WEBVTT 00:00.000 --> 00:12.840 So I'll new go for us. We are going to it using Go. And we have some very nice performance 00:12.840 --> 00:18.640 things to do. I find it Alex Ober, Alex is actually also managing a deaf room today 00:18.640 --> 00:26.120 and he just left his completely busy room to talk about how to use P going Go, so an extra 00:26.120 --> 00:37.760 hard of much room. Thank you. So I guess can we can start. Before the tour, I wanted to 00:37.760 --> 00:44.440 ask how many of you actually know what profile guide the optimization is. Great, wonderful. 00:44.440 --> 00:51.040 I like to skip through slides. So the few words about me, I used to be a C++ engineer 00:51.040 --> 00:58.880 right now. I'm a Rust engineer. Keep in mind, I'm not a Go engineer at all. I wrote several 00:58.880 --> 01:06.160 lines of gore and I'm having that. But I think my point of view will be interesting because 01:06.160 --> 01:14.720 PGO in Go is kind of special compared to let's say, a more mature PGO from C++ in 01:14.720 --> 01:20.640 the Rust world. It's kind of special. Regarding my interest, I'm interested in compilers 01:20.640 --> 01:27.840 stuff. I was hacking LVM. I prefer more LVM stuff compared to GCC. I'm also in PGO 01:27.840 --> 01:32.960 offer, this is a resource that's trying to collect as much possible information about PGO 01:32.960 --> 01:41.920 mostly from C++ in the Rust world. But I've covered it as a Go ecosystem too. And I'm 01:41.920 --> 01:47.520 a software performance developer organizer this year. So let's start. A bit fury about PGO, 01:47.520 --> 01:54.080 but we can skip it. Great. So we have a Head of Time compilation model. It's pretty simple. 01:54.080 --> 02:02.080 So our score is binary output. It's executed in target machine with regular one. So unfortunately, 02:02.080 --> 02:09.680 we have a just-and-time compilation model too. So it's a Java V8 stuff. So they have 02:09.760 --> 02:15.120 a possibility to not simply compile to some kind of byte code, but also execute with the byte 02:15.120 --> 02:25.360 content target machine. So just in time, it has a really huge advantage here because they can collect 02:25.360 --> 02:32.640 runtime statistics on a target machine. How our code is executed, which parts, which functions 02:32.720 --> 02:38.080 come frequently, et cetera. And fortunately, in the Head of Time world, we don't have such a privilege. 02:39.840 --> 02:43.920 We need this information because we have a lot of optimizations, which are dependent 02:43.920 --> 02:50.960 can be improved by providing to them around time statistics. The most important one is in Lightning 02:50.960 --> 02:57.920 is important in every compiler, including the Go one. And the solution profile guided optimizations 02:58.480 --> 03:02.720 so we just collect runtime statistics on how it's called PGR profile, 03:02.720 --> 03:06.800 pass the profile to the compiler, use the profile during the compilation phase. 03:07.440 --> 03:13.760 And it's it, our binary is faster, if an asterisk, because it's not actually true for all cases. 03:13.760 --> 03:20.400 It's first we need to understand we can optimize if PGR for now only CPU intensive stuff 03:20.400 --> 03:28.160 and stuff that is not already optimized. Whereas some developments, 03:29.680 --> 03:36.560 really fresh developments, less than a week ago, about applying PGR for GPU stuff, device GPU 03:36.560 --> 03:47.440 stuff, but it's highly unstable and available only in LLVM. And okay, our binary will be pretty fast. 03:47.520 --> 03:54.560 So, if you want to know more about PGR internals in the Go compiler, in the main Go compiler, 03:54.560 --> 04:01.280 highly recommend the stored from Go for a coin UK. And I highly recommend to ask questions to 04:01.280 --> 04:09.520 this guy on GitHub, in the Go repository, because actually it was the only person from the 04:09.600 --> 04:16.800 Go compiler, the team who answered my PGR related questions well from my point of view. 04:18.320 --> 04:29.920 So, why should we, we should care about PGR? Most of these benchmarks are not go unfortunately, 04:29.920 --> 04:36.720 unfortunately for you. But it's the reason because it's simply much easier to find 04:36.720 --> 04:46.000 ready benchmarks for more mature PGR systems. Let's see it. And most of these benchmarks actually 04:46.000 --> 04:54.080 all of them, I did myself. So, I'm pretty sure. You can expect almost similar results from the 04:54.960 --> 05:01.760 PGR applying PGR in the Go ecosystem, but keep in mind one thing. Go compiler, implement, 05:02.720 --> 05:09.600 less less aggressive optimization compared to C, C++, RUSTF, or GCC, or LLVM, 05:10.400 --> 05:16.240 a consistency, because if you've seen at least once the amount of optimization passes in LLVM, 05:16.240 --> 05:21.040 you understand. So, there are a lot, really, a lot of optimization since I told LLM. 05:21.920 --> 05:28.320 So, let's talk about PGR in Go. In the game, it was implemented firstly in a 1.20, 05:28.320 --> 05:36.240 previous 1.21, it's globally available. Much, much later compared to C and C++ world. 05:37.600 --> 05:44.560 It was implemented by Google and Google also is the offer of the same thing PGR, which is 05:45.760 --> 05:54.000 frequently called as autofidio or simply PGR. It's just same thing PGR, nothing special, that's it. 05:54.960 --> 06:00.640 Google implemented this kind of PGR for exactly one reason, because they wanted to collect 06:00.640 --> 06:07.280 PGR profiles directly from production environment. The regular ones, the default one PGR mode, 06:07.280 --> 06:17.760 instrumentation mode, gives you an ability to collect much, okay, more precise PGR profiles, 06:17.760 --> 06:21.520 but in the cost of much, much higher instrumentation costs on the run time. 06:22.480 --> 06:29.280 If you hear that you cannot run on production instrumented binaries, let's not completely 06:29.280 --> 06:35.280 true because they actually answer it depends, it depends on your requirements, etc. But the 06:35.280 --> 06:41.040 go engineer decided you don't need it, and simply PGR will be enough. 06:42.800 --> 06:51.120 And the consequence of that, that Go compiler misses a lot of interesting PGR modes, 06:51.200 --> 06:58.560 instrumentation, instrumentation kinds. For example, Varra, a lot of instrumentation flavors 06:58.560 --> 07:06.000 before translating to IR, after in-lining phase, combining sampling and instrumentation at once, 07:06.000 --> 07:14.080 trying to fill gaps in the PGR profile by passing it to some machine learning models. 07:14.800 --> 07:22.720 And one of the fancies PGR kinds is a temporal PGR that optimizes not a run time 07:22.720 --> 07:28.560 speed of your application, but a start-up time code start. It was implemented by a meta for 07:28.560 --> 07:39.040 mobile devices. It's not available for Go. So unfortunately, only the main Go compiler supports PGR, 07:39.600 --> 07:46.480 LLGo, the LLVM-based compiler for Go doesn't support PGR at all, and even 07:46.480 --> 07:53.440 there is no any request to this. I forgot to report it to the upstream. And if you hear about 07:54.320 --> 08:00.960 TINIGO, so subset of Go for Embedded, I post it TINIGO also doesn't support PGR. 08:00.960 --> 08:07.600 I left a request, I guess, one year ago also in the upstream and the developer said, yeah, 08:07.600 --> 08:16.560 great idea, give it a try. Okay, that's it. So, I don't care much, sorry. So, and another interesting 08:16.560 --> 08:25.280 detail that Go compiler reuse the PPR off ecosystem for gasoline PGR profiles, it's a huge 08:25.280 --> 08:33.920 difference compared to C++ world because they implemented custom profiles that are not 08:33.920 --> 08:39.440 compatible between compilers, you need special tooling, which is called out video from Google, 08:39.440 --> 08:47.280 this tooling is not so good, it's not so easy to build on your machine, it requires specific 08:47.280 --> 08:53.760 version of LLVM, you cannot build it on the latest LLVM version where I lot of stuff. So, 08:54.720 --> 08:58.240 from this point of view, PGR in Go is much better. 08:58.320 --> 09:06.000 Actually, you can try, so if you are not happy with TPR off format, for some reason, for example, 09:06.000 --> 09:13.040 you have your own proprietary open source profiling stuff profiling a system which was developed 09:13.040 --> 09:19.200 before PPR off, you can try to convert your profiles into the PPR off compatible format, 09:19.200 --> 09:28.240 but it nuances. Whereas a artificial documentation by the way Go documentation about PGR, 09:28.240 --> 09:34.400 I would say one of the most user-friendly in the ecosystem not only in Go, but actually for 09:34.400 --> 09:42.320 in the world, PGR ecosystem because for C++ world PGR, the documentation is terrible. 09:42.400 --> 09:52.160 It was terrible right now, it's simply bad. So, and Go developers really tried and they did a good 09:52.160 --> 09:59.680 job, trying to not only describe what PGR is, that's what we usually have, but how to use 09:59.760 --> 10:08.160 in practice, and a huge kudos to them really. And the second interesting detail, 10:09.040 --> 10:14.960 it's what designed PGR mainly for service like workloads. For example, if you have a binary, 10:14.960 --> 10:21.680 if you buy your HTTP handlers, they are running continuously and you collect profiles from a PGR 10:22.080 --> 10:30.080 from time to time, if some frequency. In C++ world, the instrumentation was the first PGR mode, 10:30.640 --> 10:37.680 and it didn't have such limitations, so usually you got a PGR profile at the exit of your program. 10:38.240 --> 10:45.440 You can highly customize this behavior, but it's pretty undocumented. You need to read a lot of 10:45.520 --> 10:53.440 DM sources, et cetera. Compared to Go, once again, Go, you can just 10:55.360 --> 11:03.840 at a package profile, HTTP, and you will get HTTP provided profiles all simply 11:03.840 --> 11:12.160 right and mainly, like a gather and dump somewhere to the file. So, let's talk about interesting 11:12.240 --> 11:19.200 possible PGR issues that you can meet in your PGR journey in Go. So, at first, 11:19.200 --> 11:26.960 mismatched and outdated PGR profiles, official documentation says that, but they choose, 11:26.960 --> 11:36.080 they work very carefully. So, that if your profile is kind of outdated because 11:36.480 --> 11:43.360 new code is added, so it will not be covered by a PGR existing PGR profile, or some code 11:43.360 --> 11:48.080 that is allergic covered by PGR profile is changed. So, your profile. 11:48.880 --> 12:04.000 Okay, backup, yep, backup is fine. 12:04.000 --> 12:20.160 Yep, yeah, a couple of last engineers here. So, Go, Go implementation of PGR is pretty stable 12:21.120 --> 12:28.320 and sustainable to this changes. However, I still recommend you to avoid problems if the 12:28.400 --> 12:36.960 stale profiles just try to regenerate them as frequently as possible and that's it. You will 12:36.960 --> 12:46.080 eliminate a really big amount of potential problems with possible regression and performance, 12:46.080 --> 12:53.200 et cetera. Please don't even try to play with these things because it's really hard to back them. 12:53.840 --> 13:03.680 I would say so. And another interesting thing, C++ ecosystem gives you an ability. 13:04.480 --> 13:10.800 When you provide a really outdated profile to the compiler, you will get at least from LVM 13:10.800 --> 13:15.280 a system, you will get a lot of warnings mismatched profiles, something like that. 13:15.280 --> 13:20.080 Unfortunately, such functionality is not available for the Go compiler yet, there is a request 13:21.040 --> 13:27.840 on the screen, but it's not implemented and there is no activity. So, be careful. 13:29.680 --> 13:36.320 And regarding performance regressions, there is an interesting note in the 13:36.320 --> 13:41.120 documentation that you should not expect performance regressions from the wrong PGR profile. 13:42.720 --> 13:49.680 I was kind of curious about such a bold statement, I would say. And of course, it was a lie. 13:50.560 --> 13:57.920 Because there is an issue in the upstream, when a person provided a PGR profile to the workload, 13:58.560 --> 14:05.920 and the resulting binary was performing worse than before PGR. 14:06.960 --> 14:13.600 And I would say it's kind of a problem for the whole PGR ecosystem, not only for Gore, for one 14:14.400 --> 14:19.440 really annoying reason. If you met such a thing, what do you do? 14:20.560 --> 14:27.920 You can try to try to regenerate PGR profile, okay, let's say you did try it and it didn't help. 14:27.920 --> 14:35.840 What else? You as a good engineer are going to the upstream and report it, and they will try to 14:36.000 --> 14:43.280 so they need a reproduction, they need your court, minimal viable product type, they will need 14:44.720 --> 14:51.520 probably your profile or better benchmarking profile gathering scenario. You will need to provide it 14:52.400 --> 15:00.240 once again, it's a very difficult job. And even if all of this information highly likely, 15:00.400 --> 15:08.560 your issue will not be resolved. Because they have much, they have a lot of enough work to implement 15:08.560 --> 15:15.920 it, they go compiler and debugging performance regressions from PGR in your specific case 15:16.320 --> 15:24.800 is not a very fun job to do. We have the same problem in LVM and GCC, I reported such bugs in 15:24.800 --> 15:32.480 these ecosystems and I got exactly the same answer, so silence, so that's it. 15:34.880 --> 15:42.400 And debugging yourself is a really challenging thing, and I wanted to highlight one thing regarding 15:42.400 --> 15:51.680 the one-shot utilities like CLIs, for example, log, generator, etc. I, so remember the service 15:51.680 --> 15:59.920 like or end of PGR. So if you have one-shot utility, for example, you start in it, 15:59.920 --> 16:05.280 it's running, for example, for 10 seconds, and it's finished. How to collect a PGR profiles 16:06.000 --> 16:14.560 from this workload? Your utility doesn't have a HTP handler, it's not a service, it's like 16:14.560 --> 16:24.080 a one-time utility. Unfortunately, I was, I proposed an idea about some kind of automation, 16:24.080 --> 16:30.000 regarding dumping a PGR profile like counters, internal accounts, etc. at the end of the program, 16:30.000 --> 16:38.800 how it's done in the C++ ecosystem. This idea was rejected, and the main motivation 16:38.880 --> 16:43.680 of all the links, the public, of course, in GitHub, the main motivation is files I understand, 16:43.680 --> 16:50.240 they don't actually care as much about such workloads. I cannot blame them because we have 16:50.240 --> 16:59.920 a lot of us writing a lot of you. Writing goes services and go, not like utilities, but anyway, 17:00.480 --> 17:07.840 if you want to dump PGR profiles, you will need to implement it by yourself. For example, 17:08.480 --> 17:16.400 once again, example from the documentation, you will just need to use the Proof ecosystem, 17:16.400 --> 17:24.720 and write some code manually. Compare it to Rust ecosystem, for example, in Rust, 17:24.720 --> 17:28.560 there is a really great, 18:25.120 --> 18:34.400 you can get a copy of it because as you know, it will also form an app as it grandsons. 18:38.720 --> 18:51.840 To begin with, this recommendation mandatory, so for example, the pres yours there. 18:51.840 --> 18:55.840 in variable variables, and possible compiler switches. 18:58.840 --> 19:02.840 Unfortunately, this use case is ugly in every ecosystem, 19:02.840 --> 19:05.840 because it's multi-linked language build. 19:05.840 --> 19:08.840 Multi-language build, you will usually have a build system, 19:08.840 --> 19:10.840 which is oriented only into one language. 19:10.840 --> 19:16.840 Cargo in Rust, whatever in C and C++. 19:16.840 --> 19:18.840 They have a bunch of them. 19:19.840 --> 19:21.840 So whatever, it's always ugly. 19:21.840 --> 19:26.840 Unfortunately, no build system can really automate well. 19:26.840 --> 19:30.840 This use case for actually, 19:30.840 --> 19:33.840 a lot of interesting reasons. 19:33.840 --> 19:36.840 For example, so how to implement it manually. 19:36.840 --> 19:41.840 You will need to pass manually to the C compiler. 19:41.840 --> 19:43.840 For example, if you see dependency, 19:43.840 --> 19:46.840 all corresponding pj of x. 19:47.840 --> 19:49.840 And here you go. 19:49.840 --> 19:51.840 Different pjomots. 19:51.840 --> 19:55.840 You will need to read the old documentation, 19:55.840 --> 19:59.840 regarding pj about your gcc or client compiler. 19:59.840 --> 20:02.840 And remember, this documentation is not that good, 20:02.840 --> 20:05.840 especially in gcc part. 20:05.840 --> 20:08.840 You will need to choose the right pjomot, 20:08.840 --> 20:12.840 because we are a lot of them, especially in LLVM. 20:12.840 --> 20:15.840 And you can have really interesting combinations. 20:15.840 --> 20:18.840 For example, sampling pj of as a go part, 20:18.840 --> 20:21.840 and instrumentation pj of as a C part. 20:21.840 --> 20:25.840 And of course, pj of profiles are not compatible at all 20:25.840 --> 20:27.840 between compilers. 20:27.840 --> 20:30.840 Gcc or LLVM use their own format, 20:30.840 --> 20:32.840 and pprof is not compatible at all. 20:32.840 --> 20:37.840 You will need to get this file from the C compiler, 20:37.840 --> 20:40.840 prepare it in the right way. 20:40.840 --> 20:43.840 You will need to learn right tools, 20:43.840 --> 20:47.840 especially for gcc, they are not a three-wheel. 20:47.840 --> 20:50.840 And pass in the right way to the compiler once again 20:50.840 --> 20:51.840 to the C compiler. 20:51.840 --> 20:54.840 Of course, all of these things should be done via 20:54.840 --> 20:56.840 CGo environment variables. 20:56.840 --> 20:59.840 I wouldn't say it's impossible to do, 20:59.840 --> 21:01.840 but it's steady of styles to do. 21:01.840 --> 21:04.840 It's manual labor. 21:04.840 --> 21:07.840 And believe me, a lot of likes, 21:07.840 --> 21:09.840 it could be a problem. 21:09.840 --> 21:12.840 And if you want to do it like a reproducible way, 21:12.840 --> 21:15.840 like write a script, believe me, 21:15.840 --> 21:17.840 such a script will be really ugly. 21:17.840 --> 21:19.840 For example, if you're trying to package 21:19.840 --> 21:22.840 writing a resipe for a go application 21:22.840 --> 21:23.840 for native dependency, 21:23.840 --> 21:26.840 and you want to write a routine with 21:26.840 --> 21:29.840 optimizes with pj of all the application 21:29.840 --> 21:33.840 during the build process of the package from this resipe. 21:33.840 --> 21:36.840 As for example, a regular use case for pj 21:36.840 --> 21:38.840 optimised application, for example, 21:38.840 --> 21:42.840 for RSI or for Clank, how they are packaged 21:42.840 --> 21:44.840 on the distributions, especially 21:44.840 --> 21:46.840 performance-arranted distributions. 21:46.840 --> 21:49.840 Regarding pj of the go-sego, 21:49.840 --> 21:53.840 I actually tried to find an information. 21:53.840 --> 21:57.840 And I found only one topic at the RAST forum, 21:57.840 --> 22:01.840 and one person actually asked exactly this use case. 22:01.840 --> 22:05.840 And as you see, not so many people were interested 22:05.840 --> 22:07.840 in the topic, so automatically closed 22:07.840 --> 22:09.840 where not interested. 22:09.840 --> 22:11.840 So that's it. That's all information. 22:11.840 --> 22:16.840 So you will meet a lot of interesting 22:16.840 --> 22:19.840 traps on your journey. 22:19.840 --> 22:22.840 So reproducibility and pj, 22:22.840 --> 22:24.840 that's really a huge topic. 22:24.840 --> 22:26.840 For example, when I was talking 22:26.840 --> 22:30.840 about any person like maintainer from distribution, 22:30.840 --> 22:32.840 it's their first question. 22:32.840 --> 22:36.840 How to make reproducibility when pj is enabled? 22:36.840 --> 22:40.840 So here we are actually two dedicated cases. 22:40.840 --> 22:44.840 The first one reproducibility saved pj of profile. 22:44.840 --> 22:47.840 You dump a profile, committed to a VCS. 22:47.840 --> 22:50.840 For example, go once again, did a great job. 22:50.840 --> 22:53.840 There is a standard default dot pj. 22:53.840 --> 22:56.840 We don't have such option in c++, 22:56.840 --> 22:58.840 and we really suffer a bit from that. 22:58.840 --> 23:01.840 And reproducibility pj of profile generation. 23:01.840 --> 23:04.840 So the first case, just saved profile, 23:04.840 --> 23:07.840 but please keep in mind, it can be stale. 23:07.840 --> 23:11.840 We need to regenerate, some frequency, et cetera, et cetera. 23:11.840 --> 23:13.840 Like outdated profiles. 23:13.840 --> 23:17.840 The second case regarding reproducible pj of profile generation 23:17.840 --> 23:19.840 is impossible to achieve. 23:19.840 --> 23:22.840 Don't even try, because it's required 23:22.840 --> 23:25.840 deterministic execution of your application. 23:25.840 --> 23:30.840 And it's, so if you're talking about huge applications, 23:31.840 --> 23:33.840 it's not a viable option to achieve. 23:33.840 --> 23:36.840 So just don't waste your time. 23:36.840 --> 23:37.840 On it. 23:37.840 --> 23:41.840 Regarding saved pj of profile, there is another thing. 23:41.840 --> 23:45.840 I met, for example, when you find an open source project, 23:45.840 --> 23:48.840 for example, this project is file d, file dot d, 23:48.840 --> 23:53.840 that's like a lock processor written in go or buy a zone. 23:53.840 --> 23:58.840 And I found this project, and I found default pj of, 23:59.840 --> 24:02.840 that means that project is optimized by pj of, 24:02.840 --> 24:04.840 and I asked this question. 24:04.840 --> 24:08.840 How did you collect this pj of profile? 24:08.840 --> 24:10.840 The answer was science. 24:10.840 --> 24:15.840 So when you try to build this application, 24:15.840 --> 24:18.840 this profile will be automatically applied 24:18.840 --> 24:22.840 by the go compiler, because it's default dot pj of, 24:22.840 --> 24:25.840 but you don't know the scenario 24:25.840 --> 24:27.840 where this profile was gathered. 24:27.840 --> 24:31.840 There is no information to get it, 24:31.840 --> 24:33.840 except for asking maintainers. 24:33.840 --> 24:36.840 Actually, I found this guy on telegram chat 24:36.840 --> 24:38.840 in some local group, and asked, 24:38.840 --> 24:40.840 if I should remember, asking directly, 24:40.840 --> 24:43.840 and he said about this scenario, et cetera, et cetera, 24:43.840 --> 24:47.840 but no public available information on the internet yet. 24:47.840 --> 24:50.840 Keep it in mind when you see any open source project 24:50.840 --> 24:54.840 with provided pj of profile. 24:54.840 --> 24:58.840 So it's scale, a bunch of additional issues. 24:58.840 --> 25:02.840 You need to collect profiles from hundreds of services 25:02.840 --> 25:05.840 from thousands of machines. 25:05.840 --> 25:07.840 I need to implement proper gathering, 25:07.840 --> 25:10.840 symbolizing store, cleaning routines, 25:10.840 --> 25:12.840 for all of this pj of profiles, 25:12.840 --> 25:14.840 because you will have thousands of them. 25:14.840 --> 25:17.840 If you are talking, especially about large fleets, 25:17.840 --> 25:20.840 like for large big tech stuff. 25:20.840 --> 25:23.840 And of course, you need to make this professor, 25:23.840 --> 25:27.840 you're a robot, a robot, et cetera, et cetera. 25:27.840 --> 25:29.840 Tracking for all of his profiles, 25:29.840 --> 25:32.840 skew, and how they are outdated, 25:32.840 --> 25:34.840 for sure services, raising arrivals, 25:34.840 --> 25:36.840 but we're talking about actually 25:36.840 --> 25:38.840 a pretty strong enterprise here, et cetera. 25:38.840 --> 25:40.840 You don't want to implement all of this thing 25:40.840 --> 25:41.840 from this scratch. 25:41.840 --> 25:45.840 Luckily, we have several solutions. 25:45.840 --> 25:48.840 Let's not our way using pj of manually 25:48.840 --> 25:49.840 at such a scale. 25:49.840 --> 25:51.840 It's not a viable option, believe me. 25:51.840 --> 25:53.840 There is open source solution. 25:53.840 --> 25:55.840 Go or again, that one is parka, parka.dev. 25:55.840 --> 25:57.840 It's open source one. 25:57.840 --> 26:00.840 And another one is solution from Yandex. 26:00.840 --> 26:04.840 Yandex perforated was an open source 26:04.840 --> 26:07.840 one exactly one year ago, 26:07.840 --> 26:10.840 one day before the previous was them. 26:10.840 --> 26:13.840 And there are actually possibilities 26:13.840 --> 26:16.840 to extend our open source solutions, 26:16.840 --> 26:18.840 like Grafana Pyros, copetanlas, 26:18.840 --> 26:19.840 or Fireing Platform. 26:19.840 --> 26:22.840 I opened request to them, 26:22.840 --> 26:24.840 regarding extending their functionality 26:24.840 --> 26:29.840 for doing pjore silenced in both cases. 26:29.840 --> 26:33.840 So, of course, if you're a large enough like Google, 26:33.840 --> 26:35.840 or whatever big tech you can write, 26:35.840 --> 26:37.840 you own profiler, and that's a viable option. 26:37.840 --> 26:40.840 For example, Google, Google White Profiler, 26:40.840 --> 26:42.840 a zone vision at a zone, 26:42.840 --> 26:44.840 and many, many other systems at profiler 26:44.840 --> 26:46.840 is very, very low data, 26:46.840 --> 26:50.840 also has a linear one, probably ready to do it. 26:50.840 --> 26:55.840 So, however, if you decide to use 26:55.840 --> 26:58.840 with solutions, with solutions that kind complicated, 26:58.840 --> 27:00.840 to admin, because it's a large scale. 27:00.840 --> 27:04.840 Stuff, you need to deploy several parts, 27:04.840 --> 27:06.840 you need to, 27:06.840 --> 27:08.840 admin different database, 27:08.840 --> 27:10.840 you need to monitor them, etc, 27:10.840 --> 27:12.840 for example, object storage, 27:12.840 --> 27:14.840 free open source stress, probably safe, 27:14.840 --> 27:16.840 so if you produce, it's not far. 27:16.840 --> 27:18.840 Fun job to admin it, 27:18.840 --> 27:20.840 that's the same actually picture, 27:20.840 --> 27:22.840 but for young expert for it, 27:22.840 --> 27:24.840 once again, postgres, click house, 27:24.840 --> 27:26.840 it's really compatible object store. 27:26.840 --> 27:28.840 Okay, that's not that fun, 27:28.840 --> 27:30.840 but you need to do it. 27:30.840 --> 27:32.840 If you want to use it. 27:32.840 --> 27:33.840 Let's quickly compare, 27:33.840 --> 27:35.840 because I don't have much time, 27:35.840 --> 27:36.840 for minutes. 27:36.840 --> 27:38.840 So, both projects are alive, great. 27:38.840 --> 27:40.840 Unfortunately, 27:40.840 --> 27:42.840 here for it are written in C++, 27:42.840 --> 27:44.840 because the index uses many C++, 27:44.840 --> 27:46.840 Spark is written in Go. 27:46.840 --> 27:48.840 So, we will decide to extend, 27:48.840 --> 27:50.840 a Spark is your way to go, 27:50.840 --> 27:52.840 please don't type C++. 27:52.840 --> 27:54.840 Licenses are both fine, 27:54.840 --> 27:56.840 apashlicness.2. 27:56.840 --> 27:58.840 The commutation, actually, 27:58.840 --> 28:00.840 I would say, both could be improved, 28:00.840 --> 28:06.840 but Spark doesn't cover the SPGO part, 28:06.840 --> 28:08.840 of the functionality at all, 28:08.840 --> 28:11.840 only one commit in the git commit history, 28:11.840 --> 28:12.840 where they said, 28:12.840 --> 28:14.840 we added end to end test 28:14.840 --> 28:17.840 to cover the SPGO functionality. 28:17.840 --> 28:18.840 Let's see, 28:18.840 --> 28:21.840 young decks cover a sampling PGO, 28:21.840 --> 28:23.840 very well, because they use it. 28:23.840 --> 28:25.840 Payed support. 28:25.840 --> 28:26.840 If you're interested, 28:26.840 --> 28:27.840 if you're afraid to know 28:27.840 --> 28:28.840 and they are not interested, 28:28.840 --> 28:29.840 I'm pretty sure, 28:29.840 --> 28:31.840 Spark, yes, they provided. 28:31.840 --> 28:33.840 PGO support for GORE, 28:33.840 --> 28:34.840 it's an important part. 28:34.840 --> 28:36.840 Spark supports it. 28:37.840 --> 28:40.840 It's written documentation. 28:40.840 --> 28:42.840 Perforator supports it, 28:42.840 --> 28:43.840 but in theory, 28:43.840 --> 28:46.840 because they didn't try to use it in production yet, 28:46.840 --> 28:49.840 because most of the service of written in C++ 28:49.840 --> 28:52.840 and C++ is extensively optimized by PGORE 28:52.840 --> 28:54.840 from perforator, go, 28:54.840 --> 28:56.840 not yet, but we want to achieve it. 28:56.840 --> 28:58.840 PGO supports other languages, 28:58.840 --> 29:00.840 Spark, no, 29:00.840 --> 29:02.840 where is an issue in the upstream, 29:02.840 --> 29:04.840 no activity yet, 29:04.840 --> 29:06.840 if I say, no, we don't plan to implement it. 29:06.840 --> 29:08.840 Perforator, yes, 29:08.840 --> 29:10.840 and they are interested in it a lot, 29:10.840 --> 29:12.840 especially in native languages, 29:12.840 --> 29:13.840 because once again, 29:13.840 --> 29:16.840 most of the service of written in C++, 29:16.840 --> 29:18.840 and the way you will support 29:18.840 --> 29:20.840 it definitely in a good way. 29:20.840 --> 29:22.840 So, in the last word, 29:22.840 --> 29:24.840 one minute, about PLO, 29:24.840 --> 29:25.840 what is it? 29:25.840 --> 29:27.840 Perforator supports it, 29:27.840 --> 29:28.840 but with nuances, 29:28.840 --> 29:30.840 and Spark doesn't. 29:30.840 --> 29:33.840 PLO, it's actually a PGO on steroids. 29:33.840 --> 29:35.840 It's an optimized binary, 29:35.840 --> 29:36.840 even after PGORE, 29:36.840 --> 29:37.840 the most important optimization 29:37.840 --> 29:40.840 is the re-ording function inside the binary, 29:40.840 --> 29:45.840 for making your CPU instruction cache, 29:45.840 --> 29:48.840 because we'll be less cache-missa, 29:48.840 --> 29:51.840 instruction cache-missa's during the execution. 29:51.840 --> 29:53.840 Available, open source tools at the moment, 29:53.840 --> 29:54.840 LLV and both the main one, 29:54.840 --> 29:56.840 Google Propeller, 29:56.840 --> 29:57.840 is a second one, 29:57.840 --> 29:58.840 until finally out, 29:58.840 --> 29:59.840 optimizer, 29:59.840 --> 30:01.840 archive, to rest in peace. 30:02.840 --> 30:04.840 So, a performance impact 30:04.840 --> 30:09.840 from using PLO for GORE, 30:09.840 --> 30:10.840 is really huge. 30:10.840 --> 30:11.840 You can check, 30:11.840 --> 30:14.840 Huawei implemented this functionality, 30:14.840 --> 30:17.840 but unfortunately, they have for you bad news. 30:17.840 --> 30:19.840 Upstream, 30:19.840 --> 30:23.840 didn't agree to accept 30:23.840 --> 30:25.840 this change into the goal-linker, 30:25.840 --> 30:27.840 because it requires some additional 30:27.840 --> 30:29.840 immaterial occasions and changes 30:29.840 --> 30:30.840 since some goal-linker, 30:30.840 --> 30:33.840 and Huawei guys just decided 30:33.840 --> 30:35.840 they are not motivated enough 30:35.840 --> 30:36.840 to push it into the upstream, 30:36.840 --> 30:37.840 and that's it. 30:37.840 --> 30:39.840 So, go a system, 30:39.840 --> 30:41.840 doesn't have a PLO, 30:41.840 --> 30:43.840 C++ has. 30:43.840 --> 30:45.840 So, summary, 30:45.840 --> 30:46.840 PGORE is great, 30:46.840 --> 30:48.840 local PGORE optimizations. 30:48.840 --> 30:49.840 That's it. Thank you. 30:49.840 --> 30:50.840 Thank you. 30:50.840 --> 30:51.840 Thank you. 30:51.840 --> 30:52.840 Thank you. 31:00.840 --> 31:02.840 Thank you.