WEBVTT 00:00.000 --> 00:29.960 So, thanks for inviting me and for organizing 00:29.960 --> 00:36.600 track on our favorite topic. 00:36.600 --> 00:43.040 I don't think I have ever spoken at a conference where the previous talk among other 00:43.040 --> 00:47.760 things was a project that was one of the Nobel Peace Prize. 00:47.760 --> 00:51.160 No, no, Nobel Prize in physics, in certain. 00:51.160 --> 00:55.600 So, it's like a high bar to follow, but the other talk has been good. 00:55.600 --> 00:58.040 I will try to do my best. 00:58.040 --> 01:06.040 I have to talk about what I have been calling continuous performance engineering, something 01:06.040 --> 01:13.640 I have been doing for the past 10 years, in various database companies and now with my own 01:13.640 --> 01:17.720 company trying to productize and evangelize it. 01:17.720 --> 01:23.560 Doing this presentation, I realized that everybody else is talking about continuous benchmarking. 01:23.560 --> 01:28.440 So this is where we are, we haven't even agreed on what we should call this thing that 01:28.440 --> 01:34.440 we know we need to be doing, but I will after this talk maybe adopt this as well because 01:34.440 --> 01:38.000 I realized that majority is gravitating there. 01:38.000 --> 01:43.320 There was a poll earlier, but people have changed a little bit, so this is of course, first 01:43.320 --> 01:49.560 them and out of first them, this is the few hundred people who actually care about performance, 01:49.640 --> 01:54.520 but it's like somebody in the room who not just you, but like in the project, there isn't 01:54.520 --> 01:58.040 really any benchmarks and nobody is doing benchmarking. 01:58.040 --> 02:10.040 Yes, and for some projects, this is actually valid, but then maybe the second option, yes, 02:10.040 --> 02:18.080 we do benchmarks, but not running continuously every day, but once a year, with the release 02:18.160 --> 02:25.040 candidate, I'm told to past it, 10 or so people, maybe. 02:25.040 --> 02:32.480 So I think things are not as bad, we are making progress, so I think the number 3 is probably 02:32.480 --> 02:39.760 going to be popular, so we do have benchmarks running every day in continuous integration, 02:39.760 --> 02:44.080 but we don't actually look at the results or even if you try to look at the results 02:44.160 --> 02:49.600 there, not very useful, who would identify as this one? 02:51.600 --> 03:02.560 Seriously, if, okay, so number 4, we have benchmarks in continuous integration, we run them 03:02.560 --> 03:08.400 every day, at least, or maybe every commit in the best case, and we get actionable results, 03:08.480 --> 03:14.480 we assign tickets to the engineer who did the progression, and they are fixed within weeks, 03:14.480 --> 03:16.720 rather than in December before Christmas. 03:19.680 --> 03:21.920 Now, okay, and most people are just listening, that's fine. 03:22.800 --> 03:28.800 So, fairly even distribution, this is still better, I did this some years ago, and there was like 03:28.800 --> 03:35.840 one who is in number 4, but like I said, of course, we are, this is like maybe 03:36.800 --> 03:43.600 out of all of Europe, the engineers who care about the finance might have been in this room, so 03:44.800 --> 03:50.880 not so bad, we are making progress, but still, you know, if you ask the same question 03:52.320 --> 03:59.120 like about unit testing or testing, like who, who has automated testing, 03:59.920 --> 04:06.560 that, and not even like flakey, the tests, we have good results, and then we, if something fails, 04:06.560 --> 04:10.080 you know, we probably catch it in the poll request, don't even merge it to me, 04:10.640 --> 04:17.600 this is basically everybody doing this for 10, 15 years at least. 04:18.160 --> 04:27.120 I had in 2006 a project where we had Nokia, so one guy actually had the phone and with his own 04:27.200 --> 04:33.280 fingers, he had a hundred tests that he would like physically press on the different buttons 04:33.280 --> 04:40.320 and user interface. He was finished like at 15, 16, in the afternoon, went home and in the next 04:40.320 --> 04:45.840 morning he took another phone and did the same program. This of course never happens anymore, 04:46.560 --> 04:56.880 but with performance, we are a little bit behind, I would say, it's not like a given that 04:58.080 --> 05:04.560 all the projects have some performance tests and we know which tools to use and we like just, 05:04.560 --> 05:09.280 you know, put some Yamal lines in into GitHub and it just works. 05:11.280 --> 05:23.120 So we are, I would say, 10, 20 years behind, so we are like the Velociraptor, which is the fastest 05:23.200 --> 05:33.600 dinosaur, so it's fast, but it's not there. And where is that Augustan Kamala right here? 05:34.640 --> 05:47.920 Yes. So I had to steal your slide because it actually is the problem. This is exactly the problem. 05:48.560 --> 05:54.160 I think there are some other problems. One is that performance and benchmarking is kind of 05:54.160 --> 06:00.400 difficult. Deploying, I automatically deployment, I think it's also difficult until we had 06:00.400 --> 06:05.840 tools like that of form or something which you can just, you know, let's take care of the hard 06:05.840 --> 06:13.680 problem. But the thing with the automated deployment is that when we do software, we do need the 06:13.680 --> 06:19.520 deployment. I've got to watch there is no point in what we are doing. You can kind of postpone 06:19.520 --> 06:25.920 the performance stuff until customers really complained. So this might be one reason 06:27.200 --> 06:32.960 that you really had to solve the deployment and also to benchmarking, you also need the deployment. 06:32.960 --> 06:37.200 So there is a dependence, you need to solve the deployment problem first anyway. 06:37.680 --> 06:47.680 Now when I've had two years start up when I'm trying to sell performance and benchmarking related 06:49.120 --> 06:56.160 services to companies, one thing I find that the performance engineers are often very busy. 06:56.800 --> 07:04.720 Might take like weeks or months to talk with them because they are at some important customer fixing 07:04.800 --> 07:11.840 performance issues in production. And because we are fixing performance issues in production, 07:11.840 --> 07:19.200 we don't have the capacity to do to kind of evolve that tooling that you could fix them before 07:19.200 --> 07:23.840 they get into production. The other thing is, and I used to do this as well, sometimes it's kind 07:23.840 --> 07:29.200 of nice to be the hero that you get to work with the most important customers and you go there 07:29.200 --> 07:34.080 and you know spend some days doing something and then it's fixed and everybody's really grateful 07:34.800 --> 07:40.720 the account manager takes you to a nice restaurant and then you fly to the next place and do the 07:40.720 --> 07:45.360 same thing again. So maybe we don't want to fix it. I don't know. Certainly, 07:47.040 --> 07:53.600 and this was like genuinely my slide before I saw the data dog talk, but the part of the problem 07:53.600 --> 08:03.360 is that math is kind of difficult but unfortunately we are going to need it. So that is probably 08:03.840 --> 08:11.360 one problem as well. And later when we talk about the tuning which also has been covered a 08:11.360 --> 08:17.840 little bit here in previous talks, I find it most performance engineers actually unintuitive 08:18.720 --> 08:28.080 how to tune a server to get repeatable results. So what I'm trying to say is that maybe we 08:28.080 --> 08:36.400 need to do what was done for QA testing and deployment. Some years they had like a DevOps movement. 08:36.400 --> 08:42.000 I actually first time I heard about DevOps was in Boston probably like 15 years ago or so. 08:43.200 --> 08:50.560 And if there was to be like continuous benchmarking movement, these are some 08:51.520 --> 08:57.280 some people and conferences and open source projects and companies that you might 08:58.000 --> 09:04.640 want to follow. Scott Moore is very active on YouTube and LinkedIn. So he's a good 09:06.640 --> 09:15.520 good place to start. And if it wasn't obvious, New York here is the company that 09:16.320 --> 09:23.440 I have been pushing for two years and I'm trying to, the goal is to mainstream some 09:23.440 --> 09:32.000 open source tools that in the past 10 years have become developed and open source and so on. 09:32.000 --> 09:40.720 So I'm here to tell you what I've learned and made available in the past 10 years and I hope that 09:40.800 --> 09:51.680 you will also tell others about these solutions that continuous benchmarking actually is possible 09:51.680 --> 09:59.280 and achievable and getting the repeatable, stable results is something we can do if we share 09:59.280 --> 10:06.400 the shared knowledge how to do it. And from here on the talk has three points. 10:07.360 --> 10:12.880 If you ever see in Finnish President's tube, there is not one interview where the answer 10:12.880 --> 10:20.880 wouldn't be structured into three points. So I believe this is a good way to go. Now for the first 10:20.880 --> 10:28.720 one we and we had kind of synchronized before that the benchmark design in itself is already hard 10:29.440 --> 10:33.920 even like let alone doing it like repeatably or continuously. 10:34.880 --> 10:41.680 So I'm not going to repeat a lot what was already in the previous talk and part of this is kind 10:41.680 --> 10:48.480 of well-established already. So the nice thing is we have these frameworks typically each 10:48.480 --> 10:54.800 language has like one or two. So if you are writing in Java you do JMAG and so on. 10:56.320 --> 11:02.240 So that's kind of well understood and then of course if you use these frameworks you can 11:04.000 --> 11:13.360 relatively easily put them into GitHub workflow or Jenkins and so on. So here we have a good starting 11:13.360 --> 11:24.320 point. I have also in my career developed frameworks to run like distributed benchmarks 11:24.320 --> 11:31.360 so like deploy an entire MongoDB cluster and then deploy some application that creates some 11:31.360 --> 11:36.720 work load and then run the benchmark and then of course because it's in the cloud all these 11:36.720 --> 11:41.920 servers and the spins so you need to collect all the log files and results and so on. And it's quite 11:41.920 --> 11:49.600 complicated to do that. I believe this is an area where solutions are not very established yet but 11:49.600 --> 11:59.200 I see a lot of opportunity for innovation here. So I'm not going to talk more about that but it's 11:59.360 --> 12:06.240 like to highlight an area where I think this isn't very established yet but even if if you start 12:06.240 --> 12:16.000 with single node benchmarks with the well-established tools that's a good starting point I want to 12:16.000 --> 12:24.880 mention one open source project that is like relatively popular and active and I have also like 12:24.880 --> 12:31.120 forked and a new conversion of this one but this has been this is existed for I think six years 12:31.120 --> 12:40.160 at least and has several contributors and so what this project does is you run your benchmark with 12:40.160 --> 12:48.240 the frameworks from the previous slide and this GitHub action benchmark has parses that understand 12:48.320 --> 12:55.120 the output of all of them and you can of course contribute more. I saw recently some activity 12:55.840 --> 13:03.440 I have contributed also a parser for if you want to just use like the time command line utility 13:03.440 --> 13:11.040 to like execute something from the command line which of course isn't very very like a high 13:11.040 --> 13:17.680 fidelity result but it can be like a simple way to do at least something. So hopefully that will 13:17.840 --> 13:24.960 get merged upstream and what this then does is it collects your results and there's some 13:24.960 --> 13:32.240 interesting way to basically maintain a database or like a JSON file in your GitHub repository 13:34.080 --> 13:44.800 which I wouldn't do because your analytics gets kind of embedded in the test history itself 13:44.880 --> 13:50.160 but it's kind of a simple way it doesn't cost anything when it's in your GitHub repository 13:51.760 --> 13:56.960 and then it has threshold based alerts and default threshold and and this highlights the 13:57.840 --> 14:06.000 problem I'm talking about in this talk default threshold in this tool is 100% so so if your performance 14:06.000 --> 14:15.520 describes 2x then it will create a ticket or depending on configuration failure or pull request 14:16.080 --> 14:24.640 and block it and but 2x is quite large is the point so we would want to maybe catch 14:25.280 --> 14:39.360 regression as much smaller than that. And this leads to topic number 2 how do we do change point 14:39.360 --> 14:45.040 detection what is change point detection so topic number 1 we have design benchmarks 14:46.560 --> 14:52.560 we try to design them well so that we get like useful results but they are still noisy 14:52.800 --> 15:07.280 this is from an old publication we did at MongoDB excuse me I think I need to take medicine before 15:07.280 --> 15:14.880 got to take it before it or so so this was a MongoDB test I think at least the first one is 15:14.960 --> 15:25.600 one of the worst case scenarios that we had in 2015 and in this picture there is one regression 15:25.600 --> 15:34.720 on one of the graphs but everything else is noise so how do you how how do you work with this even 15:34.720 --> 15:40.560 like looking at it as a human it's not easy to spot where it's the regression let alone having 15:40.640 --> 15:47.200 something automated where you you have no human looking at it and it should it should find the regression 15:48.480 --> 15:54.960 so but this is normal I don't know those who had like in in the data go dog case for example 15:55.360 --> 16:02.080 20 30% up and down and this is not I know this you for example had variants in your slide 16:03.040 --> 16:07.440 we decided in MongoDB to look at the range from minimum to maximum 16:07.920 --> 16:16.400 result which is stricter but we felt it was kind of like meaningful that we want to have 16:16.400 --> 16:28.400 all all test runs behaving well and so then you get get up to 70% outliers sometimes but 16:28.400 --> 16:36.960 but there is an actual change that is persistent so this was caused by something in the actual 16:36.960 --> 16:49.440 git commit at that point right so this means this means when I was on the team there were 16:49.440 --> 17:00.480 four of us so in a month each of us had one week where we were we were attached to the triage results 17:00.560 --> 17:09.840 coming out of the out of the CI system and you can imagine that mostly they were false alarms 17:11.280 --> 17:18.240 when input data is like this this is something more recent is from the Nürkia 17:20.240 --> 17:26.800 side and some of our users choose to publish their results especially for open such projects it's 17:26.800 --> 17:33.920 convenient because the contributors don't need to create an account and and here also this is 17:33.920 --> 17:44.080 torso database finish American project very popular they get like 20 pull requests per day 17:44.080 --> 17:51.920 many of them from external contributors and simple two simple test select one select count star 17:51.920 --> 17:57.840 this is with the git how default runner very so once in a while you get this kind of hiccups 17:57.840 --> 18:05.840 up and down 40 to 50% compared to the here you can of course with the human are you can see 18:06.400 --> 18:15.120 where is the kind of normal result that that's that's the one that is kind of without these effects 18:15.120 --> 18:23.200 from the infrastructure so now if we do threshold based alerting on this kind of data in 18:23.200 --> 18:34.720 MongoDB or third database it's like first idea that comes to mind here is a generated data set 18:36.320 --> 18:44.320 and and on these two graphs the green line is like an actual change which could be like regression 18:44.800 --> 18:51.360 and everything else is false alarms and in 2015 this was my job so one week every month 18:52.720 --> 18:57.840 I would come to work on Monday and I would have to look through they were like issues created out of 18:57.840 --> 19:04.480 all of those and you know already before you look at them that maybe today 100% of them will be false 19:04.480 --> 19:10.960 positives or like once a week there would be like an actual regression that you then could 19:11.280 --> 19:22.000 file and send to the developer to fix and we were lucky so the first thing we tried 19:23.120 --> 19:29.520 something else than threshold based alerting was this kind of algorithm from Mattis on 19:29.520 --> 19:36.320 and James and that had just been published a year before and you need to understand some 19:36.320 --> 19:42.000 math that we were also lucky that we had an intern that actually studied math and went back to 19:42.000 --> 19:50.080 the master and PhD later who then did the first version of what is today the Otava 19:50.080 --> 20:00.640 incubating project in Apache and this is so so this is the same graph so this is the same same 20:00.800 --> 20:08.320 times is now in both of these graphs the top one is the change direction algorithm and the bottom one 20:08.320 --> 20:17.040 is this like just comparing like some percentage compared to previous point gets an alert so 20:19.360 --> 20:27.920 the key and the key of how do you get this kind of better performance is that it looks at more 20:27.920 --> 20:37.360 data of course than just just the previous point or some some small window and the key word here 20:37.360 --> 20:46.080 is many monitoring tools that have these kind of functions have been designed for monitoring 20:46.080 --> 20:53.280 production and that is a different problem than finding regressions in like in in poll 20:53.280 --> 21:03.040 request before you commit why so in production so the two phenomena you have are outliers which 21:03.040 --> 21:12.000 can be like a spike or single event and then change points which means there is a persistent 21:12.000 --> 21:19.760 change like in the middle here in this demo data set and and we are now only interested in 21:20.640 --> 21:29.600 in in change point the pection so we want to if there is a regression in the actual code it 21:29.600 --> 21:35.520 means like when we rerun the benchmark tomorrow the regression will still be there and then everything 21:35.520 --> 21:42.640 else that's from like network flakiness or noisy neighbors or so on we want to ignore but a lot 21:42.640 --> 21:50.960 of the a lot of the existing tooling actually is designed to also alert you on outliers and 21:50.960 --> 21:59.360 and we want we consider for us that's a distraction okay so that is change point detection 22:00.800 --> 22:08.800 you can check it out on a patch or but this was transfer formative for me and the team 22:09.760 --> 22:19.520 and and it's a sense we published it has been used by many other companies and and that was 22:19.520 --> 22:27.040 the first time that we started believing that this continuous benchmarking is possible in 22:27.040 --> 22:33.040 the sense not just that we can execute the benchmarks but also that we can automatically 22:33.120 --> 22:45.200 analyze results in an actionable way and also actually back then we did another project where 22:47.760 --> 22:51.680 where we looked at the other problem is that why is there so much noise in the benchmark 22:51.680 --> 23:00.320 that results in the first place and now I can a little bit there's a little bit overlap between 23:00.400 --> 23:08.800 the previous talks but let's say if you were here for the data talk this is like wrong answers 23:08.800 --> 23:17.520 only slide so don't like we already know the answer but but if I ask some people who went 23:17.520 --> 23:24.480 sitting in this room today then the typical answers are well you shouldn't use cloud infrastructure 23:24.560 --> 23:32.640 for benchmarking probably we had noisy neighbors or there is some other kind of bad cloud 23:32.640 --> 23:42.640 instances actually this previous talk from cover you and it's true that sometimes you observe 23:42.640 --> 23:50.960 this correctly and there's reason why this kind of change in performance happens but it's not 23:51.920 --> 23:59.520 across the population it's not well understood what what are the reasons why we why we can't 23:59.520 --> 24:10.800 get like easy to analyze reliable performance results and again we had to go back to our core principles 24:11.520 --> 24:17.280 let's let's apply scientific methods or not just math but also like okay let's let's stop 24:18.240 --> 24:26.400 you know quoting things that you read on somebody's blog and let's get to work those on 24:26.400 --> 24:34.400 experiment and see what the results are and then you repeat this we we actually spent two engineers 24:34.400 --> 24:40.800 spent the three months where the objective was to find out that what would be a good 24:40.800 --> 24:50.720 configuration in Amazon in our case to minimize this range from minimum and maximum result 24:53.120 --> 25:02.160 of benchmarks where we knew that there was no regression in fact the benchmark like the same 25:02.160 --> 25:09.280 for me all the time 25 times and here is a fun exercise you can do I'm not going to read all of 25:09.280 --> 25:15.200 them what we will come back so so we we kind of retroactively listed the assumptions because this is how 25:15.200 --> 25:22.640 science work you have hybrid thesis then you test and then you test either confirms or or 25:22.640 --> 25:28.000 invalidates the assumption and then you go to the next one and these were assumptions that were built 25:28.000 --> 25:36.960 into our benchmark in infrastructure but they had never been validated they were just accepted as 25:37.920 --> 25:49.600 fact so for example at the end if you if you read the third one there was there was this kind 25:49.600 --> 25:57.760 of belief that when you launch cloud instances you can kind of sometimes get bad instances 25:58.480 --> 26:05.440 for some reason like maybe noisy neighbor or maybe they're just bad without neighbors and 26:05.520 --> 26:10.800 because of this we had a system where we run some kind of test benchmark first and if the result 26:10.800 --> 26:17.040 wasn't above some threshold we would shut down everything and try again three times and then the 26:17.040 --> 26:26.960 third time we just kind of gave up and run the benchmark anyway and yeah so now to what we did was 26:26.960 --> 26:32.160 instead we took like three months that we didn't really test MongoDB that much and we instead 26:32.240 --> 26:40.640 we tested like our own infrastructure that can we get better results out of it so we stopped on a 26:40.640 --> 26:48.800 on a on a specific git commit that we knew were good and so it was always the same binary 26:50.080 --> 26:55.680 and because you you don't want to have too many moving parts right so you you you only want to 26:55.680 --> 27:01.920 change one thing in the test and keep everything else constant and then we launched five 27:01.920 --> 27:07.280 different servers and repeated each test five times and here it's like from an old publication 27:07.280 --> 27:14.960 again copied I copied what I did and and this is what it looks like so now immediately when you see 27:14.960 --> 27:23.840 this result what can we see from it that validates or invalidates the assumption on previous slide 27:23.920 --> 27:35.120 this does not show so like the five five first dots on each line it's like from the same 27:35.120 --> 27:40.480 server and then like five to ten is the next server and this does not show that suddenly there 27:40.480 --> 27:46.960 would be a server where the five dots are significantly different you see maybe like the first test 27:47.040 --> 27:52.400 there's a bit like the cold cache effect and then like test four one to four a lot of five are 27:53.120 --> 27:59.760 faster but generally there's a lot of variability here but but the variability does not 28:00.960 --> 28:12.160 correlate with like a 0.5 10 15 where we changed server so this was false and imagine this was also 28:12.160 --> 28:19.920 at the time when we paid for Amazon instance is per hour so imagine we would start 16 node 28:19.920 --> 28:25.280 shorted cluster in the worst case they would run this test and say oh no this is like some of 28:25.280 --> 28:31.520 the null server let's shut down everything we had already paid for an hour let's start 16 new ones 28:32.960 --> 28:38.080 this was all based on the false assumption but this does happen the reason this didn't happen for us 28:38.080 --> 28:46.080 is we used C3 family instances which are kind of like a bit more expensive and better quality 28:47.120 --> 28:51.440 if you use the M instances it's true that you can get different CPU generations 28:53.440 --> 29:00.800 even if you you are using like the same same instance type from Amazon they might I think they 29:01.040 --> 29:06.720 kind of give you like more better CPU than you are paying for or something but in the C 29:08.560 --> 29:14.320 with the C family still today this does not happen so so this is also a good lesson now when you 29:14.320 --> 29:19.600 go home you could do the same mistakes and I listen to Hendrick at first them and he said that 29:19.600 --> 29:25.040 this does not happen but actually it can happen if you don't have the exact same instance actually 29:25.040 --> 29:33.280 that I had so you need to do the testing and kind of rigorous scientific attitude so this was 29:33.280 --> 29:42.480 covered so once we iterated over this and kind of changed various configurations we found that 29:43.520 --> 29:51.120 for example the CPUs are not designed so infrastructure that we use isn't even trying to produce 29:51.200 --> 29:59.360 repeatable results in fact the CPUs have a lot of features that explicitly change performance 29:59.360 --> 30:09.280 all the time for environmental reasons for example to save energy so so there is now what 30:09.280 --> 30:16.720 is it's called CPU power client the common line utility which you can then use to turn off 30:17.040 --> 30:23.520 most of these features so that your CPUs have and you can use this in the cloud some 30:24.320 --> 30:34.160 things you can not do on virtualized infrastructure but but actually it turns out that mostly 30:34.160 --> 30:39.440 the cloud mostly it's not the problem to do benchmarking in the cloud like all of these things 30:39.440 --> 30:45.040 if you had your own server in your own laboratory this would still be a problem and you would 30:45.360 --> 30:52.160 have to turn this off by configuration the other big source so we had assumed that 30:53.440 --> 30:59.040 that if like SSDs of course are fast so that has to be good for benchmarking we had assumed that 31:00.320 --> 31:07.520 if you use the so called local SSDs in Amazon that's probably good but that was the biggest source 31:08.080 --> 31:15.440 variation and the explanation is if you read Amazon documentation know what nowhere does it say 31:15.440 --> 31:22.240 that these are local to the server instance so they go somewhere top of rack or somewhere where and 31:22.240 --> 31:28.480 this is where you have the noise neighbor problem so actually the CPU virtualization sorry 31:28.560 --> 31:41.840 CPU virtualization is quite good quality and in my experience CPU doesn't have a lot of noise 31:41.840 --> 31:49.280 neighbor issues this is the reason we have and this could like even on the same instance during 31:49.280 --> 31:56.320 the same hour this could suddenly change a lot because of the SSD and if you use EBS at the time 31:56.400 --> 32:04.240 we use like provision IOPS this is very stable performance because Amazon's are cheap so 32:04.240 --> 32:11.200 if you pay for like 5000 IOPS you would get exactly that not more and this is what what it 32:11.200 --> 32:17.600 looked like when we then deployed the improvements in production so we were able to get more stable 32:18.320 --> 32:25.440 lines in fact in the MongoDB case and there is a blog post still public and there is a link at 32:25.520 --> 32:34.400 the end of this presentation we got all the tests within a 5% range which is a significant improvement 32:34.400 --> 32:45.040 compared to something like 40%. One thing we did in a part of this project we also did tests 32:45.040 --> 32:51.440 where we omitted MongoDB completely and just like tested CPU and this can network performance and 32:51.440 --> 32:57.200 then we thought hey why don't we put this in the CIS as well so every day we kind of 32:58.080 --> 33:06.800 verify that it is the infrastructure still behaving the same as before and generally the answer is yes 33:07.920 --> 33:13.040 but one day we came to work notice this like January 33:13.440 --> 33:31.440 for in 2018 what happened all cloud vendors released fixed firmware fixes for hard bleed 33:31.840 --> 33:42.000 or what was the other one I think this is hard bleed so this is a curative fix and all our performance tests were red 33:43.280 --> 33:48.960 and we thought okay so who screwed up but because we had the canneries running to say oh no actually 33:48.960 --> 33:56.080 this is now a problem in the infrastructure so but by that time we had already been going a year or so 33:57.600 --> 34:03.680 so this is this is like exceptionally event that the regression was explained by the cloud 34:03.680 --> 34:12.240 environment okay so I've been working on this now last half year again 10 years later what has 34:12.320 --> 34:25.440 changed oh so yeah okay so if on on Nürkir.com I've done the same kind of tuning so so you can have a 34:25.440 --> 34:33.120 GitHub runner and there are there are many companies that offer these third party runners and often 34:33.120 --> 34:41.120 they are like faster than the default runner and or cheaper or often bought and Nürkir is the only 34:41.760 --> 34:47.680 one who is offering runners that are not faster and not cheaper but they are better quality 34:47.680 --> 34:55.600 because they are configured for repeatable results and okay I'm apparently out of time but 34:57.040 --> 35:02.320 but using this is quite simple you install the app and it's like one line where you can change the 35:02.640 --> 35:12.160 runner and I just want to say I was personally not expecting this but in some of the tests this 35:12.160 --> 35:18.720 is the same that was in the beginning of the presentation select count star when used which to this 35:18.720 --> 35:28.560 configuration in the torso project you actually get like for a time period when there is no regression 35:28.640 --> 35:34.720 in the code itself it stays within one nanosecond and I was quite impressed by this so you can 35:34.720 --> 35:41.760 use cloud for continuous benchmarking and if you have your own hardware you will have the same 35:41.760 --> 35:49.760 problems and this same configuration is needed so I think this is a good slide to end 35:50.960 --> 35:57.600 do we take any questions nope I will hang around here for the rest of the day so happy to talk 35:58.960 --> 36:00.960 you 36:06.960 --> 36:10.960 sorry small announcement could you please help us and leave through that door