WEBVTT 00:00.000 --> 00:12.000 But this is going to be a talk about using NYXOS for the 00:12.000 --> 00:16.000 deterministic distributed system benchmarking. 00:16.000 --> 00:19.000 And the speaker here is Bruce Gain. 00:19.000 --> 00:21.000 It's a really cool topic. 00:21.000 --> 00:24.000 I think there's a lot of applications. 00:24.000 --> 00:27.000 NYX can be leveraged for and this is a particular 00:27.000 --> 00:31.000 interesting one, so take it away. 00:31.000 --> 00:32.000 Okay. 00:32.000 --> 00:37.000 A lot of applause for this speaker, please. 00:37.000 --> 00:39.000 Yeah, hi. 00:39.000 --> 00:42.000 My name is as Martin just said, I'm Bruce Gain. 00:42.000 --> 00:45.000 I'm an analyst with consulting firm and it called 00:45.000 --> 00:47.000 the Revcom. 00:47.000 --> 00:51.000 And what we do among other things is benchmark testing. 00:51.000 --> 00:55.000 And we try to really drill down and perform its characteristics 00:55.000 --> 00:59.000 of certain software packages and tools. 00:59.000 --> 01:03.000 And we're looking at the runtime basis as much as possible. 01:03.000 --> 01:07.000 And as you could probably understand that it's quite difficult to do, 01:07.000 --> 01:12.000 especially when trying to benchmark commercial applications. 01:12.000 --> 01:16.000 And by the way, about myself, I'm a huge Linux advocate. 01:16.000 --> 01:22.000 I've been using Linux for over 25 years and I love open source. 01:22.000 --> 01:28.000 And we ran into a wall recently with really trying to gauge runtime 01:28.000 --> 01:30.000 performance of certain applications. 01:30.000 --> 01:32.000 I just mentioned. 01:32.000 --> 01:35.000 And we looked at a few things. 01:35.000 --> 01:37.000 A few alternatives to do that. 01:37.000 --> 01:39.000 And we saw geeks. 01:39.000 --> 01:40.000 I don't know. 01:40.000 --> 01:42.000 Is anybody here familiar with geeks? 01:42.000 --> 01:43.000 Kix? 01:43.000 --> 01:44.000 Yeah, cool. 01:44.000 --> 01:45.000 Yeah, great. 01:45.000 --> 01:46.000 I love geeks. 01:46.000 --> 01:49.000 But right now, we're not getting it to function as we'd like. 01:49.000 --> 01:51.000 It's really hard for us. 01:51.000 --> 01:55.000 So what we started out with next. 01:55.000 --> 01:57.000 We're doing a lot of work with next now. 01:57.000 --> 02:01.000 And we're initially looking at trying to gauge the performance 02:01.000 --> 02:05.000 of skill-a-db and Cassandra. 02:05.000 --> 02:08.000 You know, these database applications and platforms. 02:08.000 --> 02:09.000 Who's here? 02:09.000 --> 02:12.000 Who here is familiar with skill-a-db? 02:12.000 --> 02:13.000 That's database. 02:13.000 --> 02:15.000 OK, Cassandra, probably. 02:15.000 --> 02:16.000 Yeah, Cassandra. 02:16.000 --> 02:18.000 Yeah, everybody is Cassandra almost right. 02:18.000 --> 02:26.000 So it became notoriously hard for just for some reason. 02:26.000 --> 02:28.000 That might be obvious. 02:28.000 --> 02:36.000 The skill-a-db versus Cassandra for the performance based on the different 02:36.000 --> 02:42.000 benchmarks for that, latency, et cetera. 02:42.000 --> 02:46.000 It was just very skewed. 02:46.000 --> 02:48.000 That's not the right word. 02:48.000 --> 02:52.000 But as far as the performance and the benchmarks go of skill-a-db, 02:52.000 --> 02:56.000 those on file are very, the performance and skill-a-db, 02:56.000 --> 03:00.000 according to those benchmarks, are like 2 to 3x. 03:00.000 --> 03:03.000 And we'll go to that later, specific benchmarks. 03:03.000 --> 03:07.000 But compared to Cassandra. 03:07.000 --> 03:12.000 So what we wanted to do is just compare the two and a more apples 03:12.000 --> 03:15.000 way, nicks are sorry, excuse me. 03:15.000 --> 03:18.000 A Cassandra versus skill-a-db. 03:18.000 --> 03:23.000 And we didn't, it's difficult because right now we don't have access 03:23.000 --> 03:29.000 to a skill-a-db package to do that to put it on nicks. 03:29.000 --> 03:33.000 So we're asking, we're trying to get the skill-a-db folks to help us with that. 03:33.000 --> 03:36.000 So hopefully we'll be able to do that soon one day. 03:36.000 --> 03:39.000 But at the meantime, we were looking at how to do this, how the gauge 03:39.000 --> 03:42.000 of benchmarks with Cassandra. 03:42.000 --> 03:46.000 And we covered this with, for example, 03:46.000 --> 03:49.000 Docker, oh yeah, the other issue is we wanted to share our work, 03:49.000 --> 03:53.000 to share our benchmarks so you can reproduce those and see for yourself. 03:53.000 --> 04:00.000 With the president with Docker, I love Docker as much as everybody else. 04:00.000 --> 04:02.000 We use it every day. 04:02.000 --> 04:06.000 But reproducing that with Docker is problematic. 04:07.000 --> 04:09.000 It's not accurate as you know. 04:09.000 --> 04:15.000 The performance of St. Cassandra differs according to the, 04:15.000 --> 04:18.000 you know, the runtime, the operating system, et cetera. 04:18.000 --> 04:24.000 I mean, you just can't port directly replicate those environments 04:24.000 --> 04:25.000 with Docker. 04:25.000 --> 04:26.000 We know that. 04:26.000 --> 04:28.000 So nicks, I think, is everybody else. 04:28.000 --> 04:33.000 Here knows that there's one of the beautiful things that nicks does 04:34.000 --> 04:36.000 is that reproduce ability aspect. 04:36.000 --> 04:40.000 And I find it quite amazing actually. 04:40.000 --> 04:43.000 So yeah, here are the benchmarks I was referring to. 04:43.000 --> 04:45.000 Cassandra versus a skill of DB. 04:45.000 --> 04:48.000 These are provided by skill of DB. 04:48.000 --> 04:51.000 Now, you know, these different, you know, 04:51.000 --> 04:56.000 latency, et cetera, the rewrite frequency, et cetera. 04:56.000 --> 05:00.000 Yeah, skill or Cassandra gets killed. 05:00.000 --> 05:03.000 But, you know, this is probably. 05:03.000 --> 05:04.000 Yeah, I mean, I don't know. 05:04.000 --> 05:06.000 I would like to put this on nicks. 05:06.000 --> 05:08.000 That's what I want to do. 05:08.000 --> 05:10.000 And hopefully we can make that happen. 05:10.000 --> 05:12.000 Make that comparison happen. 05:12.000 --> 05:14.000 And going back to the Docker issue, you know, 05:14.000 --> 05:19.000 the leaky abstractions, the containers on the host, et cetera. 05:19.000 --> 05:22.000 You know, those are, you know, 05:22.000 --> 05:25.000 and not Docker is just not, we're often said, 05:25.000 --> 05:26.000 okay, just look at Docker. 05:26.000 --> 05:27.000 Just how we share a work. 05:27.000 --> 05:29.000 We'll put it on a Docker container. 05:30.000 --> 05:32.000 No, that's not really. 05:32.000 --> 05:35.000 That doesn't work as far as reproducibility goes. 05:35.000 --> 05:38.000 You know, for a different, a number of different reasons. 05:38.000 --> 05:41.000 I mean, it's even contingent on the, you know, operating system. 05:41.000 --> 05:44.000 Of course, your laptop, whatever. 05:44.000 --> 05:47.000 With, you know, Docker for those, you know, 05:47.000 --> 05:50.000 reproducing results or reproducing environments. 05:50.000 --> 05:53.000 It's just not cut, cut out for that. 05:53.000 --> 05:58.000 So that's, again, that's what I was just mentioned. 05:58.000 --> 06:02.000 Well, at the beginning was that, you know, we're frustrated 06:02.000 --> 06:03.000 in our journey. 06:03.000 --> 06:06.000 I hate that word, but for lack of a better word, 06:06.000 --> 06:08.000 our journey to figure out, you know, 06:08.000 --> 06:11.000 how are we going to act early, you know, compare, 06:11.000 --> 06:15.000 not just Cassandra and, and, and skill of DB, 06:15.000 --> 06:19.000 but, you know, other runtime performances or different applications. 06:19.000 --> 06:23.000 And, you know, for right now, it just looks like Nick's 06:23.000 --> 06:24.000 is the way to go for that. 06:24.000 --> 06:26.000 If anybody has any alternatives, 06:26.000 --> 06:29.000 I don't know, geeks shows promise, but right now, 06:29.000 --> 06:31.000 I can't think of anything better than with Nick's. 06:31.000 --> 06:34.000 And it's pretty fun to set up. 06:34.000 --> 06:37.000 I mean, but that won't get to that in a second. 06:37.000 --> 06:43.000 I think most people are probably familiar already 06:43.000 --> 06:47.000 with, you know, why, you know, the next functionality, 06:47.000 --> 06:49.000 you know, how that works. 06:49.000 --> 06:53.000 You know, I've learned recently that, oh, yeah, 06:53.000 --> 06:57.000 sorry, to go back to one thing about Nick's, 06:57.000 --> 07:01.000 I've learned recently, is that Debbie and actually, 07:01.000 --> 07:06.000 for a while, was trying to get over that reproducibility hump 07:06.000 --> 07:10.000 with, you know, with that doctor, you know, 07:10.000 --> 07:12.000 to solve that doctor issue for that quote, 07:12.000 --> 07:13.000 drift. 07:13.000 --> 07:15.000 It's another word I hate using, but, you know, 07:15.000 --> 07:18.000 adding, you know, reproducibility to something like 07:18.000 --> 07:20.000 doctor and Debbie and stop that. 07:20.000 --> 07:22.000 In fact, they started using Nick. 07:22.000 --> 07:23.000 I learned that recently. 07:23.000 --> 07:24.000 I thought it was interesting. 07:24.000 --> 07:26.000 Everybody here does Debbie and Linux. 07:26.000 --> 07:27.000 Yep. 07:27.000 --> 07:28.000 Yeah. 07:28.000 --> 07:29.000 I don't know. 07:29.000 --> 07:34.000 Anyway, so, so, going back to Nick's, you know, 07:34.000 --> 07:37.000 the reproducibility, which I found was fascinating, 07:37.000 --> 07:40.000 it was, you know, on the level with the output, 07:40.000 --> 07:43.000 it was the hash functionality of, you know, 07:43.000 --> 07:47.000 the flake configuration, FL, AKE, 07:48.000 --> 07:53.000 and that, for me, just on a computational level, 07:53.000 --> 07:57.000 it's fascinating, I thought, because that, 07:57.000 --> 08:02.000 the reproducibility hinges on that hash functionality, 08:02.000 --> 08:08.000 where if that hash sees one single digit in the code 08:08.000 --> 08:13.000 that is different from the build, it will not function. 08:13.000 --> 08:14.000 It just stops. 08:14.000 --> 08:16.000 It's just not, will not read to that. 08:16.000 --> 08:20.000 And so, that's what I thought was, you know, 08:20.000 --> 08:22.000 computationaly, interesting, 08:22.000 --> 08:25.000 main aspect of Nick's, which I found fascinating. 08:25.000 --> 08:31.000 I guess you could call it the power of hash, if you'd like. 08:31.000 --> 08:36.000 So, so again, you know, we put this through, you know, 08:36.000 --> 08:39.000 set this up with, well, an engineer he did at first. 08:39.000 --> 08:42.000 One of, one of the engineers on our team, 08:42.000 --> 08:45.000 who's credited at the end of this, he's in the US. 08:45.000 --> 08:49.000 And he, you know, we, you know, this configuration 08:49.000 --> 08:51.000 evolved, you know, Sanders, I mentioned, you know, 08:51.000 --> 08:56.000 we loaded up Nick's, we, you know, looked at, 08:56.000 --> 08:58.000 we did, we pulled it off GitHub, you know, 08:58.000 --> 09:00.000 we did the standard thing of putting Nick's onto the machine, 09:00.000 --> 09:04.000 getting that flake file and integrating the package, 09:04.000 --> 09:08.000 the, um, the standard package with Nick's. 09:08.000 --> 09:12.000 And that proved a little difficult sometimes. 09:12.000 --> 09:18.000 You know, the, um, sometimes the, you know, 09:18.000 --> 09:22.000 it was at one point, it said that we, 09:22.000 --> 09:25.000 if I remember correctly, the cache was not 09:25.000 --> 09:28.000 considered correctly, whether the Java was reading 09:28.000 --> 09:31.000 into the wrong file or the wrong place. 09:31.000 --> 09:35.000 And that, um, and it kept failing. 09:35.000 --> 09:39.000 So that was, um, I had a look and dig into the documentation 09:39.000 --> 09:42.000 that took a few hours just that one part. 09:42.000 --> 09:45.000 But we figured it out, um, and just the big, 09:45.000 --> 09:48.000 quite candid, I just cut and pasted from the documentation 09:48.000 --> 09:49.000 and it works now. 09:49.000 --> 09:50.000 All right. 09:50.000 --> 09:51.000 I hope it will. 09:51.000 --> 09:52.000 And I do the demo. 09:52.000 --> 09:54.000 So again, going back to the, you know, 09:54.000 --> 09:58.000 summary of the, um, you know, 09:58.000 --> 10:00.000 you know, for the, you know, how Nick's works with the 10:00.000 --> 10:02.000 standard, particularly, you're just in general, 10:02.000 --> 10:05.000 you know, you have to tick at the hash, um, 10:05.000 --> 10:07.000 it's calculating, you know, ensuring that the, 10:07.000 --> 10:11.000 the code is, um, completely has not changed. 10:11.000 --> 10:14.000 It's immutable as, as a term, uh, 10:14.000 --> 10:15.000 checks everything. 10:15.000 --> 10:17.000 You can put, you know, thoughts, say, some fridge. 10:17.000 --> 10:20.000 Download the, the pre-build environment, 10:20.000 --> 10:22.000 which I did once. 10:22.000 --> 10:25.000 So at the end of this, I'll show you the GitHub link. 10:25.000 --> 10:29.000 You should just be able to clone the GitHub repository 10:30.000 --> 10:31.000 and run this benchmark. 10:31.000 --> 10:33.000 Uh, it'll save you probably, 10:33.000 --> 10:35.000 according to my engineer. 10:35.000 --> 10:37.000 He spent 10 hours getting this set up. 10:37.000 --> 10:39.000 So that way you can just do that. 10:39.000 --> 10:41.000 Um, I hope. 10:41.000 --> 10:43.000 Let me know if it doesn't work. 10:43.000 --> 10:45.000 Or put it in a pull request and get up. 10:45.000 --> 10:46.000 If you like. 10:46.000 --> 10:48.000 So anyway, going back to this workflow, uh, 10:48.000 --> 10:50.000 you know, a download environment, 10:50.000 --> 10:53.000 and you spin it up and you start looking at your benchmark. 10:53.000 --> 10:55.000 Uh, that's, that's essentially it. 10:55.000 --> 10:58.000 Um, you know, until now, 10:58.000 --> 11:00.000 let's see if I can have any questions right now. 11:00.000 --> 11:02.000 Nope. 11:02.000 --> 11:03.000 Nope. 11:03.000 --> 11:04.000 Okay. 11:04.000 --> 11:05.000 Great. 11:05.000 --> 11:06.000 Okay. 11:06.000 --> 11:08.000 I see how this works. 11:29.000 --> 11:30.000 Okay. 11:30.000 --> 11:31.000 Okay. 11:31.000 --> 11:32.000 Okay. 11:32.000 --> 11:33.000 Okay. 11:33.000 --> 11:34.000 Okay. 11:34.000 --> 11:36.000 Okay. 11:36.000 --> 11:37.000 Okay. 11:37.000 --> 11:38.000 Okay. 11:38.000 --> 11:39.000 Okay. 11:39.000 --> 11:40.000 Okay. 11:40.000 --> 11:41.000 Okay. 11:41.000 --> 11:42.000 Okay. 11:42.000 --> 11:43.000 Okay. 11:43.000 --> 11:44.000 Okay. 11:44.000 --> 11:45.000 Okay. 11:45.000 --> 11:46.000 Okay. 11:46.000 --> 11:47.000 Okay. 11:47.000 --> 11:48.000 Okay. 11:48.000 --> 11:49.000 Okay. 11:49.000 --> 11:50.000 Okay. 11:50.000 --> 11:51.000 Okay. 11:51.000 --> 11:52.000 Okay. 11:52.000 --> 11:53.000 Okay. 11:53.000 --> 11:54.000 Okay. 11:54.000 --> 11:55.000 Okay. 11:55.000 --> 11:56.000 Okay. 11:56.000 --> 11:57.000 Okay. 11:57.000 --> 11:59.000 Okay. 11:59.000 --> 12:00.000 Okay. 12:12.000 --> 12:13.000 So, that's like what I'm getting. 12:13.000 --> 12:14.000 I see you. 12:14.000 --> 12:15.000 Okay. 12:15.000 --> 12:16.000 I apologize. 12:16.000 --> 12:18.000 We had a problem with that cable. 12:18.000 --> 12:19.000 Yeah. 12:19.000 --> 12:21.000 Is well, the old cable. 12:21.000 --> 12:25.000 I'm not sure what we can do about it. 12:26.000 --> 12:28.000 Well, excuse me, sir. 12:28.000 --> 12:29.000 Okay. 12:29.000 --> 12:30.000 Okay. 12:30.000 --> 12:31.000 Okay. 12:31.000 --> 12:33.000 Okay. 12:33.000 --> 12:34.000 Okay. 12:34.000 --> 12:35.000 Okay. 12:35.000 --> 12:36.000 Okay. 12:36.000 --> 12:37.000 Okay. 12:37.000 --> 12:38.000 Okay. 12:38.000 --> 12:39.000 Okay. 12:39.000 --> 12:41.000 I apologize. 12:41.000 --> 12:42.000 Okay. 12:42.000 --> 12:44.000 Actually we already had this one cable. 12:44.000 --> 12:45.000 Okay. 12:45.000 --> 12:46.000 Okay. 12:50.000 --> 12:51.000 Okay. 12:51.000 --> 12:52.000 Yeah. 12:52.000 --> 12:53.000 Yeah. 12:53.000 --> 12:54.000 That seems to be a bit flagging. 12:54.000 --> 12:58.000 Where did it start working when I was pulling it or when it was? 13:04.000 --> 13:06.000 I don't have a different screen. 13:07.000 --> 13:09.000 Oh yeah. 13:24.000 --> 13:50.000 I understand the impulse of trying to, you know, have, like, chatting a bit, but let's not lean into that too much because the room gets really, you know, a bit too energetic. 13:51.000 --> 13:52.000 Can I go? 14:04.000 --> 14:05.000 Okay. 14:05.000 --> 14:06.000 Sorry. 14:06.000 --> 14:09.000 So we got that figured out. 14:12.000 --> 14:14.000 Alright, everybody, please quiet down. 14:14.000 --> 14:18.000 We're continuing after just a bit of technical difficulty. 14:19.000 --> 14:21.000 Okay, now it's not going to work. 14:21.000 --> 14:22.000 It's not going to work. 14:24.000 --> 14:29.000 Anyway, so if you pull this down, if you have, I've seen you have Python. 14:29.000 --> 14:31.000 You have Java. 14:31.000 --> 14:34.000 You have what's necessary, you know, to do this. 14:34.000 --> 14:37.000 Just pull it off and get up and I'll show you the link at the end of this. 14:37.000 --> 14:40.000 And once you're in, you're just going to the directory. 14:40.000 --> 14:45.000 Are you cloned into and of just hopefully a work. 14:49.000 --> 14:50.000 Alright. 14:57.000 --> 15:00.000 Just takes like 20, 15, 20 seconds. 15:09.000 --> 15:11.000 Anybody have any questions so far? 15:11.000 --> 15:12.000 Nope. 15:12.000 --> 15:19.000 It's always great to have talks with live demos. 15:19.000 --> 15:21.000 I'm sorry. 15:21.000 --> 15:26.000 It's always great to have talks where you really see something happening, mate. 15:28.000 --> 15:30.000 Tick, tick, tick, tick. 15:30.000 --> 15:33.000 Alright, there we go. 15:33.000 --> 15:36.000 Okay, let's run the benchmark. 15:43.000 --> 15:48.000 Here we go. 15:48.000 --> 15:49.000 That's it. 15:49.000 --> 15:51.000 We got our benchmarks. 15:51.000 --> 15:55.000 Yep. 15:55.000 --> 15:58.000 That's not the easy part actually. 15:58.000 --> 15:59.000 Are two things. 15:59.000 --> 16:04.000 If you look at the statistics, so I mean, if you look at the other benchmarks we did, 16:04.000 --> 16:07.000 there's a variation of 20 to 30%. 16:08.000 --> 16:14.000 So you would say we failed, but we didn't because this is running my laptop. 16:14.000 --> 16:20.000 So as this is scaled, if we were using a very, if we scaled at thousands of, you know, 16:20.000 --> 16:26.000 to thousands of X, that variation would be maybe one to two percent. 16:26.000 --> 16:31.000 So the fact I'm doing in my laptop, there's a lot of Paris Titoz or Paris Heights, 16:31.000 --> 16:36.000 which would contribute to that 20 to 30% difference between the different benchmarks we ran. 16:37.000 --> 16:39.000 So that's it. 16:39.000 --> 16:43.000 And then what I found particularly interesting too, 16:46.000 --> 16:51.000 is that we just, it's very, you know, 16:51.000 --> 16:57.000 state was, I don't think it's the right word, but we just, you know, 16:57.000 --> 17:00.000 killed all and start over. 17:00.000 --> 17:04.000 That's it. 17:04.000 --> 17:05.000 It's done. 17:05.000 --> 17:08.000 It's gone. 17:08.000 --> 17:15.000 That's, that's out of the hard part. 17:15.000 --> 17:25.000 So hopefully my slide will be working again. 17:26.000 --> 17:29.000 Okay, so here's the shout outs to, you know, 17:29.000 --> 17:33.000 here's, if you want to, if you want to use the, you know, do this. 17:33.000 --> 17:37.000 Again, the easy part is just should be pulled this off from the GitHub and clone it. 17:37.000 --> 17:41.000 You can start doing your benchmarks on Cassandra. 17:41.000 --> 17:42.000 Hmm. 17:42.000 --> 17:45.000 The, you know, the acknowledgements, obviously Nick's next to us. 17:45.000 --> 17:48.000 And then Shaiid Khan, he's facing the US, he was working for us. 17:48.000 --> 17:50.000 Now he's working for Deloitte. 17:50.000 --> 17:53.000 And that's not so great, but anyway. 17:53.000 --> 17:56.000 And come give us a shout if you like. 17:56.000 --> 17:58.000 We love doing science and testing. 17:58.000 --> 17:59.000 That's what we like to do. 17:59.000 --> 18:00.000 Like to do. 18:00.000 --> 18:01.000 Thank you. 18:01.000 --> 18:03.000 Thank you very much for this game.