WEBVTT 00:00.000 --> 00:13.960 Hello everyone, I think I'm the final presentation for this room for today, for ever, no. 00:13.960 --> 00:19.240 Nice to see a lot of you around, forensics is quite a niche, so it's kind of fun to see 00:19.240 --> 00:23.760 a lot of people, also a lot of known faces. 00:23.760 --> 00:27.000 This presentation is called your function signature here, please. 00:27.000 --> 00:33.840 But first about me, boy, I'm Jeffrey Rung, I'm the lead scientist of the Explotation 00:33.840 --> 00:38.880 Team at the NFI, which is a team that gains initial access to devices for forensic research 00:38.880 --> 00:41.640 or other types of investigations. 00:41.640 --> 00:49.040 I've worked there for more than 10 years, I think, 12 now, and my main interests are a low 00:49.040 --> 00:54.240 level intersection between hard and software, like where you're going to write raw registers 00:54.240 --> 00:57.000 and try to de-imagines what I really like. 00:57.000 --> 01:04.880 I was also here two years ago, but then this t-shirt fit me a lot better, a bit of a shame. 01:04.880 --> 01:10.800 So the problem we have is we do a lot of reverse engineering, software reverse engineering, 01:10.800 --> 01:14.520 and software reverse engineering in the most basic form is two things. 01:14.520 --> 01:20.040 You annotate functions, variables, structures, and you form hypotheses about the working 01:20.040 --> 01:22.880 of these functions based on those annotations. 01:22.880 --> 01:27.440 You're essentially labeling stuff and then thinking about what it might do, and as you 01:27.440 --> 01:31.560 label more, you understand more. 01:31.560 --> 01:37.560 Most of the knowledge that you put in there as a researcher is the annotations. 01:37.560 --> 01:43.880 So there's some common problems when we do reversing simple, but very frequently occurring 01:43.880 --> 01:44.880 functions. 01:44.880 --> 01:48.200 I don't think I've ever seen an implementation of anything that didn't have MEMS at 01:48.200 --> 01:55.880 and MEMCOPY, so you will see it a lot, but it will differ in a lot of forms. 01:55.880 --> 02:01.440 Then you have some complex functions, and this doesn't look very fun to reverse, and 02:01.440 --> 02:11.120 it is actually not, but what you will see is that the core logic often looks like this, 02:11.120 --> 02:16.240 especially on firmware or on bad devices, you have these huge switch cases where you're 02:16.240 --> 02:20.960 trying to figure out a protocol that you don't really know based on the code, and then 02:20.960 --> 02:25.320 when you start to gain some insight into it, it goes faster. 02:25.320 --> 02:29.120 But you don't want to do this too often, and if you did this once, you don't want to do 02:29.120 --> 02:32.840 it again the next week. 02:32.840 --> 02:37.440 There's some common problems across implementations of functions, like I said, the MEMCOPY 02:37.440 --> 02:41.920 MEMS at you will see them everywhere, but the assembly that there's generated from your 02:42.000 --> 02:47.120 compiler, of course, it can differ with the same source files, and that is depending on 02:47.120 --> 02:51.200 the compiler options, but also on the way compilers are implemented. 02:51.200 --> 02:56.480 And there's a slightly different implementations of the same algorithms for all kinds 02:56.480 --> 03:02.440 of stuff and data structures, and there's no real easy way, at least, what I think is 03:02.440 --> 03:08.840 an easy way, to store and share this reversing knowledge and work, because again, looking 03:08.920 --> 03:13.440 at this, I don't want to do this two times in a month if I can help it. 03:13.440 --> 03:16.400 So there are some possible solutions. 03:16.400 --> 03:24.640 For example, we use Gidra a lot, and you have B, which is, it can make sort of signatures 03:24.640 --> 03:28.040 of functions that you can then compare and say, oh, does this look like this? 03:28.040 --> 03:30.640 But the problem is it's dependent on the Gidra decompiler. 03:30.640 --> 03:34.840 I don't know if any of you have looked at the Gidra decompiler source, but it's not very 03:34.840 --> 03:38.160 pretty, and it's subject to change every time. 03:38.160 --> 03:43.160 So if you start to build databases based upon these kinds of signatures, then they sort 03:43.160 --> 03:47.680 of don't work anymore over time, because the decompiler evolves. 03:47.680 --> 03:52.680 Now we have fuzzy hashing, which there's some transmigropapers on this. 03:52.680 --> 03:58.520 It works, but you don't really use the context of the function, so for more complicated 03:58.520 --> 04:01.120 stuff, it gets less and less efficient. 04:01.120 --> 04:06.720 And of course, you have Lumina, but that's either dependent, if you're not a problem 04:06.720 --> 04:12.800 in itself, but I prefer something that would be cross-stools. 04:12.800 --> 04:20.800 So start to look at entities with, we have a department that does a big data and neural networks. 04:20.800 --> 04:23.160 And we found some papers that looked very interesting. 04:23.160 --> 04:31.440 One of them is a J-trans, it's a way of training a model based on this assembly that makes 04:31.440 --> 04:36.400 it easy to match functions and basic blocks within functions. 04:36.400 --> 04:41.800 So I don't understand a lot of this, as I'm not a neural networking and data science 04:41.800 --> 04:48.080 kind of guy, but what you do is a function always exists out of multiple basic blocks. 04:48.080 --> 04:53.360 So for example, you run some code, then based on a condition that jumps to a different 04:53.360 --> 04:58.560 piece of the function or not, and these things where it jumps, when you snip them apart, 04:58.560 --> 05:00.000 they're called basic blocks. 05:00.000 --> 05:04.040 And these basic blocks, they have pointers and code paths to each other. 05:04.760 --> 05:09.480 The token embedding stuff, you will have to read the paper, because I cannot tell you how 05:09.480 --> 05:13.280 exactly they encoded this into the tokenizer for the model. 05:13.280 --> 05:21.000 But it is quite nicely done where they take basic block, they strip away all the constants, 05:21.000 --> 05:24.400 and they keep the jumps between basic blocks and the assembly. 05:24.400 --> 05:30.840 So the model is aware of what a function looks like based on how it runs. 05:31.080 --> 05:36.920 That is very handy, because if you throw enough functions at it to train the model, 05:36.920 --> 05:41.240 you're actually training it to look at the structure of the functions, which makes it quite 05:41.240 --> 05:43.720 powerful, so we wanted to try this. 05:43.720 --> 05:51.960 But there is one little problem, JTrans is trained for x86, and we don't do a lot of x86, 05:51.960 --> 05:56.280 we do some, but not a lot, most of our stuff is armed. 05:56.360 --> 06:03.240 So we thought, oh, maybe I can go to these neural network guys at our lab, and just say, 06:03.240 --> 06:05.880 oh, can we retrain this for rm64? 06:05.880 --> 06:13.560 And they said, yeah, sure, turns out it's not that easy, because there was code for pre-training 06:13.560 --> 06:18.200 and everything, code quality is not that great, so at the redrawed lot of stuff, but then 06:18.200 --> 06:20.280 the end they ended up doing it. 06:20.280 --> 06:25.800 So what you then need, so they told me like, if you want to do this, you need a lot of functions 06:25.800 --> 06:32.360 in binary form with the source available, and the way you train a model to say, oh, this function 06:32.360 --> 06:37.720 is this function, but they're not exactly the same, you also need, like, a way to change a function, 06:37.720 --> 06:41.160 and that's quite easy, because you just compile it with different optimization levels, which 06:41.160 --> 06:45.480 was in the original paper as well, and then you take those functions out using Gidra and 06:45.480 --> 06:48.040 compare it into each other and you train on that. 06:48.040 --> 06:53.880 To do that, I took the rs user repo, and just compiled everything, I could get my hands on, 06:54.040 --> 06:58.760 and it took a long time, it took a lot of CPU power as well, but I ended up with about six 06:58.760 --> 07:08.040 thousand binaries and about 2.1 million functions to use for training, and I don't know if we're 07:08.040 --> 07:15.400 going to hear it, but I went on vacation, and the people in my team did not, and I thought, 07:15.400 --> 07:20.280 oh, I'll just leave it run while I'm away for three weeks, and I don't know if I have sound, 07:20.520 --> 07:35.000 I don't think so, no, maybe, no, just like the sound, yeah, no, it sounds like a damn jet engine, 07:37.000 --> 07:40.920 we have these dual computers, and normally they're pretty quiet, but when I get to a warm, 07:40.920 --> 07:46.680 they're just getting to full on panic mode, and I got a text, it said, if you don't turn this shit right off, 07:46.760 --> 07:53.320 no, we're going to burn your computer, so at the end, I ended up training it on my cluster, 07:58.760 --> 08:09.480 it's my beautiful setup that doesn't work, and I was doing this like, it's 24, so 08:09.480 --> 08:18.840 they were not very happy about this, I cannot imagine why, so in the end, I ended up giving 08:18.840 --> 08:22.600 making a dataset, they ended up training them all, but then I thought, okay, how are we going to 08:22.600 --> 08:27.320 actually use this, because you want it to be usable, so then it cannot just run on my machine, 08:27.320 --> 08:32.040 it has to be somewhat usable, so I thought I could make a lot of slides about this, but it's 08:32.120 --> 08:41.880 better to do a live, though, hopefully that will go better than the video, so I think it's easiest 08:41.880 --> 08:52.280 to do, yeah, I can clip it on, this clip is to advance for me, I don't understand any of this, 08:52.280 --> 09:06.200 no, but, it's magnetic, and that works, so, I'm going to get this here, 09:10.440 --> 09:21.320 so here we have Gidra, there with me, while I try to change some weird as display settings, 09:22.280 --> 09:43.160 yes, that works, okay, so here we have Gidra, while you don't have Gidra apparently, 09:53.080 --> 10:01.240 yes I'm gonna take it, I think it's not, but so, 10:07.880 --> 10:13.480 there is more effect and full swing, 10:22.280 --> 10:37.280 Yeah, it looks good. 10:37.280 --> 10:47.280 The way I'm looking at it, it looks good. 10:47.280 --> 10:48.280 Perfect. 10:48.280 --> 10:50.280 There we go. 10:50.280 --> 10:52.280 I'm going to be there for a bit. 10:52.280 --> 10:54.280 We're going to do it like this. 10:54.280 --> 10:58.280 So I wanted to show you the server, but it's not so interesting. 10:58.280 --> 11:02.280 It's just a hypercorn server, which is fast API. 11:02.280 --> 11:10.280 For people who think you're better than everyone. 11:10.280 --> 11:14.280 Let's see. 11:20.280 --> 11:24.280 I broke it. 11:24.280 --> 11:28.280 Not again. 11:28.280 --> 11:30.280 Yeah, it works. 11:30.280 --> 11:32.280 I will find a thingy. 11:32.280 --> 11:34.280 So I run a server in the background. 11:34.280 --> 11:38.280 Just a fast API thing that has now a SQLite database. 11:38.280 --> 11:40.280 You used to be... 11:40.280 --> 11:44.280 You used to be postgres. 11:44.280 --> 11:48.280 And it's going to be postgres again. 11:48.280 --> 11:51.280 It's quite a lot faster, I have to say. 11:51.280 --> 11:56.280 And that just runs the model with an HTTP front end. 11:56.280 --> 12:00.280 So what I've done, maybe I can show you that. 12:00.280 --> 12:04.280 What I've done is I've compiled an entrusted firmware, which is... 12:04.280 --> 12:09.280 I would say the reference implementation for a lot of secure monitor on arm devices. 12:09.280 --> 12:15.280 So the thing that monitors when you switch between press on and back. 12:15.280 --> 12:20.280 And your computer, which is a very interesting surface to reverse engineer. 12:20.280 --> 12:25.280 And I've compiled everything in multiple optimization levels. 12:25.280 --> 12:30.280 So, for example, if we look at O2, 12:30.280 --> 12:36.280 the UI scale is not normally like this after so. 12:36.280 --> 12:42.280 We click this away. 12:42.280 --> 12:47.280 And you should be able to see this. 12:47.280 --> 12:50.280 The UI scale is a bit... 12:50.280 --> 12:53.280 But we'll get through it together, I swear. 12:53.280 --> 13:04.280 Now, if I take the function window... 13:04.280 --> 13:06.280 We just select a function. 13:06.280 --> 13:09.280 And what I usually do when reversing, if I don't know anything yet, 13:09.280 --> 13:14.280 I just sort by reference count and start reversing the most reference functions. 13:14.280 --> 13:18.280 Because I usually get your log functions in your main copy and your main comparison. 13:18.280 --> 13:21.280 So, we take the log function. 13:21.280 --> 13:24.280 And you can see that this is the plugin for a model. 13:24.280 --> 13:31.280 It matches 1 to 1, which is logical, because I've made the signatures based on the O2 version. 13:31.280 --> 13:34.280 We'll do it once more just to show it at the works. 13:34.280 --> 13:36.280 We'll take a bit of a bigger function. 13:36.280 --> 13:38.280 And then again, it's 1. 13:38.280 --> 13:42.280 So, these are the database that this runs against. 13:42.280 --> 13:48.280 So, when you click on a function in Gidra, it just sends the whole basic log structure as Jason to the server. 13:48.280 --> 13:51.280 That makes it an abatting, compares it to the database and sends it back. 13:51.280 --> 13:53.280 These look the most like it. 13:53.280 --> 13:58.280 Now, O2 is not that interesting, of course, because it will match perfectly. 13:58.280 --> 14:01.280 Let's take O0, for example, no optimization. 14:01.280 --> 14:04.280 Then we do the same thing. 14:04.280 --> 14:07.280 Take the function window. 14:07.280 --> 14:11.280 No, no, thank it. 14:11.280 --> 14:16.280 Love me some Python too. 14:16.280 --> 14:23.280 And we do the same thing with take TF log. 14:23.280 --> 14:25.280 Now, this is not the same function, you can imagine. 14:25.280 --> 14:27.280 Because O2 automizes a lot. 14:27.280 --> 14:36.280 So, the assembly will look fully different, which we will never be able to see on this resolution. 14:36.280 --> 14:39.280 But that's fine. 14:39.280 --> 14:45.280 And if we go to the functions of O2, we say TF log. 14:45.280 --> 14:50.280 Oh, shit, sorry, here. 14:50.280 --> 14:57.280 So, O0, it will match by quite a good amount. 14:57.280 --> 15:02.280 So, the problem with this is that the absolute numbers they don't really mean a lot. 15:02.280 --> 15:08.280 It just, what you want is that the first entry here discriminates very well against the next entry. 15:08.280 --> 15:13.280 So, then you know like, oh, it's quite certain that at least this looks like TF log. 15:13.280 --> 15:16.280 Or it has TF log in line, something like that. 15:16.280 --> 15:19.280 And we can do that for a couple of functions. 15:19.280 --> 15:23.280 So, if we take another one, console flush again. 15:23.280 --> 15:27.280 Now, we see also this goes very well. 15:28.280 --> 15:32.280 This is quite easy, because we here have a console flush. 15:32.280 --> 15:35.280 So, we already have the symbols for this thing. 15:35.280 --> 15:37.280 But you can imagine if you're reversing. 15:37.280 --> 15:42.280 It would be quite handy if this thing gives you some decompilation, some disassembly. 15:42.280 --> 15:45.280 And you don't know what you're looking at. 15:45.280 --> 15:48.280 It would be quite nice to have that function there. 15:48.280 --> 15:53.280 Now, by the power of having already done this. 15:53.280 --> 15:56.280 I took a strip to one, and this is OS. 15:56.280 --> 15:59.280 So, it optimizes, but in a very different way. 15:59.280 --> 16:02.280 It will not unroll loops or anything like O3 will. 16:02.280 --> 16:08.280 And here, if we do the same thing, you can see these functions they don't have any names. 16:08.280 --> 16:12.280 So, I don't also know what they are now. 16:12.280 --> 16:16.280 But there you can see, ah, this is a CM set of context. 16:16.280 --> 16:19.280 Where is the OS? 16:20.280 --> 16:23.280 Yeah, yeah. This is CM set of context, for example. 16:23.280 --> 16:29.280 It takes some different ones. 16:29.280 --> 16:30.280 Now, you will also see it. 16:30.280 --> 16:33.280 They're just named FUNK. 16:33.280 --> 16:35.280 Take a different big function. 16:35.280 --> 16:38.280 So, it's a really ergonomic setup, put them on. 16:38.280 --> 16:41.280 Let's see. 16:42.280 --> 16:45.280 Woo. 16:45.280 --> 16:52.280 And still, it will say, ah, that is probably console flush. 16:52.280 --> 16:57.280 So, you can see that if you're reversing, this would be kind of nice to have. 16:57.280 --> 17:00.280 And the nice thing is, it works that way around as well. 17:00.280 --> 17:01.280 When you name a function. 17:01.280 --> 17:05.280 So, when, for example, I name the FUNK bla bla console flush. 17:05.280 --> 17:10.280 Or I double click on this one, which makes it auto name it. 17:11.280 --> 17:13.280 Then it will also send it to the server. 17:13.280 --> 17:15.280 And say, ah, okay, now you've named that. 17:15.280 --> 17:19.280 So, if we're reversing a bunch of us, and there's five of us in one room. 17:19.280 --> 17:22.280 If we're reversing an I name something, then immediately for the next person, 17:22.280 --> 17:26.280 if they click that function, even if it's in a different bootloader or 17:26.280 --> 17:29.280 different secure monitor or anything, it will say, ah, it's probably that. 17:29.280 --> 17:31.280 It's exactly what I was looking for. 17:31.280 --> 17:34.280 Because this model will not change for now. 17:34.280 --> 17:37.280 And even if it changes, you can, 17:37.280 --> 17:42.280 they really make the signatures, because it also sends the basic blocks to the server. 17:42.280 --> 17:46.280 And you can keep on using this. 17:46.280 --> 17:48.280 The assembly will not change and build a database. 17:48.280 --> 17:50.280 It gets bigger and bigger and bigger over time. 17:50.280 --> 17:55.280 And in the end, you're never going to manually name MemCopy MemCompare again. 17:55.280 --> 17:57.280 That scusi switch statement you saw. 17:57.280 --> 18:00.280 That, well, you will also never have to do that by hand again, 18:00.280 --> 18:02.280 because I've already done it once. 18:02.280 --> 18:04.280 So, why do it again? 18:04.280 --> 18:07.280 So, that's what I wanted to show you. 18:07.280 --> 18:13.280 And all of this, like, 18:13.280 --> 18:20.280 got to fix the world's most junky setup. 18:34.280 --> 18:37.280 All of this work is open source. 18:37.280 --> 18:38.280 No. 18:38.280 --> 18:39.280 No. 18:39.280 --> 18:40.280 Let's see. 18:40.280 --> 18:41.280 One sec. 18:47.280 --> 18:49.280 All of this work is open source. 18:49.280 --> 18:52.280 You can find it on our GitHub, the model is on hogging phase. 18:52.280 --> 18:55.280 So, you can use it yourself if you want to as well. 18:55.280 --> 18:59.280 And you can set up your own server, the documentation is all there. 18:59.280 --> 19:02.280 So, I have a QR code somewhere. 19:03.280 --> 19:05.280 But I don't know where anymore. 19:11.280 --> 19:13.280 Ah, one sec. 19:33.280 --> 19:38.280 So, it exists from free parts. 19:38.280 --> 19:40.280 You have us in transformers, which is the model. 19:40.280 --> 19:41.280 It's on hogging phase. 19:41.280 --> 19:45.280 I would also share this presentation with you all so you can have the links. 19:45.280 --> 19:46.280 This is the model. 19:46.280 --> 19:49.280 The training set, unfortunately, I wanted to share it. 19:49.280 --> 19:51.280 But all these things are source. 19:51.280 --> 19:53.280 If you compile them, you're distributing binaries. 19:53.280 --> 19:57.280 And it's quite hard to get through all the licenses and be sure that that's allowed, 19:57.280 --> 19:59.280 or what you have to deliver with it. 19:59.280 --> 20:02.280 So, our legal department doesn't want it. 20:02.280 --> 20:07.280 It might slip on to the internet somewhere. 20:07.280 --> 20:08.280 So, this is just a model. 20:08.280 --> 20:10.280 You can run it standalone. 20:10.280 --> 20:13.280 The partner appers everything are there. 20:13.280 --> 20:15.280 This is the server that we're running against. 20:15.280 --> 20:17.280 It's just a fast API implementation. 20:17.280 --> 20:18.280 It's actively developed. 20:18.280 --> 20:20.280 It gets out as a little bit silent on the repo. 20:20.280 --> 20:23.280 But in May, we are continuing this work again. 20:23.280 --> 20:26.280 With a new cross-instruction set model. 20:26.280 --> 20:29.280 Because first we did a lot of arm 64. 20:29.280 --> 20:30.280 Now we're seeing some risk fee. 20:30.280 --> 20:32.280 I want to do VXIR as well. 20:32.280 --> 20:34.280 Like an intermediate representation. 20:34.280 --> 20:36.280 You can run this server on your own. 20:36.280 --> 20:38.280 And in the setup that I just showed you. 20:38.280 --> 20:40.280 And you have Centencia, which is for now. 20:40.280 --> 20:41.280 Just a Gidra plugin. 20:41.280 --> 20:44.280 The one that I just showed you that you point to the server. 20:44.280 --> 20:48.280 And if you click in a function, it will request what functions it knows. 20:48.280 --> 20:51.280 And if you label a function, it will put it in the database. 20:51.280 --> 20:55.280 I am very much planning on also making an item plugin. 20:55.280 --> 20:57.280 Because for us, some people also use it either. 20:57.280 --> 21:00.280 So we also want to be able to use it. 21:00.280 --> 21:03.280 And that would mean you have a sort of a lumina-like thing 21:03.280 --> 21:06.280 across Gidra, Idra and everything. 21:06.280 --> 21:09.280 So please use it. 21:09.280 --> 21:13.280 And please contribute. 21:13.280 --> 21:15.280 That was it. 21:16.280 --> 21:22.280 Sorry for the janky demo. 21:22.280 --> 21:25.280 Are there any questions? 21:25.280 --> 21:27.280 Excuse me. 21:33.280 --> 21:38.280 In essence, how well does this method generalize? 21:38.280 --> 21:42.280 If the source code is different, for example, rust? 21:42.280 --> 21:44.280 That's a bit of the problem. 21:44.280 --> 21:47.280 If you have a project that's compiled in rust. 21:47.280 --> 21:51.280 And it's different from the same project that was originally seen. 21:51.280 --> 21:54.280 You will not see any generalizations between those two things. 21:54.280 --> 21:59.280 If you were a first or rust project, you're versing the next version of a rust project. 21:59.280 --> 22:00.280 It will still work fine. 22:00.280 --> 22:03.280 Because it's so low level that it only looks at the basic blocks of a function. 22:03.280 --> 22:07.280 And it doesn't really matter if there are v tables or anything. 22:07.280 --> 22:10.280 Because they're in the older version as well. 22:11.280 --> 22:13.280 But can you still relate to assembly? 22:13.280 --> 22:17.280 Because if you use a rust or see, 22:17.280 --> 22:20.280 they are semi-insured with at least the same, right? 22:20.280 --> 22:21.280 Yes. 22:29.280 --> 22:32.280 Yes, so the question is, even if you use rust or see, 22:32.280 --> 22:35.280 you can still use the assembly to use the model. 22:35.280 --> 22:37.280 Yes, yes. 22:37.280 --> 22:39.280 The rust just compiles to assembly. 22:39.280 --> 22:40.280 And it's not in assembly level. 22:40.280 --> 22:41.280 It's not that different. 22:41.280 --> 22:44.280 It is very different with calling conventions and v tables. 22:44.280 --> 22:49.280 And this debug, like stuff and runtime, stuff that it has. 22:49.280 --> 22:51.280 But on the things that the model sees, 22:51.280 --> 22:56.280 it's just jumps between basic blocks and certain minimonics. 22:56.280 --> 23:03.280 And it's almost the same question about sequence blocks. 23:03.280 --> 23:09.280 I mean, doing the rust and change because less from the amateur, 23:09.280 --> 23:14.280 this regard is pretty hard because it's the manual. 23:14.280 --> 23:16.280 And the overloading, et cetera. 23:16.280 --> 23:19.280 So what is learning? 23:19.280 --> 23:21.280 Yeah, it is. 23:21.280 --> 23:23.280 It is also hard to reverse. 23:23.280 --> 23:24.280 Sorry. 23:24.280 --> 23:29.280 If I understand correctly, it's a seatless plus is quite hard to reverse. 23:29.280 --> 23:32.280 Will the model still work correctly and the implementation? 23:32.280 --> 23:33.280 Yeah. 23:33.280 --> 23:36.280 In essence, the reversing is a lot harder. 23:36.280 --> 23:38.280 And that's going to stay that way. 23:38.280 --> 23:39.280 But the model will not see that. 23:39.280 --> 23:42.280 The model will just see the basic blocks, even if it's seatless plus. 23:42.280 --> 23:44.280 And I will see it calling v tables. 23:44.280 --> 23:49.280 But to the model is just the referencing of v table and offset inside it. 23:49.280 --> 24:03.280 So true, but on an assembly level, that doesn't really matter to the model. 24:03.280 --> 24:06.280 So you will have to know it as the reverser. 24:06.280 --> 24:12.280 But if you say this thing is the template for this thing, then to the model, 24:12.280 --> 24:15.280 it just says template for this function. 24:15.280 --> 24:19.280 One of the things I also use this for, maybe I forgot too much. 24:19.280 --> 24:23.280 It's just compile a project from source in different levels. 24:23.280 --> 24:26.280 Throw them in the database and then you start reversing something. 24:26.280 --> 24:28.280 And then it says, oh, I already know these things. 24:28.280 --> 24:31.280 You can do the same thing with C++ or Rust. 24:31.280 --> 24:32.280 Just compile it. 24:32.280 --> 24:33.280 Throw it in Guidro. 24:33.280 --> 24:34.280 Throw it against the database. 24:34.280 --> 24:40.280 And then if you start reversing, you see what the symbols in the assembly, what I've said, 24:40.280 --> 24:42.280 if it had debug symbols. 24:43.280 --> 24:45.280 So reversing is still quite hard. 24:45.280 --> 24:46.280 For C++. 24:57.280 --> 25:01.280 Not yet. Not yet. What I really want is for now. 25:01.280 --> 25:07.280 Sorry, how do you deal with things like v tables and structured definitions? 25:07.280 --> 25:09.280 At this point, we don't. 25:09.280 --> 25:11.280 So now it's only function names. 25:11.280 --> 25:13.280 And what I really want is function parameters. 25:13.280 --> 25:14.280 Structs. 25:14.280 --> 25:17.280 And that's going to be a lot harder because if you do an item plugin, 25:17.280 --> 25:19.280 and it should still be cross platform. 25:19.280 --> 25:22.280 You have to think about how you're going to make that portable, 25:22.280 --> 25:25.280 which of course could be had or false or anything. 25:27.280 --> 25:28.280 Yeah, yeah. 25:28.280 --> 25:32.280 As we go to start problem, for example, Guidro has a collaborative server. 25:32.280 --> 25:35.280 And there you can also collaborate over symbols. 25:35.280 --> 25:37.280 So you can say, oh, I have the symbols for this winery. 25:37.280 --> 25:40.280 My made some structs. You can also add it to them. 25:40.280 --> 25:43.280 And they're essentially something like get. 25:43.280 --> 25:46.280 But I've not done anything to implement that here. 25:46.280 --> 25:49.280 That would definitely be on the wish list. 25:56.280 --> 25:59.280 The nice thing is if you compile the art user repo, 25:59.280 --> 26:03.280 everybody gets to make their own package files. 26:03.280 --> 26:05.280 So he gets a lot of different components. 26:05.280 --> 26:08.280 He also gets rust in there and some different languages. 26:08.280 --> 26:12.280 Because it's like the whole landscape comes back 26:12.280 --> 26:14.280 and to what you've been put into the model. 26:18.280 --> 26:19.280 Sorry. 26:19.280 --> 26:22.280 I've also looked at different compilers. 26:22.280 --> 26:24.280 Yes, but we have. 26:24.280 --> 26:26.280 But because of the way the data set is built. 26:29.280 --> 26:30.280 Yes. 26:30.280 --> 26:35.280 Can you tell your model on your application with for the next? 26:35.280 --> 26:36.280 Yes. 26:36.280 --> 26:41.280 Do you want to write to reverse some of the Android packages? 26:41.280 --> 26:42.280 Yes. 26:42.280 --> 26:46.280 Native libraries, you mean our application? 26:46.280 --> 26:50.280 No, no, because then Guidro can do some of that stuff. 26:50.280 --> 26:53.280 But then you're looking at the Java native byte code. 26:53.280 --> 26:56.280 And at this point, it only supports ARM64. 26:56.280 --> 26:59.280 So you can do native libraries for Android, 26:59.280 --> 27:02.280 but not I will repeat the question. 27:02.280 --> 27:09.280 The question was if we've also looked at Android applications 27:09.280 --> 27:11.280 and stuff like that. 27:11.280 --> 27:14.280 But at this point, it doesn't support it yet. 27:15.280 --> 27:17.280 I have a lot of question. 27:17.280 --> 27:20.280 Because when you show it in the demo, 27:20.280 --> 27:22.280 look at the function names. 27:22.280 --> 27:23.280 We only need the first one. 27:23.280 --> 27:25.280 The green one was like good match. 27:25.280 --> 27:26.280 But the other ones. 27:26.280 --> 27:27.280 Do you need different functions? 27:27.280 --> 27:28.280 Yes. 27:28.280 --> 27:29.280 So you only look at the green one? 27:29.280 --> 27:31.280 Well, you need it. 27:31.280 --> 27:32.280 Sorry. 27:32.280 --> 27:35.280 So the only the top one is like a green. 27:35.280 --> 27:39.280 And it says, oh, this first function is the perfect one. 27:39.280 --> 27:42.280 And that's only the case in most of the functions. 27:42.280 --> 27:45.280 Some still don't discriminate that well. 27:45.280 --> 27:50.280 But it's also because this data set is very synthetic. 27:50.280 --> 27:53.280 But in essence, it doesn't really matter if the function is green. 27:53.280 --> 27:56.280 It mostly matters how much it differs from the other ones. 27:56.280 --> 27:59.280 If the model says, they're all look about the same to me. 27:59.280 --> 28:03.280 Usually, it's not an indication that there's nothing good in there. 28:03.280 --> 28:07.280 So you have any specific sort of false boxes? 28:07.280 --> 28:08.280 Yes. 28:08.280 --> 28:10.280 Well, suppose it is that we don't see a lot. 28:10.280 --> 28:13.280 Only, of course, the model will always give you 25 results. 28:13.280 --> 28:14.280 Now, the server. 28:14.280 --> 28:16.280 So even if it says, oh, it doesn't look like anything to me. 28:16.280 --> 28:18.280 You will still get the top 25. 28:18.280 --> 28:20.280 So if you're now for a different binary, 28:20.280 --> 28:22.280 it's going to say, all these results are shit. 28:22.280 --> 28:23.280 But. 28:23.280 --> 28:25.280 All right. 28:25.280 --> 28:26.280 Thank you. 28:26.280 --> 28:29.280 Yes. 28:29.280 --> 28:30.280 Yes? 28:30.280 --> 28:31.280 All right. 28:31.280 --> 28:32.280 Thank you. 28:38.280 --> 28:40.280 Thank you.