WEBVTT 00:00.000 --> 00:07.000 So, hey, I'm Daniel Amada. 00:07.000 --> 00:08.000 Thanks for having me here. 00:08.000 --> 00:12.000 It's the first time I'm ever in the graphics bedroom. 00:12.000 --> 00:14.000 I'm here today to talk about here. 00:14.000 --> 00:18.000 There's a new Rust GPU driver that would have been writing. 00:18.000 --> 00:21.000 And when I say we, there's me, Daniel. 00:21.000 --> 00:26.000 Some two people from Google, there's one lady from arm. 00:26.000 --> 00:30.000 And there's some volunteers as well, contributing code. 00:30.000 --> 00:33.000 So, this is a Rust GPU driver as I just said. 00:33.000 --> 00:42.000 So, what I want to start with is by talking a bit about what is a GPU kernel driver, 00:42.000 --> 00:43.000 right? 00:43.000 --> 00:47.000 And versus what is a user mode GPU driver? 00:47.000 --> 00:51.000 The difference is between the two because usually people get it confused 00:51.000 --> 00:54.000 and they think we're working on the user mode driver 00:54.000 --> 00:56.000 when we are not. 00:56.000 --> 01:01.000 And also, I'd like to begin with a little bit of the story behind this project. 01:01.000 --> 01:10.000 So, basically arm approached us to try and get a Rust driver done. 01:10.000 --> 01:17.000 At that point, they were experimenting with having a driver that was basically have written in C 01:17.000 --> 01:19.000 and then have written in Rust. 01:19.000 --> 01:21.000 And that did not go anywhere. 01:21.000 --> 01:27.000 So, we basically start over from a clean slate with a driver completely written in Rust. 01:27.000 --> 01:31.000 And after that, Google also joined the project. 01:31.000 --> 01:36.000 They're also contributing one or two engineers today. 01:36.000 --> 01:41.000 And the reason why we're using Rust is that turns out that it's fairly easy 01:41.000 --> 01:43.000 to compromise a GPU driver. 01:43.000 --> 01:47.000 And then, by compromising the GPU driver, you take over the entire system. 01:47.000 --> 01:52.000 And then, basically, you can steal the user's personal data, et cetera, et cetera. 01:52.000 --> 01:56.000 So, it seems like the GPU driver is really inoffensive, 01:56.000 --> 02:00.000 but actually it's one of the most targeted attack vectors. 02:00.000 --> 02:06.000 So, during the stock, we have to briefly talk about how GPUs work. 02:06.000 --> 02:13.000 Then we will discuss a little bit how the kernel driver exposes the API to the user space. 02:13.000 --> 02:16.000 We'll talk about a bit about job submission. 02:16.000 --> 02:19.000 And that will conclude the first part of what is this? 02:19.000 --> 02:21.000 And how does this work? 02:21.000 --> 02:24.000 And then the second part is, what is ahead of us? 02:24.000 --> 02:34.000 What's because my point here is to try to try to lay out some plan for the future. 02:34.000 --> 02:36.000 And this is like, what is ahead of us? 02:36.000 --> 02:38.000 How is the prototype looking like? 02:38.000 --> 02:42.000 What's the plan for this year and the next year? 02:43.000 --> 02:47.000 So, again, briefly, how would GPU work? 02:47.000 --> 02:50.000 Again, I'm just simplifying a few things here. 02:50.000 --> 02:57.000 But you're basically trying to have the geometry like this one, 02:57.000 --> 03:00.000 where you can have more complex shapes like this. 03:00.000 --> 03:04.000 And this, textures, which will give a shape, 03:04.000 --> 03:08.000 some appearance of a real-life object. 03:08.000 --> 03:12.000 And then shaders, for example, where their programs running on the GPU, 03:12.000 --> 03:17.000 you're going to compile that down to some machine code that can execute on the hardware. 03:17.000 --> 03:23.000 And then everything is going to be, you know, place inside some buffers in GPU memory. 03:23.000 --> 03:32.000 And like the vertex data, the texture data, your command stream has to be place inside of a buffer object as well, 03:32.000 --> 03:34.000 the shader machine code. 03:34.000 --> 03:42.000 And then, at some point, you basically record your commands into a software queue like a VKQ. 03:42.000 --> 03:45.000 And then that has to be backed by something. 03:45.000 --> 03:49.000 And there's something in this particular case for our molly is tier. 03:49.000 --> 03:51.000 Is the kernel driver, right? 03:51.000 --> 03:55.000 So, for an application like VKQ, for example, 03:55.000 --> 04:01.000 VKQ will be, you know, taking the geometry for a cube, the shaders, everything. 04:01.000 --> 04:05.000 By being that thing to Vulkan, Vulkan will eventually, you know, 04:05.000 --> 04:09.000 build a command buffer, place that command buffer. 04:09.000 --> 04:12.000 I'm sorry, the application will place the command buffer in the VKQ. 04:12.000 --> 04:17.000 And then the driver will take over, basically, after this point. 04:17.000 --> 04:24.000 So, in terms of, you know, how the code is structured, much of the code is actually implemented at the user mode level. 04:24.000 --> 04:30.000 So, when I say user mode level, I'm referring to Mesa for those of you who were here in the talk in the, 04:30.000 --> 04:32.000 to talk to go, basically. 04:32.000 --> 04:39.000 So, mostly Mesa, which is where the shader compiler lives, which is where the Vulkan driver lives, 04:39.000 --> 04:41.000 the jail driver lives. 04:41.000 --> 04:47.000 So, the majority of the actual stack is implemented at the user mode level. 04:47.000 --> 04:56.000 And then the kernel mode driver is just a small layer that will do things that user mode drivers cannot do. 04:56.000 --> 05:02.000 Basically, sharing the hardware between different people who are accessing the GPU, for example, 05:02.000 --> 05:09.000 ensuring isolation, bringing up power, for example, or clock, so things of these nature, 05:09.000 --> 05:15.000 allocating memory, things that the user mode driver cannot do on its own, 05:15.000 --> 05:17.000 because it requires more privilege. 05:17.000 --> 05:22.000 So, as I said, the user mode driver will be implementing the API, like Vulkan or GL, 05:22.000 --> 05:26.000 that will be also compiling the shaders, et cetera, et cetera. 05:26.000 --> 05:32.000 And then, eventually, has to talk to the kernel driver thing, where the kernel driver is basically 05:32.000 --> 05:35.000 making sure that the user mode driver is implementable. 05:35.000 --> 05:42.000 How, by basically, again, allowing for the user mode driver to locate the memory, 05:42.000 --> 05:46.000 providing a way for the user mode driver to access the GPU's ring buffer, 05:46.000 --> 05:51.000 which is where commands are going to be placed in, doing dependency management, 05:51.000 --> 05:58.000 for example, there is a lot of most jobs they will depend on a series of previous jobs, 05:58.000 --> 06:03.000 and then it's up to user space to build as dependency graph, and then tell the kernel about 06:03.000 --> 06:08.000 this dependency graph, such that when it comes at the time to execute a job, 06:08.000 --> 06:12.000 a job is not executed before its dependencies are done. 06:12.000 --> 06:17.000 So, basically, the kernel driver is the one keeping track of this dependency graph, 06:17.000 --> 06:20.000 and making sure that the dependency graph is followed. 06:20.000 --> 06:24.000 Power management, debugging facilities, because when things break, 06:24.000 --> 06:30.000 usually the kernel driver will be able to give you some information 06:30.000 --> 06:34.000 to dev core dump, because it has the privilege of just to do so. 06:34.000 --> 06:38.000 When the GPU breaks, I'm sorry, when the GPU goes down, because I don't know, 06:38.000 --> 06:42.000 it's crashed at the kernel driver is the one that can restart it, 06:43.000 --> 06:47.000 and again, the kernel mode driver is just the bridge between, 06:47.000 --> 06:53.000 like a much larger user mode driver and the overall hardware. 06:53.000 --> 06:57.000 So, for example, just to go into a little bit more detail of what's here is doing. 06:57.000 --> 07:01.000 So, GPUs nowadays, they can provide their own vision of, 07:01.000 --> 07:08.000 they can isolate different contexts using virtual memory through the IOMMU, 07:08.000 --> 07:11.000 much like we do for CPUs, for example. 07:11.000 --> 07:14.000 And this is something that the kernel driver is actually managing. 07:14.000 --> 07:19.000 So, the kernel driver is the one programming the IOMMU in the system, 07:19.000 --> 07:24.000 and basically making sure that one application cannot access the memory 07:24.000 --> 07:27.000 from another application, does it be a memory from another application? 07:27.000 --> 07:32.000 Again, we give the user mode driver the ability to say what he wants, 07:32.000 --> 07:36.000 where he wants things to be mapped and unmapped, 07:36.000 --> 07:40.000 but the kernel mode driver is in charge of programming the actual hardware, 07:40.000 --> 07:42.000 and the page tables. 07:42.000 --> 07:47.000 Synchronization, so basically the kernel mode driver is in charge of saying 07:47.000 --> 07:51.000 of letting user space know when jobs are done. 07:51.000 --> 07:56.000 So, again, so that the user mode driver doesn't really wait forever. 07:56.000 --> 08:03.000 Some resources are shared between multiple users like the GPU driver, 08:03.000 --> 08:09.000 the display driver, et cetera, et cetera. 08:09.000 --> 08:12.000 I've already talked about this. 08:12.000 --> 08:14.000 I've already talked about the dependency graph, 08:14.000 --> 08:17.000 where the user mode driver will be building those dependency graph, 08:17.000 --> 08:23.000 and then tier will be making sure that these dependencies are actually respected. 08:23.000 --> 08:27.000 And we're recovery, again, we've already spoken about this, 08:27.000 --> 08:28.000 bar management. 08:28.000 --> 08:31.000 The kernel is the only thing that can like bring up regulators, 08:31.000 --> 08:35.000 and basically when the device goes idle, 08:35.000 --> 08:40.000 again, it's the job of the kernel mode driver to actually lower a little bit 08:40.000 --> 08:43.000 of the clocks, and that's such as what you don't use as much power, 08:43.000 --> 08:46.000 and unfortunately, tier cannot do any of that in the moment. 08:46.000 --> 08:51.000 But it's blend, also debunked facilities we've talked about this. 08:51.000 --> 08:56.000 So, the API for tier is much smaller, again, 08:56.000 --> 08:58.000 than the API for user space. 08:58.000 --> 09:04.000 Because, again, user space is actually covering a full API like Vulcan, 09:04.000 --> 09:07.000 or OpenGL, but tier is basically just this. 09:07.000 --> 09:09.000 These are the main APIs that we have. 09:09.000 --> 09:14.000 So, creating and allocating GPU memory is the top row. 09:14.000 --> 09:20.000 Enforcing isolation between different applications using virtual memory regions, 09:20.000 --> 09:23.000 which is what the VM create and VM bind I often do. 09:23.000 --> 09:29.000 And then giving user space access to the GPUs ring buffer 09:29.000 --> 09:32.000 by creating an execution context. 09:33.000 --> 09:36.000 So, basically allocating GPU memory, 09:36.000 --> 09:41.000 a job submission, and enforcing isolation. 09:41.000 --> 09:48.000 So, here is the overall picture for what a GPU kernel mode driver looks like. 09:48.000 --> 09:52.000 We have the UAPI which is communicating with user space. 09:52.000 --> 09:57.000 And then, again, as I said, scheduling jobs is one of the major responsibilities. 09:57.000 --> 10:01.000 So, we have a component, a shared component between multiple drivers 10:01.000 --> 10:06.000 in DRM called the DRM GPU scheduler, which is doing this dependency tracking thing. 10:06.000 --> 10:12.000 We've been talking about there's one component, one shared component called DRM GPU VM, 10:12.000 --> 10:18.000 which is doing this isolation between multiple applications that we've been talking about. 10:18.000 --> 10:24.000 Allocating GPU memory is done through this gem infrastructure within DRM. 10:24.000 --> 10:27.000 And then, below that, we have the actual hardware. 10:27.000 --> 10:31.000 We have the VRAM, we have the firmware scheduler, 10:31.000 --> 10:34.000 we're going to be talking a little bit more about the firmware scheduler, 10:34.000 --> 10:36.000 and then the actual course. 10:36.000 --> 10:47.000 And on top of that, our Mali has its own firmware assisted jobs scheduling system, 10:47.000 --> 10:49.000 basically inside of it. 10:49.000 --> 10:52.000 And that's basically implemented through a microcontroller unit. 10:52.000 --> 10:57.000 So, this microcontroller, basically in tier, we never really interact with any of the course. 10:57.000 --> 11:01.000 We talked to this microcontroller unit through a shared memory region, 11:01.000 --> 11:06.000 and then in this microcontroller unit, we can basically allocate a ring buffer 11:06.000 --> 11:12.000 where we can place command buffers in, and then we can tell the microcontroller unit to execute that. 11:12.000 --> 11:18.000 At which point it will take over, and then tell tier when the work is done. 11:18.000 --> 11:25.000 So, again, if we go back to Vulcan, for example, we have command buffers being recorded into Qs. 11:25.000 --> 11:31.000 Then tier will back these Qs with what we call CFF groups, 11:31.000 --> 11:35.000 which is just one view into the GPUs ring buffer, basically. 11:35.000 --> 11:40.000 And then eventually once you place your command buffers inside of this ring buffer, 11:40.000 --> 11:45.000 the microcontroller will take over, the work will eventually get scheduled inside of the ring buffer, 11:46.000 --> 11:51.000 and then the firmware will eventually raise an interrupt to tell you, hey, this job is done. 11:51.000 --> 11:59.000 At which point you can tell user space, because again, the kernel mode driver is the one that has to tell user space when work has completed. 11:59.000 --> 12:05.000 So, this is basically what a kernel driver does. 12:05.000 --> 12:12.000 And this is not a specific to rust, but to all general kernel drivers. 12:13.000 --> 12:17.000 So, where are we with the rust driver and particular? 12:17.000 --> 12:26.000 And I don't know how many of you have seen my presentation at XDC, where I presented some to the three months ago, 12:26.000 --> 12:32.000 and basically back then we didn't have a functional prototype, but now we have, 12:32.000 --> 12:35.000 and this thing is actually running tier. 12:35.000 --> 12:41.000 So, this is being rendered using the rust driver at this moment, so it's running, it hasn't crashed so far. 12:41.000 --> 12:49.000 We ran this thing for three days during plumber, 12:49.000 --> 12:54.000 Linux Plumber's Conference in December, and we had, so there's this board here. 12:54.000 --> 13:00.000 I don't know how many of you can see, I'm not going to touch it, because imaging sits. 13:00.000 --> 13:07.000 So, we had controllers, actually let's plug it into the board, and then people were playing supertips cart, you know, the cart game, 13:08.000 --> 13:11.000 and people were playing against each other for three days and they didn't crash. 13:11.000 --> 13:18.000 So, at this point, I'm somewhat, I believe somewhat that this thing isn't going to crash on us, and he hasn't so far. 13:18.000 --> 13:23.000 So, basically, we have a prototype, but yeah, what now, right? 13:23.000 --> 13:32.000 We have two different strategies at the moment, so we have this thing downstream, which we basically wanted to ensure and tell the community 13:32.000 --> 13:39.000 that, hey, we can get a rust driver done that can be, as before, and as a C driver, 13:39.000 --> 13:45.000 and we wanted to show it with, you know, we wanted to show people something real. 13:45.000 --> 13:51.000 So, we did this downstream driver, we, again, we demoed it at plumber, et cetera, et cetera. 13:51.000 --> 13:57.000 We didn't do enough benchmarking at this point, because there's a lot of moving targets on the table, 13:57.000 --> 14:05.000 but we ran a few games and the performance between tier and the C driver called Panther is roughly similar. 14:05.000 --> 14:10.000 And by roughly similar, what I'm trying to say is you get roughly the same amount of frames, 14:10.000 --> 14:18.000 but we did not do actual benchmarking, which would be the right thing to do, but we're going to get there in the future. 14:18.000 --> 14:26.000 And then we have the upstream effort, which is basically the most important thing at this moment, right? 14:26.000 --> 14:30.000 Because it's no good to have a simple downstream driver like this. 14:30.000 --> 14:35.000 Well, we want to do is to have this thing basically be a part of the Linux kernel. 14:35.000 --> 14:41.000 And for that, we will have to figure out how to would stream what we currently have inside of the kernel. 14:41.000 --> 14:51.000 And as I've been saying, I've explained this a couple of slides ago, writing a GPU kernel driver requires a lot of shared components. 14:51.000 --> 15:03.000 I told you guys not when we were discussing GPU VM, the GPU scheduler in the picture that had the basically tier in a nutshell, something like that. 15:03.000 --> 15:09.000 And we need to have abstractions for the shared components for all drivers, not only tier. 15:10.000 --> 15:27.000 And they have to work, of course, for all drivers. So in particular nowadays we have Nova, which is, you know, been written by the NVIDIA people, which is a massive driver for NVIDIA hardware, which is going to support tiering and up. 15:28.000 --> 15:37.000 So I guess, Turing starts from 20XX series of NVIDIA GPUs, if I'm not mistaken. So Nova is going to support that and up. 15:37.000 --> 15:48.000 So a lot of hardware, a very complex driver, there's also a Zahi, who was initially written by Azahilina, right? 15:48.000 --> 15:53.000 And nowadays, the Azahil people have taken over the project because you left. 15:53.000 --> 16:05.000 And they're also trying to upstream the driver. So there's basically three of us, and RKVM, RVKMS, which is a mode setting driver, also written in Russ. So there's four drivers. 16:05.000 --> 16:10.000 And they're all going to use, you know, these shared infrastructure. 16:10.000 --> 16:18.000 So our plan initially was, let's get this infrastructure up, same, right? Because if we don't have the infrastructure, there's going to be shared by everybody. 16:18.000 --> 16:31.000 We cannot have our driver. It's a simple as that. So we basically have most of the DRM stuff upstream, so you can have, you can have a DRM device, a rendering node, show up. 16:31.000 --> 16:39.000 We have clocks and regulators so that you can bring the GPU hardware up because you need power and clocks signal for that. 16:39.000 --> 16:45.000 We need RKVM as I told you before, we need RKVM used to know when jobs are done, for example. 16:45.000 --> 16:52.000 And it's the main way through which we communicate with the microcontroller is through interrupts. 16:52.000 --> 16:58.000 And we also need work used for a lot of reasons. 16:58.000 --> 17:09.000 All of these things are downstream, unfortunately. And this is a problem, because for as long as these things are downstream, we cannot upstream or work because it works depends on this. 17:09.000 --> 17:15.000 So allocating GPU memory nowadays is not possible in the upstream Linux kernel. 17:15.000 --> 17:25.000 The reason being does gym, SHM, there's no way to communicate with that in Rust today in an upstream code. 17:25.000 --> 17:32.000 If there's no way to communicate with the gym framework, again, there's no way to allocate GPU memory. 17:32.000 --> 17:41.000 If there's no way to allocate GPU memory, as I said, allocating GPU memory is basically the number one thing that a criminal driver does, you can't do anything. 17:41.000 --> 17:52.000 So to answer a question that somebody may or may not have, where are we in upstream, we can basically probe the device and bring the power up basically. 17:52.000 --> 17:54.000 And not much more than that. 17:54.000 --> 17:57.000 Reason being, again, we cannot allocate memory. 17:57.000 --> 18:02.000 We cannot talk to the IOMM use, so we cannot provide the isolation thing. 18:02.000 --> 18:08.000 I've been talking about, we cannot have, we cannot have fences. 18:08.000 --> 18:13.000 Again, fences is this synchronization mechanism where you can basically wait on a fence. 18:13.000 --> 18:16.000 There's only two operations, waiting and signaling. 18:16.000 --> 18:22.000 And it's how the kernel driver tells us these are space, for example, that a given job has finished executing. 18:22.000 --> 18:24.000 It's how you build this dependency graphs. 18:24.000 --> 18:30.000 I've been talking about where you say, hey, this job wants to complete, it's going to give you one fence. 18:30.000 --> 18:35.000 And then this other job will depend on this fence, so it's how you build these dependency graphs. 18:35.000 --> 18:44.000 And we cannot allocate fences in tier because there's no way to talk to the current C code that does dot in Rust. 18:44.000 --> 18:52.000 And this GPU scheduler, again, one of the major responsibilities of a kind of driver is scheduling jobs. 18:52.000 --> 19:01.000 Again, by noticing which job has its dependencies met, such that this job can execute at the moment. 19:01.000 --> 19:04.000 This is done by this GPU scheduler component. 19:04.000 --> 19:08.000 There's no way to talk to the staying upstream at the moment. 19:08.000 --> 19:13.000 So there's very little that can be done, however, work tackly, all of these things. 19:13.000 --> 19:18.000 So, viewed Paul from Red Hat is working on the first bullet point. 19:18.000 --> 19:22.000 And I'm pretty sure his series is going to be merged soon. 19:22.000 --> 19:24.000 So soon we're going to get the first one. 19:24.000 --> 19:28.000 Second one is probably going to be merged in the next cycle. 19:28.000 --> 19:35.000 And then again, there's a person, Philip Stoner from Red Hat, who's working on the scheduler and the fence stuff. 19:35.000 --> 19:40.000 And just last week he put out a RFC. 19:40.000 --> 19:44.000 I'm not really sure how this RFC is going to perform. 19:44.000 --> 19:51.000 And it's not really your M GPU scheduler, but a new thing, but we're going to talk a little bit about this. 19:51.000 --> 19:57.000 But the good news is we have a prototype driver, so we can just take whatever code anybody puts out. 19:57.000 --> 20:03.000 And then backport that into the prototype and then test it out on a real device to see how it performs. 20:03.000 --> 20:08.000 Because drivers like Nova, for example, do you're not at a point where they can test a scheduler? 20:08.000 --> 20:16.000 Because they cannot submit jobs yet, because they did not get there yet. 20:16.000 --> 20:25.000 And for the prototype, which is the staying, I mean, I used to have these pictures there before I could present from the actual device. 20:25.000 --> 20:30.000 So I don't think pictures are really needed because we're seeing it being presented. 20:30.000 --> 20:36.000 We can basically watch YouTube play games. 20:36.000 --> 20:40.000 So yeah, gnome and Western are working. 20:40.000 --> 21:00.000 And in fact, let's just see whether we can run something like supertickscart. 21:00.000 --> 21:05.000 And hey, supertickscart is working. 21:05.000 --> 21:14.000 This GPU thingy at zero means that we don't have support for performance counters, which we did not get there yet. 21:14.000 --> 21:25.000 This thing, basically, we're actually only 50% sure that this thing will actually work until the end of the presentations because a lot of things to do is still. 21:25.000 --> 21:29.000 And you can play. 21:29.000 --> 21:42.000 Yeah, whatever. 21:42.000 --> 21:47.000 And yeah, there you go. 21:47.000 --> 21:50.000 You can basically play. 21:50.000 --> 21:55.000 And again, people played this for a few hours during plumbers. 21:55.000 --> 22:08.000 Um, how do I quit this? 22:08.000 --> 22:14.000 And then if we do, like, demas, grab tier. 22:14.000 --> 22:16.000 Tier is basically outputting a lot of things. 22:16.000 --> 22:20.000 Basically, all mappings have traces on them for debugging reasons. 22:20.000 --> 22:26.000 So every time you map and map memory, you bring something to the console. 22:26.000 --> 22:32.000 Let's see if VKQ is actually working. 22:32.000 --> 22:35.000 So yes, it's running on Molly, just 16. 22:35.000 --> 22:45.000 And Vulcan is working, albeit with some glitches because as we will discuss, there's a lot of shortcuts there. 22:45.000 --> 22:51.000 So if you run VKQ on a actual, you know, how do I say this? 22:51.000 --> 22:55.000 On the driver that has actually been deployed, let's put it this way. 22:55.000 --> 22:57.000 It doesn't stutter like this. 22:57.000 --> 23:02.000 This is where the synchronization is in 100% okay yet. 23:02.000 --> 23:08.000 We have taken some shortcuts to get this to work. 23:08.000 --> 23:09.000 But it works. 23:09.000 --> 23:13.000 I mean, I couldn't use that as my daily driver for email. 23:13.000 --> 23:18.000 And, you know, I, as long as I saved all my work. 23:18.000 --> 23:24.000 After, like, every three or five minutes, let's, uh, yes. 23:24.000 --> 23:25.000 All right. 23:25.000 --> 23:26.000 So gnome is working. 23:26.000 --> 23:27.000 VKQ is working. 23:27.000 --> 23:30.000 Super tics cart is working. 23:30.000 --> 23:33.000 This can we run Firefox? 23:33.000 --> 23:40.000 No? 23:40.000 --> 23:41.000 Yeah. 23:41.000 --> 23:47.000 So Firefox is working. 23:47.000 --> 23:48.000 All right. 23:48.000 --> 23:58.000 So let's get back to the presentation. 23:58.000 --> 24:03.000 So anyways, as I said, some parts will need more iterations before being ready. 24:03.000 --> 24:08.000 As you can obviously see, there's a few things that this thing does not have. 24:08.000 --> 24:11.000 For example, it doesn't have any power management code whatsoever. 24:11.000 --> 24:17.000 So the power management strategy for this thing is probe the driver, 24:17.000 --> 24:25.000 clock the GPU to the max clock frequency that you can, and then just leave it there. 24:25.000 --> 24:29.000 So what it was is not good enough, like, as I said, synchronization is not good enough. 24:29.000 --> 24:34.000 So I'm not going to be showing those, well, like, if you run super tics cart in windowed mode, 24:34.000 --> 24:39.000 it's actually glitchy and there's, uh, a few, um, how do I say that? 24:39.000 --> 24:40.000 Yeah, well, glitches. 24:40.000 --> 24:46.000 As you see, red and two different colors popping up every now and then, um, 24:46.000 --> 24:50.000 what I was just not here, um, oh, um, error recovery. 24:50.000 --> 24:54.000 So if the GPU crashes in this thing, you're, you're done. 24:54.000 --> 24:56.000 You have to review the computer. 24:56.000 --> 25:01.000 Usually, when the GPU crashes, the first thing you have to do is, hey, is this crash recoverable? 25:01.000 --> 25:03.000 Because some crashes are recoverable. 25:03.000 --> 25:08.000 And if they are, then you should rewind the state to, you know, 25:08.000 --> 25:14.000 you should basically recover and put the GPU in a state where the user can resume, uh, 25:14.000 --> 25:16.000 using the, the hardware. 25:16.000 --> 25:18.000 But this is not the case here. 25:18.000 --> 25:23.000 So if this crash has a crash has, and then you reboot the system and bye bye for all of your work. 25:24.000 --> 25:28.000 Let me see if I can remember something that is not implemented yet. 25:32.000 --> 25:37.000 Oh, um, you can only run H programs at the same time. 25:41.000 --> 25:43.000 For reasons. 25:45.000 --> 25:51.000 No, really, um, I said that the microcontroller can schedule jobs automatically. 25:51.000 --> 25:55.000 But that's only if you give it up to eight jobs, right? 25:55.000 --> 25:58.000 If you give it eight jobs, like from zero to seven jobs, 25:58.000 --> 26:02.000 it can pick between these eight and automatically scheduled them. 26:02.000 --> 26:05.000 But if you have more than eight, you have to have a software schedule on top, 26:05.000 --> 26:10.000 which is more code that is not as nice as nice and shiny to show people. 26:10.000 --> 26:12.000 So it's not there yet. 26:12.000 --> 26:15.000 Um, but what do we want to stay for? 26:15.000 --> 26:18.000 As we said, going forward now, 26:18.000 --> 26:23.000 we have a driver where we can test the code that other people from, 26:23.000 --> 26:27.000 from NVIDIA, from Red Hat, or any other contributor, 26:27.000 --> 26:30.000 we can test their code in a real device, right? 26:30.000 --> 26:36.000 Uh, we can basically backport their code into on top of our prototype and then run it and then collect data 26:36.000 --> 26:39.000 as does working, as does before meant or whatever. 26:39.000 --> 26:47.000 And if it works through tier, there's a good chance it's going to work for Azahi and for, um, and video as well. 26:47.000 --> 26:56.000 So again, how can we, you know, get this prototype and upstream it and, you know, 26:56.000 --> 26:59.000 get rid of the shortcuts, that's where we are at the moment. 26:59.000 --> 27:02.000 So we are focusing basically upstream now. 27:02.000 --> 27:05.000 We're going to get a lot of the code that we have on the stain. 27:05.000 --> 27:11.000 And then we're going to see you, hey, where's, um, where do we take any shortcuts or what can we improve? 27:11.000 --> 27:16.000 Or can we design the Rust API to actually be better or safer, 27:16.000 --> 27:19.000 or more performing, et cetera, et cetera. 27:19.000 --> 27:21.000 So this is where we are. 27:21.000 --> 27:25.000 There's actually a big discussion nowadays about how 27:25.000 --> 27:27.000 job submission is going to look like. 27:27.000 --> 27:34.000 Again, job submission is, let's say, one of the major responsibilities of a criminal or 27:34.000 --> 27:35.000 driver. 27:35.000 --> 27:41.000 And we are, this thing is using the GPU scheduler. 27:41.000 --> 27:44.000 Uh, the DRM GPU scheduler. 27:44.000 --> 27:49.000 That's written in C, however, as I said, this thing has a firmware. 27:49.000 --> 27:55.000 Uh, a microcontroller unit that can automatically schedule jobs up to age jobs. 27:55.000 --> 27:59.000 And then it doesn't really make sense for you to have this direct GPU 27:59.000 --> 28:01.000 DRM scheduler on top. 28:01.000 --> 28:06.000 And this is also the same thing for Azahi and the same thing for Nova. 28:06.000 --> 28:12.000 So it's looking like we're going to be writing a new scheduler directly and Rust for, 28:12.000 --> 28:16.000 for job submission, which is not going to schedule anything because as we said, 28:16.000 --> 28:19.000 the GPUs nowadays, they can do that on their own. 28:19.000 --> 28:22.000 It's only going to do the dependency management part, right? 28:22.000 --> 28:26.000 So figuring out whether the dependencies are met before executing work. 28:26.000 --> 28:30.000 And Red Hat is basically working, or spearheading does effort. 28:30.000 --> 28:33.000 They're calling it the job queue. 28:33.000 --> 28:38.000 And we're working together with them to help them test it, give them input on the design. 28:38.000 --> 28:42.000 And so on and so forth. 28:42.000 --> 28:46.000 So as I said, we're going to start with a clean slate. 28:46.000 --> 28:49.000 Try to upstream whatever you have here. 28:49.000 --> 28:52.000 Try to upstream whatever the dependencies are. 28:52.000 --> 28:56.000 Still, not upstream in order to get a driver upstream. 28:56.000 --> 29:01.000 And then we're going to go, uh, focus some effort on benchmarking. 29:01.000 --> 29:02.000 Basically. 29:02.000 --> 29:05.000 So how performant is this thing compared to the C driver? 29:05.000 --> 29:09.000 We're in a couple of games, but we need more data than that. 29:09.000 --> 29:15.000 We're going to run CTS and ensure that CTS is passing, you know, that sort of thing. 29:15.000 --> 29:21.000 On the upstream driver as we get there. 29:21.000 --> 29:25.000 So basically, this is what I had to show for today. 29:25.000 --> 29:29.000 It's sorry I did not bring the controllers, otherwise your guys could play for 10 minutes. 29:29.000 --> 29:31.000 It'd be cool. 29:31.000 --> 29:33.000 Do you have any questions? 29:33.000 --> 29:36.000 Which talk are you using in your demo? 29:36.000 --> 29:38.000 Arcade 3588. 29:38.000 --> 29:42.000 The question was, which SLC we're using for the demo. 29:42.000 --> 29:45.000 And it's the Arcade 3588. 29:45.000 --> 29:46.000 Hi. 29:46.000 --> 29:47.000 Thank you for your talk. 29:47.000 --> 29:52.000 And you said that you don't need scheduling or up to eight jobs, 29:52.000 --> 29:56.000 but then you need some kind of scheduling. 29:56.000 --> 29:58.000 And then you said, why? 29:58.000 --> 30:01.000 Let's get early in front of the stuff. 30:01.000 --> 30:02.000 Let's get you. 30:02.000 --> 30:04.000 I won't choose by that. 30:04.000 --> 30:05.000 All right. 30:05.000 --> 30:08.000 So CSF will give you eight. 30:08.000 --> 30:10.000 So sorry. 30:10.000 --> 30:19.000 So the question was, the person was confused because I said that the GPU can schedule up to eight jobs. 30:19.000 --> 30:22.000 But then we need the software schedule on top. 30:22.000 --> 30:26.000 And I told that the job queue is not going to be scheduling anything. 30:26.000 --> 30:31.000 So there was a confusion in all of these different information. 30:31.000 --> 30:40.000 So to answer your question, CSF, which is this firmware assisted scheduling thing, 30:40.000 --> 30:43.000 will it give you up to eight slots where you can basically, 30:43.000 --> 30:48.000 you have eight ring buffers where you can place your command streams in there. 30:48.000 --> 30:53.000 And then it can automatically schedule up to these eight ring buffers. 30:54.000 --> 31:00.000 If you have more than eight jobs trying to use the GPU simultaneously, 31:00.000 --> 31:01.000 you have two options. 31:01.000 --> 31:05.000 Either you say, hey, the GPU is busy, which is what this is doing. 31:05.000 --> 31:10.000 So just return ebusy or you boot someone out. 31:10.000 --> 31:15.000 You stop the world and have a look, hey, who's idle or if nobody's idle, 31:15.000 --> 31:22.000 who can I remove from the ring buffer at this moment to free up one slot to give it to another 31:22.000 --> 31:25.000 job basically. 31:25.000 --> 31:31.000 So this is what I'm referring when I say software scheduler. 31:31.000 --> 31:33.000 It's this component. 31:33.000 --> 31:37.000 Now this component is really only for tier and panther, 31:37.000 --> 31:41.000 because for Azai and for Nova, 31:41.000 --> 31:46.000 they basically can have as many ring buffers as they want. 31:46.000 --> 31:49.000 It depends on the amount of memory, but there's no restriction. 31:49.000 --> 31:52.000 No eight or 16 is lots. 31:52.000 --> 31:58.000 So when I say that this job queue thing is not going to be doing any scheduling, 31:58.000 --> 31:59.000 this is what I mean. 31:59.000 --> 32:01.000 If you're driver, for example, 32:01.000 --> 32:06.000 needs to have any extra scheduling, 32:06.000 --> 32:08.000 because you only have a limit at amount of ring buffers, 32:08.000 --> 32:10.000 then you have to do that on top. 32:10.000 --> 32:14.000 And so far, this is only the case for tier. 32:14.000 --> 32:17.000 Any other questions? 32:18.000 --> 32:20.000 I have a more general question. 32:20.000 --> 32:21.000 You have some of that. 32:21.000 --> 32:23.000 In GPU memory allocation of the course, 32:23.000 --> 32:24.000 don't worry. 32:24.000 --> 32:26.000 But in a scope, in a scope, 32:26.000 --> 32:29.000 modern arm chips, which have tasks of CPUs, 32:29.000 --> 32:31.000 CPUs, GPUs, GPUs, 32:31.000 --> 32:34.000 and CPU core for GPU and stuff like this. 32:34.000 --> 32:37.000 And surrounded by high-binded memory, share memory. 32:37.000 --> 32:41.000 Do we really need to allocate if this memory is heavy to share 32:41.000 --> 32:43.000 it with all types of chip? 32:43.000 --> 32:45.000 Well, all types of course. 32:45.000 --> 32:49.000 So the question was in correct me if I'm wrong, 32:49.000 --> 32:52.000 given that modern SLCs have a lot of memory that's shared 32:52.000 --> 32:54.000 between a lot of components. 32:54.000 --> 32:56.000 Do we really need to allocate memory? 32:56.000 --> 32:58.000 The answer is yes. 32:58.000 --> 33:00.000 So basically, 33:00.000 --> 33:05.000 you need to allocate a portion of the system's memory 33:05.000 --> 33:07.000 to the GPU. 33:07.000 --> 33:09.000 And this is what Jim is doing. 33:09.000 --> 33:13.000 So Jim is going to use the SH, 33:14.000 --> 33:17.000 SHM, FS layer, or anyways, 33:17.000 --> 33:19.000 to actually do the allocation from you. 33:19.000 --> 33:24.000 And it's going to carve out memory from the system 33:24.000 --> 33:25.000 overall memory. 33:25.000 --> 33:26.000 So yes. 33:26.000 --> 33:29.000 I question about crucial routes. 33:29.000 --> 33:32.000 Did you do it only because the memory is empty, 33:32.000 --> 33:35.000 or you're looking to get some components? 33:35.000 --> 33:38.000 And if it's even possible in a framework. 33:38.000 --> 33:41.000 Oh, sorry. 33:42.000 --> 33:44.000 So the question is, are we doing the standard routes 33:44.000 --> 33:46.000 only because of the memory safety? 33:46.000 --> 33:50.000 Or are we also looking to get more performance in rust? 33:50.000 --> 33:53.000 So performance is actually a known goal. 33:53.000 --> 33:55.000 More performance is actually a known goal. 33:55.000 --> 34:01.000 What we're trying to do is to get as much performance as the C driver. 34:01.000 --> 34:04.000 And the reason why is, 34:07.000 --> 34:10.000 it's basically not a goal of the language to be faster 34:10.000 --> 34:12.000 than C in general. 34:12.000 --> 34:14.000 And even if the language is faster, 34:14.000 --> 34:18.000 having a faster kernel driver wouldn't necessarily make it 34:18.000 --> 34:20.000 faster overall. 34:20.000 --> 34:22.000 Because the kernel driver is not really the bottleneck 34:22.000 --> 34:23.000 most of the time. 34:23.000 --> 34:26.000 So having the kernel driver execute faster wouldn't necessarily 34:26.000 --> 34:29.000 need to games executing much faster. 34:29.000 --> 34:32.000 So no, we're only doing it for the safety. 34:36.000 --> 34:37.000 Yes. 34:37.000 --> 34:42.000 You said something about the developer and the as I drive up. 34:42.000 --> 34:47.000 If there anything you heard from people from AMD to you or Intel, 34:47.000 --> 34:50.000 I mean they have to print what's the biggest price of the kernel. 34:50.000 --> 34:54.000 I think that did they show any interest in looking at rust? 34:54.000 --> 34:58.000 Has AMD show any interest in looking at rust? 34:58.000 --> 35:00.000 Or in looking at rust? 35:00.000 --> 35:02.000 Well, no, as far as I'm aware. 35:02.000 --> 35:04.000 I haven't heard anything. 35:04.000 --> 35:06.000 That's my answer. 35:06.000 --> 35:07.000 Thank you. 35:09.000 --> 35:10.000 Yes? 35:10.000 --> 35:11.000 The mentioned feature. 35:11.000 --> 35:12.000 Something from here. 35:12.000 --> 35:15.000 These are implemented in PAN4, I guess? 35:15.000 --> 35:16.000 Yes. 35:16.000 --> 35:19.000 Is there anything else that implemented in PAN4? 35:19.000 --> 35:21.000 Which is one to have feature, 35:21.000 --> 35:23.000 but not master feature, 35:23.000 --> 35:24.000 but implemented in here? 35:24.000 --> 35:25.000 Yes. 35:25.000 --> 35:27.000 The performance comes as I guess? 35:27.000 --> 35:28.000 Yeah, a lot of things. 35:28.000 --> 35:31.000 In tier, basically, in the downstream prototype, 35:31.000 --> 35:33.000 we have the bare minimum. 35:34.000 --> 35:39.000 The question is, is there more things that's implemented in PAN4? 35:39.000 --> 35:42.000 That's not implemented in tier. 35:42.000 --> 35:45.000 So, a lot of things, basically. 35:45.000 --> 35:49.000 We only have the bare minimum here to submit jobs and, you know, 35:49.000 --> 35:51.000 build the firmware and have jobs execute. 35:51.000 --> 35:54.000 So, performance counters, bar and management. 35:54.000 --> 36:00.000 I think the bug facilities, error recovery, 36:01.000 --> 36:03.000 support for more GPU models. 36:03.000 --> 36:06.000 So, there's only supports Molly, just 16. 36:06.000 --> 36:10.000 We're a PAN4 supports other GPUs and other architectures. 36:10.000 --> 36:15.000 So, there's a lot of missing at this moment. 36:15.000 --> 36:17.000 More questions? 36:17.000 --> 36:18.000 Yes. 36:18.000 --> 36:20.000 You know, don't you stop this, PAN4, I guess. 36:20.000 --> 36:23.000 So, I could just pull your budget drive up, 36:23.000 --> 36:26.000 fill it and drive on my own, 36:27.000 --> 36:30.000 RK-8 or RK-8. 36:30.000 --> 36:35.000 Can I, the question was, can I download the driver 36:35.000 --> 36:39.000 and test it on my own, RK-358? 36:39.000 --> 36:42.000 Yes, but don't blame me. 36:42.000 --> 36:44.000 It takes back. 36:44.000 --> 36:47.000 No, yeah, it's public. 36:47.000 --> 36:51.000 But don't do your most important work and then have a crash. 36:51.000 --> 36:54.000 Do you think there's any more questions? 36:54.000 --> 36:56.000 You mentioned the video playback though. 36:56.000 --> 36:58.000 It's a bit of a blind spot for me. 36:58.000 --> 37:03.000 Is that like an entirely separate set of BDIs 37:03.000 --> 37:05.000 of the driver from our virtualeration? 37:05.000 --> 37:06.000 Or is it a different driver? 37:06.000 --> 37:10.000 Apparently, was it the same buffer objects as everything else? 37:10.000 --> 37:14.000 All right, so the, the question is, 37:14.000 --> 37:16.000 how is and correct me from our own, 37:16.000 --> 37:22.000 how is, how is video encoding the code implemented at the driver? 37:22.000 --> 37:27.000 And basically, I don't think arm has any encoder decode engines 37:27.000 --> 37:28.000 in the GPU. 37:28.000 --> 37:30.000 So it's totally separate from the driver. 37:30.000 --> 37:34.000 More questions? 37:34.000 --> 37:36.000 I have a question. 37:36.000 --> 37:37.000 I know it's very close. 37:37.000 --> 37:39.000 You're working on the who's funding it and why? 37:39.000 --> 37:41.000 Because that's all the things I know about. 37:41.000 --> 37:42.000 Oh, yes. 37:42.000 --> 37:47.000 So, who's funding the work and why? 37:47.000 --> 37:49.000 That was the question. 37:49.000 --> 37:53.000 So, basically, arm and Google at this point. 37:53.000 --> 37:58.000 And reason is, basically, as I said, security. 37:58.000 --> 38:02.000 So, making sure that people cannot hack devices. 38:02.000 --> 38:06.000 They may otherwise sell, for example. 38:06.000 --> 38:11.000 And eventually, we plan on, you know, 38:11.000 --> 38:14.000 so there's a seed driver, which is Panther, 38:14.000 --> 38:16.000 and then there's tier. 38:16.000 --> 38:20.000 And eventually, in a few years, if everything works out, 38:20.000 --> 38:23.000 the overall plan might be to move the platform from, 38:23.000 --> 38:25.000 from plant or to tier. 38:25.000 --> 38:28.000 And then, you know, to develop near features there. 38:28.000 --> 38:32.000 This is still too much in the future too, actually, discuss. 38:32.000 --> 38:36.000 But that's the general idea. 38:36.000 --> 38:41.000 Requestions? 38:41.000 --> 38:42.000 Hi. 38:42.000 --> 38:44.000 I mean, that's some stuff that's not ready yet, 38:44.000 --> 38:48.000 because there's no, like, gross funding for the figure you're looking at. 38:48.000 --> 38:50.000 Is it, like, painful to implement? 38:50.000 --> 38:52.000 So, yes, it's, is it quality? 38:52.000 --> 38:54.000 Is it the technical problem? 38:54.000 --> 38:57.000 Or is there a problem with more rust? 38:57.000 --> 38:58.000 Yes. 39:02.000 --> 39:06.000 The question was about the abstractions, 39:06.000 --> 39:09.000 and whether it's the, I said there are things that are missing, 39:09.000 --> 39:13.000 and whether they're hard to implement, or what the problem actually is. 39:13.000 --> 39:15.000 Correct? 39:15.000 --> 39:21.000 They're not hard to implement, but we need community consensus, actually. 39:21.000 --> 39:23.000 So, that's the hard part. 39:23.000 --> 39:27.000 And the number one roadblock, because when I had this slide saying, 39:27.000 --> 39:31.000 what we have versus what we don't have in terms of the abstractions, 39:31.000 --> 39:34.000 I said the most of them are going to be upstream soon, 39:34.000 --> 39:37.000 except for the job submission logic. 39:37.000 --> 39:42.000 And this is where this job queue stuff, replacing the GPU scheduler, 39:42.000 --> 39:46.000 and doing versus not doing scheduling, et cetera, et cetera. 39:46.000 --> 39:51.000 It's something that has to, this, we have to talk a lot about this, 39:51.000 --> 39:54.000 because it has to work for everybody for all drivers. 39:54.000 --> 39:57.000 And getting it to work, and getting it to work correctly, 39:57.000 --> 39:59.000 I mean, we already have this scheduler in C, 39:59.000 --> 40:03.000 and it's plagued with some issues. 40:03.000 --> 40:06.000 That nobody has managed to fix for seven years. 40:06.000 --> 40:10.000 And we don't want to repeat that just so that we have something that works. 40:10.000 --> 40:12.000 So that's, yeah. 40:12.000 --> 40:16.000 And fences are usually the same where it's like a house of cards. 40:16.000 --> 40:19.000 So if you have a bug somewhere, everything comes crashing down, 40:19.000 --> 40:22.000 so you have to be really careful there. 40:22.000 --> 40:25.000 More questions? 40:25.000 --> 40:28.000 Hi. 40:28.000 --> 40:32.000 Have you ever destroyed hardware, Bob? 40:33.000 --> 40:37.000 Thankfully not, have you ever destroyed hardware while doing this? 40:37.000 --> 40:42.000 Thankfully not, not yet. 40:47.000 --> 40:49.000 Hopefully never, right? 40:52.000 --> 40:56.000 More questions? 40:56.000 --> 40:57.000 Yes. 40:57.000 --> 41:00.000 Malay is like, why did you go to an Android? 41:00.000 --> 41:03.000 And I'm going to use the S-turn over for you. 41:03.000 --> 41:06.000 We have plans to back work on this or, 41:06.000 --> 41:12.000 but I have to wait a few years before we can move on to our next news. 41:12.000 --> 41:17.000 The question is, what are the plans for Android? 41:17.000 --> 41:22.000 Basically, right? 41:22.000 --> 41:26.000 I'm not sure I'm allowed to discuss this. 41:26.000 --> 41:31.000 There may or may not be plans. 41:31.000 --> 41:34.000 Yeah. 41:34.000 --> 41:38.000 And it may or may not be soon, ish. 41:38.000 --> 41:40.000 All right. 41:40.000 --> 41:41.000 Yes. 41:41.000 --> 41:53.000 Any more questions? 41:53.000 --> 41:54.000 No? 41:54.000 --> 41:55.000 All right. 41:55.000 --> 41:56.000 Thanks for having me. 41:56.000 --> 41:59.000 Hopefully you guys enjoyed it.