WEBVTT 00:00.000 --> 00:15.600 Okay, hi, hi everybody, approaching to the end of the post, and today I'm going to update the 00:15.600 --> 00:23.600 talk I did it in back in the web procedure, can we run back in Tokyo, I did a similar talk 00:23.600 --> 00:28.480 there, I think most of the guys here in Europe, my lobby attend that conference, I'm going 00:28.480 --> 00:36.640 to repeat it, although it's online, but also we give some updates, it's the two months 00:36.640 --> 00:42.240 talk in the December now, it's one month, and then we make it a lot of great progress. 00:42.240 --> 00:49.360 So today's talk is all about picking the record, how to we get, we spy, upstreaming 00:49.360 --> 00:58.120 to in better than any other of the ISA and trying to match of the X86 as much as we can. 00:58.600 --> 01:04.920 And I'm feeling, I'm the flunked up, but deep computing, so actually I'm a software guy, before I 01:05.800 --> 01:13.160 I unknow my money from software, and I star, I'm VM guy and compiler guys, I put a most of the VM 01:13.960 --> 01:21.720 from old age, I already am older to JavaScript, to old adult go, whatever VM I put most of the stuff, 01:21.720 --> 01:27.240 and I also started compiler guy, I familiar with all, in back in the old age, the MIPS compiler 01:27.480 --> 01:32.280 opens it for, so I'm the founder of the computing, I found the company hardware company in 01:32.280 --> 01:39.240 Co-Wick, and I'm very serious software guy, I understand where Alan Key says that, so I don't 01:39.240 --> 01:44.840 know the money from software and losing all in hardware, it's very good. 01:47.240 --> 01:53.640 So, and I make Naptop, this file laptops, so this is the 2003 and we make the most expensive 01:53.640 --> 02:02.840 wrist file laptop in the world, now is in museums, so it's $5,000, and yeah, we didn't make them up, 02:03.880 --> 02:12.360 but 100 pieces, hopefully, is the Pentate N, and then in 2004, I make a very affordable 02:13.160 --> 02:17.240 second generation of laptop, we have A core, and then that's separate, as well. 02:17.240 --> 02:23.560 So, I stuff in and out, and the first ever time, you went to a federal account, if I support, 02:24.600 --> 02:29.800 and that's the beauty of a wrist file, and after a while, I started thinking about, I only care about 02:29.800 --> 02:33.480 wrist file, I don't care about battery, I don't care about screen, I don't care about everything else, 02:33.480 --> 02:40.360 keyboard, on the hardware, so when I do laptop, I stuff in and out that, I lose so much money on the 02:40.360 --> 02:45.800 rest of the things, nothing to do with the wrist file, and I stuff like that with framework, so they have 02:45.800 --> 02:50.200 a booth there, so you can see, they have a laptop and support multiple architecture, I think you 02:50.200 --> 02:54.680 guys should know about it, if you want to do kernel, you can port all different architecture from 02:54.680 --> 03:00.120 xxx to arm to this file, they won machine, you can support all the architecture with different 03:00.120 --> 03:07.160 motherboards, so do, do, do, do, do, do, take a look, and then, and although it's nothing to do, 03:07.160 --> 03:11.800 kernel, I would soon pass it by this, basically it's very good for our European cultures, 03:11.800 --> 03:18.440 so it's DIY, fix it yourself, last forever, very monumental, so, and then I did the first 03:18.440 --> 03:26.360 aroma, a framework, motherboards for them, the same fork hole is around 150 dollars, 80 grams, 03:27.160 --> 03:35.240 but now we move about probably 200 dollars, because DDR, price cannot crazy, so, and then as you 03:35.240 --> 03:43.880 mimic a Nala ball, with 64 DDR, and it double the price now, if or whoever buy the first patch, 03:43.880 --> 03:52.280 you already get rich, so, and much, and now two months, two months ahead of the mass production, 03:52.280 --> 03:59.080 so now we get a 16-core, two one-by-gigahertz, and it's a pretty usable list file now, so everyone 03:59.080 --> 04:03.800 won the kernel should help on this file kernel, and you will write the height, because I didn't 04:03.880 --> 04:09.480 this file kernel need more help than any other architect, and the architectures, so, for ARM, 04:09.480 --> 04:16.040 I've got a lot of money, ARM have a lot of money, they have a lot of people, I'm working the kernel 04:16.040 --> 04:23.560 by for this file, we don't, so, now, back to the state, we'll talk about why this file 04:23.640 --> 04:31.000 and upstream main, the differences, so, for ARM, whatever ARM, SOC come out, 04:32.280 --> 04:38.200 they will sell meanings, so they don't really care about human, why should they care, 04:38.200 --> 04:47.320 right, their SOC value, and then, and for this file, it's really bad, because, for Rau, 04:47.320 --> 04:52.440 the risk file, the whole SOC where, from the kernel to 2-chain, to upper stack, 04:52.440 --> 04:58.760 whatever, it's not optimized, because the hardware is not there yet, so SOC won't catch up, 04:58.760 --> 05:07.160 so, for us, looking at hardware, I have a very funny experience, the three years ago, 05:07.160 --> 05:13.560 my 4-core machine running dead-time, the damn bin, is very slow, but now running damn bin 13, 05:13.640 --> 05:21.240 it's much more faster, it's bizarre, all right, the reason why it's because, 05:22.520 --> 05:29.240 this file software is better and better, similar, so that means I wish for hardware, 05:30.680 --> 05:37.960 it has to leave a very long silicon on time, so that demands more upstream main needs to be 05:38.040 --> 05:45.160 the need of long time, so that's why I would say that compared to ARM, 05:46.280 --> 05:51.000 we specify it desperately need upstream main, more than any other architecture, 05:52.680 --> 05:58.920 and then the other thing is that all the SOC guy, including myself, I'm making a motherboard, 05:58.920 --> 06:04.600 I have to guarantee the mass production, so I only allowed to work on LKS, 06:05.560 --> 06:13.000 because if you use upstream cano, the main 9, I know idea where the problem is, it's so freaky, 06:14.840 --> 06:20.760 if it gives me a 6-tock 20, if you're not LTS, I'm scared, 06:21.560 --> 06:29.640 because I know idea where the hardware, the hardware, so that is the hardware parameter, I'm saying 06:29.640 --> 06:34.520 that we have to work on LTS, but at the time we work on LTS guarantee the mass production quality, 06:35.160 --> 06:42.680 and then we off the track, we off the main 9, probably, I remember I gave my Roma second, 06:42.680 --> 06:49.800 left up to quite KH, I have seen, can you help me to upstream everything? It said, 06:49.800 --> 06:56.120 you don't, you have 100 or what, 60,000 nine of code change, no chance, 06:57.400 --> 07:04.920 is that even then this can't do that, but anyway, so that, that is the pain of it, so, 07:04.920 --> 07:11.640 and I think there's two methodology, we got the company wrong, because from over our SOC guy, 07:11.640 --> 07:17.000 at the mass production guy, we still inherit a lot of habit for ARM, 07:17.400 --> 07:26.200 which we work on LTS, but in fact, we should work on the post same as X86, we work on 07:27.000 --> 07:34.360 main 9 and back port to LTS, but that is the wrong thing we have done, so, and I give you 07:34.360 --> 07:42.920 your status, so now I give you to the first generation motherboard status, and it's where funny, 07:43.000 --> 07:52.360 I'm not sure you guys can see it, the old PowerPoint, and then 2018, we got the IP, 07:52.760 --> 07:59.400 and then SOC will four years later, we get it into the SOC, and then I take another two years 07:59.400 --> 08:06.360 to get into my motherboard, so it's 60 years, so, but you need to find out, take a look at 08:06.360 --> 08:16.200 the upstream progress, our SOC is the first, when upstream is back in 2023, it's painful, it's low, 08:17.000 --> 08:23.000 so, and according to a QRKH, there's a one thing to do, okay, this has done the upstream 08:23.000 --> 08:32.200 means since FPGA evaluation, revocations, so the mass patch, I still ongoing, 30 spray with GPU 08:32.280 --> 08:39.080 stuff related, so we know, right, mean this, we know we can solve the GPU saga, mean this kind, 08:39.080 --> 08:47.480 no one else can, and that's it, and then average the upstreaming, and another thing is that we 08:47.480 --> 08:54.520 let go of experience of upstreaming, so it takes 10 rounds, ping-pong, 10 rounds, so it's a bit one patch, 08:55.240 --> 09:04.680 okay, this first motherboard, it's pretty much the upstreaming, okay, and then the second one 09:04.680 --> 09:13.800 is even more awful, the CPU IP is related, absolutely means not even done yet, since the IP 09:13.880 --> 09:22.440 introduced in 2020, so that's the one thing to do, okay, so how are we going to do better, so I 09:22.440 --> 09:31.640 forced all my SOC guide to do more upstreaming, and we spied the two better, and graph says that as 09:31.640 --> 09:44.520 well, so this is the work work back in 2020, the 2008 says that you need to upstream 09:44.520 --> 09:53.800 stuff from the simulations, right, that's x86 away, I'm the end to it, right, so it's actually 15 years 09:53.880 --> 10:04.280 ago, 20 years ago, so experience, so, and how do we do it, so basically, I forced my the first 10:04.280 --> 10:11.080 generation motherboard SOC guide, I said you had to upstream before the chip comes back, and and we 10:13.960 --> 10:21.480 and actually is scary, because of all our engineer, even have experience, is arm cultures, 10:22.280 --> 10:30.920 the race scale, the level that it before, so, and and a lot of things as well, even the main 10:30.920 --> 10:38.040 time that we concern, you don't have hardware, should I allow you to submit it, so that there's a problem, 10:43.880 --> 10:49.800 and the approach, we do it the simple one first, Pintential, or majority of the CPU IP related 10:49.800 --> 10:56.200 first, FBJ before, then we do it easier to difficult, that's the way we do it on the K3, 10:57.640 --> 11:05.640 so, and how's the progress, so we do meet a few surprise, during the process of it, before, 11:05.640 --> 11:10.360 during the FBJ verification, and the tip out before the chip comes back, the three months will be 11:10.360 --> 11:18.600 a lot of surprise, so first thing, we realize that people look at the call, if we do in the 11:18.600 --> 11:23.720 corner, then we know we stray way, know what's the problem of the call, decide how we're designed, 11:24.840 --> 11:32.040 we call it, but a little asymmetry, that means they have mixed up, mixed cluster, 11:32.120 --> 11:44.040 RV-25, one RV-25, and they have different RVV, and and I have AI customer instruction in one, 11:44.040 --> 11:48.760 so to the software point of view, I still remember Quest email to me, it's a unit, 11:48.760 --> 11:55.720 the school up, you should let me know this diagram nine months ago, basics is very painful to get 11:55.720 --> 12:02.600 upstream, and software support, so and besides that, we have some DMA zone easier as well for the 12:02.600 --> 12:09.560 two-peck, and then the other thing is that when we start the silicon upstreamming, we do 12:09.560 --> 12:15.320 share a lot of co-out for all the community guys, so initially I invite people to a private, 12:15.320 --> 12:19.640 private repository to look at our call, we scale off with this closer or the hardware information 12:19.640 --> 12:25.240 before the product announced, but and then at the end I mean to convince the associate 12:25.240 --> 12:30.360 guys that we have nothing to lose, it's bad enough, let's open it up, let the world say it, 12:31.400 --> 12:40.120 so that's why we open source everything, first-ever open source everything before the chip comes out, 12:41.160 --> 12:47.160 so and then we do pick a few moves, the FPG target, so we make the wrong target, we separate the 12:47.320 --> 12:53.400 patch, people refuse it, and then we, we, we didn't go for any of the most of the testing, 12:53.400 --> 13:01.480 we just do upstreaming without testing it, so it's very brave, and, and yes, when in reputation 13:01.480 --> 13:08.280 ruined, according to Grasse, nothing wrong, we just fix it if there's a bug, right, nothing 13:08.280 --> 13:15.000 shamed about it, so that's what, not the progress is, so what can we improve it, and then 13:15.000 --> 13:22.760 I will say that, we still can do better, especially one for all the CPU features, we can do a lot of 13:22.760 --> 13:27.960 better, so so far we, I forgot to bring the board, it can feel free to come down to our both, 13:27.960 --> 13:34.360 we already have the latest chip with the manufacturer sample board working, and then we still have 13:34.360 --> 13:39.800 found some major CPU features, it doesn't work, for similar hypervisor, actually we should 13:39.800 --> 13:45.240 turn all those software back in the FPGA stage, we haven't done all the replication, also, 13:45.240 --> 13:50.840 we have to, we have seen that, for some AI features, we still haven't got it working, 13:50.840 --> 13:57.000 we should turn that back in the FPGA replication at that time, and also upstream in at that time, 13:57.000 --> 14:01.560 then we pretty much get everything ready, but now we haven't got it, so there's still a lot 14:01.560 --> 14:11.480 of things to improve, and of course now we learn, SOC, you have to have consistency, we use as many 14:11.480 --> 14:18.600 as you can, so, for space mid, the K3 and K1, they have shared majority of the 60%, 70, 14:18.600 --> 14:26.600 they are the same IP, so all those upstreaming effort for K1, and we use most of the upstream 14:26.600 --> 14:36.440 effort from K1 anyway, so it's much more faster, so that's all for the day's talk, so, 14:36.440 --> 14:44.280 and this first ever, I think we have done better than now, including Qualcomm, and we continue 14:44.280 --> 14:50.680 to do better, for less chip, for the test server chip, and because it's less IP in both, and 14:50.680 --> 14:56.600 look forward to you guys' help, and especially we read it in RIS5B, a lot of enough resources. 14:58.840 --> 15:10.120 Thank you, thank you, thank you guys, yeah, you can go up to the GitHub and take our upstream 15:10.120 --> 15:15.720 patches, it's not done by me, done by the SOC guide, but they were great. 15:16.040 --> 15:21.400 Any questions? 15:21.400 --> 15:38.600 Yeah, hi, thank you for your talk, regarding upstreaming the data sheets of all these processes 15:38.600 --> 15:43.720 that are you interested in to go upstream, is it easily available for download? 15:44.280 --> 15:51.720 Yes, we are aware that one to have our upstream, we release all the data sheet, and it 15:51.720 --> 15:59.880 includes our schematic, all right, thank you, we have to do better than then now, I have no 15:59.880 --> 16:02.920 CPC, we should have no CPC, we should have no CPC, we should have no CPC, we should have no CPC. 16:07.160 --> 16:07.960 Other questions? 16:13.400 --> 16:21.240 So, do come up to the upstream canal log, we did pretty well, actually, the SOC guide did about 10 16:21.640 --> 16:27.960 patches before the chip comes back, where they walks a lot, I don't know. 16:30.120 --> 16:38.360 Not really a question, but more for our comments, so we have TH5020, we have SLC at scale 16:38.360 --> 16:43.560 way, and we have to maintain an all fork of the canal because of that, because of the lag of 16:43.640 --> 16:51.560 the upstreaming, so I think upstream batch may line, it's something that will benefit all of us. 16:55.960 --> 17:00.440 Definitely, we will upstream as much as we can, except the GPU, GPU, I can't deal with it now. 17:02.760 --> 17:10.040 I've no experience from working on a CPU, but working up from a device I've been made quite a 17:11.000 --> 17:17.160 good experience by implementing it in Qemu, and doing a software simulation, and then we had 17:18.360 --> 17:25.480 driver stuff and so on, pretty good shape when we had they have already. I don't know how it works 17:25.480 --> 17:30.840 out for this five, probably it's too complicated. I think it's rare complicated, because I think using 17:30.840 --> 17:36.440 simulator, same as I'm a VM guide, usually I do, similar to first, before I touch the hardware, 17:36.520 --> 17:43.400 otherwise it's too pretty, but we think that because so much peripherals, we have to code that, 17:43.400 --> 17:45.000 we have to code that, we have to code that, we have to code that, we have to code that, we have to 17:45.000 --> 17:52.920 do many math. So the only one we can emulate, majority of the CPU features, okay, I have a 17:52.920 --> 17:57.880 wiser, blah, blah, blah, blah, blah, blah, and AI, those things, okay, she's going to simulator, 17:57.880 --> 18:03.160 and they start upstreaming, and then the FPGA, and then also as well, so that we should do better 18:03.240 --> 18:07.960 seriously, but the rest of the code that the I.O. and need to do hardware, all right, 18:08.840 --> 18:14.120 CPU user would definitely even do better. Thank you. All right, out of time, thank you. 18:14.120 --> 18:16.120 Thank you very much, thank you.