WEBVTT 00:00.000 --> 00:12.880 Hello everyone, my name is Daniel and I'm going to talk about LMAV and how to make open source 00:12.880 --> 00:16.080 chips. 00:16.080 --> 00:22.280 So LMAV is a collection of end-to-end open source risk five microcontrollers and to 00:22.280 --> 00:28.080 end-to-end this context means it's open source from design files to tape-ordered GPRGDS 00:28.080 --> 00:29.080 to files. 00:29.080 --> 00:34.560 It's available in multiple platforms for different applications and those are based on element 00:34.560 --> 00:35.560 names. 00:35.560 --> 00:39.560 So it's just a combination of elements and risk five. 00:39.560 --> 00:44.760 They are fully written in Spinal HDL and they are licensed under the CERN Open Hardware 00:44.760 --> 00:50.720 Language W2.0 and support an FPGA in ASIC flow. 00:50.720 --> 00:55.360 The history of LMAV is tightly coupled to the IHP, the innovations for high performance 00:55.360 --> 00:57.680 microelectronics. 00:57.680 --> 01:02.240 This is an publicly funded research institute in Germany. 01:02.240 --> 01:06.600 They focus on silicon-gaminium electronics. 01:06.600 --> 01:13.920 They operate a small pilot line in Germany, where they produce a research facility 01:13.920 --> 01:14.920 project. 01:14.920 --> 01:20.440 For example, they produce high speed transistors, that's what they focus on. 01:20.440 --> 01:23.240 They're located in Frankfurt, Oda, Germany. 01:23.240 --> 01:31.960 They add two key fundings, a couple of years ago, 130 nanometer open source PDK and 01:31.960 --> 01:40.320 another is free MPW's runs to validate those analog and digital components. 01:40.320 --> 01:46.800 So they had a lot of analog designs but they lacked digital designs for these free MPW 01:46.800 --> 01:49.160 runs. 01:49.160 --> 01:56.640 I already had risk five designs for FPGAs and I also wrote an hyperbus interface for those 01:56.640 --> 02:02.720 so I got asked through connections if I want to port those to make tape out. 02:02.720 --> 02:07.280 And as you can see on this image, this is the first type of LMAV. 02:07.280 --> 02:13.160 It's a flat design, it's the design philosophy, it's pretty conservative, it's an alpha 02:13.160 --> 02:17.920 PDK and the goal was just get bootable silicon. 02:17.920 --> 02:25.400 I received those silicon's last year in February, there was one buck in the hyperbus interface 02:25.400 --> 02:33.120 basically unable to run data and instruction the same time in the external memory. 02:33.120 --> 02:39.240 All other core functions were working and it was the first booting silicon by the HP Open 02:39.240 --> 02:41.600 PDK. 02:41.600 --> 02:49.440 Technically it's the first booting silicon in Europe by the first open source microcontroller. 02:49.440 --> 02:58.320 And the cool thing about this is there's no NDKA required so this is end-to-end open source. 02:58.320 --> 03:05.280 I did a second tape out in April last year, I fixed the hyperbus interface and now it's 03:05.280 --> 03:12.280 actually running Sapphire Atlas, it got an on-share frame, that's the small yellow two bars 03:12.280 --> 03:18.640 on the left bottom left, it got more I-O, more interfaces to silicon proof, it got 03:18.640 --> 03:23.840 a pinmarks controller and a mask I-S because there's a research team in Germany that's 03:23.840 --> 03:27.840 trying to break the mask I-S implementation. 03:27.840 --> 03:34.360 All right, so let's talk about the architecture, this is the nitrogen, the more powerful 03:34.360 --> 03:43.240 platform, it's using a Vex-Rosk 5 CPU, then it's going through and BMP interconnect, this 03:43.240 --> 03:49.960 has an SPI-Exhip controller, so it's communicating to external SPI flash, it has the on-share 03:49.960 --> 03:56.800 frame and hyperbus for external memory and all peripherals are connected through and wish 03:56.800 --> 04:02.040 bone bars and then going through pinmarks to the I-O pad. 04:03.000 --> 04:09.520 All right, so the X-Rosk 5, I think that's a pretty famous of familiar CPU now, it's 04:09.520 --> 04:16.960 a 32-bit Resk 5 CPU, designed in five stages, it's fully written and it's BinLage, the 04:16.960 --> 04:25.080 L, it's actually written by Charles Papon, the maintainer and inventor of SpineLage, the L, 04:25.080 --> 04:29.600 it's a black-in-base architecture, so you can define your CPU with a list of configuration 04:29.600 --> 04:35.840 and plug-ins, and in this architecture has the I-extension, the full integer-base extension, 04:35.840 --> 04:43.080 which is mandatory, has the hardware, multiply and divide, I'm extension, the seek extension 04:43.080 --> 04:48.760 for compressed instructions, so 16-bit instructions and the control and status registers. 04:48.760 --> 04:57.160 All right, so the wish bone, that's actually the biggest change now from the second 04:57.160 --> 05:02.240 from the first two silicon revisions, the hyperbus issue was actually an issue in the 05:02.240 --> 05:13.440 X-C port and X-C4 is a very complicated bus interface, a bus for maybe two complicated 05:13.440 --> 05:16.440 for small microcontrollers. 05:16.440 --> 05:20.840 The specification is also very hard to implement and also to read, there are a lot of people 05:20.840 --> 05:25.520 that actually have questions or problems with that, and the biggest problem is it's an open 05:25.520 --> 05:32.720 specification, but it's still owned by Armlimited, so that's a risk, and the replacement 05:32.720 --> 05:41.680 for X-C4 is now the banana memory bus written by Charles Papon 2, that's a simple command 05:41.680 --> 05:47.720 and request, a channel architecture, so X-C4 for example has five channels, this has only 05:47.720 --> 05:52.520 two, and it's replacing completely the X-C4 crossbar. 05:52.520 --> 05:58.880 The problem is BMP, like X-C4, is a point to point bus, and there's also a requirement 05:58.880 --> 06:07.400 for a new bus for APB-3, and wish bone actually supports shared subordinates or shared slaves, 06:07.400 --> 06:17.760 so a wish bone is the replacement for APB-3, and this is an overview of the bus architectures 06:17.760 --> 06:21.720 on the left side, you can see the instruction and data port, they're going through 06:21.720 --> 06:27.840 in decoder, the hyper bus and the on-ship ramp, they have an arbitrary because both the 06:27.840 --> 06:33.960 instruction and data port have access to those, the data port only has access to the wish 06:33.960 --> 06:43.480 bone, so you cannot read instructions from peripherals, and the data port has no access to 06:43.480 --> 06:51.520 the X-turn SPI flash, because this is right now only read only, speaking of SPI flash, 06:51.520 --> 06:57.280 so this is the, there's an SPI-except controller, it's called, that's for reading from an 06:57.280 --> 07:05.160 X-turn SPI flash, the problem is those have, they have a communication protocol, then 07:05.160 --> 07:11.440 needs to be implemented, so typically that's a command, then you need to send an address, 07:11.440 --> 07:16.880 you might need to send or include dummy cycles, and then you can read or write to an SPI 07:16.880 --> 07:23.800 flash, so in this case this is a dual mode controller, it includes the normal SPI controller 07:23.800 --> 07:29.560 to handle the SPI transaction, and then on top is the SPI-except controller that implements 07:29.560 --> 07:37.760 the SPI-flash protocol, and this time it has also run time switchable I-O protocol, normally 07:37.760 --> 07:44.120 the SPI flashes, they boot in a single bit mode, a standard mode, this controller can 07:44.120 --> 07:53.600 switch to dual or quad mode, so you can increase the, the bandwidth of the SPI-flash, 07:53.600 --> 08:01.440 and then there's also the hyperbus interface, it's mentioned, the problem is that the IP, 08:01.440 --> 08:09.680 the HP is 130 nanometer, so this is SRAM is really expensive in 130 nanometer, external 08:09.680 --> 08:17.760 SRAM devices are very small and very expensive, SRAM on the other side, they are big and 08:17.760 --> 08:26.760 cheap, but they have a quite complex interface, hyperbus, this is in an Octa SPI-like interface, 08:26.760 --> 08:33.000 so this has a very low I-O count, there are only 11 mandatory pins, plus as many chips 08:33.000 --> 08:41.040 selects as wanted, you can connect hyper RAM and hyper flash devices, or RAM is volatile 08:41.040 --> 08:48.320 and flash are non-volatile devices on the same bus, and the throughput is up to 800 megabyte 08:48.320 --> 08:56.320 per seconds, and I don't know if you can see this on this chip, so this is in 64 I-O, 08:56.560 --> 09:06.600 chip on the top, there's a small red, red angle, those are the hyperbus interface I-O, 09:06.600 --> 09:12.640 as you can see one bank is already gone for hyperbus, it's memory only, and on the right 09:12.640 --> 09:21.000 side the blue, this is only SPI, so almost 50% of all chips are gone for external memory and 09:21.000 --> 09:38.120 flash, all right so that's about the architecture, let's talk about the ASIC flow, so this 09:38.120 --> 09:47.480 is a very, very simplified overview, so you need a steering and you also need your design 09:47.480 --> 09:54.200 files, or RTL, then the open flow is using open road, so we're going to convert those 09:54.200 --> 10:00.640 into NGDS to file, then we need to add metal dummy filler, and then at the end we're going 10:00.640 --> 10:07.040 to do the design road check, all right so, the ATWS, they normally only understand very 10:07.040 --> 10:15.360 lock on VHDL, normally they commercial and open source ATWS only understand very lock, for 10:15.360 --> 10:22.000 example, open road is using uses, LMV is written in SPI and HDL, so you cannot feed your 10:22.000 --> 10:30.080 SPI and HDL files directly into the ATWS, but SPI and HDL is a scalar-based framework to 10:30.080 --> 10:35.920 actually generate very lock or VHDL code, and the advantage is that it's a high level 10:35.920 --> 10:41.200 language, so you can actually use modern programming constructs, but you can also use the 10:41.200 --> 10:45.960 entire scalar or Java infrastructure in the background, for example, you can use file 10:45.960 --> 10:54.440 I-O to actually read data from files, all right so next we need a searing or sometimes 10:54.440 --> 11:01.320 it's called guard ring, that is basically just in ring around the entire chip on all 11:01.320 --> 11:08.120 metal layers, and it's to seal off or to guard the chip itself, it has multiple purposes 11:08.120 --> 11:15.360 during the lifetime or the addition during the lifetime of a chip, so first during manufacturing 11:15.360 --> 11:22.400 it's blocking the ionic contamination during manufacturing, then when it's reducing the 11:22.400 --> 11:30.240 mechanical stress when separating dies from the wafer, it's also stopping cracks going into 11:30.240 --> 11:37.600 the core or the chip from the edges, and during lifetime it's preventing getting moisture 11:37.600 --> 11:45.360 into the chip from the outside, from these edges, and as you can see on this image this 11:45.360 --> 11:52.680 is a cup through of the I HP searing, on the bottom that green is the active layer, then 11:52.680 --> 11:58.400 you have five smaller metal layers and two big top metal layers, and in between are just 11:58.400 --> 12:08.920 the vS, all right so next step is generating the GDS, two file, so we have our searing, 12:08.920 --> 12:13.920 we have RTL, you also need a lot of some config files for open road, but when I ignore 12:13.920 --> 12:21.360 this now, anyhow so the first step is convert the RTL into netless, this is done by using 12:21.360 --> 12:26.720 users, for example with open road, then next is doing the splitting the floor plan and 12:26.720 --> 12:32.480 doing the power grid, and again on this image you can see these squares on the other side 12:32.480 --> 12:40.120 those are the band bond pads, then you have the I-O ring, next to them this is for the I-O 12:40.120 --> 12:45.080 voltage rail and the core voltage rail, and then next to them this is on the inside, this 12:45.080 --> 12:51.280 is the power and crowned ring, and in the inside this is the core area, but we're going 12:51.280 --> 12:57.400 to ignore this because that's not placed yet, this is just the I-O ring bond pads, but 12:57.400 --> 13:03.520 the chip is layout so this size is already defined, so the next step is actually building 13:03.520 --> 13:10.920 the core, so first upload we start placing this standard cells on some estimations, then 13:10.920 --> 13:16.240 it will connect those standard cells, or will build the clock tree, like connecting those 13:16.240 --> 13:20.880 to the standard cells, next is we'll do the global and detailed routing between the 13:20.880 --> 13:28.240 standard cells, and finally it's doing a parasitic extraction, so it's getting the RC and 13:28.240 --> 13:35.160 LL use of those traces, and then doing the timing LL analysis, so if those analysis 13:35.160 --> 13:41.240 if they meet the requirements it's done, but normally that's not on the first run, so it's 13:41.240 --> 13:46.440 going, if there are problems with the parasitic or the electric or the timing, then it's 13:46.440 --> 13:51.400 going back to the placement of standard cells, and it's trying to optimize things and 13:51.400 --> 13:59.000 doing all this again, so it's doing a lot of iterations until under no violations are found, 13:59.000 --> 14:04.600 and finally it's doing the physical verification and then do a sign off, and now you can 14:04.600 --> 14:08.560 see now it's actually placing the searing, that's on this image on the outside, I don't 14:08.560 --> 14:18.800 know, you might see this barely, that's this searing here, so now we have our GDS 2 file, 14:18.800 --> 14:26.160 but it's not a production ready, so next we need to add the problem is that we have 14:26.160 --> 14:32.760 going to go back, so you can see on the, for example, the top metal 2, that's what you see 14:32.760 --> 14:38.520 mostly, there are a lot of bond pads, the irons, they have a lot of metal, there are 14:38.600 --> 14:46.200 some power strips on top metal 2, but there's nothing else on top metal 2, so the metal 14:46.200 --> 14:53.480 density is pretty much undisputed over the entire chip right now, and doing chemical, mechanical 14:53.480 --> 14:59.480 polishing, you get a different thickness of those layers, that's a problem because you want 14:59.480 --> 15:05.000 to build next layers on top, and then you have uneven layers, but you also get actually small 15:05.080 --> 15:12.360 electric traces, which means you have unpredictable electrical characteristics, something 15:12.360 --> 15:20.680 you don't want, and the solution is to add, you have to just add dummy metal into the layout, 15:20.680 --> 15:28.360 you can see this again on the top, this is top metal 2, those are really big shapes, and the 15:28.440 --> 15:35.720 idea is, you just calculate a metal density on a tile, for example, HP is using 8 out of 15:35.720 --> 15:45.080 about 800 micrometer, and you get the metal density, HP is also, they define a minimum and maximum 15:45.080 --> 15:52.200 metal density for this layer, and then you try to fill for desired metal density, and then you 15:52.200 --> 16:00.360 get these metal shapes just into the layout, but you can also see there are some layers, they 16:00.360 --> 16:05.080 don't have metal here in this image because they probably have a lot of routing architecture 16:05.080 --> 16:11.240 in the core, so they don't need metal here, but you can also see on the bottom these green and red 16:12.040 --> 16:18.280 overlapping, those are the active and gate pulley layers, also the transistor layers, obviously 16:18.280 --> 16:32.120 you need metal fill, okay the last step is design rule check, there are a lot of maps on this 16:32.120 --> 16:37.800 in this world, some maps in this world, they always have different tools, they have different 16:37.800 --> 16:43.720 experience, they have different, they call this recipes, they also have different process not, 16:43.720 --> 16:51.560 so this is why they always have unique pretty much unique design rules, and they provide this, 16:51.560 --> 16:55.640 but on the other hand you have tools like open road, they cannot check for all design rules, 16:55.640 --> 17:00.760 because they focus on getting a layout, if they were check for all design rules, this would take 17:00.760 --> 17:09.480 too much time, it's very complex, so the last step of doing a tape already design is basically 17:09.480 --> 17:18.200 running design rule checks, and for this example those are the layout rules for top metal 17:18.200 --> 17:23.800 one, I mean top metal is pretty simple, they are so it's a good example, but this has two 17:23.800 --> 17:32.840 important rules, the first one is it's A, it's inside the highest yellowish shape, 17:33.800 --> 17:43.000 so this defines the minimum top metal one with, and that's 1.64 micrometer, the problem is if it's 17:43.000 --> 17:51.800 too narrow you might get disconnected traces or something else, and the other rule is B, that's the top 17:51.800 --> 17:57.560 minimum top metal one space or notch, so that defines the minimum space between two top metal 17:57.560 --> 18:04.440 one shapes, and that's also 1.64 micrometer, the problems if they're close together you get maybe 18:04.440 --> 18:09.080 short-string production, so that's what you want to make sure that doesn't happen, 18:11.560 --> 18:17.160 yeah that's on simple, that's probably the most exhausting step, because if you're two days ahead 18:17.160 --> 18:23.320 of the tape, this is a hard deadline and you have design rules, you question your life 18:23.880 --> 18:36.120 all right, so this is not using make file, it's LMV's using task file to implement this flow, 18:36.120 --> 18:42.760 the flow is basically divided into four steps, so you need to do a task prepared, this is generating 18:42.760 --> 18:50.200 a generating the very lock code from the SpineHDL, it's also generating the open road 18:50.440 --> 18:58.520 config files, et cetera, then it's generating searing macros, the searing macros, next is just 18:58.520 --> 19:07.240 run task layout, then it's starting open road to generate the GDS25, next is doing the task filler, 19:07.880 --> 19:14.760 now it's adding filler, and then at the end it's just task run DSC and it probably takes the most 19:14.760 --> 19:28.760 time to do the DSC check, and that's basically it, so the funding, the HP funding is over, 19:30.120 --> 19:36.840 since December, so they now provide open silicon WMPW shapples now with this open pdk, 19:37.160 --> 19:46.120 it's 1,500 years per square millimeter, but it's only for open source designs, so I think you can 19:46.120 --> 19:50.360 make them actually, but cadence that's fine, but you need to open those designs, you need to upload them 19:50.360 --> 19:59.800 to the GitHub repository, they have three tape audates, this here now in March, the CMOS that's only 19:59.880 --> 20:09.720 digital, and July the CMOS 5L, so this has five metal layers, it has four normal metal layers 20:09.720 --> 20:17.320 and one top metal, and then October the G2, that's where the high speed bipolar transistors, 20:19.640 --> 20:24.040 yeah and if you're interested you can register or this is the same link on the bottom, 20:24.120 --> 20:32.520 if you don't trust Mike you are good, yeah the minimum requirement is actually one square millimeter 20:32.520 --> 20:39.640 as far as I know, so you can actually make pretty cheap chips for research for your hobby projects, 20:39.640 --> 20:47.800 for your company, the only requirement is has to be open source, also yeah, skip one slide, 20:47.800 --> 20:53.240 all right so for all of you, I reserved seven square millimeter on this next to tape out 20:54.280 --> 21:01.800 shuttle or like to do some performance improvements, for example more hyperbuss performance, 21:02.440 --> 21:10.680 adding CPUs to the vectors 5 based on SRAM blocks and also adding more petitions with different 21:10.760 --> 21:21.480 clocks, so these are if you're interested in doing highly high quality images based on your 21:21.480 --> 21:28.680 GDS 2 files, this is a blender GDS is an important blender add-on for importing GDS 2 files directly 21:28.680 --> 21:37.000 into a blender and then you can use blender's cyclic engine to do high quality images based on 21:37.000 --> 21:47.560 like this is using ray tracing and all this computer vision stuff, and obviously LMFU is also 21:47.560 --> 21:57.080 open source so if you're looking for FPGA based microcontrollers while you want to tape out your 21:57.080 --> 22:04.120 own microcontroller, the LMFU is silicon proven, you can use this on the open source, 22:04.120 --> 22:17.400 silicon report box or issues request features, yeah that's basically it free for it to connect 22:17.400 --> 22:21.080 me on several platforms, thank you 22:21.720 --> 22:28.520 you 22:42.360 --> 22:44.360 memory bus 22:51.080 --> 23:04.440 Yeah, so the question was, if I did performance tests on the BMP versus XC4, I did not. 23:04.440 --> 23:13.480 So far, but I was able to improve the chip clock actually, so before I was actually almost 23:13.480 --> 23:19.080 like 55MHz with the IHP, and now it's getting up to 80. 23:19.080 --> 23:22.080 So I get faster clock now, but I haven't checked. 23:22.080 --> 23:29.560 I assume it's not really bad, because I use the SpineLage, the L implementation of XC4, 23:29.560 --> 23:38.320 which has the shared address bus, so it's just for XC, channels instead of 5, so it's 23:38.320 --> 23:46.160 not an official XC4 implementation, but now I'm done. 23:46.160 --> 23:48.560 That was another question yet. 23:48.560 --> 24:06.200 When you get DRC violations, what do you do when you have the C issues, what should you do? 24:06.200 --> 24:10.120 Sometimes you just have to take a look into open roadfiring sample and see if they're 24:10.120 --> 24:16.080 issues in your config, for example, your bond ports, bond paths could be located too close 24:16.080 --> 24:20.920 to the ceiling, then you just need to increase your chip. 24:20.920 --> 24:29.200 We already had an open roadback, but actually made two big metal sheets between 24:29.200 --> 24:34.480 VRs, and the solution was going into the design and manually fix all those, just make 24:34.480 --> 24:40.480 them bigger, because there was only three, like, 30 violations. 24:40.480 --> 24:46.280 Sometimes you just need to fix this, but mostly it's issue with your configuration. 24:46.280 --> 24:51.280 You get more issues when you do an analog design when you draw everything by yourself, 24:51.280 --> 24:55.320 then you get a lot more issues. 24:55.320 --> 25:01.800 And there's one more, and then, yep. 25:01.800 --> 25:08.520 My question is that threats list by is normally having used for PGHM to make, when you 25:08.520 --> 25:16.320 quoted it to an AC, apart, so it has, normally when you're actually doing a V-dab and all 25:17.320 --> 25:23.320 then when you move it, now you start using your own memory cell and, yeah, whatever, 25:23.320 --> 25:30.000 with the hyperbuss and all, apart from that with you, apart from these many of us, the 25:30.000 --> 25:34.600 placement, do you do any changes in the VR-DL design, or do you have to move it? 25:34.600 --> 25:42.200 Okay, the question was, if I did any changes to the V-dab, so far I did not, so right now 25:42.200 --> 25:51.280 it's using, not as, so it's using just standard cells for the registered V-dab, for 25:51.280 --> 25:59.520 example, but I'm working on the cashiest right now based on these S-RAM blocks, because 25:59.520 --> 26:04.000 otherwise it would be too big here, so that trying to do that now. 26:04.080 --> 26:13.760 HP has no S-RAM blocks that are 3 ports, 32 by, what is this? 32? That's the problem. 26:13.760 --> 26:14.760 It's too small for HP. 26:14.760 --> 26:18.280 I was one question here. 26:18.280 --> 26:23.280 I mentioned what, sorry? 26:23.280 --> 26:33.360 Oh, the mask, I don't know, sorry, the question was, if the mask IS is working, I don't 26:33.360 --> 26:38.480 know because the chip is at HP and they're doing the silicon and POSAR right now, the PCB 26:38.480 --> 26:44.000 and POSAR right now, so I hope I get the chip in four weeks. 26:44.000 --> 26:54.040 I think it was one last question. 26:54.040 --> 26:57.840 Which extension? 26:57.880 --> 27:03.000 The question was, if the V-dab's intention is implemented, I don't know, for this 27:03.000 --> 27:14.040 V-dab 5, might be for the new V-dab 5-2, I don't know, I don't have, I don't have it. 27:14.040 --> 27:19.560 I think I might be someone else here with more familiar with V-dab 5, I don't know 27:19.560 --> 27:25.000 right now. 27:25.000 --> 27:27.520 Is that no question? 27:27.520 --> 27:29.760 Do you have time for one question? 27:29.760 --> 27:40.600 All right, yeah. 27:40.600 --> 27:50.200 Which other parts, a PLL, I got the PLL, but there are mixed signal parts on this chip, no. 27:50.200 --> 27:57.000 The question was, if there are mixed signal parts on this design, no, because PIP has no 27:57.000 --> 28:00.040 PLL right now, they're working on this. 28:00.040 --> 28:05.600 I think that's now the another IHP is publicly funded, so this is everything, it's 28:05.600 --> 28:11.840 to be funded and it's a little bit slower, but they're working on a PLL right now and 28:11.840 --> 28:15.120 POSAR and stuff like this, so they're working on analog designs. 28:15.120 --> 28:18.720 They're working on an ADC also right now, so this is all coming now. 28:18.720 --> 28:25.960 This should be for free to part of this open silicon, so you can pick those from the 28:25.960 --> 28:32.160 analog library, but currently not, unfortunately. 28:32.160 --> 28:37.520 All right, thank you again.