WEBVTT

00:00.000 --> 00:12.880
Hello everyone, my name is Daniel and I'm going to talk about LMAV and how to make open source

00:12.880 --> 00:16.080
chips.

00:16.080 --> 00:22.280
So LMAV is a collection of end-to-end open source risk five microcontrollers and to

00:22.280 --> 00:28.080
end-to-end this context means it's open source from design files to tape-ordered GPRGDS

00:28.080 --> 00:29.080
to files.

00:29.080 --> 00:34.560
It's available in multiple platforms for different applications and those are based on element

00:34.560 --> 00:35.560
names.

00:35.560 --> 00:39.560
So it's just a combination of elements and risk five.

00:39.560 --> 00:44.760
They are fully written in Spinal HDL and they are licensed under the CERN Open Hardware

00:44.760 --> 00:50.720
Language W2.0 and support an FPGA in ASIC flow.

00:50.720 --> 00:55.360
The history of LMAV is tightly coupled to the IHP, the innovations for high performance

00:55.360 --> 00:57.680
microelectronics.

00:57.680 --> 01:02.240
This is an publicly funded research institute in Germany.

01:02.240 --> 01:06.600
They focus on silicon-gaminium electronics.

01:06.600 --> 01:13.920
They operate a small pilot line in Germany, where they produce a research facility

01:13.920 --> 01:14.920
project.

01:14.920 --> 01:20.440
For example, they produce high speed transistors, that's what they focus on.

01:20.440 --> 01:23.240
They're located in Frankfurt, Oda, Germany.

01:23.240 --> 01:31.960
They add two key fundings, a couple of years ago, 130 nanometer open source PDK and

01:31.960 --> 01:40.320
another is free MPW's runs to validate those analog and digital components.

01:40.320 --> 01:46.800
So they had a lot of analog designs but they lacked digital designs for these free MPW

01:46.800 --> 01:49.160
runs.

01:49.160 --> 01:56.640
I already had risk five designs for FPGAs and I also wrote an hyperbus interface for those

01:56.640 --> 02:02.720
so I got asked through connections if I want to port those to make tape out.

02:02.720 --> 02:07.280
And as you can see on this image, this is the first type of LMAV.

02:07.280 --> 02:13.160
It's a flat design, it's the design philosophy, it's pretty conservative, it's an alpha

02:13.160 --> 02:17.920
PDK and the goal was just get bootable silicon.

02:17.920 --> 02:25.400
I received those silicon's last year in February, there was one buck in the hyperbus interface

02:25.400 --> 02:33.120
basically unable to run data and instruction the same time in the external memory.

02:33.120 --> 02:39.240
All other core functions were working and it was the first booting silicon by the HP Open

02:39.240 --> 02:41.600
PDK.

02:41.600 --> 02:49.440
Technically it's the first booting silicon in Europe by the first open source microcontroller.

02:49.440 --> 02:58.320
And the cool thing about this is there's no NDKA required so this is end-to-end open source.

02:58.320 --> 03:05.280
I did a second tape out in April last year, I fixed the hyperbus interface and now it's

03:05.280 --> 03:12.280
actually running Sapphire Atlas, it got an on-share frame, that's the small yellow two bars

03:12.280 --> 03:18.640
on the left bottom left, it got more I-O, more interfaces to silicon proof, it got

03:18.640 --> 03:23.840
a pinmarks controller and a mask I-S because there's a research team in Germany that's

03:23.840 --> 03:27.840
trying to break the mask I-S implementation.

03:27.840 --> 03:34.360
All right, so let's talk about the architecture, this is the nitrogen, the more powerful

03:34.360 --> 03:43.240
platform, it's using a Vex-Rosk 5 CPU, then it's going through and BMP interconnect, this

03:43.240 --> 03:49.960
has an SPI-Exhip controller, so it's communicating to external SPI flash, it has the on-share

03:49.960 --> 03:56.800
frame and hyperbus for external memory and all peripherals are connected through and wish

03:56.800 --> 04:02.040
bone bars and then going through pinmarks to the I-O pad.

04:03.000 --> 04:09.520
All right, so the X-Rosk 5, I think that's a pretty famous of familiar CPU now, it's

04:09.520 --> 04:16.960
a 32-bit Resk 5 CPU, designed in five stages, it's fully written and it's BinLage, the

04:16.960 --> 04:25.080
L, it's actually written by Charles Papon, the maintainer and inventor of SpineLage, the L,

04:25.080 --> 04:29.600
it's a black-in-base architecture, so you can define your CPU with a list of configuration

04:29.600 --> 04:35.840
and plug-ins, and in this architecture has the I-extension, the full integer-base extension,

04:35.840 --> 04:43.080
which is mandatory, has the hardware, multiply and divide, I'm extension, the seek extension

04:43.080 --> 04:48.760
for compressed instructions, so 16-bit instructions and the control and status registers.

04:48.760 --> 04:57.160
All right, so the wish bone, that's actually the biggest change now from the second

04:57.160 --> 05:02.240
from the first two silicon revisions, the hyperbus issue was actually an issue in the

05:02.240 --> 05:13.440
X-C port and X-C4 is a very complicated bus interface, a bus for maybe two complicated

05:13.440 --> 05:16.440
for small microcontrollers.

05:16.440 --> 05:20.840
The specification is also very hard to implement and also to read, there are a lot of people

05:20.840 --> 05:25.520
that actually have questions or problems with that, and the biggest problem is it's an open

05:25.520 --> 05:32.720
specification, but it's still owned by Armlimited, so that's a risk, and the replacement

05:32.720 --> 05:41.680
for X-C4 is now the banana memory bus written by Charles Papon 2, that's a simple command

05:41.680 --> 05:47.720
and request, a channel architecture, so X-C4 for example has five channels, this has only

05:47.720 --> 05:52.520
two, and it's replacing completely the X-C4 crossbar.

05:52.520 --> 05:58.880
The problem is BMP, like X-C4, is a point to point bus, and there's also a requirement

05:58.880 --> 06:07.400
for a new bus for APB-3, and wish bone actually supports shared subordinates or shared slaves,

06:07.400 --> 06:17.760
so a wish bone is the replacement for APB-3, and this is an overview of the bus architectures

06:17.760 --> 06:21.720
on the left side, you can see the instruction and data port, they're going through

06:21.720 --> 06:27.840
in decoder, the hyper bus and the on-ship ramp, they have an arbitrary because both the

06:27.840 --> 06:33.960
instruction and data port have access to those, the data port only has access to the wish

06:33.960 --> 06:43.480
bone, so you cannot read instructions from peripherals, and the data port has no access to

06:43.480 --> 06:51.520
the X-turn SPI flash, because this is right now only read only, speaking of SPI flash,

06:51.520 --> 06:57.280
so this is the, there's an SPI-except controller, it's called, that's for reading from an

06:57.280 --> 07:05.160
X-turn SPI flash, the problem is those have, they have a communication protocol, then

07:05.160 --> 07:11.440
needs to be implemented, so typically that's a command, then you need to send an address,

07:11.440 --> 07:16.880
you might need to send or include dummy cycles, and then you can read or write to an SPI

07:16.880 --> 07:23.800
flash, so in this case this is a dual mode controller, it includes the normal SPI controller

07:23.800 --> 07:29.560
to handle the SPI transaction, and then on top is the SPI-except controller that implements

07:29.560 --> 07:37.760
the SPI-flash protocol, and this time it has also run time switchable I-O protocol, normally

07:37.760 --> 07:44.120
the SPI flashes, they boot in a single bit mode, a standard mode, this controller can

07:44.120 --> 07:53.600
switch to dual or quad mode, so you can increase the, the bandwidth of the SPI-flash,

07:53.600 --> 08:01.440
and then there's also the hyperbus interface, it's mentioned, the problem is that the IP,

08:01.440 --> 08:09.680
the HP is 130 nanometer, so this is SRAM is really expensive in 130 nanometer, external

08:09.680 --> 08:17.760
SRAM devices are very small and very expensive, SRAM on the other side, they are big and

08:17.760 --> 08:26.760
cheap, but they have a quite complex interface, hyperbus, this is in an Octa SPI-like interface,

08:26.760 --> 08:33.000
so this has a very low I-O count, there are only 11 mandatory pins, plus as many chips

08:33.000 --> 08:41.040
selects as wanted, you can connect hyper RAM and hyper flash devices, or RAM is volatile

08:41.040 --> 08:48.320
and flash are non-volatile devices on the same bus, and the throughput is up to 800 megabyte

08:48.320 --> 08:56.320
per seconds, and I don't know if you can see this on this chip, so this is in 64 I-O,

08:56.560 --> 09:06.600
chip on the top, there's a small red, red angle, those are the hyperbus interface I-O,

09:06.600 --> 09:12.640
as you can see one bank is already gone for hyperbus, it's memory only, and on the right

09:12.640 --> 09:21.000
side the blue, this is only SPI, so almost 50% of all chips are gone for external memory and

09:21.000 --> 09:38.120
flash, all right so that's about the architecture, let's talk about the ASIC flow, so this

09:38.120 --> 09:47.480
is a very, very simplified overview, so you need a steering and you also need your design

09:47.480 --> 09:54.200
files, or RTL, then the open flow is using open road, so we're going to convert those

09:54.200 --> 10:00.640
into NGDS to file, then we need to add metal dummy filler, and then at the end we're going

10:00.640 --> 10:07.040
to do the design road check, all right so, the ATWS, they normally only understand very

10:07.040 --> 10:15.360
lock on VHDL, normally they commercial and open source ATWS only understand very lock, for

10:15.360 --> 10:22.000
example, open road is using uses, LMV is written in SPI and HDL, so you cannot feed your

10:22.000 --> 10:30.080
SPI and HDL files directly into the ATWS, but SPI and HDL is a scalar-based framework to

10:30.080 --> 10:35.920
actually generate very lock or VHDL code, and the advantage is that it's a high level

10:35.920 --> 10:41.200
language, so you can actually use modern programming constructs, but you can also use the

10:41.200 --> 10:45.960
entire scalar or Java infrastructure in the background, for example, you can use file

10:45.960 --> 10:54.440
I-O to actually read data from files, all right so next we need a searing or sometimes

10:54.440 --> 11:01.320
it's called guard ring, that is basically just in ring around the entire chip on all

11:01.320 --> 11:08.120
metal layers, and it's to seal off or to guard the chip itself, it has multiple purposes

11:08.120 --> 11:15.360
during the lifetime or the addition during the lifetime of a chip, so first during manufacturing

11:15.360 --> 11:22.400
it's blocking the ionic contamination during manufacturing, then when it's reducing the

11:22.400 --> 11:30.240
mechanical stress when separating dies from the wafer, it's also stopping cracks going into

11:30.240 --> 11:37.600
the core or the chip from the edges, and during lifetime it's preventing getting moisture

11:37.600 --> 11:45.360
into the chip from the outside, from these edges, and as you can see on this image this

11:45.360 --> 11:52.680
is a cup through of the I HP searing, on the bottom that green is the active layer, then

11:52.680 --> 11:58.400
you have five smaller metal layers and two big top metal layers, and in between are just

11:58.400 --> 12:08.920
the vS, all right so next step is generating the GDS, two file, so we have our searing,

12:08.920 --> 12:13.920
we have RTL, you also need a lot of some config files for open road, but when I ignore

12:13.920 --> 12:21.360
this now, anyhow so the first step is convert the RTL into netless, this is done by using

12:21.360 --> 12:26.720
users, for example with open road, then next is doing the splitting the floor plan and

12:26.720 --> 12:32.480
doing the power grid, and again on this image you can see these squares on the other side

12:32.480 --> 12:40.120
those are the band bond pads, then you have the I-O ring, next to them this is for the I-O

12:40.120 --> 12:45.080
voltage rail and the core voltage rail, and then next to them this is on the inside, this

12:45.080 --> 12:51.280
is the power and crowned ring, and in the inside this is the core area, but we're going

12:51.280 --> 12:57.400
to ignore this because that's not placed yet, this is just the I-O ring bond pads, but

12:57.400 --> 13:03.520
the chip is layout so this size is already defined, so the next step is actually building

13:03.520 --> 13:10.920
the core, so first upload we start placing this standard cells on some estimations, then

13:10.920 --> 13:16.240
it will connect those standard cells, or will build the clock tree, like connecting those

13:16.240 --> 13:20.880
to the standard cells, next is we'll do the global and detailed routing between the

13:20.880 --> 13:28.240
standard cells, and finally it's doing a parasitic extraction, so it's getting the RC and

13:28.240 --> 13:35.160
LL use of those traces, and then doing the timing LL analysis, so if those analysis

13:35.160 --> 13:41.240
if they meet the requirements it's done, but normally that's not on the first run, so it's

13:41.240 --> 13:46.440
going, if there are problems with the parasitic or the electric or the timing, then it's

13:46.440 --> 13:51.400
going back to the placement of standard cells, and it's trying to optimize things and

13:51.400 --> 13:59.000
doing all this again, so it's doing a lot of iterations until under no violations are found,

13:59.000 --> 14:04.600
and finally it's doing the physical verification and then do a sign off, and now you can

14:04.600 --> 14:08.560
see now it's actually placing the searing, that's on this image on the outside, I don't

14:08.560 --> 14:18.800
know, you might see this barely, that's this searing here, so now we have our GDS 2 file,

14:18.800 --> 14:26.160
but it's not a production ready, so next we need to add the problem is that we have

14:26.160 --> 14:32.760
going to go back, so you can see on the, for example, the top metal 2, that's what you see

14:32.760 --> 14:38.520
mostly, there are a lot of bond pads, the irons, they have a lot of metal, there are

14:38.600 --> 14:46.200
some power strips on top metal 2, but there's nothing else on top metal 2, so the metal

14:46.200 --> 14:53.480
density is pretty much undisputed over the entire chip right now, and doing chemical, mechanical

14:53.480 --> 14:59.480
polishing, you get a different thickness of those layers, that's a problem because you want

14:59.480 --> 15:05.000
to build next layers on top, and then you have uneven layers, but you also get actually small

15:05.080 --> 15:12.360
electric traces, which means you have unpredictable electrical characteristics, something

15:12.360 --> 15:20.680
you don't want, and the solution is to add, you have to just add dummy metal into the layout,

15:20.680 --> 15:28.360
you can see this again on the top, this is top metal 2, those are really big shapes, and the

15:28.440 --> 15:35.720
idea is, you just calculate a metal density on a tile, for example, HP is using 8 out of

15:35.720 --> 15:45.080
about 800 micrometer, and you get the metal density, HP is also, they define a minimum and maximum

15:45.080 --> 15:52.200
metal density for this layer, and then you try to fill for desired metal density, and then you

15:52.200 --> 16:00.360
get these metal shapes just into the layout, but you can also see there are some layers, they

16:00.360 --> 16:05.080
don't have metal here in this image because they probably have a lot of routing architecture

16:05.080 --> 16:11.240
in the core, so they don't need metal here, but you can also see on the bottom these green and red

16:12.040 --> 16:18.280
overlapping, those are the active and gate pulley layers, also the transistor layers, obviously

16:18.280 --> 16:32.120
you need metal fill, okay the last step is design rule check, there are a lot of maps on this

16:32.120 --> 16:37.800
in this world, some maps in this world, they always have different tools, they have different

16:37.800 --> 16:43.720
experience, they have different, they call this recipes, they also have different process not,

16:43.720 --> 16:51.560
so this is why they always have unique pretty much unique design rules, and they provide this,

16:51.560 --> 16:55.640
but on the other hand you have tools like open road, they cannot check for all design rules,

16:55.640 --> 17:00.760
because they focus on getting a layout, if they were check for all design rules, this would take

17:00.760 --> 17:09.480
too much time, it's very complex, so the last step of doing a tape already design is basically

17:09.480 --> 17:18.200
running design rule checks, and for this example those are the layout rules for top metal

17:18.200 --> 17:23.800
one, I mean top metal is pretty simple, they are so it's a good example, but this has two

17:23.800 --> 17:32.840
important rules, the first one is it's A, it's inside the highest yellowish shape,

17:33.800 --> 17:43.000
so this defines the minimum top metal one with, and that's 1.64 micrometer, the problem is if it's

17:43.000 --> 17:51.800
too narrow you might get disconnected traces or something else, and the other rule is B, that's the top

17:51.800 --> 17:57.560
minimum top metal one space or notch, so that defines the minimum space between two top metal

17:57.560 --> 18:04.440
one shapes, and that's also 1.64 micrometer, the problems if they're close together you get maybe

18:04.440 --> 18:09.080
short-string production, so that's what you want to make sure that doesn't happen,

18:11.560 --> 18:17.160
yeah that's on simple, that's probably the most exhausting step, because if you're two days ahead

18:17.160 --> 18:23.320
of the tape, this is a hard deadline and you have design rules, you question your life

18:23.880 --> 18:36.120
all right, so this is not using make file, it's LMV's using task file to implement this flow,

18:36.120 --> 18:42.760
the flow is basically divided into four steps, so you need to do a task prepared, this is generating

18:42.760 --> 18:50.200
a generating the very lock code from the SpineHDL, it's also generating the open road

18:50.440 --> 18:58.520
config files, et cetera, then it's generating searing macros, the searing macros, next is just

18:58.520 --> 19:07.240
run task layout, then it's starting open road to generate the GDS25, next is doing the task filler,

19:07.880 --> 19:14.760
now it's adding filler, and then at the end it's just task run DSC and it probably takes the most

19:14.760 --> 19:28.760
time to do the DSC check, and that's basically it, so the funding, the HP funding is over,

19:30.120 --> 19:36.840
since December, so they now provide open silicon WMPW shapples now with this open pdk,

19:37.160 --> 19:46.120
it's 1,500 years per square millimeter, but it's only for open source designs, so I think you can

19:46.120 --> 19:50.360
make them actually, but cadence that's fine, but you need to open those designs, you need to upload them

19:50.360 --> 19:59.800
to the GitHub repository, they have three tape audates, this here now in March, the CMOS that's only

19:59.880 --> 20:09.720
digital, and July the CMOS 5L, so this has five metal layers, it has four normal metal layers

20:09.720 --> 20:17.320
and one top metal, and then October the G2, that's where the high speed bipolar transistors,

20:19.640 --> 20:24.040
yeah and if you're interested you can register or this is the same link on the bottom,

20:24.120 --> 20:32.520
if you don't trust Mike you are good, yeah the minimum requirement is actually one square millimeter

20:32.520 --> 20:39.640
as far as I know, so you can actually make pretty cheap chips for research for your hobby projects,

20:39.640 --> 20:47.800
for your company, the only requirement is has to be open source, also yeah, skip one slide,

20:47.800 --> 20:53.240
all right so for all of you, I reserved seven square millimeter on this next to tape out

20:54.280 --> 21:01.800
shuttle or like to do some performance improvements, for example more hyperbuss performance,

21:02.440 --> 21:10.680
adding CPUs to the vectors 5 based on SRAM blocks and also adding more petitions with different

21:10.760 --> 21:21.480
clocks, so these are if you're interested in doing highly high quality images based on your

21:21.480 --> 21:28.680
GDS 2 files, this is a blender GDS is an important blender add-on for importing GDS 2 files directly

21:28.680 --> 21:37.000
into a blender and then you can use blender's cyclic engine to do high quality images based on

21:37.000 --> 21:47.560
like this is using ray tracing and all this computer vision stuff, and obviously LMFU is also

21:47.560 --> 21:57.080
open source so if you're looking for FPGA based microcontrollers while you want to tape out your

21:57.080 --> 22:04.120
own microcontroller, the LMFU is silicon proven, you can use this on the open source,

22:04.120 --> 22:17.400
silicon report box or issues request features, yeah that's basically it free for it to connect

22:17.400 --> 22:21.080
me on several platforms, thank you

22:21.720 --> 22:28.520
you

22:42.360 --> 22:44.360
memory bus

22:51.080 --> 23:04.440
Yeah, so the question was, if I did performance tests on the BMP versus XC4, I did not.

23:04.440 --> 23:13.480
So far, but I was able to improve the chip clock actually, so before I was actually almost

23:13.480 --> 23:19.080
like 55MHz with the IHP, and now it's getting up to 80.

23:19.080 --> 23:22.080
So I get faster clock now, but I haven't checked.

23:22.080 --> 23:29.560
I assume it's not really bad, because I use the SpineLage, the L implementation of XC4,

23:29.560 --> 23:38.320
which has the shared address bus, so it's just for XC, channels instead of 5, so it's

23:38.320 --> 23:46.160
not an official XC4 implementation, but now I'm done.

23:46.160 --> 23:48.560
That was another question yet.

23:48.560 --> 24:06.200
When you get DRC violations, what do you do when you have the C issues, what should you do?

24:06.200 --> 24:10.120
Sometimes you just have to take a look into open roadfiring sample and see if they're

24:10.120 --> 24:16.080
issues in your config, for example, your bond ports, bond paths could be located too close

24:16.080 --> 24:20.920
to the ceiling, then you just need to increase your chip.

24:20.920 --> 24:29.200
We already had an open roadback, but actually made two big metal sheets between

24:29.200 --> 24:34.480
VRs, and the solution was going into the design and manually fix all those, just make

24:34.480 --> 24:40.480
them bigger, because there was only three, like, 30 violations.

24:40.480 --> 24:46.280
Sometimes you just need to fix this, but mostly it's issue with your configuration.

24:46.280 --> 24:51.280
You get more issues when you do an analog design when you draw everything by yourself,

24:51.280 --> 24:55.320
then you get a lot more issues.

24:55.320 --> 25:01.800
And there's one more, and then, yep.

25:01.800 --> 25:08.520
My question is that threats list by is normally having used for PGHM to make, when you

25:08.520 --> 25:16.320
quoted it to an AC, apart, so it has, normally when you're actually doing a V-dab and all

25:17.320 --> 25:23.320
then when you move it, now you start using your own memory cell and, yeah, whatever,

25:23.320 --> 25:30.000
with the hyperbuss and all, apart from that with you, apart from these many of us, the

25:30.000 --> 25:34.600
placement, do you do any changes in the VR-DL design, or do you have to move it?

25:34.600 --> 25:42.200
Okay, the question was, if I did any changes to the V-dab, so far I did not, so right now

25:42.200 --> 25:51.280
it's using, not as, so it's using just standard cells for the registered V-dab, for

25:51.280 --> 25:59.520
example, but I'm working on the cashiest right now based on these S-RAM blocks, because

25:59.520 --> 26:04.000
otherwise it would be too big here, so that trying to do that now.

26:04.080 --> 26:13.760
HP has no S-RAM blocks that are 3 ports, 32 by, what is this? 32? That's the problem.

26:13.760 --> 26:14.760
It's too small for HP.

26:14.760 --> 26:18.280
I was one question here.

26:18.280 --> 26:23.280
I mentioned what, sorry?

26:23.280 --> 26:33.360
Oh, the mask, I don't know, sorry, the question was, if the mask IS is working, I don't

26:33.360 --> 26:38.480
know because the chip is at HP and they're doing the silicon and POSAR right now, the PCB

26:38.480 --> 26:44.000
and POSAR right now, so I hope I get the chip in four weeks.

26:44.000 --> 26:54.040
I think it was one last question.

26:54.040 --> 26:57.840
Which extension?

26:57.880 --> 27:03.000
The question was, if the V-dab's intention is implemented, I don't know, for this

27:03.000 --> 27:14.040
V-dab 5, might be for the new V-dab 5-2, I don't know, I don't have, I don't have it.

27:14.040 --> 27:19.560
I think I might be someone else here with more familiar with V-dab 5, I don't know

27:19.560 --> 27:25.000
right now.

27:25.000 --> 27:27.520
Is that no question?

27:27.520 --> 27:29.760
Do you have time for one question?

27:29.760 --> 27:40.600
All right, yeah.

27:40.600 --> 27:50.200
Which other parts, a PLL, I got the PLL, but there are mixed signal parts on this chip, no.

27:50.200 --> 27:57.000
The question was, if there are mixed signal parts on this design, no, because PIP has no

27:57.000 --> 28:00.040
PLL right now, they're working on this.

28:00.040 --> 28:05.600
I think that's now the another IHP is publicly funded, so this is everything, it's

28:05.600 --> 28:11.840
to be funded and it's a little bit slower, but they're working on a PLL right now and

28:11.840 --> 28:15.120
POSAR and stuff like this, so they're working on analog designs.

28:15.120 --> 28:18.720
They're working on an ADC also right now, so this is all coming now.

28:18.720 --> 28:25.960
This should be for free to part of this open silicon, so you can pick those from the

28:25.960 --> 28:32.160
analog library, but currently not, unfortunately.

28:32.160 --> 28:37.520
All right, thank you again.