WEBVTT 00:00.000 --> 00:14.160 So, hello everyone, my name is Haralambos, but you can just call me Babies, and today we'll 00:14.160 --> 00:18.360 talk a bit about how we can run these applications as Linux containers. 00:18.360 --> 00:24.560 A bit about us, we are a small company, we are working in various research and commercial 00:24.560 --> 00:28.840 projects, and we are actively contributing in various open-source projects, we are the 00:28.840 --> 00:35.480 core maintainers of urnc, which is a senseiF, some of the project, and we focus 00:35.480 --> 00:42.000 on what we call systems software, hypervisors, container untimes, lightwaters, 00:42.000 --> 00:51.600 utilizers, and in general hardware acceleration, and also about IoT devices on the cloud. 00:51.600 --> 01:00.400 So, I know this is a controversial topic, so I want you to be prepared, we will talk 01:00.400 --> 01:02.640 about Linux and Kubernetes. 01:02.640 --> 01:12.360 No, I'm kidding, we will not talk about anything. 01:12.360 --> 01:19.840 Okay, let me try again, no, it's okay, I'll try to do it like I have here maybe this 01:19.840 --> 01:28.600 light second. 01:28.600 --> 01:38.520 Okay, so, we'll talk about BSD, and we know that there is, of course, we are in the BSD room, 01:38.520 --> 01:47.600 I don't have to convince you that BSD is an operating system that works to look at, there 01:47.600 --> 01:57.440 are specific use cases that BSD is widely used, various distributions of BSD, but the 01:57.440 --> 02:05.720 thing is that what we want to look in this talk is how we can bring BSD in this picture. 02:05.720 --> 02:13.000 So how can we run BSD applications and managing through Kubernetes and over Linux, because 02:13.080 --> 02:18.280 either we like it or not, this is the stack that currently dominates the cloud and most 02:18.280 --> 02:21.800 of the servers that are out there. 02:21.800 --> 02:30.680 So, currently we have various ways, it's still possible to run BSD on the cloud. 02:30.680 --> 02:35.440 Of course, we can have the option of the dedicated server that we just have one any kind 02:35.440 --> 02:41.520 of server that's just dedicated for BSD applications, and then we have to do various networking 02:41.520 --> 02:47.160 things to try to connect it with the rest of the cluster, for example, Kubernetes cluster. 02:47.160 --> 02:51.760 And of course, we also have the option of a virtual machine with a cube variant, we can also 02:51.760 --> 02:56.360 use this virtual machine through Kubernetes, and we can manage it like that, so we can 02:56.360 --> 03:02.040 again have BSD running in Kubernetes over Linux. 03:02.040 --> 03:07.040 But with a worker from Emile, with a worker from other people from FreeBSD that have managed 03:07.040 --> 03:13.280 to make a net BSD and FreeBSD run as a micro VM and boot that fast, there is a new thing 03:13.280 --> 03:14.280 that we can do. 03:14.280 --> 03:17.280 We can run the most microservice. 03:17.280 --> 03:22.040 So the idea here is that we still have the same stack, we have the Kubernetes, we have the 03:22.040 --> 03:27.920 Linux, and then we can have the BSD running the application that we want us a microservice. 03:27.920 --> 03:34.600 And Emile saw that it's so small and so fast that we can boot, so it's and we will 03:34.600 --> 03:37.680 also that this is actually possible. 03:37.680 --> 03:46.180 So the thing here is that what we want to do is that we, of course, have the option of 03:46.180 --> 03:52.120 a full virtual machine, but sometimes we just have a few small applications that we just 03:52.120 --> 03:57.280 want to execute and we want to manage as an application, like a container, what happens 03:57.280 --> 03:59.160 for example. 03:59.160 --> 04:04.080 So in this kind of scenario, full virtual machine doesn't really get us there, because 04:04.080 --> 04:09.160 the full virtual machine will have to need to have a full BSD operating system running 04:09.160 --> 04:14.040 there and as a result, we might have a lot of unnecessary services, but run on the same 04:14.040 --> 04:24.400 time with applications, stilling resources and yeah, also increases the maintenance cost. 04:24.400 --> 04:29.600 But what we do in UranC and the with other kernel shelter is that we have the concept 04:29.600 --> 04:31.840 of a single application kernel. 04:31.840 --> 04:36.040 So a single application kernel is just a kernel that is configured for a single specific 04:36.040 --> 04:38.560 purpose. 04:38.560 --> 04:44.280 It can, it is supposed to run only single service and this is how the root of S is built 04:44.280 --> 04:46.840 only just for one application. 04:46.840 --> 04:52.440 And as you can understand, this has a lower resource usage, less noise for mother services 04:52.440 --> 04:55.200 that don't exist anymore. 04:55.200 --> 05:00.920 And we just need to maintain only when it says everything that we actually require for this 05:00.920 --> 05:04.800 service. 05:04.800 --> 05:07.200 But there is also the case of unicarnals. 05:07.200 --> 05:14.160 So run run based on it is D is the most, it's one of the first unicarnal frameworks 05:14.160 --> 05:19.600 that existed and anti-hatt made the great work on that. 05:19.600 --> 05:25.920 So the concept here in that is that we have a specialized kernel, a specialized library 05:25.920 --> 05:31.280 operating system, which we can link directly against with our application and therefore 05:31.280 --> 05:33.960 we have everything under one binary. 05:33.960 --> 05:36.000 This is extremely small and extremely fast. 05:36.000 --> 05:40.000 We don't have any separation between the user space and the kernel space. 05:40.000 --> 05:48.360 So everything is just a library call, we can boot very fast and as you can understand, 05:48.360 --> 05:55.480 it has much better improvement in regards to performance, resource consumption. 05:55.480 --> 06:00.120 But we have to face that it's not the easiest frameworks to work with. 06:00.120 --> 06:05.080 Unicarnals require a lot of time to port applications on them and the benefit of 06:05.080 --> 06:10.080 specialization comes with a cost of porting applications to them. 06:10.080 --> 06:18.440 So this idea sounds good, but the problem is how can we build these things so we can use them 06:18.440 --> 06:26.120 as in Kubernetes and how we can deploy them later on Kubernetes. 06:26.120 --> 06:31.720 So the concept is pretty simple, we'll just build them like containers. 06:31.720 --> 06:38.240 And what we mean build them like containers is that we will have to pack it as an OCI 06:39.200 --> 06:48.720 So for that purpose we have implemented bunny, bunny is a new framework for which is based 06:48.720 --> 06:55.360 on build kit from Docker and it allows us to build any kind of kernel, any kind of 06:55.360 --> 07:00.560 library operating system in the same way that we do with containers. 07:00.560 --> 07:04.880 So basically we have a container like experience, we can do a Docker build and this will build 07:04.880 --> 07:08.000 the kernel for us. 07:08.800 --> 07:14.160 The purpose of bunny is to provide, let's say, a uniform experience for all kind of 07:14.160 --> 07:19.760 library operating systems that existed there, but even for generic kernels and to try to 07:19.760 --> 07:23.520 make this kind of dependency hell that we have, for example, with different frameworks, 07:23.520 --> 07:29.520 different kind of operating systems and the tools that we need and everything it tries to make 07:29.520 --> 07:32.800 the cost smaller there. 07:33.280 --> 07:41.680 Now how does it work? So everything is declared as a YAMO and bunny reads this YAMO 07:41.680 --> 07:46.320 and based on the framework it has to build, even if it is a Linux kernel, if it is a BSD kernel, 07:46.320 --> 07:52.880 if it is a Rampron, if it is MirasOS, whatever it is, it fits the building layers, 07:52.880 --> 07:58.480 this can be the tools that will be needed to build this kernel or it can be any other kind of 07:59.440 --> 08:07.760 utilities that are required, it will build the kernel like any other kind of system and later 08:07.760 --> 08:14.160 it will package everything as a NOSI image. So the packaging takes place depending on if we have 08:14.160 --> 08:20.560 a kernel or if we have a kernel. So in the greening things it is what we have in the case of a 08:20.560 --> 08:26.720 unicarnal, because in the unicarnal we only have the kernel, the libraries in the application 08:26.720 --> 08:32.000 are pockets only at once, squads everything is together as single binary and then of course 08:32.000 --> 08:36.080 we need the configuration files for our application. Now in the case of single application 08:36.080 --> 08:40.880 kernels we have different kind of layers because we also have the dependencies of the application 08:40.880 --> 08:45.440 and the kernel that we need to execute. So everything's all these things together are pockets 08:45.440 --> 08:54.160 in the OCI image. So let's see how this works. So we have here for example 08:54.240 --> 09:03.920 so I'm here and for example I can go here in small bsd. So we have here for example I have already 09:04.480 --> 09:16.160 got the kernel from small bsd and I also have the base image that from again from small bsd. 09:16.160 --> 09:23.760 So this is the isle image and the the block file, that's all the block and we have a two ways 09:23.760 --> 09:30.080 that we can declare the how we can package this thing. The first way I hope it's visible now. 09:31.920 --> 09:37.920 The first way is just using a traditional Docker file way that we just copy the stuff inside 09:37.920 --> 09:41.520 and we just have to specify some annotations for the runtime that I will explain later. 09:42.560 --> 09:48.000 And the other way is with the bunny file that explained before that we have here for example a 09:48.080 --> 09:53.520 yamu file and we declare what we want to include inside the root of this. So here we declare that we 09:53.520 --> 10:00.400 have a kernel that exists locally and it's a net bsd. Here is a path of the kernel. We declare that 10:00.400 --> 10:07.600 we have the framework that we want to use this net bsd. It will run on top of chemo and we have this 10:07.600 --> 10:11.680 root of phase that we will build from scratch. It will be a row which means that we will just copy 10:11.680 --> 10:20.240 everything in the OCI mods and yeah I have already created here I have mounted the blocker that I 10:20.240 --> 10:28.160 said before. So yeah let me try and build this for example. So what I do is just a Docker build. 10:29.040 --> 10:33.440 And this will fetch everything that is required to create this kind of OCI image for us. 10:34.960 --> 10:39.840 Now it will take a lot of time to okay a lot of stuff we're cast so we didn't build anything. 10:40.800 --> 10:46.720 But we put statements here and we have this kind of image that we will deploy later. 10:47.760 --> 10:52.000 So let's go back to we will play a bit more later with that. 10:53.840 --> 10:59.280 So the question here is okay now we have an OCI bundle. We have in our OCI mods we have the 11:00.000 --> 11:03.280 kernel that we need. We have the root of phase for the application. We have the application. We have 11:03.280 --> 11:08.080 everything. How do we run this thing? Because we cannot just do it with Docker or something like that. 11:08.800 --> 11:16.400 Well we can. We can do that with currency. So currency is the and you continue run time 11:17.600 --> 11:23.120 focusing on Linux and it tries to be the runc for unique kernels and single application 11:23.120 --> 11:29.520 kernels is a sensitive sample project as I said before. It's a here I compatible run time. 11:30.080 --> 11:34.320 So we can it's written from scratch. That was a bad idea we tried to change it. 11:34.320 --> 11:41.120 And we the goal here is to support a variety of guests and monitors. So we can have source for 11:41.120 --> 11:46.880 base monitors. We can have a virtualized base monitors like TEMO like that they use KVM but also 11:46.880 --> 11:52.640 software based think about it like in the divider if you're familiar like using second filters using 11:52.640 --> 12:00.800 some more restrictive software based things. And it's very extensible so anyone can easily add 12:00.800 --> 12:07.360 a new monitor as anyone can easily add a new guest. So the key differences of year and 12:07.360 --> 12:11.840 second pair to other kind of deployment models that we have is that in case of runc for example 12:11.840 --> 12:18.240 in actual container. The application is just executing inside the container. Runc just creates 12:18.240 --> 12:25.280 this Linux container using namespaces and figures. CubeVirt is a different kind of philosophy 12:25.440 --> 12:32.480 that it deploys a full virtual machine for us in Kubernetes inside the pod. Then we have a 12:32.480 --> 12:37.840 case of cut-a-container which is a sandbox container on time. And the idea here is that cut-a creates 12:37.840 --> 12:42.560 a virtual machine inside this virtual machine it will create the container. It's more for security 12:42.560 --> 12:48.560 purposes. And then we have the case of here and see which takes a completely different approach 12:48.560 --> 12:54.400 that we create the virtual machine inside a container. So inside the Linux container we have the 12:54.480 --> 12:58.400 virtual machine that will be executed. And inside this virtual machine we will have the actual 12:58.400 --> 13:08.640 application like a microwave. But only for specialized purpose. So how does this work with BSD? 13:08.640 --> 13:14.160 So what happens in the case of BSD is that we can have for example runc run which is based on 13:14.160 --> 13:19.680 the BSD as a unit kernel. So everything is just packed in the kernel. And we also have the case of 13:20.560 --> 13:27.280 for example small BSD that we saw before that we have the root of FSD and the kernel that 13:28.160 --> 13:37.280 it's packed in the OCI. So what Uranc is doing is that we construct a Linux container. We use 13:37.280 --> 13:41.760 namespaces, we use C-groups, we use all the stuff that we will do in a normal container. 13:41.760 --> 13:46.240 This will happen again. But inside this container we have created a very, very tiny environment 13:47.200 --> 13:53.120 depending on the monitor. If we have far cracker, chemo, software based monitor. 13:53.120 --> 13:57.200 These all these kind of you have different dependencies for the execution. So we prepare this kind of 13:57.200 --> 14:02.400 environment. So we can boot the virtual machine and inside this virtual machine we will run our 14:02.400 --> 14:07.440 application as the init process. So we have directly direct execution of the application. 14:07.440 --> 14:13.440 There is nothing inside the VM that is part of the Uranc. No agent. Maybe an init that will help us 14:13.440 --> 14:22.240 I will talk about later about how an init can help us make this work properly. And the idea 14:22.240 --> 14:29.680 how it works in Kubernetes is that we create a pod and we only like all the side of the containers 14:29.680 --> 14:35.120 will run as actual containers, but only our application will run in let's say this virtualized 14:35.120 --> 14:43.840 environment. So this way we can also separate the actual containers, the user container which is 14:44.560 --> 14:50.320 not let's say trusted with the rest of the containers in the pod like side of the containers 14:50.320 --> 14:54.240 that can be part of the deployment of our deployment that is supposed to be secure. 14:55.680 --> 15:00.560 So I showed before here that we created this little 15:01.520 --> 15:08.480 image. So I will try now to execute it here in this. So I will just do a simple 15:08.480 --> 15:12.560 nerd CTL is the third placement for Docker. I use nerd CTL because I can have 15:12.560 --> 15:18.160 easier access to DevMapper. DevMapper is a snapshot that allows us to create block-based 15:18.720 --> 15:24.080 root-of-face for our container and here on second get this block-based root-of-face and directly 15:24.080 --> 15:31.680 attach it to the VM. So I will just do this one. I will specify that we want to execute 15:31.680 --> 15:42.400 of your RNC. I will have the DevMapper snapshoter and I will run this one. I hope I didn't forget anything. 15:43.360 --> 15:58.400 Yeah, I forgot anything. And let's pull it. And here you can see that we have booted the 16:00.000 --> 16:06.560 this small BSD. Yeah, this is forget about these, there's some debug stuff that I was doing. 16:06.720 --> 16:13.920 I will explain later why. And we can see that we have also, we don't have any network here, 16:13.920 --> 16:23.440 but if I get here, you can see that this one is looking like a container. I can inspect it. 16:23.440 --> 16:37.920 And oops. Yeah. So if I go here and 16:38.560 --> 16:53.600 I will be able to do this one. Yeah, this one should be what I said before that this is part of what 16:53.600 --> 16:59.840 the init should have done for us, but right now we don't have any init. And if I add here 16:59.840 --> 17:13.600 okay, so I can even ping anything hopefully. Yeah. So this is just running here. So let me go 17:13.600 --> 17:22.080 back to the presentation. The same thing I can do it with a free BSD. We have support for free BSD. 17:22.080 --> 17:28.080 We added it to works in the same way. Maybe I can try and yeah, if we have time I will also 17:28.080 --> 17:34.000 it later. So the use case for this kind of thing is that we can, as I said before, we can deploy 17:34.000 --> 17:39.600 this kind of small BSD microservices in Kubernetes. And we can manage them with any other kind of 17:39.600 --> 17:44.000 any other kind of container. We can run into with K native, so we can have a serverless 17:44.640 --> 17:51.600 stack that built the spawns of BSD application. We can also use it for BSD development in Linux. 17:52.560 --> 17:59.520 So another thing is that we can, for example, run some of the stuff that we want to create 17:59.520 --> 18:05.200 for BSD in Linux, so we can create this kind of VM as a dev environment that we can have access 18:05.200 --> 18:12.400 and we can build whatever we want there. And we can also have it, I don't know why this is, it shouldn't 18:12.400 --> 18:21.600 be twice. And yeah, we also did a small, very early evaluation. This is, this evaluation has been 18:23.600 --> 18:32.880 for the past two weeks and our results need to work a bit more and we try to retake them 18:32.880 --> 18:40.400 because we had some very weird behavior with both free BSD and the small BSD case. 18:40.400 --> 18:48.560 And we have an included, for example, here the case of small BSD. So I will explain why. 18:49.360 --> 18:53.680 So this is the start of time that we go to from service. So to do this kind of Ben Spark, 18:54.560 --> 19:03.760 we created a small server that keeps the timestamp of two things. So when we create the container, 19:04.400 --> 19:09.600 the server gets a timestamp and when the container boots and the application directly connects 19:09.680 --> 19:15.920 with a server and we get another timestamp. So we have the duration that, from the moment we 19:15.920 --> 19:22.400 had the first timestamp, we invoked the container and when the container was able to respond, 19:22.400 --> 19:28.080 they use the actual application. And as you can see, run C is a typical container, 19:28.800 --> 19:35.600 it can boot in around 400 milliseconds, the spec is just a small nuke machine, so just a very small 19:35.680 --> 19:42.080 device. The catacam we also can see, it exploded, it's a lot of overhead. And then you can see that 19:42.080 --> 19:50.800 you can see either booting Linux or a chemo. It's okay, but even if we go further and use 19:50.800 --> 19:56.720 run prem for example, it can boot even faster and also the case with free BSD. And I didn't 19:56.720 --> 20:01.440 include small BSD because we had some issues and we, for some reason, the user land was taking 20:01.520 --> 20:07.120 much more time to boot compared to the kernel. And most probably we have screwed up the configuration 20:07.120 --> 20:12.080 there, we have done something very wrong and we would be happy if we could work with, for example, 20:12.080 --> 20:18.400 with a mill later and we can discuss about it and I'm pretty sure that these numbers for a 20:18.400 --> 20:22.960 small BSD can easily be down there to what we see for a rampant for example. 20:23.920 --> 20:35.600 And we also did some kind of networking. So here is the packets per second that we sent. 20:35.600 --> 20:42.400 We have the hyper server and the hyper client. Of course, the battery is opposite, it's not 20:42.400 --> 20:48.880 a bit like that. Higher is the better and we measure the packets per second. This is not a good 20:48.880 --> 20:53.840 evaluation setup, we know it, we will try to do it better, we just wanted to get a glimpse of 20:53.840 --> 20:59.920 what happens. And as you can see, all the BSD versions are running much better than all the links ones. 21:04.160 --> 21:08.160 However, yeah, we had some issues with the TCP stack and stuff, but hopefully we will 21:09.280 --> 21:15.360 with the help of you, we can easily resolve that. So regarding the future things that we want to do, 21:15.920 --> 21:20.880 we want to actually this, the first bullet has already been done, I guess, from the people from 21:20.880 --> 21:26.880 small BSD of what we saw today, things can be way much faster. We want to have integration with 21:26.880 --> 21:32.400 OCI images from FreeBSD that exists and what we're going to do here is just get a FreeBSD OCI 21:32.400 --> 21:37.120 image and use it as the root of S that we're going to run over a FreeBSD kernel. 21:38.640 --> 21:43.040 And we want to find some ways to pass some specific information to the inside the VM, 21:43.680 --> 21:50.160 SWG would be an option there. We need the E need to set up the environment like the user, 21:50.160 --> 21:56.800 for example, the UID, GID, the IP, all this kind of stuff. We can also use Docker build Docker to 21:56.800 --> 22:04.880 actually build, like as I said, as I said, so before with Bunny, we also want to explore integration 22:04.880 --> 22:11.520 with volumes, so we can easily attach stuff in this kind of containers, in the VMs, and of course, 22:11.600 --> 22:15.280 it will be very happy, we have a done it in the past, we will be very happy to update the 22:15.280 --> 22:22.080 run brand with newer net BSD versions. I would like to note that this work is partially funded 22:22.080 --> 22:31.920 from some European projects. And to summarize, we know that BSD has some significant benefits for 22:31.920 --> 22:39.200 certain workloads, recent work in the BSD community has allowed us to run BSD to boot BSD in 22:39.200 --> 22:45.680 micro VMs in extremely fast way, and we can take advantage of that and actually package BSD applications 22:45.680 --> 22:51.520 and deploy them, like any other kind of container that exists, I mean, any other kind of linear 22:51.520 --> 22:59.760 component. And we can do that with Uran C and build it with Bunny. Everything is of course open source, 22:59.760 --> 23:06.240 as I said, this is a CNCF sandbox project. You can find everything in GID hub, you can join the 23:07.120 --> 23:15.360 Slack channel that we have. We have the documentation and the parts for BSD. We will hopefully 23:16.000 --> 23:24.640 with help with our BSD people will finally merge them later. And yeah, so that's all from my side. 23:26.000 --> 23:31.040 We have one and a half minutes, so if you have any questions, I'll be happy to answer to them. 23:36.240 --> 23:47.600 Okay, there is one question there. 23:48.560 --> 24:09.440 Well, not really because Uran C doesn't target right now the BSD. Oh, sorry, I had to repeat the 24:09.440 --> 24:17.840 question. So the question is that currently we can use a podman to run BSD container as a Linux container, right? 24:23.680 --> 24:29.920 On free BSD, yes. And yes, with Uran C you can do the thing that you can run a free BSD container as a Linux 24:30.080 --> 24:40.160 container. yes, it would not recommend use for that. Currently we don't have support for that. 24:40.160 --> 24:45.520 But yes, that would be a nice concert for us. And we would like to explore it using, for example, 24:45.520 --> 24:51.200 like a perspective, some books like Mac and it's not existing, BSD, B5 and stuff like that. 24:59.920 --> 25:06.360 They run Kubernetes, will they, theoretically, also run the, for example, in an existing 25:06.360 --> 25:08.360 point one or Docker set up? 25:08.360 --> 25:09.360 Yes. 25:09.360 --> 25:13.080 If the question is, if these containers can run in Kubernetes, Docker, 25:13.080 --> 25:16.600 podmon, yeah, the demo that I showed before, it's NerdStale. 25:16.600 --> 25:21.160 NerdStale is from the container, the community, that it's just 25:21.160 --> 25:24.160 tried to replicate everything that Docker does. 25:24.160 --> 25:27.360 So yeah, it's going to run both in Docker. 25:27.360 --> 25:31.880 We don't have, we have some issues with podmon, but we'll resolve them in this 25:31.880 --> 25:32.880 release. 25:32.880 --> 25:36.760 There is not, I mean, there is not an app dependency on the door, I see. 25:36.760 --> 25:39.840 There is a dependency on the urans, like this kind of images, they have to be 25:39.840 --> 25:41.160 around on top of uransy. 25:41.160 --> 25:52.000 Oh, so for example, in Docker, I'll show the work, it will go, it's just already, yeah, we have 25:52.000 --> 25:58.080 been in the other time, but I can, we can, you can just ask this, yeah, this question. 25:58.080 --> 25:59.280 Okay, thank you. 25:59.280 --> 26:00.280 Thank you.