WEBVTT

00:00.000 --> 00:14.160
So, hello everyone, my name is Haralambos, but you can just call me Babies, and today we'll

00:14.160 --> 00:18.360
talk a bit about how we can run these applications as Linux containers.

00:18.360 --> 00:24.560
A bit about us, we are a small company, we are working in various research and commercial

00:24.560 --> 00:28.840
projects, and we are actively contributing in various open-source projects, we are the

00:28.840 --> 00:35.480
core maintainers of urnc, which is a senseiF, some of the project, and we focus

00:35.480 --> 00:42.000
on what we call systems software, hypervisors, container untimes, lightwaters,

00:42.000 --> 00:51.600
utilizers, and in general hardware acceleration, and also about IoT devices on the cloud.

00:51.600 --> 01:00.400
So, I know this is a controversial topic, so I want you to be prepared, we will talk

01:00.400 --> 01:02.640
about Linux and Kubernetes.

01:02.640 --> 01:12.360
No, I'm kidding, we will not talk about anything.

01:12.360 --> 01:19.840
Okay, let me try again, no, it's okay, I'll try to do it like I have here maybe this

01:19.840 --> 01:28.600
light second.

01:28.600 --> 01:38.520
Okay, so, we'll talk about BSD, and we know that there is, of course, we are in the BSD room,

01:38.520 --> 01:47.600
I don't have to convince you that BSD is an operating system that works to look at, there

01:47.600 --> 01:57.440
are specific use cases that BSD is widely used, various distributions of BSD, but the

01:57.440 --> 02:05.720
thing is that what we want to look in this talk is how we can bring BSD in this picture.

02:05.720 --> 02:13.000
So how can we run BSD applications and managing through Kubernetes and over Linux, because

02:13.080 --> 02:18.280
either we like it or not, this is the stack that currently dominates the cloud and most

02:18.280 --> 02:21.800
of the servers that are out there.

02:21.800 --> 02:30.680
So, currently we have various ways, it's still possible to run BSD on the cloud.

02:30.680 --> 02:35.440
Of course, we can have the option of the dedicated server that we just have one any kind

02:35.440 --> 02:41.520
of server that's just dedicated for BSD applications, and then we have to do various networking

02:41.520 --> 02:47.160
things to try to connect it with the rest of the cluster, for example, Kubernetes cluster.

02:47.160 --> 02:51.760
And of course, we also have the option of a virtual machine with a cube variant, we can also

02:51.760 --> 02:56.360
use this virtual machine through Kubernetes, and we can manage it like that, so we can

02:56.360 --> 03:02.040
again have BSD running in Kubernetes over Linux.

03:02.040 --> 03:07.040
But with a worker from Emile, with a worker from other people from FreeBSD that have managed

03:07.040 --> 03:13.280
to make a net BSD and FreeBSD run as a micro VM and boot that fast, there is a new thing

03:13.280 --> 03:14.280
that we can do.

03:14.280 --> 03:17.280
We can run the most microservice.

03:17.280 --> 03:22.040
So the idea here is that we still have the same stack, we have the Kubernetes, we have the

03:22.040 --> 03:27.920
Linux, and then we can have the BSD running the application that we want us a microservice.

03:27.920 --> 03:34.600
And Emile saw that it's so small and so fast that we can boot, so it's and we will

03:34.600 --> 03:37.680
also that this is actually possible.

03:37.680 --> 03:46.180
So the thing here is that what we want to do is that we, of course, have the option of

03:46.180 --> 03:52.120
a full virtual machine, but sometimes we just have a few small applications that we just

03:52.120 --> 03:57.280
want to execute and we want to manage as an application, like a container, what happens

03:57.280 --> 03:59.160
for example.

03:59.160 --> 04:04.080
So in this kind of scenario, full virtual machine doesn't really get us there, because

04:04.080 --> 04:09.160
the full virtual machine will have to need to have a full BSD operating system running

04:09.160 --> 04:14.040
there and as a result, we might have a lot of unnecessary services, but run on the same

04:14.040 --> 04:24.400
time with applications, stilling resources and yeah, also increases the maintenance cost.

04:24.400 --> 04:29.600
But what we do in UranC and the with other kernel shelter is that we have the concept

04:29.600 --> 04:31.840
of a single application kernel.

04:31.840 --> 04:36.040
So a single application kernel is just a kernel that is configured for a single specific

04:36.040 --> 04:38.560
purpose.

04:38.560 --> 04:44.280
It can, it is supposed to run only single service and this is how the root of S is built

04:44.280 --> 04:46.840
only just for one application.

04:46.840 --> 04:52.440
And as you can understand, this has a lower resource usage, less noise for mother services

04:52.440 --> 04:55.200
that don't exist anymore.

04:55.200 --> 05:00.920
And we just need to maintain only when it says everything that we actually require for this

05:00.920 --> 05:04.800
service.

05:04.800 --> 05:07.200
But there is also the case of unicarnals.

05:07.200 --> 05:14.160
So run run based on it is D is the most, it's one of the first unicarnal frameworks

05:14.160 --> 05:19.600
that existed and anti-hatt made the great work on that.

05:19.600 --> 05:25.920
So the concept here in that is that we have a specialized kernel, a specialized library

05:25.920 --> 05:31.280
operating system, which we can link directly against with our application and therefore

05:31.280 --> 05:33.960
we have everything under one binary.

05:33.960 --> 05:36.000
This is extremely small and extremely fast.

05:36.000 --> 05:40.000
We don't have any separation between the user space and the kernel space.

05:40.000 --> 05:48.360
So everything is just a library call, we can boot very fast and as you can understand,

05:48.360 --> 05:55.480
it has much better improvement in regards to performance, resource consumption.

05:55.480 --> 06:00.120
But we have to face that it's not the easiest frameworks to work with.

06:00.120 --> 06:05.080
Unicarnals require a lot of time to port applications on them and the benefit of

06:05.080 --> 06:10.080
specialization comes with a cost of porting applications to them.

06:10.080 --> 06:18.440
So this idea sounds good, but the problem is how can we build these things so we can use them

06:18.440 --> 06:26.120
as in Kubernetes and how we can deploy them later on Kubernetes.

06:26.120 --> 06:31.720
So the concept is pretty simple, we'll just build them like containers.

06:31.720 --> 06:38.240
And what we mean build them like containers is that we will have to pack it as an OCI

06:39.200 --> 06:48.720
So for that purpose we have implemented bunny, bunny is a new framework for which is based

06:48.720 --> 06:55.360
on build kit from Docker and it allows us to build any kind of kernel, any kind of

06:55.360 --> 07:00.560
library operating system in the same way that we do with containers.

07:00.560 --> 07:04.880
So basically we have a container like experience, we can do a Docker build and this will build

07:04.880 --> 07:08.000
the kernel for us.

07:08.800 --> 07:14.160
The purpose of bunny is to provide, let's say, a uniform experience for all kind of

07:14.160 --> 07:19.760
library operating systems that existed there, but even for generic kernels and to try to

07:19.760 --> 07:23.520
make this kind of dependency hell that we have, for example, with different frameworks,

07:23.520 --> 07:29.520
different kind of operating systems and the tools that we need and everything it tries to make

07:29.520 --> 07:32.800
the cost smaller there.

07:33.280 --> 07:41.680
Now how does it work? So everything is declared as a YAMO and bunny reads this YAMO

07:41.680 --> 07:46.320
and based on the framework it has to build, even if it is a Linux kernel, if it is a BSD kernel,

07:46.320 --> 07:52.880
if it is a Rampron, if it is MirasOS, whatever it is, it fits the building layers,

07:52.880 --> 07:58.480
this can be the tools that will be needed to build this kernel or it can be any other kind of

07:59.440 --> 08:07.760
utilities that are required, it will build the kernel like any other kind of system and later

08:07.760 --> 08:14.160
it will package everything as a NOSI image. So the packaging takes place depending on if we have

08:14.160 --> 08:20.560
a kernel or if we have a kernel. So in the greening things it is what we have in the case of a

08:20.560 --> 08:26.720
unicarnal, because in the unicarnal we only have the kernel, the libraries in the application

08:26.720 --> 08:32.000
are pockets only at once, squads everything is together as single binary and then of course

08:32.000 --> 08:36.080
we need the configuration files for our application. Now in the case of single application

08:36.080 --> 08:40.880
kernels we have different kind of layers because we also have the dependencies of the application

08:40.880 --> 08:45.440
and the kernel that we need to execute. So everything's all these things together are pockets

08:45.440 --> 08:54.160
in the OCI image. So let's see how this works. So we have here for example

08:54.240 --> 09:03.920
so I'm here and for example I can go here in small bsd. So we have here for example I have already

09:04.480 --> 09:16.160
got the kernel from small bsd and I also have the base image that from again from small bsd.

09:16.160 --> 09:23.760
So this is the isle image and the the block file, that's all the block and we have a two ways

09:23.760 --> 09:30.080
that we can declare the how we can package this thing. The first way I hope it's visible now.

09:31.920 --> 09:37.920
The first way is just using a traditional Docker file way that we just copy the stuff inside

09:37.920 --> 09:41.520
and we just have to specify some annotations for the runtime that I will explain later.

09:42.560 --> 09:48.000
And the other way is with the bunny file that explained before that we have here for example a

09:48.080 --> 09:53.520
yamu file and we declare what we want to include inside the root of this. So here we declare that we

09:53.520 --> 10:00.400
have a kernel that exists locally and it's a net bsd. Here is a path of the kernel. We declare that

10:00.400 --> 10:07.600
we have the framework that we want to use this net bsd. It will run on top of chemo and we have this

10:07.600 --> 10:11.680
root of phase that we will build from scratch. It will be a row which means that we will just copy

10:11.680 --> 10:20.240
everything in the OCI mods and yeah I have already created here I have mounted the blocker that I

10:20.240 --> 10:28.160
said before. So yeah let me try and build this for example. So what I do is just a Docker build.

10:29.040 --> 10:33.440
And this will fetch everything that is required to create this kind of OCI image for us.

10:34.960 --> 10:39.840
Now it will take a lot of time to okay a lot of stuff we're cast so we didn't build anything.

10:40.800 --> 10:46.720
But we put statements here and we have this kind of image that we will deploy later.

10:47.760 --> 10:52.000
So let's go back to we will play a bit more later with that.

10:53.840 --> 10:59.280
So the question here is okay now we have an OCI bundle. We have in our OCI mods we have the

11:00.000 --> 11:03.280
kernel that we need. We have the root of phase for the application. We have the application. We have

11:03.280 --> 11:08.080
everything. How do we run this thing? Because we cannot just do it with Docker or something like that.

11:08.800 --> 11:16.400
Well we can. We can do that with currency. So currency is the and you continue run time

11:17.600 --> 11:23.120
focusing on Linux and it tries to be the runc for unique kernels and single application

11:23.120 --> 11:29.520
kernels is a sensitive sample project as I said before. It's a here I compatible run time.

11:30.080 --> 11:34.320
So we can it's written from scratch. That was a bad idea we tried to change it.

11:34.320 --> 11:41.120
And we the goal here is to support a variety of guests and monitors. So we can have source for

11:41.120 --> 11:46.880
base monitors. We can have a virtualized base monitors like TEMO like that they use KVM but also

11:46.880 --> 11:52.640
software based think about it like in the divider if you're familiar like using second filters using

11:52.640 --> 12:00.800
some more restrictive software based things. And it's very extensible so anyone can easily add

12:00.800 --> 12:07.360
a new monitor as anyone can easily add a new guest. So the key differences of year and

12:07.360 --> 12:11.840
second pair to other kind of deployment models that we have is that in case of runc for example

12:11.840 --> 12:18.240
in actual container. The application is just executing inside the container. Runc just creates

12:18.240 --> 12:25.280
this Linux container using namespaces and figures. CubeVirt is a different kind of philosophy

12:25.440 --> 12:32.480
that it deploys a full virtual machine for us in Kubernetes inside the pod. Then we have a

12:32.480 --> 12:37.840
case of cut-a-container which is a sandbox container on time. And the idea here is that cut-a creates

12:37.840 --> 12:42.560
a virtual machine inside this virtual machine it will create the container. It's more for security

12:42.560 --> 12:48.560
purposes. And then we have the case of here and see which takes a completely different approach

12:48.560 --> 12:54.400
that we create the virtual machine inside a container. So inside the Linux container we have the

12:54.480 --> 12:58.400
virtual machine that will be executed. And inside this virtual machine we will have the actual

12:58.400 --> 13:08.640
application like a microwave. But only for specialized purpose. So how does this work with BSD?

13:08.640 --> 13:14.160
So what happens in the case of BSD is that we can have for example runc run which is based on

13:14.160 --> 13:19.680
the BSD as a unit kernel. So everything is just packed in the kernel. And we also have the case of

13:20.560 --> 13:27.280
for example small BSD that we saw before that we have the root of FSD and the kernel that

13:28.160 --> 13:37.280
it's packed in the OCI. So what Uranc is doing is that we construct a Linux container. We use

13:37.280 --> 13:41.760
namespaces, we use C-groups, we use all the stuff that we will do in a normal container.

13:41.760 --> 13:46.240
This will happen again. But inside this container we have created a very, very tiny environment

13:47.200 --> 13:53.120
depending on the monitor. If we have far cracker, chemo, software based monitor.

13:53.120 --> 13:57.200
These all these kind of you have different dependencies for the execution. So we prepare this kind of

13:57.200 --> 14:02.400
environment. So we can boot the virtual machine and inside this virtual machine we will run our

14:02.400 --> 14:07.440
application as the init process. So we have directly direct execution of the application.

14:07.440 --> 14:13.440
There is nothing inside the VM that is part of the Uranc. No agent. Maybe an init that will help us

14:13.440 --> 14:22.240
I will talk about later about how an init can help us make this work properly. And the idea

14:22.240 --> 14:29.680
how it works in Kubernetes is that we create a pod and we only like all the side of the containers

14:29.680 --> 14:35.120
will run as actual containers, but only our application will run in let's say this virtualized

14:35.120 --> 14:43.840
environment. So this way we can also separate the actual containers, the user container which is

14:44.560 --> 14:50.320
not let's say trusted with the rest of the containers in the pod like side of the containers

14:50.320 --> 14:54.240
that can be part of the deployment of our deployment that is supposed to be secure.

14:55.680 --> 15:00.560
So I showed before here that we created this little

15:01.520 --> 15:08.480
image. So I will try now to execute it here in this. So I will just do a simple

15:08.480 --> 15:12.560
nerd CTL is the third placement for Docker. I use nerd CTL because I can have

15:12.560 --> 15:18.160
easier access to DevMapper. DevMapper is a snapshot that allows us to create block-based

15:18.720 --> 15:24.080
root-of-face for our container and here on second get this block-based root-of-face and directly

15:24.080 --> 15:31.680
attach it to the VM. So I will just do this one. I will specify that we want to execute

15:31.680 --> 15:42.400
of your RNC. I will have the DevMapper snapshoter and I will run this one. I hope I didn't forget anything.

15:43.360 --> 15:58.400
Yeah, I forgot anything. And let's pull it. And here you can see that we have booted the

16:00.000 --> 16:06.560
this small BSD. Yeah, this is forget about these, there's some debug stuff that I was doing.

16:06.720 --> 16:13.920
I will explain later why. And we can see that we have also, we don't have any network here,

16:13.920 --> 16:23.440
but if I get here, you can see that this one is looking like a container. I can inspect it.

16:23.440 --> 16:37.920
And oops. Yeah. So if I go here and

16:38.560 --> 16:53.600
I will be able to do this one. Yeah, this one should be what I said before that this is part of what

16:53.600 --> 16:59.840
the init should have done for us, but right now we don't have any init. And if I add here

16:59.840 --> 17:13.600
okay, so I can even ping anything hopefully. Yeah. So this is just running here. So let me go

17:13.600 --> 17:22.080
back to the presentation. The same thing I can do it with a free BSD. We have support for free BSD.

17:22.080 --> 17:28.080
We added it to works in the same way. Maybe I can try and yeah, if we have time I will also

17:28.080 --> 17:34.000
it later. So the use case for this kind of thing is that we can, as I said before, we can deploy

17:34.000 --> 17:39.600
this kind of small BSD microservices in Kubernetes. And we can manage them with any other kind of

17:39.600 --> 17:44.000
any other kind of container. We can run into with K native, so we can have a serverless

17:44.640 --> 17:51.600
stack that built the spawns of BSD application. We can also use it for BSD development in Linux.

17:52.560 --> 17:59.520
So another thing is that we can, for example, run some of the stuff that we want to create

17:59.520 --> 18:05.200
for BSD in Linux, so we can create this kind of VM as a dev environment that we can have access

18:05.200 --> 18:12.400
and we can build whatever we want there. And we can also have it, I don't know why this is, it shouldn't

18:12.400 --> 18:21.600
be twice. And yeah, we also did a small, very early evaluation. This is, this evaluation has been

18:23.600 --> 18:32.880
for the past two weeks and our results need to work a bit more and we try to retake them

18:32.880 --> 18:40.400
because we had some very weird behavior with both free BSD and the small BSD case.

18:40.400 --> 18:48.560
And we have an included, for example, here the case of small BSD. So I will explain why.

18:49.360 --> 18:53.680
So this is the start of time that we go to from service. So to do this kind of Ben Spark,

18:54.560 --> 19:03.760
we created a small server that keeps the timestamp of two things. So when we create the container,

19:04.400 --> 19:09.600
the server gets a timestamp and when the container boots and the application directly connects

19:09.680 --> 19:15.920
with a server and we get another timestamp. So we have the duration that, from the moment we

19:15.920 --> 19:22.400
had the first timestamp, we invoked the container and when the container was able to respond,

19:22.400 --> 19:28.080
they use the actual application. And as you can see, run C is a typical container,

19:28.800 --> 19:35.600
it can boot in around 400 milliseconds, the spec is just a small nuke machine, so just a very small

19:35.680 --> 19:42.080
device. The catacam we also can see, it exploded, it's a lot of overhead. And then you can see that

19:42.080 --> 19:50.800
you can see either booting Linux or a chemo. It's okay, but even if we go further and use

19:50.800 --> 19:56.720
run prem for example, it can boot even faster and also the case with free BSD. And I didn't

19:56.720 --> 20:01.440
include small BSD because we had some issues and we, for some reason, the user land was taking

20:01.520 --> 20:07.120
much more time to boot compared to the kernel. And most probably we have screwed up the configuration

20:07.120 --> 20:12.080
there, we have done something very wrong and we would be happy if we could work with, for example,

20:12.080 --> 20:18.400
with a mill later and we can discuss about it and I'm pretty sure that these numbers for a

20:18.400 --> 20:22.960
small BSD can easily be down there to what we see for a rampant for example.

20:23.920 --> 20:35.600
And we also did some kind of networking. So here is the packets per second that we sent.

20:35.600 --> 20:42.400
We have the hyper server and the hyper client. Of course, the battery is opposite, it's not

20:42.400 --> 20:48.880
a bit like that. Higher is the better and we measure the packets per second. This is not a good

20:48.880 --> 20:53.840
evaluation setup, we know it, we will try to do it better, we just wanted to get a glimpse of

20:53.840 --> 20:59.920
what happens. And as you can see, all the BSD versions are running much better than all the links ones.

21:04.160 --> 21:08.160
However, yeah, we had some issues with the TCP stack and stuff, but hopefully we will

21:09.280 --> 21:15.360
with the help of you, we can easily resolve that. So regarding the future things that we want to do,

21:15.920 --> 21:20.880
we want to actually this, the first bullet has already been done, I guess, from the people from

21:20.880 --> 21:26.880
small BSD of what we saw today, things can be way much faster. We want to have integration with

21:26.880 --> 21:32.400
OCI images from FreeBSD that exists and what we're going to do here is just get a FreeBSD OCI

21:32.400 --> 21:37.120
image and use it as the root of S that we're going to run over a FreeBSD kernel.

21:38.640 --> 21:43.040
And we want to find some ways to pass some specific information to the inside the VM,

21:43.680 --> 21:50.160
SWG would be an option there. We need the E need to set up the environment like the user,

21:50.160 --> 21:56.800
for example, the UID, GID, the IP, all this kind of stuff. We can also use Docker build Docker to

21:56.800 --> 22:04.880
actually build, like as I said, as I said, so before with Bunny, we also want to explore integration

22:04.880 --> 22:11.520
with volumes, so we can easily attach stuff in this kind of containers, in the VMs, and of course,

22:11.600 --> 22:15.280
it will be very happy, we have a done it in the past, we will be very happy to update the

22:15.280 --> 22:22.080
run brand with newer net BSD versions. I would like to note that this work is partially funded

22:22.080 --> 22:31.920
from some European projects. And to summarize, we know that BSD has some significant benefits for

22:31.920 --> 22:39.200
certain workloads, recent work in the BSD community has allowed us to run BSD to boot BSD in

22:39.200 --> 22:45.680
micro VMs in extremely fast way, and we can take advantage of that and actually package BSD applications

22:45.680 --> 22:51.520
and deploy them, like any other kind of container that exists, I mean, any other kind of linear

22:51.520 --> 22:59.760
component. And we can do that with Uran C and build it with Bunny. Everything is of course open source,

22:59.760 --> 23:06.240
as I said, this is a CNCF sandbox project. You can find everything in GID hub, you can join the

23:07.120 --> 23:15.360
Slack channel that we have. We have the documentation and the parts for BSD. We will hopefully

23:16.000 --> 23:24.640
with help with our BSD people will finally merge them later. And yeah, so that's all from my side.

23:26.000 --> 23:31.040
We have one and a half minutes, so if you have any questions, I'll be happy to answer to them.

23:36.240 --> 23:47.600
Okay, there is one question there.

23:48.560 --> 24:09.440
Well, not really because Uran C doesn't target right now the BSD. Oh, sorry, I had to repeat the

24:09.440 --> 24:17.840
question. So the question is that currently we can use a podman to run BSD container as a Linux container, right?

24:23.680 --> 24:29.920
On free BSD, yes. And yes, with Uran C you can do the thing that you can run a free BSD container as a Linux

24:30.080 --> 24:40.160
container. yes, it would not recommend use for that. Currently we don't have support for that.

24:40.160 --> 24:45.520
But yes, that would be a nice concert for us. And we would like to explore it using, for example,

24:45.520 --> 24:51.200
like a perspective, some books like Mac and it's not existing, BSD, B5 and stuff like that.

24:59.920 --> 25:06.360
They run Kubernetes, will they, theoretically, also run the, for example, in an existing

25:06.360 --> 25:08.360
point one or Docker set up?

25:08.360 --> 25:09.360
Yes.

25:09.360 --> 25:13.080
If the question is, if these containers can run in Kubernetes, Docker,

25:13.080 --> 25:16.600
podmon, yeah, the demo that I showed before, it's NerdStale.

25:16.600 --> 25:21.160
NerdStale is from the container, the community, that it's just

25:21.160 --> 25:24.160
tried to replicate everything that Docker does.

25:24.160 --> 25:27.360
So yeah, it's going to run both in Docker.

25:27.360 --> 25:31.880
We don't have, we have some issues with podmon, but we'll resolve them in this

25:31.880 --> 25:32.880
release.

25:32.880 --> 25:36.760
There is not, I mean, there is not an app dependency on the door, I see.

25:36.760 --> 25:39.840
There is a dependency on the urans, like this kind of images, they have to be

25:39.840 --> 25:41.160
around on top of uransy.

25:41.160 --> 25:52.000
Oh, so for example, in Docker, I'll show the work, it will go, it's just already, yeah, we have

25:52.000 --> 25:58.080
been in the other time, but I can, we can, you can just ask this, yeah, this question.

25:58.080 --> 25:59.280
Okay, thank you.

25:59.280 --> 26:00.280
Thank you.