WEBVTT

00:00.000 --> 00:11.960
Yeah, thanks for joining, thanks for letting me talk here.

00:11.960 --> 00:18.000
So, first little introduction, I am Ankit Durk, from Microsoft Defender for Linux, so this

00:18.000 --> 00:23.600
is an anti-malware solution on the Linux space, I am a consumer of my colleagues, so back

00:23.600 --> 00:29.360
and then Lakshmi and McNadir, they also contributed to this project, so I am going to talk

00:29.360 --> 00:35.280
about how do you reliably extract process metadata, specifically for short met process using

00:35.280 --> 00:40.920
VPA, which is actually quite important for security databases.

00:40.920 --> 00:47.780
So, before moving ahead, let me talk about the background, so endpoint detection and

00:47.780 --> 00:53.240
response are like cyber security tools, which are responsible for monitoring detection

00:53.320 --> 00:59.220
investigation and placing response to the threats which are actively happening on the

00:59.220 --> 01:06.900
points like servers, laptops, so these idea solutions, actively monitors and gives

01:06.900 --> 01:12.100
you a comprehensive visibility into the endpoint, the detection and investigation as some

01:12.100 --> 01:17.740
of the features, which are provided by the idea, so that any security expert can actually

01:17.740 --> 01:23.180
track out malicious behavior, which is like now leading to the malicious behavior.

01:23.180 --> 01:32.180
So, idea solutions and also they also generate automated alerts, which can be used by security

01:32.180 --> 01:37.660
expert to eventually react in real time for the tricks happening on the endpoint.

01:37.660 --> 01:43.860
These idea solutions are continuously monitoring your endpoints based on event logs or

01:43.860 --> 01:48.540
authentication attempts and tracks like process findings.

01:48.620 --> 01:55.020
This process time is actually played quite crucial in each of these idea features, because

01:55.020 --> 02:00.020
those actual provides the logical order of the lifecycle of the process, it gives you

02:00.020 --> 02:03.860
help see a full HP and scope of security bridge.

02:03.860 --> 02:08.860
The process time mindset created using process metadata like PID is creation time, the

02:08.860 --> 02:12.540
process name, paths and command line arguments.

02:12.540 --> 02:18.060
So, any kind of gaps that you can observe in process metadata can actually result in hampering

02:18.140 --> 02:22.100
these detection and investigation, which actually helps you out to map to the corresponding

02:22.100 --> 02:24.100
threads.

02:24.100 --> 02:32.820
So, typical process timelines are actually created by monitoring basic events, occurring

02:32.820 --> 02:38.300
on the endpoint like for an executive and these events are eventually used to extract

02:38.300 --> 02:40.020
a data out of it.

02:40.020 --> 02:45.460
The information collected within these events is actually collated, aggregated across

02:45.540 --> 02:51.020
events for a one particular process and then sent to the security portal for display.

02:51.020 --> 02:55.300
The other features like detection and investigation actually uses these timelines for

02:55.300 --> 02:59.140
further providing investigation with the process.

02:59.140 --> 03:04.740
So, it gives us that process time is actually playing a very crucial role here and then

03:04.740 --> 03:10.940
any kind of gaps in this timeline can result in lot of issues.

03:10.940 --> 03:14.020
So, let us move ahead.

03:14.020 --> 03:19.220
So, there also the existing methods like three basic, measured methods like profile system

03:19.220 --> 03:23.020
based extraction or it be in net link.

03:23.020 --> 03:25.220
Each has its own advantages and disadvantages.

03:25.220 --> 03:30.500
For example, the profile system gives you extensive information about process, but they

03:30.500 --> 03:34.780
are actually quite on the level for short-lip process, because whenever the process exists

03:34.780 --> 03:36.780
the data gets deleted there.

03:37.780 --> 03:41.660
Or in this system actually gives you reliable data for events that you are talking about

03:41.660 --> 03:46.260
it, but those are very noisy and can result in a lot of performance issues on this

03:46.260 --> 03:47.260
server.

03:47.260 --> 03:52.340
Net link is quite light fit, it gives you a very performing data, but this is more

03:52.340 --> 03:54.900
numerical based like PID or PPIID.

03:54.900 --> 03:58.860
So, still you have to rely on other methods to extract the complete information about

03:58.860 --> 04:00.860
the process.

04:00.860 --> 04:09.100
So, specifically like for short-lip process we are looking off the solutions and not to

04:09.100 --> 04:13.500
get the libel data as well as this were like performance for servers, because the server

04:13.500 --> 04:17.860
is the performance is the most important thing, anti-malware solution cannot eat up the

04:17.860 --> 04:19.700
resources on the head near the server.

04:19.700 --> 04:20.700
Yes, thanks.

04:20.700 --> 04:27.980
So, we were exploring like different methods, so VPS came out to like very, very libel there.

04:27.980 --> 04:34.980
So, by leveraging VPS to extract like process data information within VPS program like

04:34.980 --> 04:40.180
in the kernel space and by utilizing VPS which provide access to like kernel data such

04:40.180 --> 04:46.260
as like task structure, the information collected was very libel and complete for us, and

04:46.260 --> 04:51.820
now you can use the information to use the space for consumers, the method to be very

04:51.940 --> 04:57.940
effective for short-lip process and was as well as a performance.

04:57.940 --> 05:03.300
The flow diagram looks like they were like multiple probes where task to different system

05:03.300 --> 05:04.300
calls.

05:04.300 --> 05:11.500
So, there was a anti-probe and a exit probe attached to like XV system call and the process

05:11.500 --> 05:16.140
metadata was collected into it and now plays into a intermediate PPIHASHA.

05:16.620 --> 05:24.100
So, when exit probe arrives the other data which is collected as part of XV is combined

05:24.100 --> 05:27.180
and then send to the ring buffer in between.

05:27.180 --> 05:32.460
On user space side there was a thread which is constantly pulling over this ring buffer

05:32.460 --> 05:37.140
and it turns to the call back to the endary state of back to the business.

05:37.140 --> 05:46.940
Looking into like more implemented data share like different process meta is collected

05:46.940 --> 05:49.380
using like different ways.

05:49.380 --> 05:55.620
For example, for PID the BPS helpers gives you the directly the PID there using BPS

05:55.620 --> 06:00.100
get current PID to GIT and opportunity to get actually give you that.

06:00.100 --> 06:06.380
For other complex three it is like current working directly process paths and command lines

06:06.460 --> 06:11.060
I am going to talk about more the rate of it complexity and we expected out I want to

06:11.060 --> 06:12.060
take more about it.

06:12.060 --> 06:15.660
But based on that there were different probes attached to different system calls not to

06:15.660 --> 06:20.540
collect all the processes that are working.

06:20.540 --> 06:25.580
Before moving ahead let us talk about some basic building blocks that are used so as you

06:25.580 --> 06:30.580
know task structure is like a basic building blocks of learning kernel which has immense

06:30.580 --> 06:35.540
large motor information available for any process to manage and schedule processes including

06:35.540 --> 06:41.460
is process state doing a memory management for the process file descriptors and more.

06:41.460 --> 06:50.100
So, and by like utilizing BPS it gives you access to this data structure there BPS helper

06:50.100 --> 06:55.300
functions helps you extract the pointer to the task it is in process context and once you

06:55.300 --> 07:00.740
have the pointer to that task you can extract all possible information is related to process.

07:00.820 --> 07:07.460
So, we specifically use this task structure to extract parent PID the process paths current

07:07.460 --> 07:10.020
working directly and the month point entries.

07:10.020 --> 07:14.620
I will talk about what exactly is a denty there because the direct paths are not available

07:14.620 --> 07:17.300
as part of the program.

07:17.300 --> 07:24.180
So, denty structure is like basic fundamental part of virtual file system layer in Linux

07:24.180 --> 07:25.580
kernel.

07:25.580 --> 07:30.500
It organizes all of your files and direct trees into pre-like structure in file system where

07:30.580 --> 07:34.580
internal nodes are actually direct trees and leaf are actually the only files.

07:34.580 --> 07:41.060
So, simply doing a P traversal from the leaf to the root node and on the way collecting

07:41.060 --> 07:44.900
the data on the specific node will give you the full path.

07:44.900 --> 07:51.780
So, and there is little adjustment needed for mount point because if your root file system

07:51.780 --> 07:56.740
is mounted to a different point you need to conquer the adjuster path according to that.

07:57.220 --> 08:02.260
One particular problem that we see is if paths are very deep into the system it can result

08:02.260 --> 08:03.860
into little performance impact.

08:03.860 --> 08:07.860
So, you might have to put some thresholds there how deep you want to go voting.

08:12.420 --> 08:18.500
So, the very first task of like extracting the process is to extract like the corresponding denty

08:19.380 --> 08:24.980
so and the denty node is as is part of your task structure there.

08:25.060 --> 08:29.140
So, based on what kind of path you collecting if it is a process path or the current

08:29.140 --> 08:34.980
working directory the corresponding denty node can be expected from task structure there and once

08:34.980 --> 08:39.460
you have a denty node the next thing that you also need to collect is like collecting the

08:39.460 --> 08:44.260
denty for the mount point because if you have root mounts leaf file system is mounted somewhere

08:44.260 --> 08:47.540
else you need to adjust the paths according to that mount point.

08:47.540 --> 08:53.460
So, now here you once you all you have all dentries for different paths there now simply

08:53.540 --> 09:02.420
call this function out to like get the actual path. So, this function is simply doing a tree

09:02.420 --> 09:12.420
traversal from leaf of the node to the top of the root and every denty node has a field called

09:12.420 --> 09:17.700
D and a school name which you can use to actually get the name of the file and directory.

09:18.100 --> 09:24.660
If you look at this there is a follow up but that is like a static loop there in evpec program

09:25.380 --> 09:29.860
you cannot have like dynamic loop there it has to be the static burn and that is the reason

09:29.860 --> 09:35.140
that you have to place some static limits to it. Secondly as explained before due to performance

09:35.140 --> 09:40.340
reasons you also really want to go very deep into the paths and only collect those which are

09:40.340 --> 09:45.620
just meeting the sessions. So, both loops the first loop is collecting the paths and second one

09:45.700 --> 09:50.420
is collecting the mount point path there and combining these two paths will give you the final

09:50.420 --> 09:58.180
path therefore, corresponding process. This part of the code is giving you the command lines.

09:58.740 --> 10:04.820
So, based on the probe the task you are indexed in very. So, for exactly it is like the first

10:04.820 --> 10:10.100
index or exactly it is like the second index. So, first thing if you look at the top line

10:10.900 --> 10:16.500
it is reading you the kernel address for the command line array from kernel space to the user space.

10:17.300 --> 10:22.660
Once you have that address now you simply look into back collecting each arguments again you have

10:22.660 --> 10:27.460
to do the similar thing you first have to extract the kernel address for the argument for the index

10:28.260 --> 10:34.260
and then move into the user space and then finally we have the address corresponding the user space

10:34.260 --> 10:38.740
for the argument you read the corresponding arguments from that particular address in the user space

10:38.820 --> 10:44.420
and then place into the local before and send it into the space space. So, in this way

10:45.460 --> 10:49.540
your process paths kernel work in the rectities or command lines are collected and then

10:49.540 --> 10:52.580
in place to the user space use a space side by bearing before there.

10:56.980 --> 11:01.780
In order to understand like how effective is this method is we did some experiments.

11:02.740 --> 11:10.740
So, we then like different commands which were representing shortly process and the process

11:10.740 --> 11:14.980
method data is collected like for two different methods one is like BPF waste and reverse

11:14.980 --> 11:20.580
profile system and in fact we run these experiments for three different scenarios. So, first

11:20.580 --> 11:25.860
keeping your machine idle with no workload there. The second was the stress scenario where last

11:25.860 --> 11:30.580
number of post stress school queries for run in background the third one was like running open

11:30.580 --> 11:35.220
SSL compilation. So, we really want to simulate actual environment where how does the methods

11:35.220 --> 11:45.220
get effective and other process represent a system. So, results were great especially look at

11:45.220 --> 11:51.220
the part which is representing with the BPF results there almost like close to 100 percent

11:51.220 --> 11:56.900
the data was always complete with no gaps there and if you look at the similar for profile system

11:57.620 --> 12:03.220
it got significantly reduced for post stress scenarios like almost close to 30 percent there was

12:03.220 --> 12:07.860
actually no data.

12:12.740 --> 12:18.340
Of course, like there was some limitations it is you need to have quite an understanding of kernel

12:18.340 --> 12:23.940
if it for run like little bit difficult to write about it you need to have some knowledge about it.

12:24.900 --> 12:30.420
One more thing that BPF programs has to be thoroughly vetted by a verifier. So, you have to write

12:30.420 --> 12:36.420
a code where BPF verifier cannot fail and you kind of bug this in verifier can result in

12:36.420 --> 12:42.260
the security vulnerabilities and crashes. So, result in the kernel crashes PTF as far as I know

12:42.260 --> 12:47.780
it is not supported below 4.18 kernels. So, for those set of kernels you still have to rely on

12:47.780 --> 12:55.220
like other methods. I think in terms of performance this is very great in terms of profile system

12:55.220 --> 12:59.780
but still need to have some balance for like for example for these paths you still want to

12:59.780 --> 13:04.660
lie on like profile system based and don't really need to do all collection in data

13:04.660 --> 13:11.460
generally BPF spaces. But I think if you can balance out what needs to be collected in the

13:11.540 --> 13:15.220
BPF space versus the user space you can meet the problem of the class.

13:19.860 --> 13:21.700
Yeah, I am done. Yeah, thanks.

13:31.460 --> 13:37.060
Yeah. Do you check if the program is altered by a detector or what is it?

13:37.060 --> 13:43.700
Sorry. Do you check if BPF program is altered by a detector or what is it?

13:43.700 --> 13:49.300
Yeah. So, the question is very reach a very BPF program is required to occur nodes.

13:49.300 --> 13:54.260
I think you made like BPF maps because maps are exposed to there. So, I think this is a gap

13:54.260 --> 14:00.500
that we are receiving right now facing where maps can be tampered and probably we need a way to

14:00.500 --> 14:05.700
protect those out and we still exploring out the how to protect like users to not tampered with

14:05.780 --> 14:06.420
the BPF.

14:27.140 --> 14:34.020
Yeah. So, question is how did the work come up and was it part of the security initiative

14:34.980 --> 14:40.820
yeah. So, what was part of the security initiative? We were observing a lot of gaps on the

14:40.820 --> 14:45.620
idea timelines where researchers first say that because the majority of the data is missing,

14:46.180 --> 14:51.860
they cannot actually map to corresponding kits and they need to do something about it the way

14:51.860 --> 14:55.940
we are collecting a data there and that actually is altered into thinking where how can we

14:56.020 --> 14:59.620
close this gap out and and is altering to ending up using data there.

15:15.300 --> 15:23.780
How can we need to call ring buffer from the user space and you have that ability to see

15:24.740 --> 15:31.220
it is being overwhelmed. Okay. So, the question is how do you often like pull air

15:31.220 --> 15:38.020
ring buffer and do you really know that it is overplaying on that. So, I think we did a lot of

15:38.020 --> 15:42.420
experiments and find it like a certain number at how often we have to do put it but that is

15:42.420 --> 15:47.220
a problem that we are seeing that ring force can actually overflow and firstly you have to adjust

15:47.220 --> 15:52.180
the size of the ring before in between and then actually can meet you the requirements that you have.

15:52.260 --> 15:55.940
So, once you have the right size and you will be preferable and I do not think so there is

15:55.940 --> 16:01.940
anything from just this about adjusting the need to have it in buffer size as well as like

16:01.940 --> 16:07.780
pulling about it. So, we pull like every one second and that gives us the right but.

16:14.580 --> 16:19.700
Microsoft related recently it has published a project about running in the care for users.

16:22.740 --> 16:27.140
Experiments with them, the projects as well or maybe I will find them as well.

16:27.780 --> 16:36.180
So, I belong to the next team but I know people were exploring VPS part of windows but right now

16:36.180 --> 16:39.220
it is more like the POC phase they are still exploring there.

16:48.900 --> 16:50.180
Thanks. Thanks. Thank you.

16:52.180 --> 16:54.180
Thank you.