WEBVTT

00:00.000 --> 00:13.040
Okay, so let me welcome Alexander, who will be presenting dynamic board-blocking with

00:13.040 --> 00:16.320
web server access log analytics.

00:16.320 --> 00:26.840
Okay, thank you, so my name is Alexander, and instead of starting from who I am, I'm going

00:26.840 --> 00:29.160
to start from what we do.

00:29.160 --> 00:36.440
We develop a team-person W, it's a high performance hybrid of HTTP accelerator and follow directly

00:36.440 --> 00:39.240
embedded into the Linux TCP IP stack.

00:39.240 --> 00:45.680
So essentially we continue the TCP stack with protocols, TLS, and HTTP, everything works

00:45.680 --> 00:47.440
inside the Linux kernel.

00:47.440 --> 00:53.960
The main focus is for the product, high performance and security, we emphasize web security

00:53.960 --> 01:03.760
and complete a GDOS protection, including volumetric GDOS protection, and a 7GDOS protection.

01:03.760 --> 01:13.400
For volumetric GDOS protection, we provide also XEPA model to paste the XFW, which is 5W against volumetric

01:13.400 --> 01:14.400
GDOS.

01:14.400 --> 01:20.960
Thanks to internal implementation, there is no corpus, contacts, which has a lot of overheads

01:20.960 --> 01:21.960
eliminated.

01:21.960 --> 01:27.760
So typically, in paste of W, it performs traditional HTTP service in terms of high throughput

01:27.760 --> 01:29.920
and low latency.

01:29.920 --> 01:38.560
In terms of security, the tempest remains very deep HTTP parser, which uses about magnitude

01:38.560 --> 01:44.640
large number of states done usual, HTTP parsers, and this way we can catch a lot of injection

01:44.640 --> 01:51.920
attacks, right on HTTP parsing stage, so we don't need additional security fish and stage.

01:51.920 --> 01:59.160
There are a lot of great limits from the basic request to advanced HTTP to contact

01:59.160 --> 02:02.160
of wayms, great limits.

02:02.160 --> 02:10.360
JavaScript and cookie changes are out of the box, and this also light key value database,

02:10.360 --> 02:17.240
tempest of the B, which keeps web cache and fish loose.

02:17.480 --> 02:25.480
Thanks to the SPIP stack integration, tempest of W is coupled with net feeder, so you can

02:25.480 --> 02:31.000
write multiple level fish loose, like glucose, traffic from particle IP address, and

02:31.000 --> 02:35.880
having particle value of refered header.

02:35.880 --> 02:44.240
Today I'll be most focused on both and the how tempest of W handles the bots, and the

02:44.240 --> 02:46.640
different types of bots.

02:46.640 --> 02:54.960
The first one is D2, so that could be flat or so HP, with flat, it's very easy to detect,

02:54.960 --> 03:01.800
and block because they are categorized with high volumes of packets or requests.

03:01.800 --> 03:08.360
So HP is more difficult to catch because they are not so high connection rates or requests

03:08.360 --> 03:16.320
rates, but they target memory existing or CPU or socket, but still possible to catch them.

03:17.240 --> 03:23.000
The security scan has password carriers, they also are easy to detect because they usually

03:23.000 --> 03:27.560
imply high rates of HP error responses.

03:27.560 --> 03:35.280
And when we talk about bots, we usually mean specific types of the bots, that content

03:35.280 --> 03:43.320
scarpas, one of the most popular bots, and they are so widespread because of AI, AI needs

03:43.400 --> 03:53.760
a lot of data to learn and usually the data is copied to use the bots.

03:53.760 --> 03:59.600
The e-commerce, a music, or a inventory is copied to use the booking bots, for example,

03:59.600 --> 04:05.840
if some public service provides some appointments, the bot can get those appointments, and

04:05.840 --> 04:07.840
sell them for something.

04:07.840 --> 04:11.080
So there are a lot of types of the bots.

04:11.120 --> 04:14.320
Why is hard to protect against the bots?

04:14.320 --> 04:21.120
It's not a simple security security scan, it's a simple, but for example, for scarpas,

04:21.120 --> 04:30.760
they are cloud vendors which provide boxes for helping bots to buy past protection.

04:30.760 --> 04:37.640
They can provide thousands of IP addresses, for example, WN experiences such scarpas, which

04:37.680 --> 04:41.160
are considered to be dedosatex against them.

04:41.160 --> 04:45.760
They can solve capture JavaScript sharing, as they can pretend, the browser means that if you

04:45.760 --> 04:53.960
just send simple HP request to HTML to the food approaches, they also generate requests

04:53.960 --> 04:57.720
for CSS, JavaScript, and doesn't include resources.

04:57.720 --> 05:02.560
They provide personalization consistency, I'll talk about this later, and that when

05:02.560 --> 05:14.160
this plants provide human behavior, this is a screenshot from the web search, and basically

05:14.160 --> 05:21.280
you can Google any large vendor, could also hire a client, whatever you can name, like

05:21.280 --> 05:28.240
top 10, and there are plenty of companies providing services to buy past services, and

05:28.240 --> 05:33.600
there's one highlighted, and goes provide very nice technical article, how do they actually

05:33.600 --> 05:39.200
buy past Clouds where you can buy the other top 10 service, because that's not the end of

05:39.200 --> 05:46.480
story, and very active community of developing custom was protection, so even if not enough

05:46.480 --> 05:58.160
to use cloud services, we can buy past detection with custom development, to fight with

05:58.160 --> 06:03.360
bots, we need to identify clients that could be IP addresses, the most additional way was that

06:03.360 --> 06:09.760
could be a fingerprint, fingerprint basically computed on a different type of network,

06:09.760 --> 06:17.600
networks, and as a hash values, usually we do not use use of my controller data wide

06:17.600 --> 06:24.400
HP request, next we could specify the traffic it could be, packet race, request race, and

06:24.560 --> 06:32.080
also for some cases like didors, we need some trigger events to not be in protection mode always,

06:33.120 --> 06:38.960
traffic fingerprints just like human fingerprints, a huge number of metrics, which we can

06:38.960 --> 06:47.360
fingerprint, and generate a hash value, as an example is J3, the most popular fingerprint

06:47.440 --> 06:55.280
it's over, JDS parameters, and this md5 over several JLS parameters, you might see it as

06:56.240 --> 07:05.360
hexadempostrin in access lock, J4 is fast development of J3, it has free pass, the first part is

07:06.960 --> 07:11.600
that's the compression version of the data, and the second, second, the third part is a

07:11.600 --> 07:19.200
shade to 56, the third problem with cryptographic hash is that one of the main properties for cryptography

07:19.200 --> 07:25.360
hash is to well distribute the data, so if you have two points, which are different only in one

07:25.360 --> 07:31.840
bit, you get the fingerprint values which are different in those bit, however if we use fingerprints

07:31.840 --> 07:38.640
for some machine classification, we need some similarity metrics, so if you have very similar

07:38.720 --> 07:45.920
values, we need a very similar finger prints, and also inside the kernel cryptography API is very

07:45.920 --> 07:55.040
expensive, we need to store and restore a few state for this, just mention PGOF with also over

07:55.040 --> 08:02.000
all the network layers, but also tries to guess software versions, a video helped our own finger

08:02.000 --> 08:07.840
prints, which do not use cryptography to be fast and to be a machine learning friendly,

08:07.840 --> 08:13.920
this is a small example of how it is computed just like very similar to J3 or J4,

08:15.520 --> 08:24.640
next you see finger prints in access locks, like example on the slide, and usually HPC versus

08:24.640 --> 08:32.880
store access locks in files, and that's for much streams, and the performance problems with

08:33.040 --> 08:38.320
this way of storage, for much stream can slow, and the writing to file is also slow, so it's usually

08:38.320 --> 08:47.040
hard to write the large amount of data on the files, and even more hard to analyze such amount of

08:47.040 --> 08:53.360
data, so to cope with the problem, people usually use relatively sophisticated pipelines to deliver

08:53.360 --> 08:59.600
the locks to some analytic data bases, we end up with the click-out because of high ingestion

08:59.600 --> 09:08.560
performance and powerful analytics, to quickly send our access locks to click-outs, which

09:08.560 --> 09:17.040
have a tip of W, logger, which has perched PU defeat against bots in several cases, the first case

09:17.040 --> 09:25.280
is a common website, the inventory is typing these characters with high rates of request to

09:25.280 --> 09:32.160
cultivate, in that case we saw thousands of IP address and fake user agents, so we use the

09:32.160 --> 09:42.880
click-out square here, where we find all the top TLS fingerprints, which have a ratio between

09:43.840 --> 09:53.040
card URL access to our other user's more than 30%, and we see that the very top results,

09:53.040 --> 10:02.960
not only produced a lot of requests, but also it has much larger ratio between card requests and

10:02.960 --> 10:11.600
our others. The next one is security scanning, we run our website on WordPress, and it still

10:11.600 --> 10:20.560
opens an XML IPC endpoint by default, and this is a popular endpoint for $70 attacks, and if you

10:20.560 --> 10:25.840
would just create a database, how frequently it is, access is the only more frequently accessed

10:25.840 --> 10:31.120
the URL for us is index page, so it's very, a lot of traffic comes from a security scanning

10:31.120 --> 10:39.120
bots. Usually, the endpoint is exported using post request, and we've verified that we really

10:39.120 --> 10:45.200
deploy with successful response to such post request, and in that case, we were really

10:45.280 --> 10:54.400
vulnerable against LSDDOS. Next, we've related that what the boss having the F3 TLS fingerprints

10:56.000 --> 11:02.160
do on our website, and we found that the website at the endpoints, they also visit regular

11:03.680 --> 11:10.640
URLs, shown in green, but also such queries, I've used that with TLS fingerprints to fingerprints

11:11.360 --> 11:20.320
ending with zero bits, and the next significant for bits now TLS fingerprint and code LPN,

11:20.320 --> 11:27.680
it's a limitation of the client, which HP version is going to talk. Usually, a browser has played some

11:27.680 --> 11:33.200
value, but in this case, there is no place, this is quite unusual for normal clients, and we can be

11:33.280 --> 11:41.280
pretty sure that this TLS fingerprints are malicious. In that case, we didn't block the finger

11:41.280 --> 11:47.120
prints because some false positives are possible fingerprints can change, and we blocked the endpoint

11:47.120 --> 11:56.800
just on by your way on TPSW site. The last example is slow HP, it tries to spend as much

11:56.880 --> 12:03.680
several times as possible, so we can't just get the largest community of time because it will

12:03.680 --> 12:12.800
be just the most popular, serious fingerprint, and we cannot just block the top talkers,

12:12.800 --> 12:20.320
because it's also a request, we've produced the longest response time because it will be just

12:20.320 --> 12:28.720
unlikely, clients, but if we try to intersect the top talkers with clients who produce the

12:28.720 --> 12:37.760
longest request, we might see interesting results, and in this query, we also see zero LPN,

12:37.760 --> 12:43.440
and a very high value of which response time. The query is pretty large, it's not handy to

12:43.520 --> 12:54.320
write it by hands, so we developed a small, accessible Python German, which runs such queries

12:54.320 --> 13:03.840
on your behalf. This is such queries named detectors, and we, it can issue, after my

13:04.400 --> 13:13.200
blocking loss for IP said, for NAF type was for TPSW, based on the detector findings,

13:13.200 --> 13:22.720
and that's the detectors and defense mode can be switched on by some triggers like Z-squares or manually.

13:23.840 --> 13:29.840
This is also the detector validation logic, so we can configure a couple of detectors,

13:29.840 --> 13:39.120
or fingerprints by a request per second, or L-R-L response code is pure. Second, and defined

13:39.840 --> 13:46.800
the threshold, this is a score, Z-square value stem, meaning that when we reach 10 standard deviations,

13:46.800 --> 13:53.760
we go through defense mode. Next, the algorithm works like we first make the query for

13:53.840 --> 14:00.320
the request per second, under attack, and we repeat the query one hour ago. The

14:02.000 --> 14:07.920
defense against the bots, but in our experience, we saw that user agencies still

14:08.880 --> 14:15.120
reliable metrics to fit and that a lot of bots still exposing, not trying to

14:15.120 --> 14:20.800
pretend any real-life browsers. Our theory is thinking of prints to work in most of the cases,

14:20.800 --> 14:28.240
we saw in examples. However, the problem with Greece, with Firefox fingerprint protection,

14:28.800 --> 14:34.880
this is a random nose. We can leave with it with just an immunization of fingerprints.

14:35.920 --> 14:41.520
The good news is that even without an immunization, it means that normal browsers will produce

14:41.600 --> 14:51.440
a lot of fingerprints, but if bot doesn't try to impersonate, we will see a single and huge

14:51.440 --> 14:57.760
point for the same TWS fingerprint, and the bad news is that there are a lot of open source libraries

14:57.760 --> 15:04.560
through bypass and to provide implicitization for both developers. I want to conclude with

15:04.560 --> 15:13.920
two key ideas. First one is that it might be not wise to rely solely on large vendors because

15:13.920 --> 15:23.440
large vendors protect a lot of websites, some websites are pretty large and good targets for

15:23.440 --> 15:29.600
both developers. On all the crowd approaches, puts a gift and defaults into bypassing the large

15:29.600 --> 15:36.160
vendors and the custom solution might be more reliable in comparison with large vendor.

15:36.160 --> 15:41.840
With a custom solution, I don't mean just to repeat some of the strategies from the large vendors.

15:41.840 --> 15:48.880
Instead, this is a good community discussion for in Shopify community and the people

15:50.640 --> 15:57.200
developing bots protection logic right on the web resource database. So if you have a

15:57.200 --> 16:02.160
comments website, you have a database for your goods, for your clients and so on, and from the

16:02.160 --> 16:08.960
database, you have your secret weapon. It's a knowledge how your users interact with your

16:08.960 --> 16:15.760
websites. This unique knowledge and unique pattern for only your resources. And this way, you can integrate

16:17.680 --> 16:22.480
that get a simple Python program integrated with access stock with your

16:23.200 --> 16:27.600
database, production database, and develop very strong. We have analytics

16:28.640 --> 16:36.000
solution which will be resilient against bot attacks. That's all. We've shared and

16:36.000 --> 16:42.960
tempest available on GitHub, just in case the large knowledge base, what we can do on the

16:43.040 --> 16:48.640
limited site, that's all. Thank you, and maybe we have a presentation.

16:55.040 --> 16:56.320
We have time for a question, so

17:13.920 --> 17:19.280
Hi, I was wondering how you deal with the difference between like a de-dos and a

17:19.280 --> 17:22.960
hug of death, where you've got like thousands of legitimate clients who've all

17:23.760 --> 17:28.400
got a specific page from red or something. Sorry, difference between de-dos and

17:28.400 --> 17:35.760
the hug of death, hug of death, like when like somebody postalling to red it and so like you

17:35.760 --> 17:40.240
shouldn't get tens of thousands people all going to one page, but they're all really mean like a

17:40.240 --> 17:47.440
fresh call, right? You mean just like a legit user, just like a lot of traffic, right? But legit,

17:47.440 --> 17:58.400
normal users. For now, I'm a simple, much more to do with the attributes, but it's the

17:58.400 --> 18:06.560
possible to do with the heavenauises. One of the point is that during de-dos, you never see

18:06.640 --> 18:11.680
some sophisticated patterns like the first the recurs index page, then traverse to some

18:11.680 --> 18:22.400
good space and make some search and make a more or less normal, normal traffic. So the second

18:22.400 --> 18:30.000
point was to find a way to transition graphs. I believe the good way to differentiate a lot of real

18:30.080 --> 18:37.120
users from de-dos traffic, and de-dos traffic usually very simple just requesting the same

18:37.120 --> 18:59.920
resource. De-dos, I'm sorry. We still have time for a question, I think, but please be quiet,

18:59.920 --> 19:10.720
please don't go any yet. There is the real bit time for that. Any more questions? No questions?

19:11.440 --> 19:15.200
Okay, thank you. Thank you very much.