WEBVTT

00:00.000 --> 00:10.720
Hello everyone, my name is also, it's my first time on fuzzdom and first time presenting

00:10.720 --> 00:15.600
on a conference, so I'm a bit nervous, so don't judge me, and today I'm going to be talking

00:15.600 --> 00:19.880
about testing support for multiple syndication methods and clickhouse using combinatorics

00:19.880 --> 00:22.560
and behavior models.

00:22.560 --> 00:30.440
I work as a QA engineer at Altinity and also pursuing my degree, master degree in data science,

00:30.440 --> 00:34.320
LMU in Munich, and I got my Bachelor degree in Applied Mathematics and Computer Science

00:34.320 --> 00:36.520
from Moscow State University.

00:36.520 --> 00:41.760
A little bit about our company, we provide managed services and support for clickhouse, develop

00:41.760 --> 00:46.200
features for clickhouse, and run other open source projects that are listed here, and feel

00:46.200 --> 00:51.760
free to join us like community where we help each other discuss new topics and do such

00:51.760 --> 00:52.760
things.

00:52.760 --> 00:57.440
The feature that we're going to test today will be from the clickhouse, it's an open source

00:57.440 --> 01:03.000
called a database, design from real-time analytics, it's super fast, super efficient, and

01:03.000 --> 01:07.120
the feature is called multiple syndication methods, it's a recent addition to clickhouse

01:07.120 --> 01:13.080
by Altinity developer for better security and flexibility, and it's just simply allows

01:13.080 --> 01:18.440
a user to have multiple syndication methods either of the same type or different types.

01:18.440 --> 01:21.480
So let's get familiar with this feature.

01:21.480 --> 01:26.480
In clickhouse, in order to create user, we use create user query, or statement, and before

01:26.480 --> 01:32.040
we could only have one authentication method for user, and we use create user, some

01:32.040 --> 01:38.320
username, in this case it's name one identified with the authentication type, here it's

01:38.320 --> 01:43.440
plain text password, the bike loss, and the password itself, here it's stringed my password.

01:43.440 --> 01:47.320
But now you can specify multiple syndication methods, separated by Cuomo.

01:47.320 --> 01:51.960
Key was specified three methods, and user will be able to again with all this free password

01:51.960 --> 01:58.240
one, two, and three, and the internal presentation of the user also changed before authentication

01:58.240 --> 02:03.240
type was stored as inam eight, but now it's stored in the form of array of inam's,

02:03.240 --> 02:09.840
and the authentication parameters are also stored in the form of the array.

02:09.840 --> 02:15.640
In order to change user, we're using Altay user statement, and before we could change

02:15.640 --> 02:21.080
one authentication method with another, but now we can set one set of multiple syndication

02:21.080 --> 02:25.800
methods with another set of multiple authentication method, and by using Altay user

02:25.800 --> 02:30.120
identified with statement, we will override previous authentication methods with new ones,

02:30.120 --> 02:34.920
and we will only be able to plug into clickhouse server with new authentication methods.

02:34.920 --> 02:38.600
But what if we don't want override methods?

02:38.600 --> 02:44.680
Now clickhouse supports Altay user identified statement, which will basically add new authentication

02:45.000 --> 02:50.200
methods to the user. So by executing this query, user will be able to again with previous

02:50.200 --> 02:57.560
passwords and also with new password 6 and 7. Also, one more statement, Altay user

02:57.560 --> 03:03.480
data authentication method to new was introduced. It was inspired by my SQL discard all passwords,

03:03.480 --> 03:08.280
and basically do it all the passwords except the most recently added one. So the user will

03:08.280 --> 03:16.040
be able again with new password only. And also, Valetantilk was improved. Now you can have

03:16.040 --> 03:21.320
a separate expiration type for each authentication method, but if you will use Valetantilk

03:21.320 --> 03:27.720
was without identified with course, you will apply this of an expiration date to all authentication

03:27.720 --> 03:34.600
method that user has by the moment you execute this query. Now let's check the example to

03:34.600 --> 03:39.800
better and send this feature. So the first query, we create a user block identified with plain

03:39.800 --> 03:45.000
text password by one, we could pass it by two, plain text password by three. He we created a user

03:45.000 --> 03:50.280
with free authentication method and use a block can login to clickhouse password one to and free.

03:50.280 --> 03:55.320
After that, we change links, box authentication method to plain text password by four,

03:55.320 --> 04:01.160
we could pass it by five. Now user can use a block and I will login with password four and five

04:01.240 --> 04:07.240
because we overwritten his authentication method. Now we are adding to a new authentication method

04:07.240 --> 04:12.600
to block and he is able to login with password four, five, six and seven because we added six

04:12.600 --> 04:17.480
and seven. And the last query we were resating authentication methods to new. And in this case,

04:17.480 --> 04:23.240
Bob will be able to login to clickhouse only with password seven because big repository by seven

04:23.240 --> 04:28.680
is the last authentication method that was added to him. So imagine how hard it will be to test

04:29.080 --> 04:33.320
this feature by hand because you have so many permutation of this authentication method and also

04:34.200 --> 04:40.520
a lot of different actions could be done. So this is why we're using combinatorial testing to test

04:40.520 --> 04:46.840
this feature. So what is combinatorial testing? It's a way of testing software by checking different

04:46.840 --> 04:54.040
combinations of input parameters. Here is a small example, we have free variables ABC and each

04:54.120 --> 05:00.520
variable has two values. So in this case, we're going to get a test cases. But in real

05:00.520 --> 05:07.720
world examples, number of variables is not just a handful, it stands 100 or even 1000 and the

05:07.720 --> 05:14.280
range of values that this variable could take is also pretty huge. And this tree will grow exponentially

05:14.280 --> 05:21.400
and you end up with a lot of different combinations that you want to test. And we do that to catch

05:21.400 --> 05:28.680
any potential issues that interaction between variable could cause. So and this way we can

05:28.680 --> 05:39.080
expand our test coverage and not focus on the just like one scenario tests. So but like generating

05:39.080 --> 05:45.080
this combination is not enough. We also need to know the expected result for all these combinations.

05:45.080 --> 05:51.960
And for that, we're going to use Oracle's. And when we're talking about combination test

05:51.960 --> 05:58.200
Oracle problem arises, it's a problem when we cannot say the expected result for a given test case.

06:00.360 --> 06:05.400
And the Oracle is just some function model or just something that can tell us the expected

06:05.400 --> 06:09.880
result for a given test case. How are we going to use this Oracle? So we have a test case,

06:09.880 --> 06:14.360
we're going to put it on the software that we're testing and also we're going to ask the Oracle,

06:14.440 --> 06:20.280
what expected result for a given test case? Then we're going to compare this to how comes if they

06:20.280 --> 06:27.400
are the same, then we're saying that this past otherwise test failed. So and there are different types

06:27.400 --> 06:33.400
of Oracle's first one is automated Oracle's, it's just some algorithm that automatically can tell us

06:33.400 --> 06:39.640
the expected result for a given test case. Also there are human-based Oracle's which rely on

06:39.720 --> 06:45.640
human judgment or domain experts and there are hybrid approaches that combine automation with

06:45.640 --> 06:55.560
human expertise. Today we're going to do this on creating automated Oracle's or behavior

06:55.560 --> 07:02.200
models to automatically tell the expected result for generated combinations.

07:02.600 --> 07:10.600
So the first thing that we want to do to test our multiple syndication method teacher

07:10.600 --> 07:17.000
is to define the input parameters for the feature. And we're going to think of them in terms of

07:17.000 --> 07:22.040
actions because it's most intuitive approach and we're going to answer the question, what are

07:22.040 --> 07:28.280
the possible actions that user can perform with this feature? So basically a short list,

07:28.280 --> 07:32.200
we can create user with multiple syndication methods, we can change user syndication methods,

07:32.200 --> 07:37.400
we can add new syndication methods to user and we can object as syndication methods to the most recently

07:37.400 --> 07:43.240
added method. And of course we can just drop a user. But testing this action separately does not

07:43.240 --> 07:49.880
make much more sense because in real world people that use software do the sequence of actions

07:49.880 --> 07:54.920
and we want to make sure that this sequence works correctly and that's why we will test the

07:54.920 --> 08:05.240
sequence of these actions. Let's do some math and let's focus on one action create user

08:05.240 --> 08:12.280
and calculate how many combinations, how many ways do we have to create user. But first we're

08:12.280 --> 08:18.680
going to introduce some assumptions for the sake of simplicity. We will say that a user can

08:18.680 --> 08:24.920
have no more than two syndication methods assigned or change direction and the two syndication methods

08:24.920 --> 08:30.920
can only be selected from the following five types. In total in clickhouse we have 13 different

08:30.920 --> 08:37.400
authentication types but for simplicity we will be working with five. So in order to create user

08:37.400 --> 08:45.160
with one authentication method we obviously have five ways like choosing one of each of this and

08:45.400 --> 08:55.880
what happened? Okay everything worked and let's calculate how many ways do we have to create

08:55.880 --> 09:03.080
user with two authentication methods. So it's just be the combinations of two from five and we use

09:03.080 --> 09:08.680
this formula and we're getting 10 ways. So in total we have 15 ways to create a user with each

09:08.680 --> 09:13.480
user having no more than two authentication methods and this authentication method has selected

09:13.480 --> 09:20.120
from five available types. The same math will be applied to auto user identified with statement

09:20.120 --> 09:25.560
the same for auto identified with and for result authentication method to new we only have one

09:25.560 --> 09:32.040
way because this query has not any parameters. So in total we will have 15 plus 15 plus one,

09:32.040 --> 09:40.360
31 different ways of changing users authentication method with auto user statement. Now let's

09:40.680 --> 09:56.680
check from create and then we will see free changing user queries. So for the first query we will

09:56.680 --> 10:03.080
have 13 different ways to create user as we calculated before and for the second third and fourth

10:03.080 --> 10:09.720
we will have very one ways. Then we take the Cartesian project of all these numbers and we

10:09.720 --> 10:16.360
getting almost half a million of different sequences. That's a huge number and that will be so painful

10:16.360 --> 10:22.760
to do that by hand. So that's why we need some ore code to tell the expected result for all

10:22.760 --> 10:29.560
these combinations and why it decided to stick with four actions. So basically we have four states

10:30.040 --> 10:36.440
user was created, user was changed, new authentication method were added to user and authentication

10:36.440 --> 10:43.400
method were reset and by having the sequence of length four we will have all these states in the

10:43.400 --> 10:47.960
sequence and we will also check all the transition from one state to another because we're taking

10:47.960 --> 10:57.480
the Cartesian project. We take all 15 ways here or 31 here and we will check all the transition

10:57.560 --> 11:03.400
from one state to another and we want to have efficient coverage without unnecessary complexity

11:03.400 --> 11:14.520
because it's already too complex. So let's catch our test. So we have four queries, one Greek query,

11:14.520 --> 11:20.600
three auto queries and after executing each query we will check that the result of execution is

11:20.600 --> 11:26.280
correct and we will try to again took a quick house with authentication method seen in the

11:26.280 --> 11:30.760
all previous query. So in this case we will try to log in with every authentication method

11:30.760 --> 11:36.040
seen in Greek query after executing first auto we will try to log in with every authentication

11:36.040 --> 11:42.680
in Greek query in auto query and in the end we will try to log in with every authentication method

11:42.680 --> 11:52.440
that's seen in this sequence. Now let's check the so-called architecture of the test. So we have two

11:52.440 --> 11:57.320
constructors, query user query construct and auto user query constructed that just basically

11:57.320 --> 12:04.440
construct the different queries for us and we have two actions, actually three actions,

12:05.400 --> 12:11.080
query user action and auto user action. They call the constructors, they construct the return

12:11.080 --> 12:17.320
them, the queries, they execute the queries and after that they build the behavior state.

12:18.200 --> 12:25.240
This is just the object that stores query user name identification methods and other parameters

12:25.240 --> 12:30.440
but the most important the stores exit code and output message that we received from Greek house.

12:31.080 --> 12:37.000
After that this behavior state is appended to the global behavior and after that this action

12:37.000 --> 12:43.080
goes to the model and model computes the expected exit code and output message based on the current

12:43.080 --> 12:50.760
state and based on the behavior and then it compares the expected exit code and message with one

12:50.760 --> 12:56.840
stored in behavior state and if they are the same then test passed otherwise test failed.

12:56.840 --> 13:02.920
Auto user action does the same, it's create state, append state to the behavior, goes to the model,

13:02.920 --> 13:09.960
ask is everything okay and so on. For again to quick calls action we execute select current user

13:09.960 --> 13:16.440
which returns the user that is currently executing the query and we check that this user is the

13:16.440 --> 13:27.000
same as in state like if it tried to again with a valid user. And now let's check code but before

13:27.640 --> 13:33.640
I will take up a words about test flows, we're using this framework for writing test programs,

13:33.640 --> 13:38.440
it supports behavior, parallel combinatorial and requirement driven testing and if you want to know

13:38.440 --> 13:46.120
more information about test flows you can visit the website. So let's check the code.

13:46.120 --> 13:52.040
It's probably too small so I'm going to just explain what's going on here. It's a create user constructor.

13:52.040 --> 13:59.000
It's just a super basic Python class that has some attributes and methods and each method just

13:59.000 --> 14:05.000
generate a query step by step so first we create an instance of a class we have this query.

14:05.080 --> 14:11.480
After we add username, we're getting this query, we're adding identified basically adding

14:11.480 --> 14:17.720
just a string and then we call method to add authentication method to the query here we call

14:17.720 --> 14:22.360
set with no password and we're adding identified with no password and in the end we're getting

14:22.360 --> 14:33.320
perfectly valid click-out query we can execute. Here how model looks like that gives us an expected result

14:33.320 --> 14:40.840
for every test case. So the main method of the model is the expect method which goes through

14:40.840 --> 14:46.360
other expect method and each expect method is responsible to catch in the certain error,

14:46.360 --> 14:50.920
certain exception. So the first one catches that no password cannot be used with

14:50.920 --> 14:56.120
add identified statement because it does not make sense to have no password with another authentication

14:56.120 --> 15:02.760
method it's not secure and that's why it's forbidden by design. So if we're not using

15:02.760 --> 15:09.320
identified we will go to the next expect method which is another distinct user cannot be altered.

15:09.320 --> 15:15.240
So if user is not we don't have user we cannot change his authentication method or we cannot do

15:15.240 --> 15:21.320
anything with this user. So if we're not doing that we will go to the next expect method which is

15:21.320 --> 15:26.600
no password cannot be used with another authentication method. It's just similar to the first one but for

15:26.680 --> 15:32.520
example you cannot create the user. The query rate uses a user name identified with no password

15:32.520 --> 15:36.760
and plain text password because if you have user identified with no password you can login with

15:36.760 --> 15:46.360
every password you like. And so one takes for like all possible errors and if everything is

15:46.360 --> 15:53.480
okay we will go to the last method it's expected okay which expects that we will not see

15:53.480 --> 16:03.640
exceptions and execut will be zero. This is how test looks like first I could read to list

16:03.640 --> 16:09.320
ways to create user ways to alter so this list will have 15 ways as we calculated after

16:10.360 --> 16:18.040
adding all the change user method we will have 31 ways to change user here we take the

16:18.040 --> 16:23.320
Cartesian product and finishing up with half million combinations and then we execute

16:23.320 --> 16:30.520
combinations in parallel using a pull of threads to make the test faster and we call this scenario

16:30.520 --> 16:39.400
we just execute an actions and then tries to login after every action. Let's check how the model works.

16:40.360 --> 16:47.320
We first will check the simple expect method which checks that no password cannot be used with

16:47.320 --> 16:53.400
add keyword. We basically checking that the current state is altered then we checking that add

16:53.400 --> 16:59.320
identified is not empty and we checking that in the list of authentication method we don't have

16:59.320 --> 17:05.160
no password. If we do have no password in this list we will expect to see this syntax error

17:06.200 --> 17:13.320
from that click house and here are the examples of the bad query because we use no password with

17:13.320 --> 17:25.400
add identified and here is the example with a good query. Here's more complex expect method that

17:25.400 --> 17:35.320
goes through whole behavior and it checks that no password yeah no password cannot

17:35.320 --> 17:43.480
exist with other authentication method. Basically we go through all the behavior here and we create

17:43.480 --> 17:50.600
the authentication method that are currently available for a user we go state by state we create

17:50.600 --> 17:57.080
this authentication method list and in the end we check that no password is not in this authentication

17:57.080 --> 18:04.360
method list and if it is we expect the error otherwise everything is correct and we will go to the

18:04.360 --> 18:12.200
next expect method. So this again the examples this query is not valid because we try to create

18:12.200 --> 18:20.680
user with two methods and one of them is no password and here again the same example we try to

18:20.680 --> 18:28.920
change user and one of the authentication methods is no password the query is not valid. We find

18:28.920 --> 18:35.800
a couple of issues by doing this communal testing with behavior model and we also running our

18:35.800 --> 18:41.480
test every other day in the ACT so if everything is changing in new quick calls releases we will

18:42.280 --> 18:48.760
fix everything quickly reveal a test or review the feature itself and fix it and the full test code is

18:48.760 --> 18:56.280
available here is a bit harder because we not have this assumptions in the code so if you want to check

18:56.360 --> 19:10.440
can do that it's open source and thank you for your time I'm happy to answer any questions.

19:10.440 --> 19:27.000
So are you running all of the tests every possible combination you need to continue to

19:27.000 --> 19:33.240
defend or you try to run the subsets and hope that's enough to cover all of this workplaces.

19:34.120 --> 19:42.280
Without assumptions that we had millions of combinations and we tried to run as much as we can but

19:42.280 --> 19:48.680
we didn't run all of them because it's too much combinations and we'll take weeks or months to run

19:48.760 --> 20:00.360
all of them. No we don't use any code coverage so you do one done combinations.

20:05.880 --> 20:13.240
Yes sometimes we do just random and sometimes we use covering arrays so we say that we want to test

20:13.240 --> 20:18.440
like every parameter or certain amount of times and we use covering arrays which is also the part

20:18.440 --> 20:26.280
of test flows yes and we testing like each sequence is tested like each parameter to

20:26.840 --> 20:38.120
test the certain amount of times yes for your test that you use the value of them until

20:38.120 --> 20:50.200
that parameter that you use the one of the time passing in your test so that you can check

20:50.200 --> 20:57.560
the four of the combinations duration and order changes that's time to pass it in the future.

20:58.840 --> 21:03.560
Okay the question was I'm repeating questions so it will be available on the recording

21:04.280 --> 21:12.600
about a while until and if I just specified time in the past and in the future or I like

21:12.600 --> 21:22.120
wait in the test yes something like that. I have these problems without like changing time

21:22.120 --> 21:28.680
and the quick holes so I use just sleeves so I said some expiration date and I sleep for some

21:28.680 --> 21:35.480
time and they check that I cannot log in after that time and yes I just said like the previous

21:36.520 --> 21:43.640
time and the future time. Talking about the topics I don't use loops just setting time in the

21:43.720 --> 21:58.920
past and time in the future. The worst one I think it was the last one within cluster

21:59.960 --> 22:08.520
close so like I created user one one note with certain authentication methods and I was not able

22:08.520 --> 22:20.600
to log in with this user on other notes. Yes. How did you cover the authentication or

22:20.600 --> 22:26.440
their full load use date and check how you see where these types of the combination methods were used.

22:29.000 --> 22:36.440
The question was about if this technique can be used on another click hole features yes we use

22:36.440 --> 22:43.400
combinatorial testing when we test attach partition with different partition keys and also I use

22:43.400 --> 22:47.160
this feature to test gson webtock and authorization to click holes.

22:55.480 --> 23:06.280
Okay then you can obviously reach this peak after the session. Yeah. Thanks very much. Thank you.