WEBVTT 00:00.000 --> 00:10.720 Hello everyone, my name is also, it's my first time on fuzzdom and first time presenting 00:10.720 --> 00:15.600 on a conference, so I'm a bit nervous, so don't judge me, and today I'm going to be talking 00:15.600 --> 00:19.880 about testing support for multiple syndication methods and clickhouse using combinatorics 00:19.880 --> 00:22.560 and behavior models. 00:22.560 --> 00:30.440 I work as a QA engineer at Altinity and also pursuing my degree, master degree in data science, 00:30.440 --> 00:34.320 LMU in Munich, and I got my Bachelor degree in Applied Mathematics and Computer Science 00:34.320 --> 00:36.520 from Moscow State University. 00:36.520 --> 00:41.760 A little bit about our company, we provide managed services and support for clickhouse, develop 00:41.760 --> 00:46.200 features for clickhouse, and run other open source projects that are listed here, and feel 00:46.200 --> 00:51.760 free to join us like community where we help each other discuss new topics and do such 00:51.760 --> 00:52.760 things. 00:52.760 --> 00:57.440 The feature that we're going to test today will be from the clickhouse, it's an open source 00:57.440 --> 01:03.000 called a database, design from real-time analytics, it's super fast, super efficient, and 01:03.000 --> 01:07.120 the feature is called multiple syndication methods, it's a recent addition to clickhouse 01:07.120 --> 01:13.080 by Altinity developer for better security and flexibility, and it's just simply allows 01:13.080 --> 01:18.440 a user to have multiple syndication methods either of the same type or different types. 01:18.440 --> 01:21.480 So let's get familiar with this feature. 01:21.480 --> 01:26.480 In clickhouse, in order to create user, we use create user query, or statement, and before 01:26.480 --> 01:32.040 we could only have one authentication method for user, and we use create user, some 01:32.040 --> 01:38.320 username, in this case it's name one identified with the authentication type, here it's 01:38.320 --> 01:43.440 plain text password, the bike loss, and the password itself, here it's stringed my password. 01:43.440 --> 01:47.320 But now you can specify multiple syndication methods, separated by Cuomo. 01:47.320 --> 01:51.960 Key was specified three methods, and user will be able to again with all this free password 01:51.960 --> 01:58.240 one, two, and three, and the internal presentation of the user also changed before authentication 01:58.240 --> 02:03.240 type was stored as inam eight, but now it's stored in the form of array of inam's, 02:03.240 --> 02:09.840 and the authentication parameters are also stored in the form of the array. 02:09.840 --> 02:15.640 In order to change user, we're using Altay user statement, and before we could change 02:15.640 --> 02:21.080 one authentication method with another, but now we can set one set of multiple syndication 02:21.080 --> 02:25.800 methods with another set of multiple authentication method, and by using Altay user 02:25.800 --> 02:30.120 identified with statement, we will override previous authentication methods with new ones, 02:30.120 --> 02:34.920 and we will only be able to plug into clickhouse server with new authentication methods. 02:34.920 --> 02:38.600 But what if we don't want override methods? 02:38.600 --> 02:44.680 Now clickhouse supports Altay user identified statement, which will basically add new authentication 02:45.000 --> 02:50.200 methods to the user. So by executing this query, user will be able to again with previous 02:50.200 --> 02:57.560 passwords and also with new password 6 and 7. Also, one more statement, Altay user 02:57.560 --> 03:03.480 data authentication method to new was introduced. It was inspired by my SQL discard all passwords, 03:03.480 --> 03:08.280 and basically do it all the passwords except the most recently added one. So the user will 03:08.280 --> 03:16.040 be able again with new password only. And also, Valetantilk was improved. Now you can have 03:16.040 --> 03:21.320 a separate expiration type for each authentication method, but if you will use Valetantilk 03:21.320 --> 03:27.720 was without identified with course, you will apply this of an expiration date to all authentication 03:27.720 --> 03:34.600 method that user has by the moment you execute this query. Now let's check the example to 03:34.600 --> 03:39.800 better and send this feature. So the first query, we create a user block identified with plain 03:39.800 --> 03:45.000 text password by one, we could pass it by two, plain text password by three. He we created a user 03:45.000 --> 03:50.280 with free authentication method and use a block can login to clickhouse password one to and free. 03:50.280 --> 03:55.320 After that, we change links, box authentication method to plain text password by four, 03:55.320 --> 04:01.160 we could pass it by five. Now user can use a block and I will login with password four and five 04:01.240 --> 04:07.240 because we overwritten his authentication method. Now we are adding to a new authentication method 04:07.240 --> 04:12.600 to block and he is able to login with password four, five, six and seven because we added six 04:12.600 --> 04:17.480 and seven. And the last query we were resating authentication methods to new. And in this case, 04:17.480 --> 04:23.240 Bob will be able to login to clickhouse only with password seven because big repository by seven 04:23.240 --> 04:28.680 is the last authentication method that was added to him. So imagine how hard it will be to test 04:29.080 --> 04:33.320 this feature by hand because you have so many permutation of this authentication method and also 04:34.200 --> 04:40.520 a lot of different actions could be done. So this is why we're using combinatorial testing to test 04:40.520 --> 04:46.840 this feature. So what is combinatorial testing? It's a way of testing software by checking different 04:46.840 --> 04:54.040 combinations of input parameters. Here is a small example, we have free variables ABC and each 04:54.120 --> 05:00.520 variable has two values. So in this case, we're going to get a test cases. But in real 05:00.520 --> 05:07.720 world examples, number of variables is not just a handful, it stands 100 or even 1000 and the 05:07.720 --> 05:14.280 range of values that this variable could take is also pretty huge. And this tree will grow exponentially 05:14.280 --> 05:21.400 and you end up with a lot of different combinations that you want to test. And we do that to catch 05:21.400 --> 05:28.680 any potential issues that interaction between variable could cause. So and this way we can 05:28.680 --> 05:39.080 expand our test coverage and not focus on the just like one scenario tests. So but like generating 05:39.080 --> 05:45.080 this combination is not enough. We also need to know the expected result for all these combinations. 05:45.080 --> 05:51.960 And for that, we're going to use Oracle's. And when we're talking about combination test 05:51.960 --> 05:58.200 Oracle problem arises, it's a problem when we cannot say the expected result for a given test case. 06:00.360 --> 06:05.400 And the Oracle is just some function model or just something that can tell us the expected 06:05.400 --> 06:09.880 result for a given test case. How are we going to use this Oracle? So we have a test case, 06:09.880 --> 06:14.360 we're going to put it on the software that we're testing and also we're going to ask the Oracle, 06:14.440 --> 06:20.280 what expected result for a given test case? Then we're going to compare this to how comes if they 06:20.280 --> 06:27.400 are the same, then we're saying that this past otherwise test failed. So and there are different types 06:27.400 --> 06:33.400 of Oracle's first one is automated Oracle's, it's just some algorithm that automatically can tell us 06:33.400 --> 06:39.640 the expected result for a given test case. Also there are human-based Oracle's which rely on 06:39.720 --> 06:45.640 human judgment or domain experts and there are hybrid approaches that combine automation with 06:45.640 --> 06:55.560 human expertise. Today we're going to do this on creating automated Oracle's or behavior 06:55.560 --> 07:02.200 models to automatically tell the expected result for generated combinations. 07:02.600 --> 07:10.600 So the first thing that we want to do to test our multiple syndication method teacher 07:10.600 --> 07:17.000 is to define the input parameters for the feature. And we're going to think of them in terms of 07:17.000 --> 07:22.040 actions because it's most intuitive approach and we're going to answer the question, what are 07:22.040 --> 07:28.280 the possible actions that user can perform with this feature? So basically a short list, 07:28.280 --> 07:32.200 we can create user with multiple syndication methods, we can change user syndication methods, 07:32.200 --> 07:37.400 we can add new syndication methods to user and we can object as syndication methods to the most recently 07:37.400 --> 07:43.240 added method. And of course we can just drop a user. But testing this action separately does not 07:43.240 --> 07:49.880 make much more sense because in real world people that use software do the sequence of actions 07:49.880 --> 07:54.920 and we want to make sure that this sequence works correctly and that's why we will test the 07:54.920 --> 08:05.240 sequence of these actions. Let's do some math and let's focus on one action create user 08:05.240 --> 08:12.280 and calculate how many combinations, how many ways do we have to create user. But first we're 08:12.280 --> 08:18.680 going to introduce some assumptions for the sake of simplicity. We will say that a user can 08:18.680 --> 08:24.920 have no more than two syndication methods assigned or change direction and the two syndication methods 08:24.920 --> 08:30.920 can only be selected from the following five types. In total in clickhouse we have 13 different 08:30.920 --> 08:37.400 authentication types but for simplicity we will be working with five. So in order to create user 08:37.400 --> 08:45.160 with one authentication method we obviously have five ways like choosing one of each of this and 08:45.400 --> 08:55.880 what happened? Okay everything worked and let's calculate how many ways do we have to create 08:55.880 --> 09:03.080 user with two authentication methods. So it's just be the combinations of two from five and we use 09:03.080 --> 09:08.680 this formula and we're getting 10 ways. So in total we have 15 ways to create a user with each 09:08.680 --> 09:13.480 user having no more than two authentication methods and this authentication method has selected 09:13.480 --> 09:20.120 from five available types. The same math will be applied to auto user identified with statement 09:20.120 --> 09:25.560 the same for auto identified with and for result authentication method to new we only have one 09:25.560 --> 09:32.040 way because this query has not any parameters. So in total we will have 15 plus 15 plus one, 09:32.040 --> 09:40.360 31 different ways of changing users authentication method with auto user statement. Now let's 09:40.680 --> 09:56.680 check from create and then we will see free changing user queries. So for the first query we will 09:56.680 --> 10:03.080 have 13 different ways to create user as we calculated before and for the second third and fourth 10:03.080 --> 10:09.720 we will have very one ways. Then we take the Cartesian project of all these numbers and we 10:09.720 --> 10:16.360 getting almost half a million of different sequences. That's a huge number and that will be so painful 10:16.360 --> 10:22.760 to do that by hand. So that's why we need some ore code to tell the expected result for all 10:22.760 --> 10:29.560 these combinations and why it decided to stick with four actions. So basically we have four states 10:30.040 --> 10:36.440 user was created, user was changed, new authentication method were added to user and authentication 10:36.440 --> 10:43.400 method were reset and by having the sequence of length four we will have all these states in the 10:43.400 --> 10:47.960 sequence and we will also check all the transition from one state to another because we're taking 10:47.960 --> 10:57.480 the Cartesian project. We take all 15 ways here or 31 here and we will check all the transition 10:57.560 --> 11:03.400 from one state to another and we want to have efficient coverage without unnecessary complexity 11:03.400 --> 11:14.520 because it's already too complex. So let's catch our test. So we have four queries, one Greek query, 11:14.520 --> 11:20.600 three auto queries and after executing each query we will check that the result of execution is 11:20.600 --> 11:26.280 correct and we will try to again took a quick house with authentication method seen in the 11:26.280 --> 11:30.760 all previous query. So in this case we will try to log in with every authentication method 11:30.760 --> 11:36.040 seen in Greek query after executing first auto we will try to log in with every authentication 11:36.040 --> 11:42.680 in Greek query in auto query and in the end we will try to log in with every authentication method 11:42.680 --> 11:52.440 that's seen in this sequence. Now let's check the so-called architecture of the test. So we have two 11:52.440 --> 11:57.320 constructors, query user query construct and auto user query constructed that just basically 11:57.320 --> 12:04.440 construct the different queries for us and we have two actions, actually three actions, 12:05.400 --> 12:11.080 query user action and auto user action. They call the constructors, they construct the return 12:11.080 --> 12:17.320 them, the queries, they execute the queries and after that they build the behavior state. 12:18.200 --> 12:25.240 This is just the object that stores query user name identification methods and other parameters 12:25.240 --> 12:30.440 but the most important the stores exit code and output message that we received from Greek house. 12:31.080 --> 12:37.000 After that this behavior state is appended to the global behavior and after that this action 12:37.000 --> 12:43.080 goes to the model and model computes the expected exit code and output message based on the current 12:43.080 --> 12:50.760 state and based on the behavior and then it compares the expected exit code and message with one 12:50.760 --> 12:56.840 stored in behavior state and if they are the same then test passed otherwise test failed. 12:56.840 --> 13:02.920 Auto user action does the same, it's create state, append state to the behavior, goes to the model, 13:02.920 --> 13:09.960 ask is everything okay and so on. For again to quick calls action we execute select current user 13:09.960 --> 13:16.440 which returns the user that is currently executing the query and we check that this user is the 13:16.440 --> 13:27.000 same as in state like if it tried to again with a valid user. And now let's check code but before 13:27.640 --> 13:33.640 I will take up a words about test flows, we're using this framework for writing test programs, 13:33.640 --> 13:38.440 it supports behavior, parallel combinatorial and requirement driven testing and if you want to know 13:38.440 --> 13:46.120 more information about test flows you can visit the website. So let's check the code. 13:46.120 --> 13:52.040 It's probably too small so I'm going to just explain what's going on here. It's a create user constructor. 13:52.040 --> 13:59.000 It's just a super basic Python class that has some attributes and methods and each method just 13:59.000 --> 14:05.000 generate a query step by step so first we create an instance of a class we have this query. 14:05.080 --> 14:11.480 After we add username, we're getting this query, we're adding identified basically adding 14:11.480 --> 14:17.720 just a string and then we call method to add authentication method to the query here we call 14:17.720 --> 14:22.360 set with no password and we're adding identified with no password and in the end we're getting 14:22.360 --> 14:33.320 perfectly valid click-out query we can execute. Here how model looks like that gives us an expected result 14:33.320 --> 14:40.840 for every test case. So the main method of the model is the expect method which goes through 14:40.840 --> 14:46.360 other expect method and each expect method is responsible to catch in the certain error, 14:46.360 --> 14:50.920 certain exception. So the first one catches that no password cannot be used with 14:50.920 --> 14:56.120 add identified statement because it does not make sense to have no password with another authentication 14:56.120 --> 15:02.760 method it's not secure and that's why it's forbidden by design. So if we're not using 15:02.760 --> 15:09.320 identified we will go to the next expect method which is another distinct user cannot be altered. 15:09.320 --> 15:15.240 So if user is not we don't have user we cannot change his authentication method or we cannot do 15:15.240 --> 15:21.320 anything with this user. So if we're not doing that we will go to the next expect method which is 15:21.320 --> 15:26.600 no password cannot be used with another authentication method. It's just similar to the first one but for 15:26.680 --> 15:32.520 example you cannot create the user. The query rate uses a user name identified with no password 15:32.520 --> 15:36.760 and plain text password because if you have user identified with no password you can login with 15:36.760 --> 15:46.360 every password you like. And so one takes for like all possible errors and if everything is 15:46.360 --> 15:53.480 okay we will go to the last method it's expected okay which expects that we will not see 15:53.480 --> 16:03.640 exceptions and execut will be zero. This is how test looks like first I could read to list 16:03.640 --> 16:09.320 ways to create user ways to alter so this list will have 15 ways as we calculated after 16:10.360 --> 16:18.040 adding all the change user method we will have 31 ways to change user here we take the 16:18.040 --> 16:23.320 Cartesian product and finishing up with half million combinations and then we execute 16:23.320 --> 16:30.520 combinations in parallel using a pull of threads to make the test faster and we call this scenario 16:30.520 --> 16:39.400 we just execute an actions and then tries to login after every action. Let's check how the model works. 16:40.360 --> 16:47.320 We first will check the simple expect method which checks that no password cannot be used with 16:47.320 --> 16:53.400 add keyword. We basically checking that the current state is altered then we checking that add 16:53.400 --> 16:59.320 identified is not empty and we checking that in the list of authentication method we don't have 16:59.320 --> 17:05.160 no password. If we do have no password in this list we will expect to see this syntax error 17:06.200 --> 17:13.320 from that click house and here are the examples of the bad query because we use no password with 17:13.320 --> 17:25.400 add identified and here is the example with a good query. Here's more complex expect method that 17:25.400 --> 17:35.320 goes through whole behavior and it checks that no password yeah no password cannot 17:35.320 --> 17:43.480 exist with other authentication method. Basically we go through all the behavior here and we create 17:43.480 --> 17:50.600 the authentication method that are currently available for a user we go state by state we create 17:50.600 --> 17:57.080 this authentication method list and in the end we check that no password is not in this authentication 17:57.080 --> 18:04.360 method list and if it is we expect the error otherwise everything is correct and we will go to the 18:04.360 --> 18:12.200 next expect method. So this again the examples this query is not valid because we try to create 18:12.200 --> 18:20.680 user with two methods and one of them is no password and here again the same example we try to 18:20.680 --> 18:28.920 change user and one of the authentication methods is no password the query is not valid. We find 18:28.920 --> 18:35.800 a couple of issues by doing this communal testing with behavior model and we also running our 18:35.800 --> 18:41.480 test every other day in the ACT so if everything is changing in new quick calls releases we will 18:42.280 --> 18:48.760 fix everything quickly reveal a test or review the feature itself and fix it and the full test code is 18:48.760 --> 18:56.280 available here is a bit harder because we not have this assumptions in the code so if you want to check 18:56.360 --> 19:10.440 can do that it's open source and thank you for your time I'm happy to answer any questions. 19:10.440 --> 19:27.000 So are you running all of the tests every possible combination you need to continue to 19:27.000 --> 19:33.240 defend or you try to run the subsets and hope that's enough to cover all of this workplaces. 19:34.120 --> 19:42.280 Without assumptions that we had millions of combinations and we tried to run as much as we can but 19:42.280 --> 19:48.680 we didn't run all of them because it's too much combinations and we'll take weeks or months to run 19:48.760 --> 20:00.360 all of them. No we don't use any code coverage so you do one done combinations. 20:05.880 --> 20:13.240 Yes sometimes we do just random and sometimes we use covering arrays so we say that we want to test 20:13.240 --> 20:18.440 like every parameter or certain amount of times and we use covering arrays which is also the part 20:18.440 --> 20:26.280 of test flows yes and we testing like each sequence is tested like each parameter to 20:26.840 --> 20:38.120 test the certain amount of times yes for your test that you use the value of them until 20:38.120 --> 20:50.200 that parameter that you use the one of the time passing in your test so that you can check 20:50.200 --> 20:57.560 the four of the combinations duration and order changes that's time to pass it in the future. 20:58.840 --> 21:03.560 Okay the question was I'm repeating questions so it will be available on the recording 21:04.280 --> 21:12.600 about a while until and if I just specified time in the past and in the future or I like 21:12.600 --> 21:22.120 wait in the test yes something like that. I have these problems without like changing time 21:22.120 --> 21:28.680 and the quick holes so I use just sleeves so I said some expiration date and I sleep for some 21:28.680 --> 21:35.480 time and they check that I cannot log in after that time and yes I just said like the previous 21:36.520 --> 21:43.640 time and the future time. Talking about the topics I don't use loops just setting time in the 21:43.720 --> 21:58.920 past and time in the future. The worst one I think it was the last one within cluster 21:59.960 --> 22:08.520 close so like I created user one one note with certain authentication methods and I was not able 22:08.520 --> 22:20.600 to log in with this user on other notes. Yes. How did you cover the authentication or 22:20.600 --> 22:26.440 their full load use date and check how you see where these types of the combination methods were used. 22:29.000 --> 22:36.440 The question was about if this technique can be used on another click hole features yes we use 22:36.440 --> 22:43.400 combinatorial testing when we test attach partition with different partition keys and also I use 22:43.400 --> 22:47.160 this feature to test gson webtock and authorization to click holes. 22:55.480 --> 23:06.280 Okay then you can obviously reach this peak after the session. Yeah. Thanks very much. Thank you.