WEBVTT 00:00.000 --> 00:12.360 So, the idea is to talk about another distributed SQL database, you go by DB, which has 00:12.360 --> 00:19.840 this popularity that to be posguest compatible, we will use the posguest code for the query 00:19.840 --> 00:25.160 layer, and this is where I will focus on in this presentation. 00:25.160 --> 00:32.160 But I will use the posguest code on top of a distributed storage that is completely different, 00:32.160 --> 00:35.560 and I don't know if that works. 00:35.560 --> 00:40.160 So, I am from Pashow, I have been working with a lot of databases. 00:40.160 --> 00:46.160 I am developer Advocate for UGABite for five years, and one more week I will be developer 00:46.160 --> 00:54.160 Advocate for MongoDB next week, but let's talk about distributed SQL and UGABite DB here. 00:54.160 --> 01:01.160 Basically, if you look at the architecture of a monolithic database, so one that is not distributed, 01:01.160 --> 01:09.160 you have a query layer, and basically the application connects to, I don't know if that works. 01:09.160 --> 01:10.160 No, not really. 01:10.160 --> 01:21.160 Okay, the application connects to the query layer that receives a SQL statement, pass it, look 01:21.160 --> 01:27.160 at the dictionary, build an execution plan, to finally have something to execute, to the 01:27.160 --> 01:34.160 storage, read and write operations, and those read and write operations go to the disk 01:34.160 --> 01:38.160 for shared memory shared by first. 01:38.160 --> 01:43.160 This is why it is monolithic, it is difficult to share the memory between multiple nodes. 01:43.160 --> 01:49.160 So, that is basically a rich work, even if it is monolithic, there are multiple layers visible in the code, 01:49.160 --> 01:52.160 otherwise it is open source, so you can look at the code. 01:52.160 --> 01:58.160 The idea of UGABite DB is to do the same, but distributed, running on multiple nodes, so that 01:58.160 --> 02:06.160 the application can connect to any node that you choose, load, load balance, have a query 02:06.160 --> 02:09.160 layer of there, understanding the query building the execution plan. 02:09.160 --> 02:14.160 The difference will be the storage layer, instead of reading and writing through the 02:14.160 --> 02:21.160 first to the local disk, it communicates through the network with the other nodes to get 02:21.160 --> 02:29.160 data from where it is and write data, where it should be, and it can be distributed. 02:29.160 --> 02:35.160 So, here we will focus on this blue path that looks like the Postgres 1, so you just 02:35.160 --> 02:41.160 take the code, you will need on multiple nodes, but you have a few things to change first, because 02:41.160 --> 02:46.160 you will store differently, you have network calls, and maybe different layers will need 02:46.160 --> 02:53.160 to be aware of it, optimizing an execution plan is different when it involves network calls, for 02:53.160 --> 02:54.160 example. 02:54.160 --> 02:58.160 So, this is what I will detail in this sort. 02:58.160 --> 03:07.160 Basically, the idea of UGABite DB is to use to be compatible as much as possible with Postgres, 03:07.160 --> 03:15.160 but with a storage that is similar to Google Spanner, Google ChromeDB, TIDB, YDB, those distributed 03:15.160 --> 03:17.160 SQL databases. 03:17.160 --> 03:30.160 Okay, I don't know why my clicker is not always working, but let's, maybe it's just my laptop. 03:30.160 --> 03:35.160 Okay, so how it looks like from the application point of view, you connect to a Postgres 03:35.160 --> 03:39.160 endpoint that really looks like Postgres, you can query the version. 03:39.160 --> 03:45.160 We just add the version of UGABite DB, but this is the Postgres compatible version for 03:45.160 --> 03:50.160 the moment it is a fork of Postgres 15. 03:50.160 --> 03:56.160 The project started with Postgres 10, and then Postgres 11 was managed, and then for a few 03:56.160 --> 04:04.160 years, it was telling with Postgres 11 compatibility, and last year we merged all the 04:04.160 --> 04:10.160 versions, all the changes to the query layer from Postgres up to Postgres 15. 04:10.160 --> 04:12.160 So that's the compatibility layer. 04:12.160 --> 04:17.160 The idea is that if you are in Postgres 15, you have the same features in UGABite DB, 04:17.160 --> 04:23.160 but with a distributed storage, the ability to scale out the scale horizontally. 04:23.160 --> 04:29.160 Okay, it's not the first database that you use the Postgres code, because the Postgres 04:29.160 --> 04:36.160 license allows it, it allows you to reuse the codes, and you can even use it for commercial 04:36.160 --> 04:41.160 purpose, and UGABite DB is fully open source, but there are also some fork of Postgres 04:41.160 --> 04:43.160 that are not open source. 04:43.160 --> 04:50.160 On the Postgres weekly, you have a list of Postgres derived databases, and they can be derived 04:50.160 --> 04:51.160 in a different way. 04:51.160 --> 04:54.160 First, Postgres is very extensible. 04:54.160 --> 05:01.160 There is this possibility to have some kind of plugins on Postgres extensions. 05:01.160 --> 05:08.160 So Postgres derived database, use that, for example, timescale DB, siteers. 05:08.160 --> 05:14.160 Do something differently on Postgres, but not forking the whole Postgres code, just adding 05:14.160 --> 05:18.160 a plugin that can work with it. 05:18.160 --> 05:24.160 So others are really forking it at a specific version, and then it has its own life. 05:24.160 --> 05:30.160 This is what GreenPull did, for example, without trying to follow the Postgres versions. 05:30.160 --> 05:37.160 And some folks are the same, except that they want to be able to merge from the version of Postgres, 05:37.160 --> 05:43.160 just to benefit from the new features that come to Postgres, but also just to make it easier for 05:43.160 --> 05:47.160 users who use Postgres when to be distributed or go distributed, one to move back to Postgres. 05:47.160 --> 05:56.160 And also some folks who even allow you to choose the Postgres compatibility version, for 05:56.160 --> 06:02.160 example in ORA, in ORA, you create a Postgres compatible database, and you even say, 06:02.160 --> 06:07.160 I want it to be a Postgres compatible with 16.1, for example. 06:07.160 --> 06:12.160 So different ways to do it. 06:13.160 --> 06:21.160 I will explain quickly why we don't just use an extension, because when you look at Postgres, 06:21.160 --> 06:27.160 you see a lot of grid extensibility, and that's not only the extensions. 06:27.160 --> 06:31.160 You have also the foreign data wrapper that basically can connect to any database. 06:31.160 --> 06:34.160 So if you have a very different storage, you can use that. 06:34.160 --> 06:37.160 So that's the top layer in the query. 06:37.160 --> 06:41.160 Where we find tables like you query local tables. 06:41.160 --> 06:49.160 There are also the table and index access methods, kind of an API between the query layer and the storage layer, 06:49.160 --> 06:50.160 and extensions. 06:50.160 --> 06:54.160 So it looks like you can do everything with simply an extension. 06:54.160 --> 06:56.160 But it also source some limitations. 06:56.160 --> 07:03.160 The problem is that if you want to be able to plug anything that the software becomes quite complex. 07:03.160 --> 07:12.160 So there are things that, even if you can plug very different storage, Postgres still expect it tables. 07:12.160 --> 07:17.160 And then you go by DB, we don't store the rows in it tables, we store them in the primary key, 07:17.160 --> 07:23.160 which changes a bit what you can do with the table access method. 07:23.160 --> 07:28.160 I will not go into all the details, but the limitations with extensions, 07:28.160 --> 07:34.160 and those will do something very different from Postgres with extension. 07:34.160 --> 07:36.160 They have to work around that. 07:36.160 --> 07:43.160 So maybe the idea is just to fork it, which makes it easier not trying to have hooks everywhere. 07:43.160 --> 07:51.160 In this diagram from the initial architecture, we tried to show where the code must be different. 07:51.160 --> 07:55.160 It's not a single place where you plug between the query layer and the storage. 07:55.160 --> 08:01.160 Because for example, in you got by DB, we have threads of connection, an internal connection tool. 08:01.160 --> 08:04.160 So there is some code that must be different there. 08:04.160 --> 08:08.160 The planner, the optimizer must be different. 08:08.160 --> 08:11.160 You need to take into account that it is distributed. 08:11.160 --> 08:14.160 You have network calls and of course the execution is different. 08:14.160 --> 08:19.160 Many different places and you don't want to have 100 of small extensions everywhere. 08:19.160 --> 08:24.160 Maybe it makes sense to just fork. 08:24.160 --> 08:32.160 But still try to do not change everywhere in the code to be able to merge with new version. 08:32.160 --> 08:41.160 I just listed many things that you got by DB must do differently than Postgres. 08:41.160 --> 08:46.160 So the user, we have seen that you just connect with PSQL. 08:46.160 --> 08:52.160 For example, the Postgres client or PGBENS, this is an example using PGBENS, just Postgres application. 08:52.160 --> 08:56.160 And most of the things are very similar. 08:56.160 --> 08:59.160 You describe the table, you see it as if you are in Postgres. 08:59.160 --> 09:03.160 Of course there are little things that must be a bit different. 09:03.160 --> 09:08.160 For example, in Postgres, you have by default B3 indexes. 09:08.160 --> 09:11.160 Here we store everything on LSM trees. 09:11.160 --> 09:15.160 I will not go into the detail in this presentation. 09:15.160 --> 09:18.160 But it fits better than the storage. 09:18.160 --> 09:22.160 It's distributed database. 09:22.160 --> 09:28.160 And in Postgres index, you have columns that are ascending or descending. 09:28.160 --> 09:33.160 And we have added something asht to apply automatically asht function. 09:33.160 --> 09:40.160 If you use a column only with equality predicate, the advantage of asht function is that we know the range of values. 09:40.160 --> 09:42.160 So it's much easier to distribute. 09:42.160 --> 09:49.160 Rather than arrange a shorting, where you may need to split differently depending on the data. 09:49.160 --> 09:54.160 So those little changes and those cannot be done with an extension. 09:54.160 --> 10:03.160 This is just one more line in the code to define B3 or LSM tree or to define the index option asht. 10:03.160 --> 10:07.160 But there is no oak in Postgres to add that. 10:07.160 --> 10:11.160 So different places where the code must be different. 10:11.160 --> 10:14.160 And that's the reason for a fork. 10:14.160 --> 10:21.160 So from the developer, if you run a query and run an explain to see the execution plan, 10:21.160 --> 10:30.160 you will see something very similar to Postgres because most of the code from the explain or for from the plan is the same. 10:30.160 --> 10:32.160 But with a little differences. 10:32.160 --> 10:37.160 For example, it's an index on the scan. 10:37.160 --> 10:43.160 So we still display it features if you know Postgres when you have an index on this plan. 10:43.160 --> 10:48.160 It may still go to the table to see the visibility of the rows in the execution plan. 10:48.160 --> 10:52.160 You see the number of it features from that. 10:52.160 --> 10:59.160 We show it. It's always zero here because we do MVCC differently than Postgres. 10:59.160 --> 11:02.160 And the index only scan reads only the index there. 11:02.160 --> 11:07.160 But by just for compatibility, it shows the same with zero which much. 11:07.160 --> 11:14.160 And there are also more metrics that are important in the distributed database. 11:14.160 --> 11:19.160 For example, one thing that is must be very different is the catalog in Postgres. 11:19.160 --> 11:26.160 You have a catalog that describes your table indexes, columns, whatever, also all the function code. 11:26.160 --> 11:35.160 This must be shared. In a distributed database, we want to share less things to short and not share everything. 11:35.160 --> 11:37.160 But the catalog must be shared. 11:37.160 --> 11:40.160 And then it's important when you have a query that runs. 11:40.160 --> 11:47.160 It's important to know how many reads requests must go to the catalog because the catalog is at a single place. 11:47.160 --> 11:52.160 You don't read it always when you execute queries to be scalable. 11:52.160 --> 11:58.160 If you pass a query that's to read the catalog and if you distribute your database across multiple regions, 11:58.160 --> 12:00.160 then it can have a latency. 12:00.160 --> 12:04.160 So more metrics needed in the execution plan. 12:04.160 --> 12:15.160 And basically in Postgres, you run, explain, analyze with the buffer option to see the number of pages of blocks that are accessed. 12:15.160 --> 12:20.160 This is really how you know that your query is expensive or not. 12:20.160 --> 12:28.160 Here, we don't read from buffers because we don't have this share buffer cache because we cannot share it to multiple nodes. 12:28.160 --> 12:36.160 But we have a dist option, distributed calls, and basically it shows the distributed request and write requests. 12:36.160 --> 12:47.160 For example, this index on this scan has sent two reads requests to get 1000 rows here. 12:47.160 --> 12:56.160 So this was just an example to show that it looks like Postgres, but some things must be different. 12:56.160 --> 13:01.160 But the less possible. 13:01.160 --> 13:05.160 Another example or something that must be done differently. 13:05.160 --> 13:07.160 We use the Postgres query planner. 13:07.160 --> 13:10.160 We use the same join methods. 13:10.160 --> 13:17.160 Nestled loop has joined merge join, but we implemented a new join method. 13:17.160 --> 13:26.160 Because the Nestled loop is really nice because it's the only join method where you can push down the join condition to the inner table. 13:26.160 --> 13:29.160 The others have to read both and then join. 13:29.160 --> 13:39.160 But the following in Postgres, it executes the inner access as many times as you have holes in the outer table. 13:39.160 --> 13:43.160 Which is okay in Postgres, because that's just accessing in memory. 13:43.160 --> 13:49.160 But here, we access through the network to a table that may be in a different place. 13:49.160 --> 13:51.160 And then you want to batch it. 13:51.160 --> 13:56.160 Of course, you need a network call if the table is on another node. 13:56.160 --> 14:01.160 But you don't want one network call for each row. 14:01.160 --> 14:03.160 You want just one that gets all rows. 14:03.160 --> 14:10.160 This is why there is a different join method that is displayed as you go by DB. 14:10.160 --> 14:12.160 Batch the Nestled loop there. 14:12.160 --> 14:14.160 And you see the difference in the plan. 14:14.160 --> 14:18.160 What it does in the plan, the push down of the join predicate. 14:18.160 --> 14:20.160 Put the values that are batched. 14:20.160 --> 14:22.160 Put it in an array. 14:22.160 --> 14:29.160 And of course, the access to the LSM trees is made to be efficient with an access with an array. 14:29.160 --> 14:31.160 And you can see that then you have less loops. 14:31.160 --> 14:37.160 In Postgres Nestled loop, you would have more than 1,000 loops here. 14:37.160 --> 14:40.160 You have only two loops to get all that. 14:40.160 --> 14:48.160 So those are the kind of things that must be done differently just because you are distributed. 14:48.160 --> 14:51.160 And when the same I performance. 14:52.160 --> 15:03.160 Okay, and then let's look a bit at more detail about what is different in the code. 15:03.160 --> 15:05.160 So you got by DB is fully open source. 15:05.160 --> 15:07.160 So you can look at the code. 15:07.160 --> 15:12.160 And if you want to see a lot of code that comes from Postgres. 15:12.160 --> 15:16.160 And a lot of codes that is different for the storage. 15:16.160 --> 15:20.160 Here we focus on the Postgres code where something was made differently. 15:20.160 --> 15:25.160 So if you look at the function is YB relation. 15:25.160 --> 15:28.160 Then this is where you have a special code. 15:28.160 --> 15:33.160 If the table is not a Postgres table, but you got by DB1. 15:33.160 --> 15:38.160 If you use temporary tables, this is fully the Postgres code because this runs locally. 15:38.160 --> 15:41.160 But when it is distributed, it must be different. 15:41.160 --> 15:47.160 And you can see that usually you have this if condition. 15:47.160 --> 15:50.160 But everything else is implemented in another file. 15:50.160 --> 15:52.160 That's something. 15:52.160 --> 15:57.160 If you want to merge from new versions, you don't want to add code a different place. 15:57.160 --> 15:59.160 You want the minimal places. 15:59.160 --> 16:03.160 And then put the specific codes in another place. 16:03.160 --> 16:09.160 A bit like extensions or plugins, but it has different hooks in the code there. 16:09.160 --> 16:11.160 So those are just examples. 16:11.160 --> 16:16.160 And if you look at the code, there are also interesting things in the comments. 16:16.160 --> 16:18.160 One very good thing in Postgres. 16:18.160 --> 16:24.160 The code is very clean, but the comments are very clean also and gives a lot of information. 16:24.160 --> 16:30.160 So we tried to explain what is done differently also in the comment. 16:30.160 --> 16:38.160 And when I was preparing this presentation, I was like, I'm really sure how much the code is different. 16:38.160 --> 16:42.160 And there are probably better tools and better ways. 16:42.160 --> 16:48.160 But I just got the code from the two versions. 16:48.160 --> 16:54.160 So we have a Postgres, we've already got a DB version that is compatible with Postgres 15. 16:54.160 --> 16:57.160 I compared the code with Postgres 15. 16:57.160 --> 17:03.160 And then I just did a diff using diff stats to see the different on me on the C file. 17:03.160 --> 17:08.160 And just looking at the files with most changes. 17:08.160 --> 17:13.160 And of course there are a lot, but not so many. 17:13.160 --> 17:20.160 And what was nice with this exercise was that just listing the files with the most changes. 17:20.160 --> 17:24.160 I was able to explain why there are a lot of changes here. 17:24.160 --> 17:27.160 And I will not go into all of them. 17:27.160 --> 17:30.160 I didn't like them here. 17:30.160 --> 17:33.160 But for example, of course table commands. 17:33.160 --> 17:36.160 We have some commands that are different. 17:36.160 --> 17:41.160 And some where we just need to flag that it's a different comment. 17:41.160 --> 17:43.160 So of course many changes there. 17:43.160 --> 17:46.160 You see the relation cache. 17:46.160 --> 17:49.160 Of course we don't use the same cache for the tables. 17:49.160 --> 17:52.160 Gucy does are the power matters. 17:52.160 --> 17:55.160 We have a lot more power matters to control what is done in the storage. 17:55.160 --> 17:57.160 What is distributed. 17:58.160 --> 18:00.160 So make sense. 18:00.160 --> 18:05.160 We have seen we use the same explain, but display different things. 18:05.160 --> 18:09.160 So of course there are some some difference there. 18:09.160 --> 18:13.160 If table sequences or so they must be done differently. 18:13.160 --> 18:16.160 Because the sequence has a single point of truth. 18:16.160 --> 18:22.160 So you must optimize things to be able to scale it horizontally. 18:22.160 --> 18:24.160 And let's take one more. 18:24.160 --> 18:29.160 Yeah, so we table spaces. 18:29.160 --> 18:33.160 In possible use table spaces to put files in different file system. 18:33.160 --> 18:35.160 Here we are distributed. 18:35.160 --> 18:37.160 We use the same table spaces. 18:37.160 --> 18:41.160 But to put different placement information. 18:41.160 --> 18:43.160 For example, you can distribute in multiple regions. 18:43.160 --> 18:47.160 But one table or one table partition must be in Europe. 18:47.160 --> 18:48.160 For example. 18:48.160 --> 18:50.160 And we use the possible feature. 18:50.160 --> 18:54.160 But instead of saying it's different file system. 18:54.160 --> 18:56.160 It's just a different cloud region. 18:56.160 --> 19:00.160 So making the table spaces cloud native. 19:00.160 --> 19:04.160 Another way to look at how it is different or not. 19:04.160 --> 19:05.160 You run it. 19:05.160 --> 19:12.160 And you do a perf record on it and look at the call stack. 19:12.160 --> 19:18.160 So this was an example where I just created a table and inserted some holes. 19:19.160 --> 19:21.160 And I looked at the call stack. 19:21.160 --> 19:26.160 So taking a flame graph from it and looking at what is different. 19:26.160 --> 19:31.160 So there are some functions that are actually the same as postgres. 19:31.160 --> 19:33.160 So from there maybe it's a bit too small. 19:33.160 --> 19:37.160 At the beginning of the touch it's the same as postgres. 19:37.160 --> 19:40.160 There are some you go by it function in the middle. 19:40.160 --> 19:44.160 For example, one thing that we do differently in red committed. 19:44.160 --> 19:49.160 When the conflict is detected, postgres just reach twice the roll. 19:49.160 --> 19:52.160 We try the whole query to be consistent. 19:52.160 --> 19:55.160 So at that point there is a function to do that. 19:55.160 --> 19:58.160 And when you go lower closer to the storage. 19:58.160 --> 20:03.160 This is where things are done differently with what we call PG gate. 20:03.160 --> 20:06.160 The gate from postgres to the UGA by the list of edge. 20:06.160 --> 20:12.160 And finally you will have network calls RPC to another layer. 20:12.160 --> 20:16.160 But the storage layer that is the distributed storage. 20:16.160 --> 20:18.160 And I will go fast from it. 20:18.160 --> 20:20.160 But it's multi-traded. 20:20.160 --> 20:24.160 So you have threads in the storage layer that receive the reads and writes. 20:24.160 --> 20:26.160 And we will process them. 20:26.160 --> 20:29.160 And here if you want to see where the code comes from. 20:29.160 --> 20:31.160 You don't want to reinvent. 20:31.160 --> 20:33.160 And completely new database. 20:33.160 --> 20:37.160 So in the storage layer it is distributed with raft. 20:37.160 --> 20:40.160 And we use the Apache Kudo for that. 20:40.160 --> 20:42.160 We can find some functions similar there. 20:42.160 --> 20:45.160 And basically it's short on each server. 20:45.160 --> 20:48.160 It's stored in a red DB LSM tree. 20:48.160 --> 20:51.160 And then you see some red DB function. 20:51.160 --> 20:53.160 This is also forked from red DB. 20:53.160 --> 20:56.160 But without trying to merge to new version. 20:56.160 --> 20:58.160 Because we don't need to be compatible with red DB. 20:58.160 --> 21:03.160 We're going to become compatible with postgres. 21:04.160 --> 21:07.160 OK. And you can see also about the raft application. 21:07.160 --> 21:11.160 And what is coming from Apache Kudo. 21:11.160 --> 21:18.160 Another way if you want to look at exactly what is happening between the query layer and the storage layer, 21:18.160 --> 21:20.160 you can enable a trace. 21:20.160 --> 21:26.160 So that's a good parameter that you can set to log the request to the storage. 21:26.160 --> 21:31.160 And I highlighted a bit of them. 21:31.160 --> 21:33.160 So the trace is a bit verbose. 21:33.160 --> 21:38.160 You see basically the wall API between the storage layer and the distributed 21:38.160 --> 21:42.160 between the query layer and the distributed storage. 21:42.160 --> 21:49.160 So for example, you will see that you have a red or a white operation. 21:49.160 --> 21:53.160 That it concerns something that comes from postgres, 21:53.160 --> 21:57.160 because you go by it as also Cassandra like API. 21:57.160 --> 22:04.160 And maybe something must be done differently in the storage to have the different behaviour. 22:04.160 --> 22:07.160 There is a catalog version. 22:07.160 --> 22:11.160 This is because your query relies on the version of the catalog. 22:11.160 --> 22:14.160 If the deal happens, the deal will increase the version. 22:14.160 --> 22:19.160 And then the query, the storage must know that it comes from different. 22:19.160 --> 22:21.160 And there are also some push downs. 22:21.160 --> 22:26.160 For example, if I select Canstar from a table with the architecture, 22:26.160 --> 22:32.160 we have seen, I will read millions of rows and count them with the postgres code. 22:32.160 --> 22:34.160 We don't want to do that. 22:34.160 --> 22:37.160 So we just push down to the storage count this. 22:37.160 --> 22:40.160 So the storage can also use some postgres code. 22:40.160 --> 22:42.160 For example, if you have a function that is pushed down, 22:42.160 --> 22:44.160 the storage will call the code also. 22:44.160 --> 22:46.160 It's different logical layers. 22:46.160 --> 22:52.160 But those layers are all present in all nodes here. 22:52.160 --> 22:55.160 And there is a partition key. 22:55.160 --> 22:58.160 This is the chart. 22:58.160 --> 23:00.160 I don't like the term partition. 23:00.160 --> 23:02.160 This holds to the Cassandra people. 23:02.160 --> 23:06.160 I don't like the term partition because we have also the postgres partitioning. 23:06.160 --> 23:11.160 Not used to distribute, because it's distributed at a lower level. 23:11.160 --> 23:16.160 But postgres partitioning can be used here, for example, to do geo partitioning. 23:16.160 --> 23:19.160 I have a country code for my customer table. 23:19.160 --> 23:23.160 And I want the customers in Europe to be stored in Europe. 23:23.160 --> 23:28.160 I use partitions and table space to map all that. 23:28.160 --> 23:31.160 Of course, there is a whole identification. 23:31.160 --> 23:38.160 It's not the same city idea of postgres, because the city idea of postgres is an offset in a heap table. 23:38.160 --> 23:40.160 Here we have another same tree. 23:40.160 --> 23:42.160 So it's a different one. 23:42.160 --> 23:46.160 And many operations there. 23:46.160 --> 23:53.160 OK, last slide where I just summarized why are and the benefit of it. 23:53.160 --> 23:59.160 So the reason for using the postgres code is just to have a better runtime compatibility. 23:59.160 --> 24:03.160 Now some other databases that build a compatibility, 24:03.160 --> 24:05.160 but with another language. 24:05.160 --> 24:07.160 For example, co-code DB uses go. 24:07.160 --> 24:09.160 And yeah, why not? 24:09.160 --> 24:12.160 Except that there are some little things that are done differently. 24:12.160 --> 24:16.160 The arithmetic on the data type may give you a different result. 24:16.160 --> 24:23.160 Some regular expressions are not processed the same in postgres rather than in a goal library. 24:23.160 --> 24:29.160 So here the idea was really to use the same version of postgres just by using the same code. 24:29.160 --> 24:33.160 Something that is already proven for years in postgres. 24:33.160 --> 24:37.160 And how doing that, a fork, rather than an extension, 24:37.160 --> 24:43.160 but a fork that is clean enough to be able to merge form new version. 24:43.160 --> 24:48.160 Here from 11 to 15 it took a bit long. 24:48.160 --> 24:58.160 The reason was that it was not just about merging feature, but being able to do a nonline upgrade from those two versions. 24:58.160 --> 25:04.160 So it involves modifications in the catalog while the application is still reading and writing. 25:04.160 --> 25:10.160 Now that the framework is there, future versions will be merged more quickly. 25:10.160 --> 25:14.160 And the benefit of it is having a lot of features. 25:14.160 --> 25:21.160 My boss, the city who was saying a postgres is the gift that gives you every day. 25:21.160 --> 25:24.160 One user is asking, can we do that? 25:24.160 --> 25:26.160 Yeah, you can do that on postgres. 25:26.160 --> 25:30.160 And many things are just done the same without any effort. 25:30.160 --> 25:42.160 So this is mostly 100% of the postgres code because this is only a query layer. 25:42.160 --> 25:45.160 So do not reinvent something for the query layer. 25:45.160 --> 25:49.160 Do the storage differently when needed. 25:49.160 --> 25:52.160 So that's mostly the thing. 25:53.160 --> 26:01.160 Better compatibility and also the C code with the root DB storage C++. 26:01.160 --> 26:03.160 This matches very well. 26:03.160 --> 26:08.160 That's my LinkedIn QR code and that's all I have for you. 26:08.160 --> 26:10.160 Do you have any questions for the two minutes? 26:10.160 --> 26:13.160 We have one time for one question. 26:13.160 --> 26:18.160 And so your hand goes first. 26:18.160 --> 26:22.160 Hello. 26:22.160 --> 26:26.160 Do you have idea how large the catalog can be? 26:26.160 --> 26:30.160 How large it becomes before it gets problematic? 26:30.160 --> 26:33.160 I will launch the catalog can be? 26:33.160 --> 26:37.160 Yeah, if I have 50,000 tables, it will be okay. 26:37.160 --> 26:40.160 So for the moment, the catalog is a single tablet. 26:40.160 --> 26:43.160 It is not distributed and that works. 26:43.160 --> 26:49.160 Because the access to the catalog is you don't need to access always to the catalog. 26:49.160 --> 26:51.160 Ideally, you pass a statement. 26:51.160 --> 26:55.160 You execute the plan, the prepared statement multiple times. 26:55.160 --> 26:58.160 So for the moment, the catalog has never been a bottleneck. 26:58.160 --> 27:03.160 It could be also distributed but for the moment, it's not a bottleneck for its size. 27:03.160 --> 27:07.160 It's something to take care if you are geo distributed. 27:07.160 --> 27:12.160 Because if you pass a lot of queries and you need to win the catalog from Europe to Australia, 27:12.160 --> 27:14.160 then you add some latency. 27:14.160 --> 27:17.160 But the size itself is usually okay. 27:17.160 --> 27:19.160 Thank you. 27:19.160 --> 27:21.160 Fortunately, we'll wrap it up. 27:21.160 --> 27:22.160 But Frank's going to be here. 27:22.160 --> 27:25.160 I know yesterday and he is another session coming up too. 27:25.160 --> 27:27.160 So, thank you, Frank. 27:33.160 --> 27:35.160 Thank you.