WEBVTT 00:00.000 --> 00:12.160 Now, so, I mean, the next session is, I mean, sort of continuing on talking about both post-gressive 00:12.160 --> 00:18.040 and my sequel, and we're going to talk about compatibility, starting with Jimmy, talking 00:18.040 --> 00:23.200 about post-gress, and then we'll turn things over to Daniel to talk about my sequel families. 00:24.200 --> 00:39.200 Can anyone hear me? 00:39.200 --> 00:40.200 Cool. 00:40.200 --> 00:41.200 Okay. 00:41.200 --> 00:52.200 So, thanks for coming to our talk with Daniel, it was a great collaborating with him, 00:53.200 --> 00:57.200 because we're going to be going over some important topics. 00:57.200 --> 01:00.200 I think compatibility is really important. 01:00.200 --> 01:06.200 It's a word that's thrown around semi-randomly these days. 01:06.200 --> 01:12.200 Everything is compatible with everything else, and it's usually not decided on technical merit. 01:12.200 --> 01:15.200 It's usually decided by marketing people. 01:15.200 --> 01:24.200 So, this being a technical chiefly, a technical community conference, we'd like to set a few things 01:24.200 --> 01:27.200 straight and just offer our opinion on this. 01:27.200 --> 01:30.200 So, my name is Jimmy Angelacos. 01:30.200 --> 01:32.200 I'm based in Edinburgh. 01:32.200 --> 01:34.200 I work for PG Edge. 01:34.200 --> 01:39.200 I've been doing things around open source for a long time. 01:40.200 --> 01:43.200 I contribute to the PostgresQL project. 01:43.200 --> 01:48.200 I've written a book called PostgresQL Mistakes and how to avoid them, and I'm the maintainer of 01:48.200 --> 01:54.200 Postgres extension called PG Statiles. 01:54.200 --> 01:56.200 Yeah, so I'm Daniel. 01:56.200 --> 02:01.200 I'm based in the Netherlands, and I'm actually working for PINCAP on TIDV. 02:01.200 --> 02:06.200 TIDV is a MySQL compatible database, so I really care about compatibility. 02:06.200 --> 02:10.200 And I've also been working with companies that run MySQL. 02:10.200 --> 02:12.200 I've been running MySQL for a long time. 02:12.200 --> 02:17.200 So, I know where compatibility matters and where it breaks down. 02:17.200 --> 02:22.200 I'm also working on the Wireshark projects on the MySQL protocol decoder. 02:22.200 --> 02:29.200 So, also there, it's really important to be compatible and figure out what all the features 02:29.200 --> 02:35.200 are that we are actually using on the wire and also to see how different connectors 02:35.200 --> 02:41.200 and the server implementations all behave and talk to each other. 02:41.200 --> 02:50.200 I was also awarded to MySQL Rockstar Award in 2023, so I'm well-verk and nice and old MySQL ecosystem. 02:50.200 --> 03:06.200 So, let's go over the current landscape, right? 03:06.200 --> 03:15.200 I mentioned briefly that compatibility is let's say point of contention. 03:15.200 --> 03:24.200 There is a lot of Postgres and MySQL compatible databases out there. 03:24.200 --> 03:30.200 The absolute success of open source databases has made this possible. 03:30.200 --> 03:44.200 Postgres and MySQL are always the top two databases, and this has caused people to want to claim compatibility with them. 03:44.200 --> 03:49.200 I think this causes a bit of user confusion. 03:49.200 --> 03:55.200 When you see things like Amazon DCQL saying it's Postgres compatible. 03:55.200 --> 04:04.200 When it's a non-relational database that doesn't even support sequences, triggers, views. 04:04.200 --> 04:06.200 So, how is it compatible? 04:06.200 --> 04:12.200 I think it's reasonable for users to be confused when marketing makes these decisions. 04:12.200 --> 04:21.200 And I think this is also a source of brand dilution when something that doesn't work as well as Postgres says it's Postgres compatible. 04:21.200 --> 04:27.200 I think that detracts from the actual merits of the Postgres scale project. 04:27.200 --> 04:31.200 compatibility, of course, is not yes, no, right? 04:31.200 --> 04:33.200 It's not true false, it's not an absolute. 04:33.200 --> 04:40.200 You can be compatible with some aspects of a database and not compatible with other aspects. 04:40.200 --> 04:53.200 Now, this is why we don't like 90% compatible or 99% compatible because the one feature that you want may not be there. 04:53.200 --> 05:05.200 So, even within the same database, you know that different versions of the same database platform might not be 100% compatible. 05:05.200 --> 05:12.200 And as matter of fact, Postgres says clearly that versions are not compatible between them, right? 05:12.200 --> 05:18.200 So, if you have a Postgres 11 and a Postgres 12, they cannot talk to each other. 05:18.200 --> 05:20.200 So, in that sense, they're not compatible. 05:20.200 --> 05:23.200 But also, we're talking about features. 05:23.200 --> 05:26.200 Features are getting added and removed all the time. 05:26.200 --> 05:33.200 And this is what makes versions not exactly compatible with one another. 05:33.200 --> 05:37.200 But what does Postgres or my SQL compatible actually mean? 05:37.200 --> 05:45.200 So, from a user perspective, it's like, will my application be able to work with this database, right? 05:45.200 --> 05:53.200 I bought a piece of software that says it works with Postgres, will it work with this Postgres compatible database? 05:53.200 --> 05:55.200 What database drivers do I need? 05:55.200 --> 06:02.200 What APIs can I use to connect to this database? 06:02.200 --> 06:09.200 Very importantly, can I replicate out of this database to some other system? 06:09.200 --> 06:12.200 And can I replicate from some other system into this database? 06:12.200 --> 06:19.200 So, if it's Postgres compatible, is it replication compatible with Postgres? 06:19.200 --> 06:27.200 And if I use a cloud offering, like, let's say from a hyperscaler, 06:27.200 --> 06:32.200 I buy my SQL or Postgres compatible service, right? 06:32.200 --> 06:34.200 Is that the same experience? 06:34.200 --> 06:37.200 Is it the same as installing it? 06:37.200 --> 06:42.200 Myself on, let's say, bare metal, will it work exactly in the same way? 06:42.200 --> 06:56.200 And of course, can I use the existing tools and drivers that I am used to using with this database flavor for backups, for replication, for any sort of thing? 06:56.200 --> 07:00.200 So, there are two perspectives. 07:00.200 --> 07:03.200 I'm not going to talk about strict Postgres compatibility. 07:03.200 --> 07:12.200 I'm going to talk about more the project governance aspect of this thing. 07:12.200 --> 07:21.200 So, as the Postgres scale project on the Postgres scale community, we got together to discuss this issue, 07:21.200 --> 07:29.200 because we started noticing that people were saying, we're compatible with your thing. 07:29.200 --> 07:35.200 And afterwards Daniel is going to talk about the implementation side. 07:35.200 --> 07:44.200 So, how they actually made sure that TIDB was compatible with my SQL. 07:44.200 --> 07:55.200 Now, at PGCon for you, the biggest Postgres conference in the world, which took place in Riga in Latvia last year. 07:55.200 --> 08:00.200 We had a session that was called the Stublishing the Postgres scale standard, what's Postgres compatible? 08:00.200 --> 08:02.200 It was a working session. 08:02.200 --> 08:09.200 We gathered opinions from the community with discussed things, and we came to some consensus before we left there. 08:09.200 --> 08:16.200 So, the goal was to find a practical framework of criteria and tests for Postgres compatibility. 08:16.200 --> 08:24.200 And people have said that Postgres is becoming so ubiquitous that it's becoming the new Linux, 08:24.200 --> 08:32.200 because everything that's running on a database in production is most likely running on Postgres these days. 08:32.200 --> 08:37.200 So, the word standard, right? 08:37.200 --> 08:45.200 You all know that instead of having a pass or fail, like this is Postgres compatible, this is not Postgres compatible, 08:45.200 --> 08:53.200 we will have a checklist of features, and they, of course, will have weights and importance, 08:53.200 --> 09:01.200 like how important this is to be replication compatible versus have the exact same syntax for this function. 09:02.200 --> 09:11.200 We also have the realistic use case that some Postgres services are managed, right? 09:11.200 --> 09:16.200 So, let's say on a hyperscaler, you wouldn't get super user access. 09:16.200 --> 09:22.200 You would get some user that they created that has some of the privileges, but wouldn't be a super user. 09:22.200 --> 09:27.200 And, of course, you can't read the files from the disk. 09:27.200 --> 09:33.200 You don't have direct access, you only have access to a database connection. 09:33.200 --> 09:37.200 And also, we mentioned silent failures. 09:37.200 --> 09:47.200 It's like, okay, you accept the command, create index, but does it actually create an index when you type it in, right? 09:47.200 --> 09:54.200 So, everything that's in the Postgres documentation part two, SQL commands, I think that's all required, 09:54.200 --> 09:57.200 even rarely used features. 09:57.200 --> 10:03.200 There is implicit behavior in Postgres that you must understand, to understand compatibility. 10:03.200 --> 10:08.200 Things like insert with a select statement that doesn't order by. 10:08.200 --> 10:09.200 What does that do? 10:09.200 --> 10:15.200 Yes, it guarantees that the order of the select will be exactly as you specify or they're by, 10:15.200 --> 10:21.200 but it doesn't guarantee that the data will get written to disk in that order, right? 10:21.200 --> 10:29.200 So, that's an implicit behavior that we understand the Postgres users, but it's not specified anywhere. 10:29.200 --> 10:34.200 So, data types, of course, array by the array, JSONB and so on. 10:34.200 --> 10:37.200 And, of course, we talked about feature dependencies. 10:37.200 --> 10:43.200 So, you cannot have triggers without PLSQL, PLPGSQL, excuse me. 10:43.200 --> 10:51.200 So, of course, you must be compatible with the transaction isolation levels, 10:51.200 --> 10:56.200 the Postgres offers in order to claim compatibility with Postgres. 10:56.200 --> 11:02.200 And, of course, when the software is the same behavior, if you claim to support the same isolation level. 11:02.200 --> 11:10.200 Error codes are a good thing to have, you know, consistency with error codes, system catalogs, 11:10.200 --> 11:16.200 like when you're using a monitoring tool, it attempts to connect to the PG catalog. 11:16.200 --> 11:24.200 So, if the PG catalog isn't there in your compatible version, what is there to monitor? 11:24.200 --> 11:36.200 And, a really important point, if your database allows identifiers that are 256,000 long, right? 11:36.200 --> 11:46.200 And, then you can't backport that data into Postgres, because you've broken compatibility by adding a feature. 11:46.200 --> 11:52.200 So, server string, it must be recognizable by existing tools. 11:52.200 --> 11:54.200 Standard drivers must work. 11:54.200 --> 12:01.200 Can you PG dump data in and out of the database in a standard way? 12:01.200 --> 12:07.200 We said the execution plans don't need to be identical, but they need to do the same thing. 12:07.200 --> 12:17.200 So, if you're using a partition table, it must be able to prune the same way that Postgres can. 12:17.200 --> 12:25.200 So, replication, about the ability we mentioned briefly before, it must be bi-directional. 12:25.200 --> 12:33.200 It must be observable, like from standard views like PG, stat replication. 12:33.200 --> 12:40.200 And, of course, any vendor extensions must not break compatibility with vanilla Postgres. 12:40.200 --> 12:48.200 So, you should be able to replicate two and from a standard Postgres node. 12:48.200 --> 12:56.200 You may or may not support physical replication in the sense of taking the data files and dumping them on some other server. 12:56.200 --> 13:05.200 You may or may not support hybrid clusters, like if vanilla Postgres node and a vendor Postgres node can be talked to each other. 13:05.200 --> 13:09.200 That would be part of the same HA cluster. 13:09.200 --> 13:13.200 And, of course, point in time recovery is a key Postgres feature. 13:13.200 --> 13:21.200 And, for some of these things, including point in time recovery, you need access to the wall files. 13:21.200 --> 13:27.200 So, we said the test suite must reside outside of the Postgres URL code base. 13:27.200 --> 13:33.200 We're testing for compatibility against the specific versions, so are you compatible with Postgres 14? 13:33.200 --> 13:37.200 Are you compatible with Postgres 17 and so forth? 13:37.200 --> 13:47.200 Vendors, of course, must provide build targets because we don't know how they build their flavor of Postgres. 13:47.200 --> 13:55.200 And, of course, lastly, if you fix a Postgres bug, that doesn't mean that the test. 13:55.200 --> 14:01.200 If you fix a MySQL network protocol, it works, but like no one is going to be helped with this. 14:01.200 --> 14:03.200 So, is that Postgres compatible? 14:03.200 --> 14:07.200 Because it just doesn't network protocol. 14:07.200 --> 14:09.200 I don't know. 14:09.200 --> 14:13.200 Yeah. 14:13.200 --> 14:23.200 So, the examples that I built are written with go MySQL, go to link that allows you to build your own MySQL protocol server. 14:23.200 --> 14:27.200 And do whatever you want with the backend. 14:27.200 --> 14:33.200 This is just like a fun project, but a cloud provider could do the same. 14:33.200 --> 14:41.200 They could say, well, we have like a SQL server in the cloud, but we make it talk to MySQL network protocol. 14:41.200 --> 14:43.200 Now we're MySQL compatible. 14:43.200 --> 14:45.200 No, you're not. 14:45.200 --> 14:55.200 So, you should be really careful with what are you calling MySQL compatible and what not. 14:55.200 --> 14:57.200 So, TIDB is written in Go. 14:57.200 --> 15:00.200 So, there are some things that are easy to do in Go. 15:00.200 --> 15:03.200 MySQL is written in C, C++. 15:03.200 --> 15:09.200 So, sometimes that makes things a bit weird because we want to implement exactly the same thing. 15:09.200 --> 15:13.200 Our storage layer is actually written in Rust. 15:13.200 --> 15:21.200 And one of the things that's really different is like if you create a table in MySQL, you use like engine equals inner DB. 15:21.200 --> 15:25.200 Well, with our system, that really doesn't make any sense. 15:25.200 --> 15:29.200 So, we accept the syntax and ignore it. 15:29.200 --> 15:35.200 But we're really careful with accepting syntax and then ignoring it because that might break a lot of things. 15:35.200 --> 15:41.200 Signantly, and that's not what we want, but in some cases to just get a functional thing. 15:41.200 --> 15:46.200 You need this because otherwise, if you do a dumb promo MySQL system, you cannot load it. 15:46.200 --> 15:50.200 And that's one of the important things to do. 15:50.200 --> 15:53.200 Feature compatibility. 15:53.200 --> 15:58.200 We are really open on what features we support, what features we don't support. 15:58.200 --> 16:03.200 So, that comes back to the core and optional part that we talked about before. 16:03.200 --> 16:07.200 Like, we do not do any geospatial. 16:07.200 --> 16:12.200 We do have a vector support, XML, for example. 16:12.200 --> 16:16.200 It's a nice feature, but no one is using it. Everyone is using JSON. 16:16.200 --> 16:19.200 So, we're not supporting XML. 16:19.200 --> 16:23.200 And I think for any database vendor, it's really important to be very open. 16:23.200 --> 16:31.200 If you're compatible, like are you exactly compatible, mostly compatible, what features do you support and don't you support? 16:31.200 --> 16:39.200 Because if you're actually running an application, only then finding out that's really a crappy user experience. 16:39.200 --> 16:47.200 So, in TIDB, we try to be MySQL compatible, but we also do support sequences because sequences are very useful. 16:47.200 --> 16:56.200 But that's not in MySQL. So, we implemented in a MariaDB compatible way, because MariaDB did implement sequences. 16:56.200 --> 17:00.200 So, then you get a bit of like a Frankenstein solution. 17:00.200 --> 17:04.200 But yeah, it's still a feature we want to implement. 17:04.200 --> 17:08.200 And we cannot just say, well, we're not going to implement it because MySQL didn't. 17:08.200 --> 17:11.200 We are not going to limit ourselves. 17:11.200 --> 17:18.200 And I think the whole solution is still MySQL compatible, because you don't have to use this feature. 17:18.200 --> 17:25.200 And it's not going to break replication, et cetera. 17:25.200 --> 17:34.200 So, a few words on like MariaDB, they originally try to be a fully dropping replacement for MySQL. 17:34.200 --> 17:40.200 They were using the same codebase, so you could just easily take your data files, 17:40.200 --> 17:44.200 store MySQL, start MariaDB, everything would work. 17:44.200 --> 17:49.200 Over time this changed, so now you have to export your data, import your data, 17:49.200 --> 17:55.200 some new features that were added by MariaDB and MySQL are implemented in slightly different ways. 17:55.200 --> 17:59.200 So, MariaDB is not fully MySQL compatible. 17:59.200 --> 18:04.200 They have very good documentation on like, what is compatible, what's not compatible. 18:04.200 --> 18:11.200 We did have to fork the MySQL connector J for some TIDB specific authentication mechanisms. 18:11.200 --> 18:16.200 But most of our customers are using plain vanilla connector J because they don't, 18:16.200 --> 18:21.200 they're not interested in this new authentication method that we implemented. 18:21.200 --> 18:34.200 So, that's a place where we are compatible, but we have one optional feature that requires us to do something different. 18:34.200 --> 18:39.200 So, as well as syntax and type friction, reserve keywords is always a pain, 18:39.200 --> 18:43.200 because new versions of the database might introduce new keywords, 18:43.200 --> 18:51.200 so there's certainly your statements that we're working with, a previous version, still working. 18:51.200 --> 18:57.200 And when it comes to explain formats, that's one of the things where we decided not to be compatible with MySQL. 18:57.200 --> 19:04.200 So, if you do an explain in TIDB, like to get the same outputs as MySQL would not really make sense, 19:04.200 --> 19:12.200 because it's being executed in a distributed way, and like to try and morph everything to look exactly the same. 19:12.200 --> 19:16.200 It just doesn't help anyone because you don't get the right information. 19:16.200 --> 19:22.200 And also, the explain that's not anything that an application usually does, it's what a DBA does. 19:22.200 --> 19:33.200 And they are the amount of information that you're showing is way more important than having exactly the same format. 19:34.200 --> 19:41.200 For the JSON data type, my readb implemented this as just a text data field, and they have functions like JSON, 19:41.200 --> 19:43.200 and if check constraints etc. 19:43.200 --> 19:50.200 In mySQL, this was implemented differently with the UUID data type. 19:50.200 --> 20:01.200 With the JSON data type, so the JSON data type actually holds your data, if you insert it automatically, 20:01.200 --> 20:03.200 you have better dates etc. 20:03.200 --> 20:09.200 And then MariaDB did the exact opposite with the UUID data type. 20:09.200 --> 20:16.200 MariaDB as UUID data type, MySQL does not, but they have functions to change things. 20:16.200 --> 20:22.200 So, this is a quick demo on the difference of the explain formats. 20:22.200 --> 20:26.200 It's just a very simple query, so don't give you too much things. 20:27.200 --> 20:30.200 I will just skip over the binlog. 20:32.200 --> 20:35.200 Yeah, errors, limits, and bugs. 20:35.200 --> 20:39.200 So one of the things is that, like, of course, we are a MySQL compatible database, 20:39.200 --> 20:42.200 but we also have to have our own error messages. 20:42.200 --> 20:45.200 We try to have the same error messages for the same situation, 20:45.200 --> 20:49.200 but there might be new and different situations, but there's not really any, like, 20:49.200 --> 20:55.200 like, reins of error codes that MySQL is saying that they're never going to use. 20:55.200 --> 21:07.200 So that's quite difficult to actually avoid conflicts, but we try by just using a quite high number. 21:07.200 --> 21:10.200 That was everything I had for now. Thank you. 21:10.200 --> 21:17.200 I think we have time for questions. 21:17.200 --> 21:20.200 I think we have a little bit of time for questions. 21:20.200 --> 21:22.200 I'm the just side we don't. 21:22.200 --> 21:24.200 So we don't have, you can reach us outside. 21:24.200 --> 21:25.200 Yes.