WEBVTT 00:00.000 --> 00:15.000 Thank you. 00:15.000 --> 00:16.000 Welcome. 00:16.000 --> 00:21.000 Today we will be talking about the remarkable features of multi vector embedding balls. 00:21.000 --> 00:28.000 We will try to answer if it is a revolution or just an evolution. 00:28.000 --> 00:31.000 My name is Machin Santos and I'm a UV Corr engineer. 00:31.000 --> 00:38.000 On a daily basis, I'm working on an open source for a vector database called UV8. 00:38.000 --> 00:40.000 I'm Roberto. 00:40.000 --> 00:44.000 I'm a research engineer and I'm part of the applied research team. 00:44.000 --> 00:49.000 I have experience on approximate nearest neighbor search and compression. 00:49.000 --> 00:55.000 We'll start with explaining what are the types of embedding models out there in the market. 00:55.000 --> 01:01.000 Then we will show how those models are being used with vector databases. 01:01.000 --> 01:10.000 We'll end with going into explaining how multi vector encoding algorithm is changing the way we are working with multi vectors. 01:10.000 --> 01:16.000 And our presentation will end with short demo. 01:17.000 --> 01:28.000 So embedding models are the models which have only one task and that task is to create a vector representation of the process input. 01:28.000 --> 01:36.000 For example, when we are sending some sentence into it, we are getting back an embedding vector of a given dimensionality. 01:36.000 --> 01:44.000 All of the modern embedding models are trained with materials, materials, reinforcement learning technique. 01:44.000 --> 01:48.000 It means that during the learning process we are using multiple loss functions. 01:48.000 --> 01:58.000 And that results in a model that is able to output embeddings of varying dimensionalities. 01:58.000 --> 02:03.000 Before that method, we will have to train one model per each dimensionality. 02:03.000 --> 02:09.000 When you are starting your journey with AI, perhaps you are building some rack pipelines. 02:09.000 --> 02:20.000 You are most certainly using open models that are like from open AI, Google or coherent that are returning regular embeddings. 02:20.000 --> 02:24.000 But there is also another family of embedding models. 02:24.000 --> 02:26.000 They are called multi vector embedding models. 02:26.000 --> 02:33.000 The difference is that for a process input, we are getting back an array of vectors. 02:33.000 --> 02:36.000 An array of so-called token level embeddings. 02:36.000 --> 02:42.000 And as you can see here, we are working with a multi vector, not with one vector. 02:42.000 --> 02:53.000 And that also makes it a little bit harder for us to calculate a similarity metric for those embeddings. 02:53.000 --> 03:00.000 Because now we have to take into account all token level embeddings for each of the query and document. 03:00.000 --> 03:04.000 And we have to calculate a similarity metric for each of the pairs. 03:04.000 --> 03:08.000 And then pick the highest results and sum them. 03:08.000 --> 03:12.000 This is then our similarity score, our maximum score. 03:12.000 --> 03:20.000 The formula, which we are using in order to find the similar vector, it is called maximum formula. 03:20.000 --> 03:24.000 And maximum is basis for a late interaction. 03:24.000 --> 03:29.000 Late interaction means that when we are working with multi vectors, we don't have to pull them into one vector. 03:29.000 --> 03:33.000 And then calculate some similarity metric. 03:33.000 --> 03:36.000 But we can just work with those arrays of vectors. 03:36.000 --> 03:43.000 The coolest family of multi vector embedding models are the multi vector vision embedding models. 03:43.000 --> 03:50.000 Those that understand text and images, like called particle quen, called nomic. 03:50.000 --> 03:55.000 This image is taken from the called poly paper. 03:55.000 --> 04:03.000 You can see that there is an example of the standard retrieval when you are working with complex document. 04:03.000 --> 04:09.000 When you want to index that data, you have to first perform some live detection. 04:09.000 --> 04:14.000 You have to find where the images, where the text then extract them separately. 04:14.000 --> 04:20.000 Then of course, you have to pre-process the text, I mean, chunk it. 04:20.000 --> 04:22.000 And then you can calculate some embeddings. 04:22.000 --> 04:28.000 But with multi vector vision embedding models, you can treat a page as one image. 04:28.000 --> 04:32.000 And then send that image of a page over to that model. 04:32.000 --> 04:39.000 Then the model, what it does, it takes the image and divides it into small squares. 04:39.000 --> 04:44.000 And for each square of that image, it calculates a token level embedding. 04:44.000 --> 04:50.000 So then when you are sending, for example, a query, which also gets translated into a multi vector. 04:50.000 --> 04:57.000 And when we calculate those similarity metrics for each of those token level embeddings pairs. 04:57.000 --> 05:05.000 And we project that results in form of a heat map over the image. 05:05.000 --> 05:11.000 You will see that the most relevant and highlighted parts of our query. 05:11.000 --> 05:15.000 And our query was which hour of the day had the highest over electricity. 05:15.000 --> 05:20.000 Generals in 2019, those parts are highlighted. 05:20.000 --> 05:26.000 And this is the way we can visualize how those multi vector vision embedding models are able to see. 05:26.000 --> 05:28.000 What is on the picture? 05:28.000 --> 05:37.000 So when to use those models, they are based usually with visually rich documents with screenshots and presentations. 05:37.000 --> 05:44.000 Now we are about to go over how those models are being used with vector databases. 05:44.000 --> 05:46.000 Thank you, Martin. 05:46.000 --> 05:50.000 So here we have two pictures of two cute dogs. 05:50.000 --> 05:55.000 And what you can see is that we have the vector representation of our image. 05:55.000 --> 06:01.000 And as Martin just said, we can have those vector representation regardless of the modality of our data. 06:01.000 --> 06:03.000 So let's say you also have text or videos. 06:03.000 --> 06:11.000 You can create those embeddings and similar data or semantically similar data will have similar vectors. 06:11.000 --> 06:17.000 So here we have a pretty much learning model which is just an embedding function. 06:17.000 --> 06:21.000 And as you can see here we have several dots which are vectors. 06:21.000 --> 06:24.000 Here we are just in a 3-dimensional space. 06:24.000 --> 06:29.000 And for example, meat and chicken that are quite similar are closing this vector space. 06:29.000 --> 06:32.000 And here you have also apple and banana. 06:32.000 --> 06:34.000 So you have this area with fruit. 06:34.000 --> 06:37.000 So let's suppose you have your own data. 06:37.000 --> 06:42.000 For example, a collection of wines, you will be computing the embedding of your data. 06:42.000 --> 06:46.000 So either it's image or text, you will be computing the embedding. 06:46.000 --> 06:49.000 And here you need a vector DB to sort those data. 06:49.000 --> 06:54.000 Because whenever you have a query, you want to search in an efficient way the most similar vector. 06:54.000 --> 06:57.000 So let's suppose your query is wine for seafood. 06:57.000 --> 07:00.000 You will be computing the embedding of your query. 07:00.000 --> 07:04.000 And then you will use the vector DB to search the top k elements. 07:04.000 --> 07:07.000 So for example here, shardony wine. 07:07.000 --> 07:10.000 There are several types of vector indexes. 07:10.000 --> 07:14.000 For example, we with currently supports HNSW and flat index. 07:14.000 --> 07:18.000 HNSW is a fully in memory index and graph based. 07:18.000 --> 07:21.000 And flat index is just exhaustive search. 07:21.000 --> 07:23.000 So you are computing all the distances. 07:23.000 --> 07:26.000 On top of them you can also have quantization. 07:26.000 --> 07:31.000 Quantization is a technique that allow you to compress the memory for those vector. 07:31.000 --> 07:35.000 Because by default, you are using floating point in 32 bits. 07:35.000 --> 07:37.000 And this can be expensive. 07:37.000 --> 07:41.000 So you can use those quantization like PQ, BQ, SQ or AQ. 07:41.000 --> 07:43.000 They have different trade-off. 07:43.000 --> 07:48.000 So for example, if you are compressing more, you will be losing more information. 07:48.000 --> 07:50.000 But you are saving more memory. 07:50.000 --> 07:56.000 On the other hand, if you will be compress less, you will have still high quality results. 07:56.000 --> 08:02.000 But today, we will focus on HNSW with multivector, specifically with movera. 08:02.000 --> 08:07.000 And before jumping into it, here, there is an example of how HNSW works. 08:07.000 --> 08:12.000 So HNSW stands for hierarchical navigable word graph. 08:12.000 --> 08:15.000 And as you can see here, we have just three layers. 08:15.000 --> 08:17.000 And in each layer, we have nodes. 08:17.000 --> 08:22.000 And whenever we have to perform a search operation, we will start from the topmost layer. 08:22.000 --> 08:25.000 And we will be computing the distances in that layer. 08:25.000 --> 08:29.000 Once we find the closest vector, we will go on the next layer, and so on. 08:29.000 --> 08:36.000 Until we reach layer 0, where we will be in a local region, and we will be looking for the most similar vectors. 08:36.000 --> 08:39.000 So let's jump into movera. 08:39.000 --> 08:45.000 Movera is a research paper that has been released from Google, research group almost three years ago. 08:45.000 --> 08:51.000 And the idea here is to encode those multivector that must be described before into single vector. 08:51.000 --> 08:55.000 That are called FDE, fixed dimensional encoding. 08:55.000 --> 08:56.000 Why we want to do this? 08:56.000 --> 09:01.000 Because if you want to compute similarity with multivector, this can be very expensive. 09:01.000 --> 09:04.000 It's quadratic in the number of vectors you have. 09:04.000 --> 09:08.000 And the idea of movera is that given to FDE, so choose single vector, 09:08.000 --> 09:13.000 you will be when you are computing the dot product between those two vectors. 09:13.000 --> 09:16.000 You will have an approximation of the maximum score. 09:16.000 --> 09:23.000 And dot product can be efficiently computed, because it can be highly optimized using, for example, seemed instruction. 09:23.000 --> 09:26.000 So the main steps of movera are three. 09:26.000 --> 09:32.000 You have the space partitioning one, dimensionality reduction, and repeat of first and second step. 09:32.000 --> 09:35.000 For each of them, you have a parameter, which is kcm. 09:35.000 --> 09:40.000 It's telling you how much you want to divide your space. 09:40.000 --> 09:47.000 And then the number of repetition, which by default, is 10. 09:47.000 --> 09:51.000 So let's suppose you have the following multivector. 09:51.000 --> 09:57.000 You have five vectors, and each of them has dimensionality 128. 09:57.000 --> 10:03.000 If you are using, for example, hnsw without movera, you have to add all those vectors just for one document. 10:03.000 --> 10:09.000 So this can be expensive, because hnsw is fully in memory, so this can bring the cos up. 10:09.000 --> 10:12.000 So the first one is the space partitioning. 10:12.000 --> 10:17.000 And here we are using a technique that is called cmesh, and it's based on locality sensitive ashing. 10:17.000 --> 10:22.000 So in other words, you are based on kcm, which in this case is tree. 10:22.000 --> 10:25.000 You are sampling tree random Gaussian vector. 10:25.000 --> 10:31.000 And whenever you want to find corresponding cluster for a vector, for example, for a vector one, 10:31.000 --> 10:37.000 you will be computing the dot product between vector one and those Gaussian vector, those three Gaussian vector. 10:37.000 --> 10:45.000 Based on the sign of the dot product, so for example, for v1, you have all negative values. 10:45.000 --> 10:49.000 You will encode it in 0 0 0, which is cluster 0. 10:49.000 --> 10:56.000 For example, for vector 5, in this case, you may have that the first two dot product were negative. 10:56.000 --> 10:58.000 The second, the third one was positive. 10:58.000 --> 11:01.000 So the encoding will be 0 0 1, and you add it to cluster 1. 11:01.000 --> 11:06.000 Here, you could also use cmesh, the main difference is that cmesh requires training. 11:06.000 --> 11:13.000 And in this way, you can do partitioning without having any training data, and it's also data obvious. 11:13.000 --> 11:18.000 So in this case, if you have kcm equals tree, you can end up with eight clusters. 11:18.000 --> 11:21.000 So what you will be doing then is to just flatten them. 11:21.000 --> 11:27.000 In case you have more than one vector in a cluster, you will be applying min pooling. 11:27.000 --> 11:33.000 So for example, for cluster 0, you have v1 and v4, so we will be just compute the min for each entry. 11:33.000 --> 11:39.000 Then you have cluster 1 with just one vector, and as you can see, we have some empty clusters. 11:39.000 --> 11:42.000 The next step is the filling the empty cluster. 11:42.000 --> 11:50.000 So for example, for those that are empty, we are substituting those zeros with the closest non empty cluster. 11:50.000 --> 11:55.000 So for example, in this case for cluster 2, the closest non empty is cluster 1. 11:55.000 --> 11:58.000 So we reported here, also the SIM for a cluster chip. 11:58.000 --> 12:01.000 For 5, the closest one was v3. 12:01.000 --> 12:08.000 As you can see, the length of this vector is number of clusters times dim, so 8 times 128. 12:08.000 --> 12:14.000 So what we want to do in the second step, the dimension introduction, we want to shrink those vector. 12:14.000 --> 12:19.000 So we will have the vector coming from the previous operation. 12:19.000 --> 12:24.000 And here we will be using random matrices that are made of plus 1 and minus 1. 12:24.000 --> 12:31.000 In this case, we have that dimensionality is dim times dproj, so 128 times 16. 12:31.000 --> 12:41.000 So that when you will be computing the vector matrix operation, you will end up with a new cluster of smaller dimensionality in this case dproj. 12:41.000 --> 12:43.000 And you will be just concatenating them. 12:43.000 --> 12:49.000 So the new resulting vector will be number of clusters times 16, not anymore 128. 12:49.000 --> 12:53.000 As you can see, we have done some approximation here and in the previous steps. 12:53.000 --> 13:00.000 So in order to reduce the randomness coming from those approximation, we will repeat this process several times. 13:00.000 --> 13:04.000 So which are the prompts of mover and is it worth to use it? 13:04.000 --> 13:09.000 Actually, if you are dealing with a lot of documents, now we will see, for example, in the demo, 13:09.000 --> 13:13.000 only with one document, but we will see how many vectors we will end up. 13:13.000 --> 13:16.000 So let's suppose you are working with hundreds of thousands document. 13:16.000 --> 13:18.000 You need some sort of filtering approach. 13:18.000 --> 13:23.000 So you can use mover with hnsw in memory to create a candidate set. 13:23.000 --> 13:25.000 And then you can record those values. 13:25.000 --> 13:30.000 But the benefit of mover as well, the first one, the improving important times. 13:30.000 --> 13:34.000 You just need to preprocess those vector and store them in hnsw. 13:34.000 --> 13:39.000 Then you have reduced memory requirements, because if you have to build hnsw in this case, 13:39.000 --> 13:44.000 you will have like five nodes and for each of them you have also to find the connection. 13:44.000 --> 13:48.000 In this case, it's just one node and you find a corresponding connection. 13:48.000 --> 13:53.000 Then one consequence is clearly the worst recall, but this can be fixed with risk coding. 13:53.000 --> 13:56.000 So you will be using mover to build a candidate set. 13:56.000 --> 14:02.000 And only for those vector, you will be reading from this original multivector and compute the maximum score. 14:02.000 --> 14:05.000 But now it's time for them. 14:05.000 --> 14:08.000 Yeah, thank you Roberta. 14:08.000 --> 14:11.000 So for today you have prepared a demo. 14:11.000 --> 14:15.000 If you want to run it locally, here is the link to the project. 14:15.000 --> 14:19.000 We will be using V8 to store the multivector embeddings, 14:19.000 --> 14:24.000 and call it to 0.5 call version model. 14:24.000 --> 14:29.000 So let's first load the model. 14:30.000 --> 14:33.000 As you can see here, we have two methods. 14:33.000 --> 14:39.000 One is for vectorizing the image, and the second one we will use to vectorize the text. 14:39.000 --> 14:43.000 All of those methods are outputting a multivector. 14:43.000 --> 14:46.000 We will try to index this document. 14:46.000 --> 14:50.000 You can see that this is like an NVIDIA investor presentation. 14:50.000 --> 14:52.000 It's a very complex document. 14:52.000 --> 14:57.000 It's a very complex document. 14:57.000 --> 15:02.000 We will try to find the image of text and images. 15:02.000 --> 15:06.000 Also we will make some with some charts. 15:06.000 --> 15:10.000 The idea here is to treat this one page as an image. 15:10.000 --> 15:13.000 And one page will be our one chunk. 15:13.000 --> 15:17.000 And then send that image over to the colloquial model to get the multivector embeddings. 15:17.000 --> 15:22.000 And try to find something on those pages. 15:22.000 --> 15:26.000 So the model is loaded. 15:26.000 --> 15:31.000 Now this is a helper method which we will use to generate the embeddings. 15:31.000 --> 15:34.000 Let's start with it. 15:34.000 --> 15:38.000 Let's try to connect with it. 15:38.000 --> 15:40.000 Let's wait if it's running. 15:40.000 --> 15:41.000 It's running. 15:41.000 --> 15:44.000 So first we need to create a collection. 15:44.000 --> 15:48.000 We are creating a collection with one property page number. 15:48.000 --> 15:54.000 And the most important part is here that we are defining a vector index. 15:54.000 --> 15:56.000 A multivector vector index. 15:56.000 --> 15:59.000 And we are enabling here moveira encoding. 15:59.000 --> 16:03.000 It means that when we will be sending multivector's to evit, 16:03.000 --> 16:09.000 those will get translated automatically to FDs using moveira algorithm. 16:09.000 --> 16:12.000 So let's create a collection. 16:12.000 --> 16:15.000 Let's generate the embeddings for our document. 16:15.000 --> 16:20.000 And now you can see here that we have 32 pages. 16:20.000 --> 16:26.000 And for each page, for each page we have generated a multivector. 16:26.000 --> 16:30.000 That consists of 731 total embeddings. 16:30.000 --> 16:35.000 So in total for 32 pages we have over 23,000 vectors. 16:35.000 --> 16:37.000 But we have moveira. 16:37.000 --> 16:41.000 Those 23,000 vectors will get translated only into 42. 16:41.000 --> 16:46.000 So we can see how much of the reduction is on the resources. 16:46.000 --> 16:50.000 So let's save those multivectors into evit. 16:50.000 --> 16:52.000 And now here's another helper method. 16:52.000 --> 16:58.000 I will use that method to generate the query using the call-con model. 16:58.000 --> 17:00.000 So the query will be passed to it. 17:00.000 --> 17:02.000 It will generate a multivector. 17:02.000 --> 17:06.000 And then I will use this multivector in order to perform a vector search. 17:06.000 --> 17:09.000 And I will just present the first result. 17:09.000 --> 17:14.000 So let's look for a list of countries using AI. 17:14.000 --> 17:18.000 You can see that I got a page which is like a roadmap. 17:18.000 --> 17:23.000 And there are those countries that's something that I was looking for. 17:23.000 --> 17:27.000 Let's look what is in VDS infrastructure roadmap. 17:27.000 --> 17:31.000 Maybe there's an information about that of course it is. 17:31.000 --> 17:38.000 So I just wanted to make a point that those multivector embeddings are good. 17:38.000 --> 17:42.000 Also with pages that consist only of text. 17:42.000 --> 17:48.000 And of course I really go to mixed content. 17:48.000 --> 17:54.000 So let's make another request revenue and income charts. 17:54.000 --> 18:00.000 You can see that we got a page that is describing the exactly what we are searching for. 18:00.000 --> 18:06.000 I think that that would be very hard to achieve using regular embedding models 18:06.000 --> 18:10.000 because it's really hard to extract that data. 18:10.000 --> 18:14.000 So let's make two last queries. 18:14.000 --> 18:18.000 First in Italian and the second one in Polish. 18:18.000 --> 18:24.000 And we will be asking first what are the industries that will benefit the most from AI. 18:24.000 --> 18:28.000 You can see that we got a page that describes those industries. 18:28.000 --> 18:32.000 And they ask what are the plans for dividends. 18:32.000 --> 18:39.000 And you can see that in 2025 there will be 300 over 300 million dollars dividends. 18:39.000 --> 18:45.000 So you can see those models are also really good with multilingual data. 18:45.000 --> 18:49.000 So. 18:49.000 --> 18:51.000 Thanks Marcin for your demo. 18:51.000 --> 18:57.000 So we have just seen how we can build an AI powered OCR pipeline and the steps are pretty simple. 18:57.000 --> 19:03.000 The first one is the extraction of the document page using just like converting it to an image. 19:03.000 --> 19:08.000 Then we use a multivector embedding vision model to create those vector. 19:08.000 --> 19:12.000 And then we can store them in a vector db using the mover encoding. 19:12.000 --> 19:15.000 And then you are ready for your semantic search. 19:15.000 --> 19:19.000 So if you like our presentation, you can connect with us. 19:19.000 --> 19:23.000 And also if you are interested in more details about Movera, 19:24.000 --> 19:30.000 we have a blog post and also we have podcast episode with one of the authors of the paper. 19:30.000 --> 19:33.000 Thank you for your attention. 19:54.000 --> 19:56.000 Yeah, actually this is a good question. 19:56.000 --> 20:01.000 So the question was which is our device for the parameters of Movera. 20:01.000 --> 20:08.000 So by default, we are using KC equal 4, the projection equal 16 and the number of repetition time equal 10. 20:08.000 --> 20:14.000 So in this case, the dimensionality of the vector will be 10 times 16 times 2 to the power of 4. 20:14.000 --> 20:21.000 And as you can see, this is a longer vector with respect to, let's say, the regular one which are just 128. 20:21.000 --> 20:25.000 So in this case, I would say it really depends on your data distribution. 20:25.000 --> 20:32.000 So for example, if you have visually rich documents like where you have charts and tables, 20:32.000 --> 20:35.000 you may want to have them to be accurate as much as possible. 20:35.000 --> 20:37.000 So you can increase those parameter. 20:37.000 --> 20:41.000 And actually something that you can also do on top of them. 20:41.000 --> 20:45.000 There is something vector that you get, the FD is uncompressed. 20:45.000 --> 20:54.000 But if you want, you can also use quantization like I will not suggest for BQ, which is shrinking a lot. 20:54.000 --> 21:00.000 The data dimensionality, but most probably RQ or RQ fill wet fit well here, 21:00.000 --> 21:06.000 where you are just going from 4 bytes for entry to 1 byte for entry. 21:06.000 --> 21:07.000 Yes? 21:08.000 --> 21:17.000 Can you question more against using standard embedding and vector search as we'll pass the level of data fraction of the second byte? 21:17.000 --> 21:26.000 Also, this is a good question. So the question is if we have tried any benchmarking in doing the first stage ratio with single vector and in the second one with multi vector, right? 21:26.000 --> 21:27.000 Okay. 21:27.000 --> 21:30.000 So actually we have also tried something like this. 21:30.000 --> 21:36.000 I didn't see any, let's say benefit in doing this. 21:36.000 --> 21:42.000 Personally, I think the main challenge here will be to deal with those chunking technique. 21:42.000 --> 21:47.000 Because personally, what I prefer more about this type of approach is the simplicity, 21:47.000 --> 21:54.000 because if you are just going with a regular approach, you also have to do like these steps in chunking and layout detection. 21:54.000 --> 21:59.000 And in this case, you can just focus on like one multi vector per page. 22:01.000 --> 22:02.000 Okay. 22:02.000 --> 22:04.000 Why are you listening to some sort of communication with this question? 22:04.000 --> 22:08.000 It's trying to sort of approach with this multi-layer. 22:08.000 --> 22:10.000 It's performance. 22:10.000 --> 22:18.000 It's comparable example to the standard layer of building the HTML connector, which is basically having multiple vectors with the same label. 22:18.000 --> 22:19.000 That's one performance way. 22:19.000 --> 22:24.000 In the other way, of course, it's not actually using multi-modal. 22:25.000 --> 22:38.000 Okay. So the question is, like, how this compares to HLSW with just multi vector, and also with single vector, right? 22:38.000 --> 22:46.000 Well, one of them is the multi vector function, which is basically just having one label or a common label or a multi-layer. 22:46.000 --> 22:49.000 So do you mean like a filter at search? 22:49.000 --> 22:53.000 Like each label that needs used for as filter, or 22:53.000 --> 22:57.000 this kind of, you know, that the scarecrowation is going to be referencing your location. 22:57.000 --> 23:03.000 So plus actually allows you to have multiple vectors in the same label. 23:03.000 --> 23:08.000 Okay. So in that case, you are also doing, let's say building a candidate set for those vector, 23:08.000 --> 23:10.000 and then you will be computing the risk coding. 23:10.000 --> 23:16.000 So for example, in this case, maybe you have that two or more nodes of the HLSW are part of the same document, right? 23:16.000 --> 23:18.000 So they are just true vector. 23:18.000 --> 23:19.000 Okay. 23:19.000 --> 23:24.000 Yeah. So for this, yes, actually, we also have support for this, where if you don't pass, like, 23:24.000 --> 23:30.000 de-movering coding, you can use this. And personally, I think this is slightly better quality, 23:30.000 --> 23:35.000 but the main consequence of this is the memory pressure, because for example, in, in this case, 23:35.000 --> 23:40.000 you have seen, like, you have 700 nodes for just one chunk. 23:40.000 --> 23:44.000 So creating the HLSW since it's fully in memory can be expensive from, 23:45.000 --> 23:48.000 if I don't recognize this thing, you can do something like that. 23:48.000 --> 23:54.000 If you want to use memory to take the risk of vectors. 23:54.000 --> 23:59.000 That's true, but you still need memory for the connection. 23:59.000 --> 24:05.000 Because if you either, you are using BQ or using uncompressed, you still, for example, have the M parameter, 24:05.000 --> 24:09.000 which is ruling you how many connections you have for each node. 24:09.000 --> 24:13.000 So that can be still expensive, but yes, we have the support also for this. 24:13.000 --> 24:23.000 You can use regular, for the multi-mortal one, we have done also some experiments on this. 24:23.000 --> 24:27.000 Personally, I didn't find any big benefits in using those approaches, 24:27.000 --> 24:38.000 compared to just either HLSW with multi-vector or with, HLSW with multi-vector and Movera. 24:38.000 --> 24:39.000 Yes? 24:39.000 --> 24:40.000 Yes. 24:40.000 --> 24:47.000 We have a question about the power and the question of pretty much about the difference 24:47.000 --> 24:53.000 between more than a single vector versus both vector. 24:53.000 --> 24:59.000 Does that mean that it's potentially possible to learn a single vector representation 24:59.000 --> 25:04.000 that are too a B-bote vector if you're not using anymore? 25:05.000 --> 25:07.000 Correct me if I'm wrong. 25:07.000 --> 25:13.000 The question is, can we use this idea of converting multi-vector in single vector 25:13.000 --> 25:18.000 and we do it in the embedding model so we just work with single vector? 25:18.000 --> 25:19.000 Yes. 25:19.000 --> 25:24.000 This is something that can be done and I think it's effective because when you are doing 25:24.000 --> 25:28.000 the training of this embedding model, you will improve the quality. 25:28.000 --> 25:32.000 So that's something I think will definitely benefit. 25:33.000 --> 25:34.000 I cannot do that. 25:34.000 --> 25:40.000 There are documents, screenshots and embedding models that are virtually doing that. 25:40.000 --> 25:50.000 They are creating a one vector out of the multivators and they are just sending you back 25:50.000 --> 25:55.000 that one vector and it's possible with those types of models. 26:02.000 --> 26:07.000 How does it work with tables? 26:07.000 --> 26:12.000 Because the total of the power of that line is the image of the element. 26:12.000 --> 26:15.000 It's all a bit of a bit much. 26:15.000 --> 26:19.000 Like I say, you have a bit of a table on the board that they patched it up 26:19.000 --> 26:21.000 and then you'll compress it in multi-vector. 26:21.000 --> 26:24.000 We use this whole bit as big as we can. 26:24.000 --> 26:26.000 How is it there? 26:26.000 --> 26:29.000 I actually didn't try that, 26:29.000 --> 26:31.000 but 26:31.000 --> 26:35.000 if necessary, your question is how do we handle those tables 26:35.000 --> 26:38.000 because we are working with patches, right? 26:38.000 --> 26:40.000 Okay. 26:40.000 --> 26:41.000 Yes. 26:41.000 --> 26:47.000 I mean, what you can do is to have some sort of overlapping with patches. 26:47.000 --> 26:51.000 We did not try, but I'm pretty sure this will help the benefits 26:51.000 --> 26:54.000 so you have some overlapping between each patch. 26:54.000 --> 26:58.000 So you can, let's say, not lose as much as just doing four, 26:58.000 --> 27:00.000 let's say, straight squares. 27:01.000 --> 27:05.000 It was more, like, how bad is it going to be? 27:05.000 --> 27:08.000 Is there a convertible to vectors and other things? 27:08.000 --> 27:13.000 Oh, actually, something like this specifically, I did not test it. 27:13.000 --> 27:16.000 I'll have a third question. 27:16.000 --> 27:19.000 Okay, thank you very much. 27:19.000 --> 27:21.000 Thank you very much. 27:21.000 --> 27:23.000 Thank you very much. 27:23.000 --> 27:25.000 Thank you very much. 27:25.000 --> 27:27.000 Thank you. 27:27.000 --> 27:29.000 Thank you. 27:29.000 --> 27:31.000 Thank you. 27:31.000 --> 27:33.000 Thank you. 27:33.000 --> 27:35.000 Thank you. 27:35.000 --> 27:37.000 Thank you. 27:37.000 --> 27:39.000 Thank you. 27:39.000 --> 27:41.000 Thank you. 27:41.000 --> 27:43.000 Thank you. 27:43.000 --> 27:45.000 Thank you.