WEBVTT

00:00.000 --> 00:15.000
Thank you.

00:15.000 --> 00:16.000
Welcome.

00:16.000 --> 00:21.000
Today we will be talking about the remarkable features of multi vector embedding balls.

00:21.000 --> 00:28.000
We will try to answer if it is a revolution or just an evolution.

00:28.000 --> 00:31.000
My name is Machin Santos and I'm a UV Corr engineer.

00:31.000 --> 00:38.000
On a daily basis, I'm working on an open source for a vector database called UV8.

00:38.000 --> 00:40.000
I'm Roberto.

00:40.000 --> 00:44.000
I'm a research engineer and I'm part of the applied research team.

00:44.000 --> 00:49.000
I have experience on approximate nearest neighbor search and compression.

00:49.000 --> 00:55.000
We'll start with explaining what are the types of embedding models out there in the market.

00:55.000 --> 01:01.000
Then we will show how those models are being used with vector databases.

01:01.000 --> 01:10.000
We'll end with going into explaining how multi vector encoding algorithm is changing the way we are working with multi vectors.

01:10.000 --> 01:16.000
And our presentation will end with short demo.

01:17.000 --> 01:28.000
So embedding models are the models which have only one task and that task is to create a vector representation of the process input.

01:28.000 --> 01:36.000
For example, when we are sending some sentence into it, we are getting back an embedding vector of a given dimensionality.

01:36.000 --> 01:44.000
All of the modern embedding models are trained with materials, materials, reinforcement learning technique.

01:44.000 --> 01:48.000
It means that during the learning process we are using multiple loss functions.

01:48.000 --> 01:58.000
And that results in a model that is able to output embeddings of varying dimensionalities.

01:58.000 --> 02:03.000
Before that method, we will have to train one model per each dimensionality.

02:03.000 --> 02:09.000
When you are starting your journey with AI, perhaps you are building some rack pipelines.

02:09.000 --> 02:20.000
You are most certainly using open models that are like from open AI, Google or coherent that are returning regular embeddings.

02:20.000 --> 02:24.000
But there is also another family of embedding models.

02:24.000 --> 02:26.000
They are called multi vector embedding models.

02:26.000 --> 02:33.000
The difference is that for a process input, we are getting back an array of vectors.

02:33.000 --> 02:36.000
An array of so-called token level embeddings.

02:36.000 --> 02:42.000
And as you can see here, we are working with a multi vector, not with one vector.

02:42.000 --> 02:53.000
And that also makes it a little bit harder for us to calculate a similarity metric for those embeddings.

02:53.000 --> 03:00.000
Because now we have to take into account all token level embeddings for each of the query and document.

03:00.000 --> 03:04.000
And we have to calculate a similarity metric for each of the pairs.

03:04.000 --> 03:08.000
And then pick the highest results and sum them.

03:08.000 --> 03:12.000
This is then our similarity score, our maximum score.

03:12.000 --> 03:20.000
The formula, which we are using in order to find the similar vector, it is called maximum formula.

03:20.000 --> 03:24.000
And maximum is basis for a late interaction.

03:24.000 --> 03:29.000
Late interaction means that when we are working with multi vectors, we don't have to pull them into one vector.

03:29.000 --> 03:33.000
And then calculate some similarity metric.

03:33.000 --> 03:36.000
But we can just work with those arrays of vectors.

03:36.000 --> 03:43.000
The coolest family of multi vector embedding models are the multi vector vision embedding models.

03:43.000 --> 03:50.000
Those that understand text and images, like called particle quen, called nomic.

03:50.000 --> 03:55.000
This image is taken from the called poly paper.

03:55.000 --> 04:03.000
You can see that there is an example of the standard retrieval when you are working with complex document.

04:03.000 --> 04:09.000
When you want to index that data, you have to first perform some live detection.

04:09.000 --> 04:14.000
You have to find where the images, where the text then extract them separately.

04:14.000 --> 04:20.000
Then of course, you have to pre-process the text, I mean, chunk it.

04:20.000 --> 04:22.000
And then you can calculate some embeddings.

04:22.000 --> 04:28.000
But with multi vector vision embedding models, you can treat a page as one image.

04:28.000 --> 04:32.000
And then send that image of a page over to that model.

04:32.000 --> 04:39.000
Then the model, what it does, it takes the image and divides it into small squares.

04:39.000 --> 04:44.000
And for each square of that image, it calculates a token level embedding.

04:44.000 --> 04:50.000
So then when you are sending, for example, a query, which also gets translated into a multi vector.

04:50.000 --> 04:57.000
And when we calculate those similarity metrics for each of those token level embeddings pairs.

04:57.000 --> 05:05.000
And we project that results in form of a heat map over the image.

05:05.000 --> 05:11.000
You will see that the most relevant and highlighted parts of our query.

05:11.000 --> 05:15.000
And our query was which hour of the day had the highest over electricity.

05:15.000 --> 05:20.000
Generals in 2019, those parts are highlighted.

05:20.000 --> 05:26.000
And this is the way we can visualize how those multi vector vision embedding models are able to see.

05:26.000 --> 05:28.000
What is on the picture?

05:28.000 --> 05:37.000
So when to use those models, they are based usually with visually rich documents with screenshots and presentations.

05:37.000 --> 05:44.000
Now we are about to go over how those models are being used with vector databases.

05:44.000 --> 05:46.000
Thank you, Martin.

05:46.000 --> 05:50.000
So here we have two pictures of two cute dogs.

05:50.000 --> 05:55.000
And what you can see is that we have the vector representation of our image.

05:55.000 --> 06:01.000
And as Martin just said, we can have those vector representation regardless of the modality of our data.

06:01.000 --> 06:03.000
So let's say you also have text or videos.

06:03.000 --> 06:11.000
You can create those embeddings and similar data or semantically similar data will have similar vectors.

06:11.000 --> 06:17.000
So here we have a pretty much learning model which is just an embedding function.

06:17.000 --> 06:21.000
And as you can see here we have several dots which are vectors.

06:21.000 --> 06:24.000
Here we are just in a 3-dimensional space.

06:24.000 --> 06:29.000
And for example, meat and chicken that are quite similar are closing this vector space.

06:29.000 --> 06:32.000
And here you have also apple and banana.

06:32.000 --> 06:34.000
So you have this area with fruit.

06:34.000 --> 06:37.000
So let's suppose you have your own data.

06:37.000 --> 06:42.000
For example, a collection of wines, you will be computing the embedding of your data.

06:42.000 --> 06:46.000
So either it's image or text, you will be computing the embedding.

06:46.000 --> 06:49.000
And here you need a vector DB to sort those data.

06:49.000 --> 06:54.000
Because whenever you have a query, you want to search in an efficient way the most similar vector.

06:54.000 --> 06:57.000
So let's suppose your query is wine for seafood.

06:57.000 --> 07:00.000
You will be computing the embedding of your query.

07:00.000 --> 07:04.000
And then you will use the vector DB to search the top k elements.

07:04.000 --> 07:07.000
So for example here, shardony wine.

07:07.000 --> 07:10.000
There are several types of vector indexes.

07:10.000 --> 07:14.000
For example, we with currently supports HNSW and flat index.

07:14.000 --> 07:18.000
HNSW is a fully in memory index and graph based.

07:18.000 --> 07:21.000
And flat index is just exhaustive search.

07:21.000 --> 07:23.000
So you are computing all the distances.

07:23.000 --> 07:26.000
On top of them you can also have quantization.

07:26.000 --> 07:31.000
Quantization is a technique that allow you to compress the memory for those vector.

07:31.000 --> 07:35.000
Because by default, you are using floating point in 32 bits.

07:35.000 --> 07:37.000
And this can be expensive.

07:37.000 --> 07:41.000
So you can use those quantization like PQ, BQ, SQ or AQ.

07:41.000 --> 07:43.000
They have different trade-off.

07:43.000 --> 07:48.000
So for example, if you are compressing more, you will be losing more information.

07:48.000 --> 07:50.000
But you are saving more memory.

07:50.000 --> 07:56.000
On the other hand, if you will be compress less, you will have still high quality results.

07:56.000 --> 08:02.000
But today, we will focus on HNSW with multivector, specifically with movera.

08:02.000 --> 08:07.000
And before jumping into it, here, there is an example of how HNSW works.

08:07.000 --> 08:12.000
So HNSW stands for hierarchical navigable word graph.

08:12.000 --> 08:15.000
And as you can see here, we have just three layers.

08:15.000 --> 08:17.000
And in each layer, we have nodes.

08:17.000 --> 08:22.000
And whenever we have to perform a search operation, we will start from the topmost layer.

08:22.000 --> 08:25.000
And we will be computing the distances in that layer.

08:25.000 --> 08:29.000
Once we find the closest vector, we will go on the next layer, and so on.

08:29.000 --> 08:36.000
Until we reach layer 0, where we will be in a local region, and we will be looking for the most similar vectors.

08:36.000 --> 08:39.000
So let's jump into movera.

08:39.000 --> 08:45.000
Movera is a research paper that has been released from Google, research group almost three years ago.

08:45.000 --> 08:51.000
And the idea here is to encode those multivector that must be described before into single vector.

08:51.000 --> 08:55.000
That are called FDE, fixed dimensional encoding.

08:55.000 --> 08:56.000
Why we want to do this?

08:56.000 --> 09:01.000
Because if you want to compute similarity with multivector, this can be very expensive.

09:01.000 --> 09:04.000
It's quadratic in the number of vectors you have.

09:04.000 --> 09:08.000
And the idea of movera is that given to FDE, so choose single vector,

09:08.000 --> 09:13.000
you will be when you are computing the dot product between those two vectors.

09:13.000 --> 09:16.000
You will have an approximation of the maximum score.

09:16.000 --> 09:23.000
And dot product can be efficiently computed, because it can be highly optimized using, for example, seemed instruction.

09:23.000 --> 09:26.000
So the main steps of movera are three.

09:26.000 --> 09:32.000
You have the space partitioning one, dimensionality reduction, and repeat of first and second step.

09:32.000 --> 09:35.000
For each of them, you have a parameter, which is kcm.

09:35.000 --> 09:40.000
It's telling you how much you want to divide your space.

09:40.000 --> 09:47.000
And then the number of repetition, which by default, is 10.

09:47.000 --> 09:51.000
So let's suppose you have the following multivector.

09:51.000 --> 09:57.000
You have five vectors, and each of them has dimensionality 128.

09:57.000 --> 10:03.000
If you are using, for example, hnsw without movera, you have to add all those vectors just for one document.

10:03.000 --> 10:09.000
So this can be expensive, because hnsw is fully in memory, so this can bring the cos up.

10:09.000 --> 10:12.000
So the first one is the space partitioning.

10:12.000 --> 10:17.000
And here we are using a technique that is called cmesh, and it's based on locality sensitive ashing.

10:17.000 --> 10:22.000
So in other words, you are based on kcm, which in this case is tree.

10:22.000 --> 10:25.000
You are sampling tree random Gaussian vector.

10:25.000 --> 10:31.000
And whenever you want to find corresponding cluster for a vector, for example, for a vector one,

10:31.000 --> 10:37.000
you will be computing the dot product between vector one and those Gaussian vector, those three Gaussian vector.

10:37.000 --> 10:45.000
Based on the sign of the dot product, so for example, for v1, you have all negative values.

10:45.000 --> 10:49.000
You will encode it in 0 0 0, which is cluster 0.

10:49.000 --> 10:56.000
For example, for vector 5, in this case, you may have that the first two dot product were negative.

10:56.000 --> 10:58.000
The second, the third one was positive.

10:58.000 --> 11:01.000
So the encoding will be 0 0 1, and you add it to cluster 1.

11:01.000 --> 11:06.000
Here, you could also use cmesh, the main difference is that cmesh requires training.

11:06.000 --> 11:13.000
And in this way, you can do partitioning without having any training data, and it's also data obvious.

11:13.000 --> 11:18.000
So in this case, if you have kcm equals tree, you can end up with eight clusters.

11:18.000 --> 11:21.000
So what you will be doing then is to just flatten them.

11:21.000 --> 11:27.000
In case you have more than one vector in a cluster, you will be applying min pooling.

11:27.000 --> 11:33.000
So for example, for cluster 0, you have v1 and v4, so we will be just compute the min for each entry.

11:33.000 --> 11:39.000
Then you have cluster 1 with just one vector, and as you can see, we have some empty clusters.

11:39.000 --> 11:42.000
The next step is the filling the empty cluster.

11:42.000 --> 11:50.000
So for example, for those that are empty, we are substituting those zeros with the closest non empty cluster.

11:50.000 --> 11:55.000
So for example, in this case for cluster 2, the closest non empty is cluster 1.

11:55.000 --> 11:58.000
So we reported here, also the SIM for a cluster chip.

11:58.000 --> 12:01.000
For 5, the closest one was v3.

12:01.000 --> 12:08.000
As you can see, the length of this vector is number of clusters times dim, so 8 times 128.

12:08.000 --> 12:14.000
So what we want to do in the second step, the dimension introduction, we want to shrink those vector.

12:14.000 --> 12:19.000
So we will have the vector coming from the previous operation.

12:19.000 --> 12:24.000
And here we will be using random matrices that are made of plus 1 and minus 1.

12:24.000 --> 12:31.000
In this case, we have that dimensionality is dim times dproj, so 128 times 16.

12:31.000 --> 12:41.000
So that when you will be computing the vector matrix operation, you will end up with a new cluster of smaller dimensionality in this case dproj.

12:41.000 --> 12:43.000
And you will be just concatenating them.

12:43.000 --> 12:49.000
So the new resulting vector will be number of clusters times 16, not anymore 128.

12:49.000 --> 12:53.000
As you can see, we have done some approximation here and in the previous steps.

12:53.000 --> 13:00.000
So in order to reduce the randomness coming from those approximation, we will repeat this process several times.

13:00.000 --> 13:04.000
So which are the prompts of mover and is it worth to use it?

13:04.000 --> 13:09.000
Actually, if you are dealing with a lot of documents, now we will see, for example, in the demo,

13:09.000 --> 13:13.000
only with one document, but we will see how many vectors we will end up.

13:13.000 --> 13:16.000
So let's suppose you are working with hundreds of thousands document.

13:16.000 --> 13:18.000
You need some sort of filtering approach.

13:18.000 --> 13:23.000
So you can use mover with hnsw in memory to create a candidate set.

13:23.000 --> 13:25.000
And then you can record those values.

13:25.000 --> 13:30.000
But the benefit of mover as well, the first one, the improving important times.

13:30.000 --> 13:34.000
You just need to preprocess those vector and store them in hnsw.

13:34.000 --> 13:39.000
Then you have reduced memory requirements, because if you have to build hnsw in this case,

13:39.000 --> 13:44.000
you will have like five nodes and for each of them you have also to find the connection.

13:44.000 --> 13:48.000
In this case, it's just one node and you find a corresponding connection.

13:48.000 --> 13:53.000
Then one consequence is clearly the worst recall, but this can be fixed with risk coding.

13:53.000 --> 13:56.000
So you will be using mover to build a candidate set.

13:56.000 --> 14:02.000
And only for those vector, you will be reading from this original multivector and compute the maximum score.

14:02.000 --> 14:05.000
But now it's time for them.

14:05.000 --> 14:08.000
Yeah, thank you Roberta.

14:08.000 --> 14:11.000
So for today you have prepared a demo.

14:11.000 --> 14:15.000
If you want to run it locally, here is the link to the project.

14:15.000 --> 14:19.000
We will be using V8 to store the multivector embeddings,

14:19.000 --> 14:24.000
and call it to 0.5 call version model.

14:24.000 --> 14:29.000
So let's first load the model.

14:30.000 --> 14:33.000
As you can see here, we have two methods.

14:33.000 --> 14:39.000
One is for vectorizing the image, and the second one we will use to vectorize the text.

14:39.000 --> 14:43.000
All of those methods are outputting a multivector.

14:43.000 --> 14:46.000
We will try to index this document.

14:46.000 --> 14:50.000
You can see that this is like an NVIDIA investor presentation.

14:50.000 --> 14:52.000
It's a very complex document.

14:52.000 --> 14:57.000
It's a very complex document.

14:57.000 --> 15:02.000
We will try to find the image of text and images.

15:02.000 --> 15:06.000
Also we will make some with some charts.

15:06.000 --> 15:10.000
The idea here is to treat this one page as an image.

15:10.000 --> 15:13.000
And one page will be our one chunk.

15:13.000 --> 15:17.000
And then send that image over to the colloquial model to get the multivector embeddings.

15:17.000 --> 15:22.000
And try to find something on those pages.

15:22.000 --> 15:26.000
So the model is loaded.

15:26.000 --> 15:31.000
Now this is a helper method which we will use to generate the embeddings.

15:31.000 --> 15:34.000
Let's start with it.

15:34.000 --> 15:38.000
Let's try to connect with it.

15:38.000 --> 15:40.000
Let's wait if it's running.

15:40.000 --> 15:41.000
It's running.

15:41.000 --> 15:44.000
So first we need to create a collection.

15:44.000 --> 15:48.000
We are creating a collection with one property page number.

15:48.000 --> 15:54.000
And the most important part is here that we are defining a vector index.

15:54.000 --> 15:56.000
A multivector vector index.

15:56.000 --> 15:59.000
And we are enabling here moveira encoding.

15:59.000 --> 16:03.000
It means that when we will be sending multivector's to evit,

16:03.000 --> 16:09.000
those will get translated automatically to FDs using moveira algorithm.

16:09.000 --> 16:12.000
So let's create a collection.

16:12.000 --> 16:15.000
Let's generate the embeddings for our document.

16:15.000 --> 16:20.000
And now you can see here that we have 32 pages.

16:20.000 --> 16:26.000
And for each page, for each page we have generated a multivector.

16:26.000 --> 16:30.000
That consists of 731 total embeddings.

16:30.000 --> 16:35.000
So in total for 32 pages we have over 23,000 vectors.

16:35.000 --> 16:37.000
But we have moveira.

16:37.000 --> 16:41.000
Those 23,000 vectors will get translated only into 42.

16:41.000 --> 16:46.000
So we can see how much of the reduction is on the resources.

16:46.000 --> 16:50.000
So let's save those multivectors into evit.

16:50.000 --> 16:52.000
And now here's another helper method.

16:52.000 --> 16:58.000
I will use that method to generate the query using the call-con model.

16:58.000 --> 17:00.000
So the query will be passed to it.

17:00.000 --> 17:02.000
It will generate a multivector.

17:02.000 --> 17:06.000
And then I will use this multivector in order to perform a vector search.

17:06.000 --> 17:09.000
And I will just present the first result.

17:09.000 --> 17:14.000
So let's look for a list of countries using AI.

17:14.000 --> 17:18.000
You can see that I got a page which is like a roadmap.

17:18.000 --> 17:23.000
And there are those countries that's something that I was looking for.

17:23.000 --> 17:27.000
Let's look what is in VDS infrastructure roadmap.

17:27.000 --> 17:31.000
Maybe there's an information about that of course it is.

17:31.000 --> 17:38.000
So I just wanted to make a point that those multivector embeddings are good.

17:38.000 --> 17:42.000
Also with pages that consist only of text.

17:42.000 --> 17:48.000
And of course I really go to mixed content.

17:48.000 --> 17:54.000
So let's make another request revenue and income charts.

17:54.000 --> 18:00.000
You can see that we got a page that is describing the exactly what we are searching for.

18:00.000 --> 18:06.000
I think that that would be very hard to achieve using regular embedding models

18:06.000 --> 18:10.000
because it's really hard to extract that data.

18:10.000 --> 18:14.000
So let's make two last queries.

18:14.000 --> 18:18.000
First in Italian and the second one in Polish.

18:18.000 --> 18:24.000
And we will be asking first what are the industries that will benefit the most from AI.

18:24.000 --> 18:28.000
You can see that we got a page that describes those industries.

18:28.000 --> 18:32.000
And they ask what are the plans for dividends.

18:32.000 --> 18:39.000
And you can see that in 2025 there will be 300 over 300 million dollars dividends.

18:39.000 --> 18:45.000
So you can see those models are also really good with multilingual data.

18:45.000 --> 18:49.000
So.

18:49.000 --> 18:51.000
Thanks Marcin for your demo.

18:51.000 --> 18:57.000
So we have just seen how we can build an AI powered OCR pipeline and the steps are pretty simple.

18:57.000 --> 19:03.000
The first one is the extraction of the document page using just like converting it to an image.

19:03.000 --> 19:08.000
Then we use a multivector embedding vision model to create those vector.

19:08.000 --> 19:12.000
And then we can store them in a vector db using the mover encoding.

19:12.000 --> 19:15.000
And then you are ready for your semantic search.

19:15.000 --> 19:19.000
So if you like our presentation, you can connect with us.

19:19.000 --> 19:23.000
And also if you are interested in more details about Movera,

19:24.000 --> 19:30.000
we have a blog post and also we have podcast episode with one of the authors of the paper.

19:30.000 --> 19:33.000
Thank you for your attention.

19:54.000 --> 19:56.000
Yeah, actually this is a good question.

19:56.000 --> 20:01.000
So the question was which is our device for the parameters of Movera.

20:01.000 --> 20:08.000
So by default, we are using KC equal 4, the projection equal 16 and the number of repetition time equal 10.

20:08.000 --> 20:14.000
So in this case, the dimensionality of the vector will be 10 times 16 times 2 to the power of 4.

20:14.000 --> 20:21.000
And as you can see, this is a longer vector with respect to, let's say, the regular one which are just 128.

20:21.000 --> 20:25.000
So in this case, I would say it really depends on your data distribution.

20:25.000 --> 20:32.000
So for example, if you have visually rich documents like where you have charts and tables,

20:32.000 --> 20:35.000
you may want to have them to be accurate as much as possible.

20:35.000 --> 20:37.000
So you can increase those parameter.

20:37.000 --> 20:41.000
And actually something that you can also do on top of them.

20:41.000 --> 20:45.000
There is something vector that you get, the FD is uncompressed.

20:45.000 --> 20:54.000
But if you want, you can also use quantization like I will not suggest for BQ, which is shrinking a lot.

20:54.000 --> 21:00.000
The data dimensionality, but most probably RQ or RQ fill wet fit well here,

21:00.000 --> 21:06.000
where you are just going from 4 bytes for entry to 1 byte for entry.

21:06.000 --> 21:07.000
Yes?

21:08.000 --> 21:17.000
Can you question more against using standard embedding and vector search as we'll pass the level of data fraction of the second byte?

21:17.000 --> 21:26.000
Also, this is a good question. So the question is if we have tried any benchmarking in doing the first stage ratio with single vector and in the second one with multi vector, right?

21:26.000 --> 21:27.000
Okay.

21:27.000 --> 21:30.000
So actually we have also tried something like this.

21:30.000 --> 21:36.000
I didn't see any, let's say benefit in doing this.

21:36.000 --> 21:42.000
Personally, I think the main challenge here will be to deal with those chunking technique.

21:42.000 --> 21:47.000
Because personally, what I prefer more about this type of approach is the simplicity,

21:47.000 --> 21:54.000
because if you are just going with a regular approach, you also have to do like these steps in chunking and layout detection.

21:54.000 --> 21:59.000
And in this case, you can just focus on like one multi vector per page.

22:01.000 --> 22:02.000
Okay.

22:02.000 --> 22:04.000
Why are you listening to some sort of communication with this question?

22:04.000 --> 22:08.000
It's trying to sort of approach with this multi-layer.

22:08.000 --> 22:10.000
It's performance.

22:10.000 --> 22:18.000
It's comparable example to the standard layer of building the HTML connector, which is basically having multiple vectors with the same label.

22:18.000 --> 22:19.000
That's one performance way.

22:19.000 --> 22:24.000
In the other way, of course, it's not actually using multi-modal.

22:25.000 --> 22:38.000
Okay. So the question is, like, how this compares to HLSW with just multi vector, and also with single vector, right?

22:38.000 --> 22:46.000
Well, one of them is the multi vector function, which is basically just having one label or a common label or a multi-layer.

22:46.000 --> 22:49.000
So do you mean like a filter at search?

22:49.000 --> 22:53.000
Like each label that needs used for as filter, or

22:53.000 --> 22:57.000
this kind of, you know, that the scarecrowation is going to be referencing your location.

22:57.000 --> 23:03.000
So plus actually allows you to have multiple vectors in the same label.

23:03.000 --> 23:08.000
Okay. So in that case, you are also doing, let's say building a candidate set for those vector,

23:08.000 --> 23:10.000
and then you will be computing the risk coding.

23:10.000 --> 23:16.000
So for example, in this case, maybe you have that two or more nodes of the HLSW are part of the same document, right?

23:16.000 --> 23:18.000
So they are just true vector.

23:18.000 --> 23:19.000
Okay.

23:19.000 --> 23:24.000
Yeah. So for this, yes, actually, we also have support for this, where if you don't pass, like,

23:24.000 --> 23:30.000
de-movering coding, you can use this. And personally, I think this is slightly better quality,

23:30.000 --> 23:35.000
but the main consequence of this is the memory pressure, because for example, in, in this case,

23:35.000 --> 23:40.000
you have seen, like, you have 700 nodes for just one chunk.

23:40.000 --> 23:44.000
So creating the HLSW since it's fully in memory can be expensive from,

23:45.000 --> 23:48.000
if I don't recognize this thing, you can do something like that.

23:48.000 --> 23:54.000
If you want to use memory to take the risk of vectors.

23:54.000 --> 23:59.000
That's true, but you still need memory for the connection.

23:59.000 --> 24:05.000
Because if you either, you are using BQ or using uncompressed, you still, for example, have the M parameter,

24:05.000 --> 24:09.000
which is ruling you how many connections you have for each node.

24:09.000 --> 24:13.000
So that can be still expensive, but yes, we have the support also for this.

24:13.000 --> 24:23.000
You can use regular, for the multi-mortal one, we have done also some experiments on this.

24:23.000 --> 24:27.000
Personally, I didn't find any big benefits in using those approaches,

24:27.000 --> 24:38.000
compared to just either HLSW with multi-vector or with, HLSW with multi-vector and Movera.

24:38.000 --> 24:39.000
Yes?

24:39.000 --> 24:40.000
Yes.

24:40.000 --> 24:47.000
We have a question about the power and the question of pretty much about the difference

24:47.000 --> 24:53.000
between more than a single vector versus both vector.

24:53.000 --> 24:59.000
Does that mean that it's potentially possible to learn a single vector representation

24:59.000 --> 25:04.000
that are too a B-bote vector if you're not using anymore?

25:05.000 --> 25:07.000
Correct me if I'm wrong.

25:07.000 --> 25:13.000
The question is, can we use this idea of converting multi-vector in single vector

25:13.000 --> 25:18.000
and we do it in the embedding model so we just work with single vector?

25:18.000 --> 25:19.000
Yes.

25:19.000 --> 25:24.000
This is something that can be done and I think it's effective because when you are doing

25:24.000 --> 25:28.000
the training of this embedding model, you will improve the quality.

25:28.000 --> 25:32.000
So that's something I think will definitely benefit.

25:33.000 --> 25:34.000
I cannot do that.

25:34.000 --> 25:40.000
There are documents, screenshots and embedding models that are virtually doing that.

25:40.000 --> 25:50.000
They are creating a one vector out of the multivators and they are just sending you back

25:50.000 --> 25:55.000
that one vector and it's possible with those types of models.

26:02.000 --> 26:07.000
How does it work with tables?

26:07.000 --> 26:12.000
Because the total of the power of that line is the image of the element.

26:12.000 --> 26:15.000
It's all a bit of a bit much.

26:15.000 --> 26:19.000
Like I say, you have a bit of a table on the board that they patched it up

26:19.000 --> 26:21.000
and then you'll compress it in multi-vector.

26:21.000 --> 26:24.000
We use this whole bit as big as we can.

26:24.000 --> 26:26.000
How is it there?

26:26.000 --> 26:29.000
I actually didn't try that,

26:29.000 --> 26:31.000
but

26:31.000 --> 26:35.000
if necessary, your question is how do we handle those tables

26:35.000 --> 26:38.000
because we are working with patches, right?

26:38.000 --> 26:40.000
Okay.

26:40.000 --> 26:41.000
Yes.

26:41.000 --> 26:47.000
I mean, what you can do is to have some sort of overlapping with patches.

26:47.000 --> 26:51.000
We did not try, but I'm pretty sure this will help the benefits

26:51.000 --> 26:54.000
so you have some overlapping between each patch.

26:54.000 --> 26:58.000
So you can, let's say, not lose as much as just doing four,

26:58.000 --> 27:00.000
let's say, straight squares.

27:01.000 --> 27:05.000
It was more, like, how bad is it going to be?

27:05.000 --> 27:08.000
Is there a convertible to vectors and other things?

27:08.000 --> 27:13.000
Oh, actually, something like this specifically, I did not test it.

27:13.000 --> 27:16.000
I'll have a third question.

27:16.000 --> 27:19.000
Okay, thank you very much.

27:19.000 --> 27:21.000
Thank you very much.

27:21.000 --> 27:23.000
Thank you very much.

27:23.000 --> 27:25.000
Thank you very much.

27:25.000 --> 27:27.000
Thank you.

27:27.000 --> 27:29.000
Thank you.

27:29.000 --> 27:31.000
Thank you.

27:31.000 --> 27:33.000
Thank you.

27:33.000 --> 27:35.000
Thank you.

27:35.000 --> 27:37.000
Thank you.

27:37.000 --> 27:39.000
Thank you.

27:39.000 --> 27:41.000
Thank you.

27:41.000 --> 27:43.000
Thank you.

27:43.000 --> 27:45.000
Thank you.