WEBVTT 00:00.000 --> 00:11.280 By break is over, on to the next talk, and we've been really fortunate to have a really 00:11.280 --> 00:20.240 nice spread of topics covered, so now we move on to, well, polytopes, or metabolic flux, 00:20.240 --> 00:37.040 due to side, and my place to introduce Zalion. Is it copoulos? OK. Thanks a lot. So, today, I would 00:37.040 --> 00:44.920 like to introduce Dingo, which is a Python package for a sampling on a metabolic networks. 00:45.880 --> 00:53.680 OK. So, I will start with some intro, probably most of you know it, but it's a good 00:53.680 --> 00:59.240 background introduction. So, probably you know that in our cells, they're having a thousand 00:59.240 --> 01:06.280 of reactions in every cell. So, here we are interested in the input and the output of those 01:06.280 --> 01:14.040 reactions, what we call them metabolites. And of course, you can model all these reactions and 01:14.040 --> 01:24.440 the metabolites that take place on these reactions as a net cork. OK. So, those net corks, we are 01:24.440 --> 01:31.960 interested in the flow rate, the rate of a reaction, the rate that happens inside one reaction, 01:31.960 --> 01:41.480 that we call it flow. OK. So, if you see the reactions are the rows, and then this VI are the 01:41.640 --> 01:50.920 fluxes, all the flows. OK. So, you can also imagine all this set of flows, that we can create 01:52.520 --> 02:01.960 a flow vector. OK. So, this can be modelled as a bolt-up, and this bolt-up is a feasible set, 02:01.960 --> 02:07.800 and the feasible set contains all the fluxes that balance is the net cork. So, I'm 02:07.880 --> 02:15.560 skipping the linear algebra here, but you can model it as a linear system, actual linear inequalities, 02:15.560 --> 02:22.920 where all the feasible solutions of this inequalities are the fluxes that balances my net cork. 02:22.920 --> 02:31.000 So, I want to study all this space. OK. Of course, net corks in real life are like this. So, 02:31.000 --> 02:37.960 this is a recon one, the human metabolic network. As far as I know, there are also two more 02:39.960 --> 02:47.240 larger versions, like recon two or three that are even larger. OK. And all this net cork will 02:47.240 --> 02:55.160 become for us a convex molecule. So, if we're, OK, imagine that box, this is a convex 02:55.160 --> 03:02.600 volt-up in a three-day space, but here you have as many dimensions as our reaction. So, 03:02.600 --> 03:11.320 can be thousands of dimensions. OK. And this volt-up is so the red one represents the balance. 03:11.320 --> 03:19.560 So, the steady states. OK. Now, if we want to do some optimization, like the optimal steady states, 03:19.640 --> 03:26.520 with respect to, for example, some biomass objective function, then we have to do some optimization, 03:26.520 --> 03:32.920 which is simple linear optimization. This is the method that called FBA. So, in the 03:32.920 --> 03:40.520 polytop, this means that you would like to find the blue thing, which is a facet, like let's say if 03:40.520 --> 03:48.840 you have a box, it's one facet of the box. This is optimal according to the biomass. So, 03:48.840 --> 03:56.120 there, all the fluxes are optimal with respect to this object. OK. So, this is one, let's say, 03:56.120 --> 04:06.680 biased way of studying this, because FBA will give you one vertex of this blue facet that corresponds 04:06.680 --> 04:14.760 to the optimal state. The other way is the unbiased. So, we can do some link on this facet, 04:15.400 --> 04:22.200 and take all the possible, let's say, this will cover all the possible stages that are optimal. 04:23.160 --> 04:31.160 So, this is the last figure. So, imagine that you do some, in this case, uniform sampling on this facet, 04:31.160 --> 04:42.200 and then it's point of the sampling represent some optimal flux. OK. In order to do it, 04:42.200 --> 04:50.200 we, we build a package called Dingo. It's a Python, it's written Python. It has several sampling 04:50.200 --> 04:59.480 and rounding algorithms, which are based on a C++ library called Valeste. So, the library that 04:59.480 --> 05:07.160 we also maintain, and everything that it needs performance is on C++. So, Python contains 05:07.800 --> 05:14.760 the bindings, and also some extra functions, like loading models from different standards, 05:15.560 --> 05:23.160 loading utilities, and do some statistics like computing copulus or joint distribution, 05:23.160 --> 05:30.280 and things like that. OK. So, yes, this also published in by format some vansis. If you want to see 05:30.360 --> 05:40.120 the paper, and we are a small team working on this problem. So, let's, so we case, 05:41.960 --> 05:49.640 how you can use Dingo to start these fluxes. So, in the first line, you, you just import 05:51.880 --> 05:58.840 metabolic numbers, and the sampler from the from Dingo, and then you create a model, 05:58.920 --> 06:06.680 or the second line, this is a simple model from the, they call it. So, this is quite small. If 06:06.680 --> 06:14.520 you run it and your computer, this will be really fast. Then you can do, we do FBA. This is the, 06:14.520 --> 06:22.840 let's say, the bias method. This will give us one value, one flux value, that it's optimal. 06:23.800 --> 06:29.800 And then we try to, to sample. So, we create a sampler, like Paul, to a sampler given the model, 06:30.360 --> 06:38.040 and the, the, the, the Chevy, the computer is not Chevy, part is the line that, 06:38.920 --> 06:45.640 computer state states, like sampler generates state states. So, there, ESS is a statistical thing that, 06:45.640 --> 06:56.120 uh, somehow tells you that we want 3,000 samples that are somehow statistical and biased. 06:56.120 --> 07:01.240 So, and we want to, to have 3,000 points that are uniform. Let's say, on this, 07:01.240 --> 07:06.840 fast it, on some high dimensional space. Okay, and then we plot the histogram. So, in the, in the 07:06.840 --> 07:12.760 histogram, you can see the blue line is FBA, the information that FBA gives you. And the histogram 07:12.840 --> 07:19.560 is all the samples. The, the flux is a fusion, we compute from the samples. And this is, 07:20.600 --> 07:25.800 uh, from one reaction in the network, and this is from another reaction. So, even from this, 07:27.000 --> 07:35.560 you can understand that FBA can give you a different information. Okay, um, okay, you can do it for, 07:35.560 --> 07:41.240 all the reactions, and you can do it for different networks. Okay, now, the other thing that you can 07:41.320 --> 07:49.720 do is that, uh, you can also start the, uh, how reactions, um, the, the connection between the 07:49.720 --> 07:55.960 dependencies between the reactions. And in order to do it, we, we use copulas, which is, um, when you, 07:55.960 --> 08:03.240 you start it's own, uh, uh, Martin at distributions. So, here, I select two different reactions from 08:03.240 --> 08:09.880 my network, uh, account and, uh, PPC, and I compute, uh, the copula again with, uh, sampling. So, 08:10.840 --> 08:16.200 and then I have this graph. So, this means that when the one, the reaction, the, the flux in one, 08:16.200 --> 08:22.840 a reaction goes up, the other also goes up. That could be, uh, also different. So, this is a way 08:22.840 --> 08:31.560 to study, um, uh, the dependence, uh, the, and bias dependence between, uh, two different, uh, reactions. 08:31.640 --> 08:43.560 On my network. Okay, and then one thing that you could also do, uh, is, okay, this is maybe a 08:43.560 --> 08:50.440 high shot, uh, that you can, you can use this to do some targeting. Uh, what we did is that we, we took 08:50.440 --> 08:57.000 a paper from rents at all, that, uh, they generate a host virus network. They're going to study, 08:57.960 --> 09:05.640 um, the actions that are related to COVID in order to create a drug. Uh, so here you have two objective 09:05.640 --> 09:10.680 functions. They model it having two objective functions, the human biomass and the virus grows. 09:11.800 --> 09:18.200 And then they compute the FBA. So, what we did is, we said, okay, instead of FBA, let's say compute, 09:18.200 --> 09:26.280 let's say, let's do some sampling. Okay, um, what they show in FBA, that, uh, all, 09:27.080 --> 09:31.640 most of the reactions have the same FBA, but one reaction have a different FBA. So what we do with 09:31.640 --> 09:39.080 sampling is that, uh, we look at the flux rate, uh, the distribution, uh, of the host biomass 09:39.080 --> 09:45.400 and the virus growth rate. So one is human with COVID and the other is human without COVID. And we 09:45.400 --> 09:52.600 look at the, if the flux does not change, means that this reaction probably, uh, does not have to 09:52.680 --> 09:58.200 do with COVID or it's not a target that we want to study. But if we have something like this to 09:58.200 --> 10:05.080 different flux, then this means that this reaction is doing something in my organism. So the question 10:05.080 --> 10:11.240 here, uh, that we could raise is that, uh, is it possible that sampling can give you more information 10:11.240 --> 10:21.080 than FBA? It seems that uncertainty gives you, but can we use it for targeting? Okay, uh, so this 10:21.080 --> 10:27.880 is my last slide. I would like to, uh, give some notes about what is the current and future work. 10:27.880 --> 10:34.440 We have some Google summer, of course, the projects. Uh, two of them, uh, the last year is 10:34.440 --> 10:41.080 one, uh, sampling from the boundary. So now we sample inside from the, the, the, but it seems that 10:41.080 --> 10:46.520 it's interesting for some applications, uh, that you should sample from the boundary. So we had 10:46.600 --> 10:54.120 one student that, uh, implemented the boundary sampling, which is more, let's say, difficult to, 10:54.120 --> 10:58.920 do that. We don't have a lot of values or sample from the boundary. Let's say that it's not 10:58.920 --> 11:04.920 convex as a problem. Uh, and then there is another student that was doing, uh, statistical analysis. 11:04.920 --> 11:12.360 So if you remember the copula, this is, you can imagine that every cell of this, uh, matrix is 11:12.440 --> 11:19.320 one copula. But instead of loading the copula, we just have a patient correlation and we, we'll 11:19.320 --> 11:25.640 be the, uh, we just draw the value here. So this is somehow the connections between all the, 11:25.640 --> 11:34.200 all the reactions, uh, in a network. Uh, yes, and this is, yes, this is in, uh, in the pockets. 11:34.200 --> 11:41.320 You can use it. Um, yeah. So that's it. Uh, the repository, it's on GitHub. Uh, we have a call 11:41.400 --> 11:49.960 up a notebook that you can run all of these and, uh, take and there will also have a, uh, paper, uh, 11:49.960 --> 11:59.720 pockets. Uh, that's all. Thanks a lot. And we, and Olga, would you like to start 11:59.720 --> 12:10.920 coming to set up? Uh, questions. So this was probably the first tool that we had with more 12:10.920 --> 12:23.240 questions. So I tried to remove them. It's very hard. I mean, just that this is, um, either 12:23.240 --> 12:28.440 sort of static solution versus numerical simulation to get a, uh, more of a more broad space 12:28.440 --> 12:36.120 solutions. Um, so what sort of compute would work better for the, the copula, uh, computation? 12:36.120 --> 12:43.320 So if you're doing all dimensions or reactions against all reactions, uh, possible, is there a parallel, 12:43.320 --> 12:50.120 is there fully parallelized? So it depends on the network. So for RECON 3D, uh, which is the most 12:50.200 --> 12:57.080 advanced model, uh, even sampling, one sampling means that you, you get, uh, one copula, 12:59.320 --> 13:05.960 used to take, uh, okay, maybe days. So for the whole matrix, I don't know. Yeah. Um, 13:07.960 --> 13:18.440 but, uh, okay, okay, okay, okay. But, uh, yes, probably you can, probably you can, because, um, 13:18.520 --> 13:24.920 if you have the sample, then you can do the, you can have the matrix. Yes. So the sample gives you, 13:24.920 --> 13:28.920 let's say that in the sample, you have a point that it's 1,000 dimensional, 13:29.560 --> 13:34.600 every coordinate corresponds to a reaction. So if you have the sample, you have everything. 13:35.720 --> 13:41.400 So let's say, yeah, in a day, it's not bad. All right. Thank you very much. Thanks.