The purpose of this package is to discover directionality in the changes in a time or pseudo-time series of gene expression.
A direction in N-dimensional space is a point on the (N-1)-sphere. That is, a direction in 2-dimensional space is a point on the circle (i.e., the circumference of the circle), a direction in 3-dimensional space is a point on the sphere (i.e., the surface of a solid ball), and so on in higher dimensions. A direction in 25000-dimensional gene expression space is a point on the 24999-dimensional sphere in this space.
If we have a trajectory in some space, we can look out from the starting point and follow the direction of that trajectory on the sphere as seen from our vantage point. This will give us a collection of points on the sphere. If that trajectory is moving in some well-defined direction, those points on the sphere should be close together. We’ll say more in a moment as to how to determine whether these points are close.
We illustrate here a path in 3-dimensional space, the projection of its points onto the 2-sphere and the center of a circle which minimizes mean distance to these points and the circle representing this mean distance. We can see that some of the early steps produce outliers on the sphere.
{r, echo=FALSE, out.width="50%", fig.cap="A path and its projection to the sphere."} knitr::include_graphics("pathAndSphere.png")
This approach is very general. Given gene expression data organized in a time series we can apply these methods to the gene expression, to normalized gene expression or to any chosen dimension of a principal component analysis of the data set.
Suppose we have a matrix latexX with latexN rows and latexD columns. Here we consider each row as a time point and each each column as a feature which could be expression or normalized expression of a given gene or a principal component. We therefore have a path consisting of latexN points in latexD dimensional space. In our experience, biological trajectories are capable of changing direction, so we may wish to query the data for directionality starting at the latexkth time point. Let latexxi be the latexith row of latexX. We will consider the matrix latexX′ starting with the latexkth row of latexX. We view the successive points from latexxk and consider the successive vectors latexy1=xk+1−xk, latexy2=xk+2−xk, …, latexym=xN−xk. (Thus latexm=N−k−1.) Taking latexpi=yi||yi|| gives latexm points on the unit sphere which represent the directionality of the path latexxk,…,xN.
We would like to measure how compactly a set of points latexp1,…,pm sit on the sphere. To to this we would like to find a center for these points and a circle (more generally as latexN−2 dimensional sphere) which measures their collective separation from that center. The measure of their collective distance from their center could be their median, their mean or their maximum distance from this center. The center is chosen to minimize whichever summary statistic we’re using. Distance in this context is spherical distance, i.e., the angular distance between points as viewed from the center of the sphere. That is, we take latexμ to be one of the functions median, mean or max. The center is then latexC=argminxμ(d(x,p1),d(x,p2),…,d(x,pm)) and latexRμ=μ(d(x,p1),d(x,p2),…,d(x,pm)) In this way, we get a map latexX′↦Rμ and we take latexRμ as our measure of the clustering of the points latexpi and therefore of the directionality of latexX′.
We would like to test whether this clustering is statistically significant. To ask whether the points latexpi are close is to ask, close compared to what? We do this by producing K randomized paths latexY1,…,YK modeled on latexX′. For each of these we compute latexRμ,i. A p-value for the directionality of latexX′ comes from the ranking of latexRμ among latexRμ,1,…,Rμ,K. If it is less than all but 5% of these, you have detected a p-value of p<0.05.
This package offers several methods for producing a set of randomized paths based on a path latexX. They are roughly divided into two methods. One produces a random path by taking random steps in the ambient space of latexX and one produces random paths by permuting the entries in latexX.
library(TrajectoryGeometry)
c('bySteps','preserveLengths')
randomizationParams = generateRandomPaths(path=straight_path,
Y =randomizationParams=randomizationParams,
N=10)
c('bySteps','preserveLengths','nonNegative')
randomizationParams = generateRandomPaths(path=straight_path,
Y =randomizationParams=randomizationParams,
N=10)
c('byPermutation','permuteAsMatrix')
randomizationParams = c('byPermutation','permuteWithinColumns') randomizationParams =
The answer here is not necessarily. Consider a path which starts out at the origin and the heads out along the x-axis. Perhaps is gets a little ways out and then takes to wandering back and forth. As long as it never quite gets back to the origin, the methods we have described will show this path as highly directional. Accordingly if you have determined that your path has strong directionality using these methods, it is appropriate to check for progression in that direction. This can be done with the function pathProgression()
pathProgression(straight_path,direction=straight_path_center)
progress = pathProgression(crooked_path,from=6,direction=crooked_path_center) progress =
A related issue which can also be checked using pathProgression() is the instability of behavior in the neighborhood of the starting point of a path. Consider a path which starts at a point latexP0 and after oscillating small distances in the neighborhood of latexP0 proceeds in a highly directional manner towards a point latexPk. We would like to see this as a highly directional path. However, the oscillations around latexP0, no matter how small, can in projection occupy large portions of the sphere. Accordingly testPathForDirectionality() may be bamboozled out of detecting directionality. A prophylaxis against this is the following:
oscillation[nrow(straight_path),] - oscillation[1,]
direction = pathProgression(oscillation,direction=direction) progress =
This will allow you to detect a portion of the path which does not depart from the neighborhood of its starting point. You can then use the from parameter of testPathForDirectionality() to eliminate these oscillations from consideration. Note that we did not need to normalize direction as that is done within pathProgression().