cellmig
analysisprofiles
for different compoundslibrary(cellmig)
library(ggplot2)
library(ggforce)
ggplot2::theme_set(new = theme_bw(base_size = 10))
High-throughput tracking of cells with time-lapse microscopy followed by the acquisition of images at fixed time intervals facilitates the analysis of cell migration across many wells treated under different biological conditions. These workflows generate considerable technical noise and biological variability, and therefore technical and biological replicates are necessary, leading to large, hierarchically structured datasets, i.e., cells are nested within technical replicates that are nested within biological replicates.
Current statistical analyses of such data usually ignore the hierarchical structure of the data and fail to explicitly quantify uncertainty arising from technical or biological variability. To address this gap, we present cellmig, an R package implementing Bayesian hierarchical models for migration analysis. cellmig quantifies condition-specific velocity changes (e.g., drug effects) while modeling nested data structures and technical artifacts, providing uncertainty-aware estimates through credible intervals.
There are currently no Bioconductor packages providing specialized statistical methods for analyzing hierarchical high-throughput cell migration data. cellmig addresses this gap and will represent a valuable addition to the ecosystem.
To install this package, start R (version “4.5”) and enter:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("cellmig")
This is how a typical cell migration data looks like \(\rightarrow\) a table.
Each rows is a cell with the following features:
data("d", package = "cellmig")
str(d)
FALSE 'data.frame': 7560 obs. of 6 variables:
FALSE $ well : chr "1" "1" "1" "1" ...
FALSE $ plate : chr "1" "1" "1" "1" ...
FALSE $ compound: chr "C1" "C1" "C1" "C1" ...
FALSE $ dose : chr "D1" "D1" "D1" "D1" ...
FALSE $ v : num 21.905 0.535 3.348 5.351 1.194 ...
FALSE $ offset : num 1 1 1 1 1 1 1 1 1 1 ...
In this vignette we will use simulated data from:
v
Let’s visualize the data. Each dot represents a cell with its velocity on the y-axis. Each facet corresponds to a compound (e.g., specific drug that may affect cellular velocity). The x-axis represents the dose. There are three plates, indicated by color. Four technical replicates (wells), analyzed on the same plate, are stacked next to each other and have the same color.
ggplot(data = d)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 0.5)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")
Alternatively, we can visualize the well-specific mean velocities to highlight plate-specific batch effects.
dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
facet_wrap(facets = ~paste0("compound=", compound),
scales = "free_y", ncol = 2)+
geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well),
size = 1.5, alpha = 0.7)+
theme_bw()+
theme(legend.position = "top",
strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
ylab(label = "migration velocity")+
xlab(label = '')+
scale_color_grey()+
guides(color = guide_legend(override.aes = list(size = 3)))+
guides(shape = guide_legend(override.aes = list(size = 3)))+
scale_y_log10()+
annotation_logticks(base = 10, sides = "l")
cellmig
analysisWe will use this data to infer the overall treatment effects (parameter \(\delta_t\)), relative to a control treatment (the offset) to correct for plate-specific batch effects. At the same time, cellmig will quantify many different features of the data using its model parameters (e.g., variability between technical or biological replicates; or plate-specific treatment effects (\(\gamma_{pt}\))).
We fit the Stan model employed by cellmig with the
control parameters defined in the list control
. There are many other
input parameters in control
, check the cellmig
function documentation.
o <- cellmig(x = d,
control = list(mcmc_warmup = 300, # nr. of MCMC warmup step?
mcmc_steps = 1000, # nr. of MCMC iteration steps?
mcmc_chains = 2, # nr. of MCMC chains
mcmc_cores = 2)) # nr. of MCMC cores
To extract the means, medians, and 95% Highest Density Intervals (HDIs,
quantifying parameter value uncertainty) of \(\delta_t\), we have to
access the data.frame delta_t
in the output object posteriors
:
str(o$posteriors$delta_t)
FALSE 'data.frame': 35 obs. of 16 variables:
FALSE $ group_id: int 1 2 3 4 5 6 7 8 9 10 ...
FALSE $ mean : num -0.8661 -0.4553 -0.2073 0.0623 0.112 ...
FALSE $ se_mean : num 0.00151 0.00155 0.00144 0.00149 0.00144 ...
FALSE $ sd : num 0.0694 0.0737 0.0698 0.0739 0.0707 ...
FALSE $ X2.5. : num -0.9955 -0.5944 -0.3429 -0.0873 -0.0237 ...
FALSE $ X25. : num -0.9129 -0.5058 -0.2542 0.0119 0.0633 ...
FALSE $ X50. : num -0.8662 -0.4578 -0.2076 0.0645 0.1112 ...
FALSE $ X75. : num -0.82 -0.403 -0.163 0.111 0.16 ...
FALSE $ X97.5. : num -0.732 -0.314 -0.067 0.207 0.247 ...
FALSE $ n_eff : num 2111 2263 2343 2456 2405 ...
FALSE $ Rhat : num 1.001 0.999 1 0.999 0.999 ...
FALSE $ group : chr "C2|D1" "C2|D2" "C2|D3" "C2|D4" ...
FALSE $ compound: chr "C2" "C2" "C2" "C2" ...
FALSE $ dose : chr "D1" "D2" "D3" "D4" ...
FALSE $ plate_id: num 1 1 1 1 1 1 1 1 1 1 ...
FALSE $ plate : chr "1" "1" "1" "1" ...
It is better to visualize the mean \(\delta_t\)s and their 95% HDIs
As compound t=1 was selected as control (by setting offset=1), the treatment effects of this compounds are not shown.
ggplot(data = o$posteriors$delta_t)+
geom_line(aes(x = dose, y = mean, col = compound, group = compound))+
geom_point(aes(x = dose, y = mean, col = compound))+
geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5.,
col = compound), width = 0.1)+
ylab(label = expression("Overall treatment effect ("*delta*")"))+
theme(legend.position = "top")
profiles
for different compoundsFor “rectangular datasets”, i.e. datasets with multiple compounds and overlapping doses, we can study the treatment dose-response profiles by hierarchical clustering based on the complete posteriors of \(\delta_t\), account for uncertainty in this parameter.
Panel A: dendrogram constructed by hierarchical clustering with average linkage, based on euclidean distances between vectors of \(\delta_t\) (shown in panel B) of each compound (leaf) across doses. Branch support values show branch robustness (label = 1000 implies this branch was encountered in each of the 1000 dendrograms constructed from the posterior of \(\delta_t\)). Plate-specific treatment dose-responses based on parameters \(\gamma_pt\).
get_dose_response_profile(x = o)+
patchwork::plot_layout(widths = c(.7, 1, 4))
Pairwise dot-plot comparison \(\rightarrow\) x minus y axis
(Left panel) Differences in overall treatment effects. Log fold change (LFC; described by parameter \(\rho_{ij}\)) between overall treatments effects (\(\delta_t\)) of row (\(i\)) vs. column (\(j\)) treatment groups. Tile colors and labels represent \(\rho_{ij}\). (Right panel) Probability of differential treatment effect described by parameter \(\pi_{ij}\). Tile colors and labels represent \(\pi_{ij}\).
u <- get_pairs(x = o, exponentiate = FALSE)
u$plot
FALSE NULL
from_groups
vs. to_group
).get_groups(x = o)
FALSE group_id group compound dose
FALSE 1 1 C2|D1 C2 D1
FALSE 2 2 C2|D2 C2 D2
FALSE 3 3 C2|D3 C2 D3
FALSE 4 4 C2|D4 C2 D4
FALSE 5 5 C2|D5 C2 D5
FALSE 6 6 C2|D6 C2 D6
FALSE 7 7 C2|D7 C2 D7
FALSE 8 8 C3|D1 C3 D1
FALSE 9 9 C3|D2 C3 D2
FALSE 10 10 C3|D3 C3 D3
FALSE 11 11 C3|D4 C3 D4
FALSE 12 12 C3|D5 C3 D5
FALSE 13 13 C3|D6 C3 D6
FALSE 14 14 C3|D7 C3 D7
FALSE 15 15 C4|D1 C4 D1
FALSE 16 16 C4|D2 C4 D2
FALSE 17 17 C4|D3 C4 D3
FALSE 18 18 C4|D4 C4 D4
FALSE 19 19 C4|D5 C4 D5
FALSE 20 20 C4|D6 C4 D6
FALSE 21 21 C4|D7 C4 D7
FALSE 22 22 C5|D1 C5 D1
FALSE 23 23 C5|D2 C5 D2
FALSE 24 24 C5|D3 C5 D3
FALSE 25 25 C5|D4 C5 D4
FALSE 26 26 C5|D5 C5 D5
FALSE 27 27 C5|D6 C5 D6
FALSE 28 28 C5|D7 C5 D7
FALSE 29 29 C6|D1 C6 D1
FALSE 30 30 C6|D2 C6 D2
FALSE 31 31 C6|D3 C6 D3
FALSE 32 32 C6|D4 C6 D4
FALSE 33 33 C6|D5 C6 D5
FALSE 34 34 C6|D6 C6 D6
FALSE 35 35 C6|D7 C6 D7
u <- get_violins(x = o,
from_groups = get_groups(x = o)$group,
to_group = "C2|D1",
exponentiate = FALSE)
u$plot
To assess model validity, we performed posterior predictive checks, which showed that the simulated data (pink violin) were consistent with the observed data (black violins). Each dot is a cells.
g <- get_ppc_violins(x = o, wrap = TRUE, ncol = 3)
g+scale_y_log10()
Using posterior predictive checks we compared the mean simulated velocity per well (y-axis) with the observed mean per well (x-axis). Each dot is a well.
g <- get_ppc_means(x = o)
g
g_alpha_p <- ggplot(data = o$posteriors$alpha_p)+
geom_errorbarh(aes(y = plate_id, x = mean, xmin = X2.5., xmax = X97.5.),
height = 0.1)+
geom_point(aes(y = plate_id, x = mean))
g_sigma <- ggplot()+
geom_errorbarh(data = o$posteriors$sigma_bio,
aes(y = "sigma_bio",
x = mean, xmin = X2.5., xmax = X97.5.),
height = 0.1)+
geom_errorbarh(data = o$posteriors$sigma_tech,
aes(y = "sigma_tech",
x = mean, xmin = X2.5., xmax = X97.5.),
height = 0.1)+
geom_point(data = o$posteriors$sigma_bio,
aes(y = "sigma_bio", x = mean))+
geom_point(data = o$posteriors$sigma_tech,
aes(y = "sigma_tech", x = mean))+
ylab(label = '')
g_alpha_p|g_sigma
sessionInfo()
FALSE R version 4.5.1 Patched (2025-08-23 r88802)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.3 LTS
FALSE
FALSE Matrix products: default
FALSE BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
FALSE
FALSE locale:
FALSE [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
FALSE [3] LC_TIME=en_GB LC_COLLATE=C
FALSE [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
FALSE [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
FALSE [9] LC_ADDRESS=C LC_TELEPHONE=C
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
FALSE
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE
FALSE attached base packages:
FALSE [1] stats graphics grDevices utils datasets methods base
FALSE
FALSE other attached packages:
FALSE [1] ggforce_0.5.0 ggplot2_4.0.0 cellmig_0.99.16 BiocStyle_2.37.1
FALSE
FALSE loaded via a namespace (and not attached):
FALSE [1] gtable_0.3.6 xfun_0.53 bslib_0.9.0
FALSE [4] QuickJSR_1.8.0 inline_0.3.21 lattice_0.22-7
FALSE [7] vctrs_0.6.5 tools_4.5.1 generics_0.1.4
FALSE [10] yulab.utils_0.2.1 stats4_4.5.1 curl_7.0.0
FALSE [13] parallel_4.5.1 tibble_3.3.0 pkgconfig_2.0.3
FALSE [16] ggplotify_0.1.2 RColorBrewer_1.1-3 S7_0.2.0
FALSE [19] RcppParallel_5.1.11-1 lifecycle_1.0.4 compiler_4.5.1
FALSE [22] farver_2.1.2 stringr_1.5.2 treeio_1.33.0
FALSE [25] tinytex_0.57 codetools_0.2-20 ggtree_3.17.1
FALSE [28] ggfun_0.2.0 htmltools_0.5.8.1 sass_0.4.10
FALSE [31] yaml_2.3.10 lazyeval_0.2.2 pillar_1.11.0
FALSE [34] jquerylib_0.1.4 tidyr_1.3.1 MASS_7.3-65
FALSE [37] cachem_1.1.0 magick_2.9.0 StanHeaders_2.32.10
FALSE [40] nlme_3.1-168 rstan_2.32.7 aplot_0.2.9
FALSE [43] tidyselect_1.2.1 digest_0.6.37 stringi_1.8.7
FALSE [46] dplyr_1.1.4 reshape2_1.4.4 purrr_1.1.0
FALSE [49] bookdown_0.44 labeling_0.4.3 polyclip_1.10-7
FALSE [52] fastmap_1.2.0 grid_4.5.1 cli_3.6.5
FALSE [55] magrittr_2.0.4 patchwork_1.3.2 loo_2.8.0
FALSE [58] dichromat_2.0-0.1 pkgbuild_1.4.8 ape_5.8-1
FALSE [61] withr_3.0.2 rappdirs_0.3.3 scales_1.4.0
FALSE [64] rmarkdown_2.29 matrixStats_1.5.0 gridExtra_2.3
FALSE [67] evaluate_1.0.5 knitr_1.50 V8_7.0.0
FALSE [70] gridGraphics_0.5-1 rstantools_2.5.0 rlang_1.1.6
FALSE [73] Rcpp_1.1.0 glue_1.8.0 tidytree_0.4.6
FALSE [76] tweenr_2.0.3 BiocManager_1.30.26 jsonlite_2.0.0
FALSE [79] R6_2.6.1 plyr_1.8.9 fs_1.6.6