Contents

library(cellmig)
library(ggplot2)
library(ggforce)
ggplot2::theme_set(new = theme_bw(base_size = 10))

1 Background

High-throughput tracking of cells with time-lapse microscopy enables the analysis of cell migration across many wells treated under different conditions. Such experiments generate substantial technical and biological variability, making technical and biological replicates necessary. This leads to hierarchically structured datasets: cells are nested within technical replicates, which in turn are nested within biological replicates.

Most current statistical analyses ignore the hierarchical structure and do not explicitly quantify uncertainty from technical or biological variability. The Bioconductor package cellmig addresses this gap by implementing Bayesian hierarchical models for cell migration analysis. It quantifies condition-specific changes in migration speed (e.g., drug effects) while modeling nested data structures, producing uncertainty-aware estimates through credible intervals.

Currently, there are no other Bioconductor packages specialized for hierarchical high-throughput cell migration data analysis. cellmig addresses this gap and integrates naturally into the Bioconductor ecosystem.

2 Installation

To install this package, start R (version “4.5”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cellmig")

3 Data

This is how a typical cell migration data looks like \(\rightarrow\) a table.

Each row is a cell with the following features:

data("d", package = "cellmig")
str(d)
FALSE 'data.frame': 7560 obs. of  6 variables:
FALSE  $ well    : chr  "1" "1" "1" "1" ...
FALSE  $ plate   : chr  "1" "1" "1" "1" ...
FALSE  $ compound: chr  "C1" "C1" "C1" "C1" ...
FALSE  $ dose    : chr  "D1" "D1" "D1" "D1" ...
FALSE  $ v       : num  21.905 0.535 3.348 5.351 1.194 ...
FALSE  $ offset  : num  1 1 1 1 1 1 1 1 1 1 ...

In this vignette we will use simulated data from:

4 Visualizing raw cell speed

Each dot represents a cell; y-axis is velocity. Facets represent compounds, x-axis represents dose. Plate is indicated by color. Technical replicates (wells) are stacked next to each other and have the same color.

ggplot(data = d)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 0.5)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration speed")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

4.1 Mean migration speed per well

VIsualizing mean speed within wells highlights plate-specific batch effects.

dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 1.5, alpha = 0.7)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration speed")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

5 cellmig analysis

We will use this data to infer the overall treatment effects (parameter \(\delta_t\)), relative to a control treatment (the offset) to correct for plate-specific batch effects. At the same time, cellmig will quantify many different features of the data using its model parameters (e.g., variability between technical or biological replicates; or plate-specific treatment effects (\(\gamma_{pt}\))).

5.1 Model fitting

We fit the Stan model employed by cellmig with the control parameters defined in the list control. There are many other input parameters in control, check the cellmig function documentation.

o <- cellmig(x = d,
             control = list(mcmc_warmup = 300, # nr. of MCMC warmup step?
                            mcmc_steps = 1000, # nr. of MCMC iteration steps?
                            mcmc_chains = 2,   # nr. of MCMC chains
                            mcmc_cores = 2))   # nr. of MCMC cores

5.2 What are the overall treatment effects (\(\delta_t\)) on speed?

To extract the means, medians, and 95% Highest Density Intervals (HDIs, quantifying parameter value uncertainty) of \(\delta_t\), we have to access the data.frame delta_t in the output object posteriors:

str(o$posteriors$delta_t)
FALSE 'data.frame': 35 obs. of  16 variables:
FALSE  $ group_id: int  1 2 3 4 5 6 7 8 9 10 ...
FALSE  $ mean    : num  -0.862 -0.448 -0.205 0.064 0.118 ...
FALSE  $ se_mean : num  0.00158 0.00175 0.00161 0.00161 0.00145 ...
FALSE  $ sd      : num  0.0694 0.0702 0.0723 0.07 0.0712 ...
FALSE  $ X2.5.   : num  -0.9928 -0.5779 -0.3523 -0.0739 -0.0241 ...
FALSE  $ X25.    : num  -0.9095 -0.4991 -0.2537 0.0171 0.0716 ...
FALSE  $ X50.    : num  -0.8608 -0.4481 -0.2044 0.0624 0.1143 ...
FALSE  $ X75.    : num  -0.815 -0.402 -0.155 0.109 0.164 ...
FALSE  $ X97.5.  : num  -0.7253 -0.3085 -0.0658 0.1984 0.2565 ...
FALSE  $ n_eff   : num  1920 1602 2011 1896 2393 ...
FALSE  $ Rhat    : num  0.999 0.999 0.999 0.999 0.999 ...
FALSE  $ group   : chr  "C2|D1" "C2|D2" "C2|D3" "C2|D4" ...
FALSE  $ compound: chr  "C2" "C2" "C2" "C2" ...
FALSE  $ dose    : chr  "D1" "D2" "D3" "D4" ...
FALSE  $ plate_id: num  1 1 1 1 1 1 1 1 1 1 ...
FALSE  $ plate   : chr  "1" "1" "1" "1" ...

5.3 Visualizing \(\delta_t\)

It is better to visualize the mean \(\delta_t\)s and their 95% HDIs

  • Dot: Posterior mean of \(\delta_t\)
  • Error bar: 95% highest density interval (HDI) of \(\delta\)
  • \(\exp(\delta)\): Fold change in cell speed relative to control

As compound t=1 was selected as control (by setting offset=1), the treatment effects of this compounds are not shown.

ggplot(data = o$posteriors$delta_t)+
  geom_line(aes(x = dose, y = mean, col = compound, group = compound))+
  geom_point(aes(x = dose, y = mean, col = compound))+
  geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5., 
                    col = compound), width = 0.1)+
  ylab(label = expression("Overall treatment effect ("*delta*")"))+
  theme(legend.position = "top")

5.4 Dose-response profiles

For “rectangular datasets”, i.e. datasets with multiple compounds and overlapping doses, we can study the treatment dose-response profiles by hierarchical clustering based on the complete posteriors of \(\delta_t\), account for uncertainty in this parameter.

Panel A: dendrogram constructed by hierarchical clustering with average linkage, based on euclidean distances between vectors of \(\delta_t\) (shown in panel B) of each compound (leaf) across doses. Branch support values show branch robustness (label = 1000 implies this branch was encountered in each of the 1000 dendrograms constructed from the posterior of \(\delta_t\)). Plate-specific treatment dose-responses based on parameters \(\gamma_pt\).

  • Dot in panel B/C: Posterior mean of \(\delta_t\) and \(\gamma_pt\)
  • Error bar: 95% highest density interval (HDI)
get_dose_response_profile(x = o, exponentiate = TRUE)+
  patchwork::plot_layout(widths = c(.7, 1, 4))

5.5 Pairwise comparisons of treatment effects

Pairwise dot-plot comparison \(\rightarrow\) x minus y axis

(Left panel) Differences in overall treatment effects. Log fold change (LFC; described by parameter \(\rho_{ij}\)) between overall treatments effects (\(\delta_t\)) of row (\(i\)) vs. column (\(j\)) treatment groups. Tile colors and labels represent \(\rho_{ij}\). (Right panel) Probability of differential treatment effect described by parameter \(\pi_{ij}\). Tile colors and labels represent \(\pi_{ij}\).

  • x/y-axis treatment groups (combinations of compounds and doses)
  • \(\rho\): Difference between treatment groups at y-x axis.
  • \(\pi\): probability of observing either a completely positive or negative \(\rho\)
u <- get_pairs(x = o, exponentiate = FALSE)
u$plot
FALSE NULL

5.6 Violin plots of pairwise differences

  • from_groups: vector of treatment groups to consider (combinations of compounds and doses)
  • to_group: target treatment group
  • violins show the posterior distributions of the differences (\(\rho\): each element from from_groups vs. to_group).
  • label: probability, \(\pi\), of observing completely positive or negative \(\rho\)
str(get_groups(x = o))
FALSE 'data.frame': 35 obs. of  4 variables:
FALSE  $ group_id: int  1 2 3 4 5 6 7 8 9 10 ...
FALSE  $ group   : chr  "C2|D1" "C2|D2" "C2|D3" "C2|D4" ...
FALSE  $ compound: chr  "C2" "C2" "C2" "C2" ...
FALSE  $ dose    : chr  "D1" "D2" "D3" "D4" ...
u <- get_violins(x = o, 
                 from_groups = get_groups(x = o)$group,
                 to_group = "C2|D1",
                 exponentiate = FALSE)
u$plot

5.7 Posterior predictive checks (PPCs)

5.7.1 Compare simulated data to observed data at the cell level

To assess model validity, we performed posterior predictive checks, which showed that the simulated data (pink violin) were consistent with the observed data (black violins). Each dot is a cell.

g <- get_ppc_violins(x = o, wrap = TRUE, ncol = 3)
g+scale_y_log10()

5.7.2 Compare mean velocity per well

Using posterior predictive checks we compared the mean simulated speed per well (y-axis) with the observed mean per well (x-axis). Each dot is a well.

g <- get_ppc_means(x = o)
g

5.8 Other model parameters

g_alpha_p <- ggplot(data = o$posteriors$alpha_p)+
  geom_errorbarh(aes(y = plate_id, x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_point(aes(y = plate_id, x = mean))

g_sigma <- ggplot()+
  geom_errorbarh(data = o$posteriors$sigma_bio,
                 aes(y = "sigma_bio",
                     x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_errorbarh(data = o$posteriors$sigma_tech,
                 aes(y = "sigma_tech",
                     x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_point(data = o$posteriors$sigma_bio,
             aes(y = "sigma_bio", x = mean))+
  geom_point(data = o$posteriors$sigma_tech,
             aes(y = "sigma_tech", x = mean))+
  ylab(label = '')

g_alpha_p|g_sigma

6 Session Info

sessionInfo()
FALSE R Under development (unstable) (2026-01-15 r89304)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.3 LTS
FALSE 
FALSE Matrix products: default
FALSE BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
FALSE 
FALSE locale:
FALSE  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
FALSE  [3] LC_TIME=en_GB              LC_COLLATE=C              
FALSE  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
FALSE  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
FALSE  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
FALSE 
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE 
FALSE attached base packages:
FALSE [1] stats     graphics  grDevices utils     datasets  methods   base     
FALSE 
FALSE other attached packages:
FALSE [1] ggforce_0.5.0    ggplot2_4.0.1    cellmig_1.1.6    BiocStyle_2.39.0
FALSE 
FALSE loaded via a namespace (and not attached):
FALSE  [1] ggiraph_0.9.3           tidyselect_1.2.1        dplyr_1.1.4            
FALSE  [4] farver_2.1.2            loo_2.9.0               S7_0.2.1               
FALSE  [7] fastmap_1.2.0           lazyeval_0.2.2          tweenr_2.0.3           
FALSE [10] fontquiver_0.2.1        digest_0.6.39           lifecycle_1.0.5        
FALSE [13] StanHeaders_2.32.10     tidytree_0.4.7          magrittr_2.0.4         
FALSE [16] compiler_4.6.0          rlang_1.1.7             sass_0.4.10            
FALSE [19] tools_4.6.0             yaml_2.3.12             knitr_1.51             
FALSE [22] labeling_0.4.3          htmlwidgets_1.6.4       pkgbuild_1.4.8         
FALSE [25] curl_7.0.0              plyr_1.8.9              RColorBrewer_1.1-3     
FALSE [28] aplot_0.2.9             withr_3.0.2             purrr_1.2.1            
FALSE [31] grid_4.6.0              polyclip_1.10-7         stats4_4.6.0           
FALSE [34] gdtools_0.4.4           inline_0.3.21           scales_1.4.0           
FALSE [37] MASS_7.3-65             tinytex_0.58            dichromat_2.0-0.1      
FALSE [40] cli_3.6.5               rmarkdown_2.30          treeio_1.35.0          
FALSE [43] generics_0.1.4          otel_0.2.0              RcppParallel_5.1.11-1  
FALSE [46] ggtree_4.1.1            reshape2_1.4.5          ape_5.8-1              
FALSE [49] cachem_1.1.0            rstan_2.32.7            stringr_1.6.0          
FALSE [52] parallel_4.6.0          ggplotify_0.1.3         BiocManager_1.30.27    
FALSE [55] matrixStats_1.5.0       vctrs_0.7.1             yulab.utils_0.2.3      
FALSE [58] V8_8.0.1                jsonlite_2.0.0          fontBitstreamVera_0.1.1
FALSE [61] bookdown_0.46           gridGraphics_0.5-1      patchwork_1.3.2        
FALSE [64] magick_2.9.0            systemfonts_1.3.1       tidyr_1.3.2            
FALSE [67] jquerylib_0.1.4         glue_1.8.0              codetools_0.2-20       
FALSE [70] stringi_1.8.7           gtable_0.3.6            QuickJSR_1.9.0         
FALSE [73] tibble_3.3.1            pillar_1.11.1           rappdirs_0.3.4         
FALSE [76] htmltools_0.5.9         R6_2.6.1                evaluate_1.0.5         
FALSE [79] lattice_0.22-7          ggfun_0.2.0             fontLiberation_0.1.0   
FALSE [82] bslib_0.10.0            rstantools_2.6.0        Rcpp_1.1.1             
FALSE [85] gridExtra_2.3           nlme_3.1-168            xfun_0.56              
FALSE [88] fs_1.6.6                pkgconfig_2.0.3