library(cellmig)
library(ggplot2)
library(ggforce)
ggplot2::theme_set(new = theme_bw(base_size = 10))

1 Background

High-throughput tracking of cells with time-lapse microscopy followed by the acquisition of images at ﬁxed time intervals facilitates the analysis of cell migration across many wells treated under different biological conditions. These workflows generate considerable technical noise and biological variability, and therefore technical and biological replicates are necessary, leading to large, hierarchically structured datasets, i.e., cells are nested within technical replicates that are nested within biological replicates.

Current statistical analyses of such data usually ignore the hierarchical structure of the data and fail to explicitly quantify uncertainty arising from technical or biological variability. To address this gap, we present cellmig, an R package implementing Bayesian hierarchical models for migration analysis. cellmig quantifies condition-specific velocity changes (e.g., drug effects) while modeling nested data structures and technical artifacts, providing uncertainty-aware estimates through credible intervals.

There are currently no Bioconductor packages providing specialized statistical methods for analyzing hierarchical high-throughput cell migration data. cellmig addresses this gap and will represent a valuable addition to the ecosystem.

2 Installation

To install this package, start R (version “4.5”) and enter:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cellmig")

3 Data

This is how a typical cell migration data looks like \(\rightarrow\) a table.

Each rows is a cell with the following features:

well = unique well ID (w1, w2, w3, etc.).
plate = unique plate ID (p1, p2, p3, etc.). Each plate is a biological replicate. A plate contains multiple wells, some of which are treated with the same compound and dose (technical replicates)
compound = compound name (c1, c2, c3, etc.)
dose = compound concentration (0, 1, 5, 10, low, mid, high, etc.)
v = Observed cell migration velocity (numeric)
offset = binary (0 or 1). Indicates whether a treatment should be used for batch correction across plates. By default offset = 0 (no correction). Set to 1 for specific treatment groups (compound x dose) used as offsets. Ensure that this treatment group appears on each plate.

data("d", package = "cellmig")
str(d)

FALSE 'data.frame': 7560 obs. of  6 variables:
FALSE  $ well    : chr  "1" "1" "1" "1" ...
FALSE  $ plate   : chr  "1" "1" "1" "1" ...
FALSE  $ compound: chr  "C1" "C1" "C1" "C1" ...
FALSE  $ dose    : chr  "D1" "D1" "D1" "D1" ...
FALSE  $ v       : num  21.905 0.535 3.348 5.351 1.194 ...
FALSE  $ offset  : num  1 1 1 1 1 1 1 1 1 1 ...

In this vignette we will use simulated data from:

plates (\(p\)): 1, 2, … , 3
wells (\(w\)): 1, 2, … , 378
cells per well with their migration velocity v
wells are treated with compounds 1, 2, …, 6 at dose 1, 2, …, 7.
combination of a compound and dose is a treatment group (\(t\)) \(\rightarrow\) 1, 2, …, 42.

Let’s visualize the data. Each dot represents a cell with its velocity on the y-axis. Each facet corresponds to a compound (e.g., specific drug that may affect cellular velocity). The x-axis represents the dose. There are three plates, indicated by color. Four technical replicates (wells), analyzed on the same plate, are stacked next to each other and have the same color.

ggplot(data = d)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 0.5)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration velocity")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

3.1 Mean migration velocity per well

Alternatively, we can visualize the well-specific mean velocities to highlight plate-specific batch effects.

dm <- aggregate(v~well+plate+compound+dose, data = d, FUN = mean)
ggplot(data = dm)+
  facet_wrap(facets = ~paste0("compound=", compound), 
             scales = "free_y", ncol = 2)+
  geom_sina(aes(x = as.factor(dose), col = plate, y = v, group = well), 
            size = 1.5, alpha = 0.7)+
  theme_bw()+
  theme(legend.position = "top",
        strip.text.x = element_text(margin = margin(0.03,0,0.03,0, "cm")))+
  ylab(label = "migration velocity")+
  xlab(label = '')+
  scale_color_grey()+
  guides(color = guide_legend(override.aes = list(size = 3)))+
  guides(shape = guide_legend(override.aes = list(size = 3)))+
  scale_y_log10()+
  annotation_logticks(base = 10, sides = "l")

4 `cellmig` analysis

We will use this data to infer the overall treatment effects (parameter \(\delta_t\)), relative to a control treatment (the offset) to correct for plate-specific batch effects. At the same time, cellmig will quantify many different features of the data using its model parameters (e.g., variability between technical or biological replicates; or plate-specific treatment effects (\(\gamma_{pt}\))).

4.1 Model fitting

We fit the Stan model employed by cellmig with the control parameters defined in the list control. There are many other input parameters in control, check the cellmig function documentation.

o <- cellmig(x = d,
             control = list(mcmc_warmup = 300, # nr. of MCMC warmup step?
                            mcmc_steps = 1000, # nr. of MCMC iteration steps?
                            mcmc_chains = 2,   # nr. of MCMC chains
                            mcmc_cores = 2))   # nr. of MCMC cores

4.2 What are the overall treatment effects (\(\delta_t\)) on velocity?

To extract the means, medians, and 95% Highest Density Intervals (HDIs, quantifying parameter value uncertainty) of \(\delta_t\), we have to access the data.frame delta_t in the output object posteriors:

str(o$posteriors$delta_t)

FALSE 'data.frame': 35 obs. of  16 variables:
FALSE  $ group_id: int  1 2 3 4 5 6 7 8 9 10 ...
FALSE  $ mean    : num  -0.8662 -0.4546 -0.2061 0.0616 0.1164 ...
FALSE  $ se_mean : num  0.00162 0.00178 0.00187 0.00174 0.00191 ...
FALSE  $ sd      : num  0.069 0.0685 0.0683 0.0742 0.0709 ...
FALSE  $ X2.5.   : num  -1.0014 -0.5855 -0.3317 -0.0912 -0.0176 ...
FALSE  $ X25.    : num  -0.9115 -0.5019 -0.2531 0.011 0.0676 ...
FALSE  $ X50.    : num  -0.8661 -0.4552 -0.2082 0.0634 0.1159 ...
FALSE  $ X75.    : num  -0.82 -0.409 -0.16 0.111 0.163 ...
FALSE  $ X97.5.  : num  -0.7349 -0.3256 -0.0644 0.2059 0.2557 ...
FALSE  $ n_eff   : num  1821 1479 1338 1821 1370 ...
FALSE  $ Rhat    : num  1 1 1 1 1 ...
FALSE  $ group   : chr  "C2|D1" "C2|D2" "C2|D3" "C2|D4" ...
FALSE  $ compound: chr  "C2" "C2" "C2" "C2" ...
FALSE  $ dose    : chr  "D1" "D2" "D3" "D4" ...
FALSE  $ plate_id: num  1 1 1 1 1 1 1 1 1 1 ...
FALSE  $ plate   : chr  "1" "1" "1" "1" ...

It is better to visualize the mean \(\delta_t\)s and their 95% HDIs

Dot: Posterior mean of \(\delta_t\)
Error bar: 95% highest density interval (HDI) of \(\delta\)
\(\exp(\delta)\): Fold change in cell velocity relative to control

As compound t=1 was selected as control (by setting offset=1), the treatment effects of this compounds are not shown.

ggplot(data = o$posteriors$delta_t)+
  geom_line(aes(x = dose, y = mean, col = compound, group = compound))+
  geom_point(aes(x = dose, y = mean, col = compound))+
  geom_errorbar(aes(x = dose, y = mean, ymin = X2.5., ymax = X97.5., 
                    col = compound), width = 0.1)+
  ylab(label = expression("Overall treatment effect ("*delta*")"))+
  theme(legend.position = "top")

4.3 Compare the dose-response `profiles` for different compounds

For “rectangular datasets”, i.e. datasets with multiple compounds and overlapping doses, we can study the treatment dose-response profiles by hierarchical clustering based on the complete posteriors of \(\delta_t\), account for uncertainty in this parameter.

Panel A: dendrogram constructed by hierarchical clustering with average linkage, based on euclidean distances between vectors of \(\delta_t\) (shown in panel B) of each compound (leaf) across doses. Branch support values show branch robustness (label = 1000 implies this branch was encountered in each of the 1000 dendrograms constructed from the posterior of \(\delta_t\)). Plate-specific treatment dose-responses based on parameters \(\gamma_pt\).

Dot in panel B/C: Posterior mean of \(\delta_t\) and \(\gamma_pt\)
Error bar: 95% highest density interval (HDI)

get_dose_response_profile(x = o)+
  patchwork::plot_layout(widths = c(.7, 1, 4))

4.4 Compare the effects between treatment group

Pairwise dot-plot comparison \(\rightarrow\) x minus y axis

(Left panel) Differences in overall treatment effects. Log fold change (LFC; described by parameter \(\rho_{ij}\)) between overall treatments effects (\(\delta_t\)) of row (\(i\)) vs. column (\(j\)) treatment groups. Tile colors and labels represent \(\rho_{ij}\). (Right panel) Probability of differential treatment effect described by parameter \(\pi_{ij}\). Tile colors and labels represent \(\pi_{ij}\).

x/y-axis treatment groups (combinations of compounds and doses)
\(\rho\): Difference between treatment groups at y-x axis.
\(\pi\): probability of observing either a completely positive or negative \(\rho\)

u <- get_pairs(x = o, exponentiate = FALSE)
u$plot

FALSE NULL

4.5 Violin plot based comparison

from_groups: vector of treatment groups to consider (combinations of compounds and doses)
to_group: target treatment group
violins show the posterior distributions of the differences (\(\rho\): each element from from_groups vs. to_group).
label: probability, \(\pi\), of observing completely positive or negative \(\rho\)

get_groups(x = o)

FALSE    group_id group compound dose
FALSE 1         1 C2|D1       C2   D1
FALSE 2         2 C2|D2       C2   D2
FALSE 3         3 C2|D3       C2   D3
FALSE 4         4 C2|D4       C2   D4
FALSE 5         5 C2|D5       C2   D5
FALSE 6         6 C2|D6       C2   D6
FALSE 7         7 C2|D7       C2   D7
FALSE 8         8 C3|D1       C3   D1
FALSE 9         9 C3|D2       C3   D2
FALSE 10       10 C3|D3       C3   D3
FALSE 11       11 C3|D4       C3   D4
FALSE 12       12 C3|D5       C3   D5
FALSE 13       13 C3|D6       C3   D6
FALSE 14       14 C3|D7       C3   D7
FALSE 15       15 C4|D1       C4   D1
FALSE 16       16 C4|D2       C4   D2
FALSE 17       17 C4|D3       C4   D3
FALSE 18       18 C4|D4       C4   D4
FALSE 19       19 C4|D5       C4   D5
FALSE 20       20 C4|D6       C4   D6
FALSE 21       21 C4|D7       C4   D7
FALSE 22       22 C5|D1       C5   D1
FALSE 23       23 C5|D2       C5   D2
FALSE 24       24 C5|D3       C5   D3
FALSE 25       25 C5|D4       C5   D4
FALSE 26       26 C5|D5       C5   D5
FALSE 27       27 C5|D6       C5   D6
FALSE 28       28 C5|D7       C5   D7
FALSE 29       29 C6|D1       C6   D1
FALSE 30       30 C6|D2       C6   D2
FALSE 31       31 C6|D3       C6   D3
FALSE 32       32 C6|D4       C6   D4
FALSE 33       33 C6|D5       C6   D5
FALSE 34       34 C6|D6       C6   D6
FALSE 35       35 C6|D7       C6   D7

u <- get_violins(x = o, 
                 from_groups = get_groups(x = o)$group,
                 to_group = "C2|D1",
                 exponentiate = FALSE)
u$plot

4.6 Posterior predictive checks (PPCs)

To assess model validity, we performed posterior predictive checks, which showed that the simulated data (pink violin) were consistent with the observed data (black violins). Each dot is a cells.

g <- get_ppc_violins(x = o, wrap = TRUE, ncol = 3)
g+scale_y_log10()

Using posterior predictive checks we compared the mean simulated velocity per well (y-axis) with the observed mean per well (x-axis). Each dot is a well.

g <- get_ppc_means(x = o)
g

4.7 Inspecting other model parameters

g_alpha_p <- ggplot(data = o$posteriors$alpha_p)+
  geom_errorbarh(aes(y = plate_id, x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_point(aes(y = plate_id, x = mean))

g_sigma <- ggplot()+
  geom_errorbarh(data = o$posteriors$sigma_bio,
                 aes(y = "sigma_bio",
                     x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_errorbarh(data = o$posteriors$sigma_tech,
                 aes(y = "sigma_tech",
                     x = mean, xmin = X2.5., xmax = X97.5.),
                 height = 0.1)+
  geom_point(data = o$posteriors$sigma_bio,
             aes(y = "sigma_bio", x = mean))+
  geom_point(data = o$posteriors$sigma_tech,
             aes(y = "sigma_tech", x = mean))+
  ylab(label = '')

g_alpha_p|g_sigma

5 Session Info

sessionInfo()

FALSE R Under development (unstable) (2025-10-20 r88955)
FALSE Platform: x86_64-pc-linux-gnu
FALSE Running under: Ubuntu 24.04.3 LTS
FALSE 
FALSE Matrix products: default
FALSE BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
FALSE LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
FALSE 
FALSE locale:
FALSE  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
FALSE  [3] LC_TIME=en_GB              LC_COLLATE=C              
FALSE  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
FALSE  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
FALSE  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
FALSE [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
FALSE 
FALSE time zone: America/New_York
FALSE tzcode source: system (glibc)
FALSE 
FALSE attached base packages:
FALSE [1] stats     graphics  grDevices utils     datasets  methods   base     
FALSE 
FALSE other attached packages:
FALSE [1] ggforce_0.5.0    ggplot2_4.0.0    cellmig_1.1.0    BiocStyle_2.39.0
FALSE 
FALSE loaded via a namespace (and not attached):
FALSE  [1] ggiraph_0.9.2           tidyselect_1.2.1        dplyr_1.1.4            
FALSE  [4] farver_2.1.2            loo_2.8.0               S7_0.2.0               
FALSE  [7] fastmap_1.2.0           lazyeval_0.2.2          tweenr_2.0.3           
FALSE [10] fontquiver_0.2.1        digest_0.6.37           lifecycle_1.0.4        
FALSE [13] StanHeaders_2.32.10     tidytree_0.4.6          magrittr_2.0.4         
FALSE [16] compiler_4.6.0          rlang_1.1.6             sass_0.4.10            
FALSE [19] tools_4.6.0             yaml_2.3.10             knitr_1.50             
FALSE [22] labeling_0.4.3          htmlwidgets_1.6.4       pkgbuild_1.4.8         
FALSE [25] curl_7.0.0              plyr_1.8.9              RColorBrewer_1.1-3     
FALSE [28] aplot_0.2.9             withr_3.0.2             purrr_1.1.0            
FALSE [31] grid_4.6.0              polyclip_1.10-7         stats4_4.6.0           
FALSE [34] gdtools_0.4.4           inline_0.3.21           scales_1.4.0           
FALSE [37] MASS_7.3-65             dichromat_2.0-0.1       tinytex_0.57           
FALSE [40] cli_3.6.5               rmarkdown_2.30          treeio_1.35.0          
FALSE [43] generics_0.1.4          RcppParallel_5.1.11-1   ggtree_4.1.1           
FALSE [46] reshape2_1.4.4          ape_5.8-1               cachem_1.1.0           
FALSE [49] rstan_2.32.7            stringr_1.5.2           parallel_4.6.0         
FALSE [52] ggplotify_0.1.3         BiocManager_1.30.26     matrixStats_1.5.0      
FALSE [55] yulab.utils_0.2.1       vctrs_0.6.5             V8_8.0.1               
FALSE [58] jsonlite_2.0.0          fontBitstreamVera_0.1.1 bookdown_0.45          
FALSE [61] gridGraphics_0.5-1      patchwork_1.3.2         systemfonts_1.3.1      
FALSE [64] magick_2.9.0            tidyr_1.3.1             jquerylib_0.1.4        
FALSE [67] glue_1.8.0              codetools_0.2-20        stringi_1.8.7          
FALSE [70] gtable_0.3.6            QuickJSR_1.8.1          tibble_3.3.0           
FALSE [73] pillar_1.11.1           rappdirs_0.3.3          htmltools_0.5.8.1      
FALSE [76] R6_2.6.1                evaluate_1.0.5          lattice_0.22-7         
FALSE [79] ggfun_0.2.0             fontLiberation_0.1.0    bslib_0.9.0            
FALSE [82] rstantools_2.5.0        Rcpp_1.1.0              gridExtra_2.3          
FALSE [85] nlme_3.1-168            xfun_0.54               fs_1.6.6               
FALSE [88] pkgconfig_2.0.3

cellmig: quantifying cell migration with hierarchical Bayesian models

30 October 2025

Contents

1 Background

2 Installation

3 Data

3.1 Mean migration velocity per well

4 `cellmig` analysis

4.1 Model fitting

4.2 What are the overall treatment effects (\(\delta_t\)) on velocity?

4.3 Compare the dose-response `profiles` for different compounds

4.4 Compare the effects between treatment group

4.5 Violin plot based comparison

4.6 Posterior predictive checks (PPCs)

4.7 Inspecting other model parameters

5 Session Info

cellmig: quantifying cell migration with hierarchical Bayesian models

30 October 2025

Contents

1 Background

2 Installation

3 Data

3.1 Mean migration velocity per well

4 cellmig analysis

4.1 Model fitting

4.2 What are the overall treatment effects (\(\delta_t\)) on velocity?

4.3 Compare the dose-response profiles for different compounds

4.4 Compare the effects between treatment group

4.5 Violin plot based comparison

4.6 Posterior predictive checks (PPCs)

4.7 Inspecting other model parameters

5 Session Info

4 `cellmig` analysis

4.3 Compare the dose-response `profiles` for different compounds