--- title: "miaSim: Microbiome Data Simulation" author: - name: Daniel Rios Garza email: danielrios.garza@kuleuven.be - name: Emma Gheysen email: emma.gheysen@student.kuleuven.be - name: Karoline Faust email: karoline.faust@kuleuven.be - name: Leo Lahti email: leo.lahti@iki.fi - name: Yagmur Simsek email: yagmur.simsek@hsrw.org - name: Yu Gao email: gaoyu19920914@gmail.com date: "`r Sys.Date()`" package: miaSim output: BiocStyle::html_document: fig_height: 7 fig_width: 10 toc: yes toc_depth: 2 number_sections: true vignette: > %\VignetteIndexEntry{miaSim} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} --- ```{r, echo=FALSE} knitr::opts_chunk$set(cache = FALSE, fig.width = 9, message = FALSE, warning = FALSE) ``` # Introduction `miaSim` implements tools for microbiome data simulation based on the `microsim` . Microbiome time series simulation can be obtained by generalized Lotka-Volterra model,`simulateGLV`, and Self-Organized Instability (SOI), `simulateSOI`. Hubbell's Neutral model, `simulateHubbell` is used to determine the species abundance matrix. The resulting abundance matrix from these three simulation models can be implemented to `SummarizedExperiment` object or `TreeSummarizedExperiment` object by using `convertToSE`. `powerlawA` and `randomA` give interaction matrix of species generated by normal distribution and uniform distribution, respectively. These matrices can be used in the simulation model examples. `simulationTimes` generates lists of time series that can be specified as simulation time and time points to keep in simulated time. # Installation ### Bioc-release ```{r install-bioc,eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") ``` # Models for simulating microbiome data sets `simulateGLV` is the generalized Lotka-Volterra simulation model fitted to time-series estimates microbial population dynamics and relative rates of interaction. The model relies on interaction matrix that represents interaction heterogeneity between species. This interaction matrix can be generated with `powerlawA` or `randomA` functions depending on the distribution method. The function `powerlawA` uses normal distribution to create interaction matrix. ```{r eval=FALSE, include=FALSE} library(miaSim) A_normal <- powerlawA(n_species = 4, alpha = 3) ``` The function `randomA` uses uniform distribution to create interaction matrix. ```{r eval=FALSE, include=FALSE} A_uniform <- randomA(n_species = 10, d = -0.4, min_strength = -0.8, max_strength = 0.8, connectance = 0.5) ``` The number of species specified in the interaction matrix must be the same amount as the species used in the `simulateGLV` and `simulateSOI` models. `simulateGLV` and `simulateRicker` functions both use the generalized Lotka-Volterra model. ```{r eval=FALSE, include=FALSE} A_normal <- powerlawA(n_species = 4, alpha = 3) ExampleGLV <- simulateGLV(n_species = 4, A_normal, t_start = 0, t_store = 1000, stochastic = FALSE, norm = FALSE) ExampleRicker <- simulateRicker(n_species=4, A_normal, tend=100, norm = FALSE) ``` Time series is added to `simulateGLV` with `simulationTimes` function where the time points can be kept and extracted from simulation time as a separate list. ```{r eval=FALSE, include=FALSE} Time <- simulationTimes(t_start = 0, t_end = 100, t_step = 0.5, t_store = 100) Time$t.index ``` `simulateHubbell` includes the Hubbell Neutral simulation model which explains the diversity and relative abundance of species in ecological communities. This model is based on the community dynamics; migration, births and deaths. Loss in society is either replaced by migration or birth. Similarly, `simulateHubbellRates` also uses the Hubbell Neutral model.Migration is determined by the metacommunity and replacement by birth is measure by growth rates. The time between events is the determinant of replacement. ```{r eval=FALSE, include=FALSE} ExampleHubbell <- simulateHubbell(n_species = 8, M = 10, I = 1000, d = 50, m = 0.02, tend = 100) ExampleHubbellRates <- simulateHubbellRates(community_initial = c(0,5,10), migration_p = 0.1, metacommunity_p = NULL, k_events = 1, growth_rates = NULL, norm = FALSE, t_end=1000) ``` The Self-Organised Instability (SOI) model can be found in `simulateSOI` and it generates time series for communities and accelerates stochastic simulation. ```{r eval=FALSE, include=FALSE} ExampleSOI <- simulateSOI(n_species = 4, I = 1000, A_normal, k=5, com = NULL, tend = 150, norm = TRUE) ``` Stochastic logistic model is used in `simulateStochasticLogistic` to determine dead and alive counts in community. ```{r eval=FALSE, include=FALSE} ExampleLogistic <- simulateStochasticLogistic(n_species = 5) ``` The consumer resource community simulation model `simulateConsumerResource` requires the use of the `randomE` function, which returns a matrix containing the production rates and consumption rates of each species. The resulting matrix is used as a determination of resource consumption efficiency. ```{r eval=FALSE, include=FALSE} ExampleConsumerResource <- simulateConsumerResource(n_species = 2, n_resources = 4, eff = randomE(n_species = 2, n_resources = 4)) # visualize the dynamics of the model Consumer_plot <- matplot(ExampleConsumerResource, type = "l") ``` The community simulations result in abundance matrix that can be stored in `SummarizedExperiment` [@SE] class object. Other fields, such as rowData containing information about the samples, and colData, consisting of sample metadata describing the samples, can be added to the `SummarizedExperiment` class object. This can be done by conversion function `convertToSE`. ```{r eval=FALSE, include=FALSE} ExampleHubbellRates <- simulateHubbellRates(community_initial = c(0,5,10), migration_p = 0.1, metacommunity_p = NULL, k_events = 1, growth_rates = NULL, norm = FALSE, t_end=1000) HubbellSE <- convertToSE(matrix = ExampleHubbellRates$counts, colData = ExampleHubbellRates$time, metadata = ExampleHubbellRates$metadata) ``` Also, the `TreeSummarizedExperiment` class object can be created using the required hierarchical information in addition to the `SummarizedExperiment` class object. It can also be reached through `convertToSE`. For further details: ```{r eval=FALSE, include=FALSE} library(TreeSummarizedExperiment) help("TreeSummarizedExperiment-constructor", package = TreeSummarizedExperiment) ``` After converting to `SummarizedExperiment` or `TreeSummarizedExperiment` the models can be used with: various visualization functions under `miaViz`, many analysis tools that are available under `mia` and time series analysis tools under `miaTime`. `miaViz` installation ```{r eval=FALSE, include=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("miaViz") ``` `mia` installation ```{r eval=FALSE, include=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("mia") ``` `HubbellSE` population density can be plotted with `plotAbundanceDensity` function from `miaViz`. ```{r eval=FALSE, include=FALSE} library(miaViz) HubbellDensityPlot <- plotAbundanceDensity(HubbellSE, abund_values = "counts") ``` ```{r eval=FALSE, include=FALSE} A_normal <- powerlawA(n_species = 4, alpha = 3) ExampleGLV <- simulateGLV(n_species = 4, A_normal, t_start = 0, t_store = 1000, stochastic = FALSE, norm = FALSE) rownames(ExampleGLV) <- c(paste("Species", rownames(ExampleGLV), sep = "_")) colnames(ExampleGLV) <- c(paste("Sample", seq_len(ncol(ExampleGLV)), sep = "_")) df <- DataFrame(sampleID = colnames(ExampleGLV), Time = seq(1, 1000, 1), SubjectID = rep(1:4, 250), row.names = colnames(ExampleGLV)) SE_GLV <- convertToSE(matrix = ExampleGLV, colData = df) ``` More examples on `SummarizedExperiment` object manipulation and analysis can be found at https://microbiome.github.io/OMA/ . # Session info ```{r} sessionInfo() ```