---
title: "multiSight quick start guide"
author:
- name: Florian Jeanneret
affiliation: Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France.
date: Sys.Date()
- name: Stéphane Gazut
affiliation: Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France.
date: Sys.Date()
output:
BiocStyle::html_document:
toc_float: true
BiocStyle::pdf_document: default
vignette: |
%\VignetteIndexEntry{multiSight quick start guide}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# multiSight
The purpose of this vignette is to help you become productive as quickly as
possible with the **multiSight** package.
## Version Info
**R version**: `r R.version.string`
**Bioconductor version**: `r BiocManager::version()`
**Package version**: `r packageVersion("multiSight")`
# Installation
You can install the released version of **multiSight** from
[Bioconductor](https://www.bioconductor.org/) with:
```{r, eval = FALSE}
#To install this package ensure you have BiocManager installed
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#The following initializes usage of Bioc devel
BiocManager::install("multiSight")
```
# **multiSight** purpose
**multiSight** is an R package providing a graphical interface to
analyze and explore your omic datasets (e.g. RNAseq, RPPA) by both
__single-omics__ and __multi-omics__ approaches.
_Single-omics_ means Differential Expression Analysis (DEA) is led by DESeq2
towards Over Representation Analysis (ORA). Features with DEA p-values under
the 0.05 threshold, or user's value set in the *Biological Insight* tab, are
selected. These features are saved to carry out ORA on several databases
(see *Biological insights* tab section).
_Multi-omics_ is defined by substituting the DEA part by sPLS-DA method.
This is a multi-block selection features method taking several omic blocks
into account to select features (see *sPLS-DA* section). Biological processes
(*BPs*) from a database (e.g. Reactome) are then enriched with these selected
features by ORA to reveal putative altered biological processes.
Enrichment results obtained by ORAs are tables with *BP* IDs,
descriptions and p-values. An enrichment table is provided for each omic data
block and for each database.
**multiSight** proposes easing enrichment results interpretation from
several omic datasets. For each *BP* shared between enrichment results
from a given database, the relative p-values are merged.
Stouffer's pooling method is used to calculate probabilities fusion
(see Multi-omics table section).
For instance, if "*signaling by ERBB2*" is highlighted by ORAs of RNAseq and
RPPA features with p-values of 0.06 and 0.10, Stouffer's method
calculates a probability of 0.0224 for this pathway.
Thereby, multi-omics
information is harnessed to make decision of significance.
Stouffer's p-values for *BP* beget a multi-omics table where
*BP* IDs, description and Stouffer's p-values are provided.
This *multi-omics* enrichment table is used to draw an *enrichment map*.
*Enrichment map* is a network based on *BPs* information.
*BPs* are nodes while the bonds are built between
*BPs* sharing some features (see *Multi-omics enrichment map* section).
In addition, classification models are fitted to select few subsets of
features, using **biosigner** or **sPLS-DA methods**.
*biosigner* provides one model by omic block and one list of features named
*biosignature*.
Nevertheless, sPLS-DA *biosignatures* are based on more features than biosigner.
**Biosignatures** can be used:
- To forecast phenotypes (e.g. to diagnostic tasks, histological subtyping);
- To design _**Pathways**_ and _**gene ontology**_ **enrichments**
(sPLS-DA biosignatures only);
- To build _**Network inference**_;
- To find _**PubMed**_ references to make assumptions easier and data-driven.
Moreover, numerical relationships between features selected by *sPLS-DA* or
*biosigner* can feed network inferences (for instance by correlation
or mutual information, see *Assumption tab* section).
_**multiSight** package provides a graphical interface and relative functions_
_to use in a script._
## App structure
**multiSight** allows to get better biological insights for each omic dataset
based on __four analytic modules__:
- __Data input__ & __results__;
- __Classification__ models building;
- __Biological databases__ querying;
- __Network Inference__ & __PubMed__ querying.
*All figures and tables can be retrieved in an automatic report.*
## Which omic data?
All types of omic data respecting input format are supported to build
**classification models**, **biosignatures** and
**network inference**.
- Genomics;
- Transcriptomics;
- Proteomics;
- Metabolomics;
- Lipidomics;
*This package supports all numerical matrices.*
_You can take a look on data format supported in **Home** tab details._
# Home tab
This is the first tab of **multiSight**.
In this tab, you can load your biological data and find all results in a
generated report (.html or .doc formats).
## Data format
You should provide two types of data: **numerical matrices** and
**classes vector** as **csv tables** for the **same samples**.
| Omic data 1 | | | | |
|-------------|--------|--------|--------|-----|
| | SIGIRR | SIGIRR | MANSC1 | |
| AOFJ | 0 | 150 | 1004 | ... |
| A13E | 34 | 0 | 0 | |
| | | ... | | |
| Omic data 2 | | | | |
|-------------|-----------------|-----------------|-----------------|-----|
| | ENSG00000139618 | ENSG00000226023 | ENSG00000198695 | |
| AOFJ | 25 | 42 | 423 | ... |
| A13E | 0 | 154 | 4900 | |
| | | ... | | |
| Omic classes | |
|--------------|-------|
| | Y |
| AOFJ | condA |
| A13E | condB |
| | ... |
## Organism
**multiSight** downloads organism database automatically according to your
choice in *Home tab*. **19 organism databases** are available (see *Home tab*).
Please note that *multiSight* can be used without enrichment analysis, and
thus, without a organism database matching with your data.
## Generated report
Report of results can be generated in *Home tab* below *datasets input part*
inside the **Analysis Results** block (.html or .doc formats).
# Classification tab
*This tab background is launched when you start analysis in Home tab. Results*
*are displayed after biosigner and sPLS-DA models fitting*.
This tab presents **classification models** to **predict sample classes**
based upon only a **small subset of features** selected by **multiSight**
models.
## Model methods
Two types of models have been implemented so far to answer different
questions: from **biosigner** & **mixOmics (sPLS-DA)** R packages.
- To determine *small biosignatures* - biosigner.
- To build *classification models* by a *multi-block* approach - sPLS-DA
- To select relevant biological *features* to *enrich* - sPLS-DA
### biosigner
[**biosigner**](bioconductor.org/packages/release/bioc/html/biosigner.html)
is an R package available in *Bioconductor* project.
_In a single-omics_ approach, **biosigner** computes **SVM**
and **Random Forest** models and selects features for all omic
datasets **one by one**.
Thereby, a biosignature is defined for each omic as the *union of SVM and RF*
variable lists.
*If there are 3 omic datasets, __biosigner__ gives 6 models and 3 feature lists
(about 5-10 features by data type).*
### Example
```{r model_biosigner, echo=TRUE, message=FALSE, warning=FALSE}
require(multiSight)
## omic2 is a multi-omics dataset of 2 simulated omics included in package
data("omic2", package = "multiSight")
splitData <- splitDatatoTrainTest(omic2, freq = 0.8)
data.train <- splitData$data.train
data.test <- splitData$data.test
## Build model and one biosignature by omic dataset.
biosignerRes <- runSVMRFmodels_Biosigner(data.train)
## Results
biosignerModels <- biosignerRes$model #list of SVM/RF models for each omic.
biosignerFeats <- biosignerRes$biosignature #selected features for each omic.
## Assess model classification performances
biosignerPerf <- assessPerformance_Biosigner(modelList = biosignerModels,
dataTest = data.test)
print(biosignerPerf) #confusion matrices and performance metrics
```
### sPLS-DA (DIABLO)
By a _multi-omics_ approach,
[**sPLS-DA (DIABLO)**](doi.org/10.1093/bioinformatics/bty1054)
method selects relevant features to explain biological outcome of interest.
This is implemented into the
[**mixOmics**](bioconductor.org/packages/release/bioc/html/mixOmics.html)
R package available in *Bioconductor* project.
*sPLS-DA method* builds a new space for each omic dataset by linear
combinations of initial features giving several components.
In fact, around 20-40 features, for each omic, are selected according to
contributions to components. Then, these features can be used to
enrich some biological processes from a biological database
(e.g. KEGG, Reactome) by ORA.
Classification **performances** are displayed in *classification tab* for
each model and **selected features** by omic data type (e.g. if you have
provided 3 omic types, you could observe 3 feature lists).
### Example
```{r model_DIABLO, message=FALSE, warning=FALSE}
require(multiSight)
## omic2 is a multi-omics dataset of 2 simulated omics included in package
data("omic2", package = "multiSight")
data("diabloRes", package = "multiSight")
splitData <- splitDatatoTrainTest(omic2, freq = 0.9)
data.train <- splitData$data.train
data.test <- splitData$data.test
## Build model and one biosignature by omic dataset.
# diabloRes <- runSPLSDA(data.train)
# diabloRes #internal object of package to save time
## Results
diabloModels <- diabloRes$model #sPLS-DA model using all omics.
diabloFeats <- diabloRes$biosignature #selected features for each omic.
## Asses model classification performances
diabloPerf <- assessPerformance_Diablo(splsdaModel = diabloModels,
dataTest = data.test)
print(diabloPerf) #confusion matrices and performance metrics
```
```{r model_DIABLO, message=FALSE, warning=FALSE}
```
# Biological insights tab
*This tab is ready to use when Classification tab features selection is over.*
**Biological Insights** tab is dedicated to give biological sense to your data.
You can carry out enrichment by ORA with features selected according to
*DESeq2* feature probabilities under the 0.05 threshold.
Or, you can compute biological
enrichment with features selected by *sPLS-DA*.
## Biological Annotation Databases
**Several databases** are implemented in **multiSight** package to
provide a large panel of **enrichment analysis**:
There are **Pathways** and **Gene Ontology** databases helped by
**clusterProfiler** and **reactomePA** R Bioconductor packages:
- Kegg;
- Reactome;
- wikiPathways;
- Molecular Function (GO)
- Cellular Component (GO)
- Biological Process (GO)
See table below to check available biological information for **pathways** from
these databases, according to organism annotations.
**multiSight** supports **19 organism databases**
(required only for enrichment analysis):
| orgDb | kegg | reactome | wikipathways |
|-------------------|------|-----------|--------------------------|
| org.Hs.eg.db | hsa | human | Homo sapiens |
| org.Mm.eg.db | mmu | mouse | Mus musculus |
| org.Rn.eg.db | rno | rat | Rattus norvegicus |
| org.Sc.sgd.db | sce | yeast | Saccharomyces cerevisiae |
| org.Dm.eg.db | dme | fly | Drosophila melanogaster |
| org.Dr.eg.db | dre | zebrafish | Danio rerio |
| org.Ce.eg.db | cel | celegans | Caenorhabditis elegans |
| org.At.tair.db | ath | x | Arabidopsis thaliana |
| org.Bt.eg.db | bta | x | Bos taurus |
| org.Gg.eg.db | gga | x | Gallus gallus |
| org.Cf.eg.db | cfa | x | Canis familiaris |
| org.Ss.eg.db | ssc | x | Sus scrofa |
| org.EcK12.eg.db | eck | x | Escherichia coli |
| org.Pt.eg.db | ptr | x | Pan troglodytes |
| org.Ag.eg.db | aga | x | Anopheles gambiae |
| org.Pf.plasmo.db | pfa | x | Plasmodium falciparum |
| org.EcSakai.eg.db | ecs | x | Escherichia coli |
| org.Mmu.eg.db | mcc | x | x |
| org.Xl.eg.db | xla | x | x |
**Note for enrichment**: only convertible feature names to genes could be
enriched according to information in organism's database (e.g. *Gene SYMBOL*
*to entrez ids*).
To know _what feature name types are supported_, copy these two lines of
codes with your organism database of interest.
For instance, org.Mm.eg.db for Human provides 24 input names you can use as
feature names in your datasets.
```{r message=FALSE}
if (requireNamespace("org.Mm.eg.db", quietly = TRUE))
{
library(org.Mm.eg.db)
columns(org.Mm.eg.db)
}
```
Please note that feature IDs can differ between omic blocks (e.g. one dataset
with gene SYMBOL, another one with ENSEMBL IDs). You can provide this
information to convert them for enrichment in the application.
## **Single-omic** DESeq2 differential expression analysis
Therefore, in this *Biological insights tab* you can compute
**Differential Expression Analysis (DEA) by DESeq2** for several omic datasets
Moreover, with features selected according to
*p-value threshold set by the user* you can enrich pathway or Gene Ontology
databases.
### Example
```{r multiOmicEnrichment_DESeq2, message=FALSE, warning=FALSE}
require(multiSight)
## omic2 is a multi-omics dataset of 2 simulated omics included in package
data("omic2", package = "multiSight")
data("deseqRes", package = "multiSight")
# deseqRes <- runMultiDeseqAnalysis(omicDataList = omic2,
# padjUser = 0.05)
## One Differential Expression Analysis table for each omic dataset
# View(deseqRes$DEtable)
## One feature selected list for each omic according to padjust user threshold
multiOmic_biosignature <- deseqRes$selectedFeatures
# View(multiOmic_biosignature)
## Multi-omics enrichment
### convert features
dbList <- list(rnaRead = "ENSEMBL",
dnaRead = "ENSEMBL")
convFeat <- convertToEntrezid(multiOmic_biosignature, dbList, "org.Mm.eg.db")
### ORA enrichment analysis
if (requireNamespace("org.Mm.eg.db", quietly = TRUE))
{
# database <- c("kegg", "wikiPathways", "reactome", "MF", "CC", "BP")
database <- c("reactome")
data("enrichResList", package = "multiSight")
# enrichResList <- runMultiEnrichment(omicSignature = convFeat,
# databasesChosen = database,
# organismDb = "org.Mm.eg.db",
# pvAdjust = "BH", #default value
# minGSSize = 5, #default value
# maxGSSize = 800, #default value
# pvStouffer = 0.1) #default value
reacRes <- enrichResList$pathways$reactome
names(reacRes$result) # classical enrichment tables, multi-omics and EMap
}
```
## **Multi-omics** sPLS-DA selected features
This tab gives a graphical interface where all **multi-omics features**
selected by sPLS-DA method are used to enrich chosen databases.
You could retrieve **functional enrichment results** in Home tab's report or
save tables in this tab.
```{r multiOmicEnrichment_DIABLO, message=FALSE, warning=FALSE}
require(multiSight)
## omic2 is a multi-omics dataset of 2 simulated omics included in package
data("omic2", package = "multiSight")
data("diabloRes", package = "multiSight")
# splitData <- splitDatatoTrainTest(omic2, 0.8)
# data.train <- splitData$data.train
# data.test <- splitData$data.test
#
# diabloRes <- runSPLSDA(data.train)
# diabloRes #internal object of package to save time
diabloModels <- diabloRes$model #sPLS-DA model using all omics.
diabloFeats <- diabloRes$biosignature #selected features for each omic.
## Multi-omics enrichment
### convert features
names(diabloFeats) #/!\use same names for dbList and omic datasets.
dbList <- list(rnaRead = "ENSEMBL", #feature names origin
dnaRead = "ENSEMBL")
if (requireNamespace("org.Mm.eg.db", quietly = TRUE))
{
convFeat <- convertToEntrezid(diabloFeats,
dbList,
"org.Mm.eg.db")
### ORA enrichment analysis for omic feature lists
# database <- c("kegg", "wikiPathways", "reactome", "MF", "CC", "BP")
database <- c("reactome")
multiOmicRes <- runMultiEnrichment(omicSignature = convFeat,
databasesChosen = database,
organismDb = "org.Mm.eg.db",
pvAdjust = "BH", #default value
minGSSize = 5, #default value
maxGSSize = 800, #default value
pvStouffer = 0.1) #default value
## Results
reacRes <- multiOmicRes$pathways$reactome
names(reacRes$result) # classical enrichment tables, multi-omics and EMap
}
```
## Visualizations
Two types of result visualization are given:
- Classical **enrichment tables** for each omic and each database (with
pathways id, p-value, padjust column).
- Then, when more than one omic is used for enrichment,
a *multi-omics table* and a *multi-omics enrichment map*
with **DEA** and *sPLS-DA selected features* are available.
### Multi-omics Table
A **multi-omics table** is built by annotation database (e.g. Reactome) with
all enrichment results obtained for omic datasets using Stouffer's method.
[*Stouffer's p-value method*](https://doi.org/10.1371/journal.pone.0089297)
consists in *p-values pooling* for same pathways or
ontology.
In fact, a Stouffer's value is computed for every biological
annotations shared by at least 2 enrichment analysis results (NOTE: each omic
data used for enrichment has one usual enrichment table by selected database).
Thereby, you can **summarize information for several datasets** and
enrichment results with a multi-omics table, for instance:
| ID | Description | p-value:Omic1 | p-value:Omic2 | Stouffer | StoufferWeighted | geneID | GeneRatio | Count |
|---------------|---------------------------------------------------------------------------|---------------|---------------|----------|------------------|------------------------------------|-----------|-------|
| R-MMU-2173791 | TGF-beta receptor signaling in EMT (epithelial to mesenchymal transition) | 0.043 | 0.047 | 0.008 | 0.008 | 102098/11848/16456/18762/21803/... | 2/60 | 2 |
| R-MMU-194315 | Signaling by Rho GTPases | 0.096 | 0.029 | 0.011 | 0.011 | 100043813/101497/102098/102920/... | 1/60 | 1 |
| R-MMU-73887 | Death Receptor Signalling | 0.056 | 0.06 | 0.013 | 0.013 | 101497/102098/106025/109934/11491/ | 2/60 | 2 |
| R-MMU-8953854 | Metabolism of RNA | 0.533 | 0.002 | 0.024 | 0.021 | 100043813/100044627/100502825/... | 1/60 | 1 |
Each value is similar to an *enrichRes* object obtained by *clusterProfiler*.
In fact, Stouffer's results are transformed as *enrichRes** objects and could be
used for usual *enrichRes* analysis and visualization functions.
Note in our *cytoscape-like case* that *geneID* column refers to all genes in
pathways or ontology named in database annotations. In usual *enrichRes*, this
column presents genes enriched contributing in pathways.
A graphical network is built according to this table as an **Enrichment Map**.
### Multi-omics enrichment map
**Enrichment map** is a graph drawn to observe pathway or ontology
relationships according to overlapping elements in each parts.
Each pathway is a node and a bond is built between them if there have
common genes (see *Jaccard similarity coefficient - JC's similarity* method).
Here, enrichment map functions from *enrichPlot* package are adapted to draw
similar map to **cytoscape** add-on
[*EnrichmentMap*](doi.org/10.1371/journal.pone.0013984):
all genes in each pathway are used to compute similarity between pathway
building network bonds.
![Enrichment Map from Stouffer p-values pooling method](EnrichmentMapForInstance.png)
*Each multi-omics enrichment map (one by enrichment database) could be*
*retrieve in a report automatically generated and in the Home tab to be saved.*
# Assumption tab
**Assumption tab** aims to help biological hypothesis making by *network
inference* from feature relationship values (e.g correlation,
partial correlation) and by a *PubMed module*.
You can find both functions:
- To compute _*network inference*_ and to reveal feature relationships.
- To get _*PubMed articles*_ based on your personalized query without leaving
app.
```{r networkInference, message=FALSE, warning=FALSE}
require(multiSight)
data("omic2", package = "multiSight")
data("diabloRes", package = "multiSight")
## omic2 is a multi-omics dataset of 2 simulated omics included in package
splitData <- splitDatatoTrainTest(omic2, 0.8)
data.train <- splitData$data.train
data.test <- splitData$data.test
## Build sPLS-DA models
# diabloRes <- runSPLSDA(data.train)
diabloFeats <- diabloRes$biosignature #selected features for each omic.
## Build biosigner models
biosignerRes <- runSVMRFmodels_Biosigner(data.train)
biosignerFeats <- biosignerRes$biosignature #selected features for each omic.
## Network inference
### DIABLO features
concatMat_diablo <- getDataSelectedFeatures(omic2, diabloFeats)
corrRes_diablo <- correlationNetworkInference(concatMat_diablo, 0.85)
pcorRes_diablo <- partialCorrelationNI(concatMat_diablo, 0.4)
miRes_diablo <- mutualInformationNI(concatMat_diablo, 0.2)
### biosigner features
concatMat_biosigner <- getDataSelectedFeatures(omic2, biosignerFeats)
corrRes_bios <- correlationNetworkInference(concatMat_biosigner, 0.85)
pcorRes_bios <- partialCorrelationNI(concatMat_biosigner, 0.4)
miRes_bios <- mutualInformationNI(concatMat_biosigner, 0.2)
corrRes_diablo$graph
```
# Session info {.unnumbered}
Here is the output of `sessionInfo()` on the system on which this document was
compiled running pandoc `r rmarkdown::pandoc_version()`:
```{r sessionInfo, echo=FALSE}
sessionInfo()
```