scToppR is a package that allows seamless, workflow-based interaction with ToppGene, a portal for gene enrichment analysis. Researchers can use scToppR to directly query ToppGene’s databases and conduct analysis with a few lines of code. The use of data from ToppGene is governed by their Terms of Use: https://toppgene.cchmc.org/navigation/termsofuse.jsp
This vignette shows the use of scToppR within a differential expression workflow. Using the ‘airway’ dataset, we’ll perform a quick differential expression analysis using DESeq2. With the list of differentially expressed genes, we can easily use scToppR.
library(scToppR)
#> NOTE: scToppR provides data via ToppGene. Any use of this data must adhere to
#> ToppGene's Terms of Use. Please visit https://toppgene.cchmc.org/navigation/termsofuse.jsp
#> for more information.
suppressMessages({
library(airway)
library(DESeq2)
})
data("airway")
se <- airway
rownames(se) <- rowData(se)$gene_name
dds <- DESeqDataSet(se, design = ~ cell + dex)
#> Warning in DESeqDataSet(se, design = ~cell + dex): 7039 duplicate rownames were
#> renamed by adding numbers
smallestGroupSize <- 3
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]
dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res <- results(dds)
#add the gene names as a column in the results
res$gene <- rownames(res)
#add cluster column - here, with this bulk RNAseq data, we will only have 1 cluster
res$cluster <- "cluster0"
With these results, we will use scToppR to querry the ToppGene database for all categories for each cluster using the toppFun() function. This function requires users to specify the columns in their dataset.
toppData <- toppFun(res,
gene_col = "gene",
cluster_col = "cluster",
p_val_col = "padj",
logFC_col = "log2FoldChange")
#> This function returns data generated from ToppGene (https://toppgene.cchmc.org/)
#>
#> Any use of this data must be done so under the Terms of Use and citation guide established by ToppGene.
#>
#> Terms of Use: https://toppgene.cchmc.org/navigation/termsofuse.jsp
#> Citations: https://toppgene.cchmc.org/help/publications.jsp
#> Working on cluster: cluster0
head(toppData)
#> Category ID
#> 1 GeneOntologyMolecularFunction GO:0008083
#> 2 GeneOntologyMolecularFunction GO:0030546
#> 3 GeneOntologyMolecularFunction GO:0048018
#> 4 GeneOntologyMolecularFunction GO:0030545
#> 5 GeneOntologyMolecularFunction GO:0008201
#> 6 GeneOntologyMolecularFunction GO:0005539
#> Name PValue QValueFDRBH QValueFDRBY
#> 1 growth factor activity 1.778945e-08 2.502976e-05 0.0001959026
#> 2 signaling receptor activator activity 9.167716e-08 6.073895e-05 0.0004753908
#> 3 receptor ligand activity 1.295074e-07 6.073895e-05 0.0004753908
#> 4 signaling receptor regulator activity 1.754794e-07 6.172489e-05 0.0004831075
#> 5 heparin binding 2.066628e-06 5.815491e-04 0.0045516604
#> 6 glycosaminoglycan binding 2.526794e-06 5.925332e-04 0.0046376305
#> QValueBonferroni TotalGenes GenesInTerm GenesInQuery GenesInTermInQuery
#> 1 2.502976e-05 19912 177 820 26
#> 2 1.289898e-04 19912 558 820 51
#> 3 1.822169e-04 19912 548 820 50
#> 4 2.468996e-04 19912 619 820 54
#> 5 2.907745e-03 19912 197 820 24
#> 6 3.555199e-03 19912 270 820 29
#> Source URL Cluster
#> 1 cluster0
#> 2 cluster0
#> 3 cluster0
#> 4 cluster0
#> 5 cluster0
#> 6 cluster0
As the code reminds you, the use of this data must be done so in accordance with ToppGene’s Terms of Use. For more information, please visit: https://toppgene.cchmc.org/navigation/termsofuse.jsp
The toppData dataframe includes all results from toppGene. We can use this dataframe to quickly generate pathway analysis plots using the toppPlot() function. The function can be used to generate a single plot, for example:
The toppPlot() function can also create a plot for each cluster for a
specified category; simply assign the parameter clusters
to
NULL. In this case, the function will return a list of plots.
plot_list <- toppPlot(toppData,
category = "GeneOntologyMolecularFunction",
clusters = NULL)
plot_list[1]
#> $data
#> Category ID
#> 1 GeneOntologyMolecularFunction GO:0009032
#> 2 GeneOntologyMolecularFunction GO:0031703
#> 3 GeneOntologyMolecularFunction GO:0004947
#> 4 GeneOntologyMolecularFunction GO:0003845
#> 5 GeneOntologyMolecularFunction GO:0035276
#> 6 GeneOntologyMolecularFunction GO:0000293
#> 7 GeneOntologyMolecularFunction GO:0008823
#> 8 GeneOntologyMolecularFunction GO:0052851
#> 9 GeneOntologyMolecularFunction GO:0004556
#> 10 GeneOntologyMolecularFunction GO:0016160
#> Name PValue
#> 1 thymidine phosphorylase activity 1.693908e-03
#> 2 type 2 angiotensin receptor binding 1.693908e-03
#> 3 bradykinin receptor activity 1.693908e-03
#> 4 11-beta-hydroxysteroid dehydrogenase [NAD(P)+] activity 1.693908e-03
#> 5 ethanol binding 4.007714e-05
#> 6 ferric-chelate reductase activity 6.538038e-04
#> 7 cupric reductase (NADH) activity 6.538038e-04
#> 8 ferric-chelate reductase (NADPH) activity 6.538038e-04
#> 9 alpha-amylase activity 6.538038e-04
#> 10 amylase activity 1.267530e-03
#> QValueFDRBH QValueFDRBY QValueBonferroni TotalGenes GenesInTerm GenesInQuery
#> 1 0.064414282 0.50415681 1.00000000 19912 2 820
#> 2 0.064414282 0.50415681 1.00000000 19912 2 820
#> 3 0.064414282 0.50415681 1.00000000 19912 2 820
#> 4 0.064414282 0.50415681 1.00000000 19912 2 820
#> 5 0.005467418 0.04279231 0.05638853 19912 6 820
#> 6 0.043804853 0.34285122 0.91990192 19912 5 820
#> 7 0.043804853 0.34285122 0.91990192 19912 5 820
#> 8 0.043804853 0.34285122 0.91990192 19912 5 820
#> 9 0.043804853 0.34285122 0.91990192 19912 5 820
#> 10 0.061510511 0.48142961 1.00000000 19912 6 820
#> GenesInTermInQuery Source URL Cluster nlog10_fdr geneRatio
#> 1 2 cluster0 1.191018 1.0000000
#> 2 2 cluster0 1.191018 1.0000000
#> 3 2 cluster0 1.191018 1.0000000
#> 4 2 cluster0 1.191018 1.0000000
#> 5 4 cluster0 2.262218 0.6666667
#> 6 3 cluster0 1.358478 0.6000000
#> 7 3 cluster0 1.358478 0.6000000
#> 8 3 cluster0 1.358478 0.6000000
#> 9 3 cluster0 1.358478 0.6000000
#> 10 3 cluster0 1.211051 0.5000000
All of these plots can also be automatically saved by the toppPlot() function. The files and their save locations can be set using the parameters: -save = TRUE -save_dir=“/path/to/save_directory” -file_name_prefix=“GO_Molecular_Function”
The cluster/celltype name will be automatically added to the filename prior to saving.
plot_list <- toppPlot(toppData,
category = "GeneOntologyMolecularFunction",
clusters = NULL,
save = TRUE,
save_dir = "./GO_results",
file_prefix = "GO_molecular_function")
scToppR also uses the toppBalloon() function to create a balloon plot, allowing researchers to quickly compare the top terms from the ToppGene results.
toppBalloon(toppData,
categories = "GeneOntologyBiologicalProcess")
#> Balloon Plot: GeneOntologyBiologicalProcess
Some advantages of using scToppR in a pipeline include access to the other categories in ToppGene. Users can quickly view results from all ToppGene categories using these plotting function, or by examining the toppData results. For example, a user could explore any common results among celltypes in terms such as Pathway, ToppCell, and TFBS.
For example, a quick look at the toppBalloon plot for Pathway shows a distinction with the Dendritic Cells compared to others:
toppBalloon(toppData,
categories = "Pathway")
#> Balloon Plot: Pathway
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
The Pubmed category also provides researchers with other papers exploring similar data:
To save toppData results, scToppR also includes a toppSave()
function. This function can save the toppData results as a single file,
or it can split the data into different clusters/celltypes and save each
individually. To do so, set save = TRUE
in the function
call. The function saves the files as Excel spreadsheets by default, but
this can be changed to .csv or .tsv files using the format
parameter.
toppSave(toppData,
filename = "airway_toppData",
save_dir = "./toppData_results"
split = TRUE,
format = "xlsx")
sessionInfo()
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] DESeq2_1.47.0 airway_1.27.0
#> [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
#> [5] GenomicRanges_1.59.0 GenomeInfoDb_1.43.0
#> [7] IRanges_2.41.0 S4Vectors_0.45.1
#> [9] BiocGenerics_0.53.2 generics_0.1.3
#> [11] MatrixGenerics_1.19.0 matrixStats_1.4.1
#> [13] scToppR_0.99.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 rjson_0.2.23 xfun_0.49
#> [4] bslib_0.8.0 ggplot2_3.5.1 lattice_0.22-6
#> [7] vctrs_0.6.5 tools_4.5.0 curl_6.0.1
#> [10] parallel_4.5.0 tibble_3.2.1 fansi_1.0.6
#> [13] pkgconfig_2.0.3 Matrix_1.7-1 lifecycle_1.0.4
#> [16] GenomeInfoDbData_1.2.13 compiler_4.5.0 farver_2.1.2
#> [19] stringr_1.5.1 munsell_0.5.1 codetools_0.2-20
#> [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
#> [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
#> [28] BiocParallel_1.41.0 cachem_1.1.0 DelayedArray_0.33.1
#> [31] viridis_0.6.5 abind_1.4-8 locfit_1.5-9.10
#> [34] tidyselect_1.2.1 zip_2.3.1 digest_0.6.37
#> [37] stringi_1.8.4 dplyr_1.1.4 labeling_0.4.3
#> [40] forcats_1.0.0 fastmap_1.2.0 grid_4.5.0
#> [43] colorspace_2.1-1 cli_3.6.3 SparseArray_1.7.1
#> [46] magrittr_2.0.3 patchwork_1.3.0 S4Arrays_1.7.1
#> [49] utf8_1.2.4 withr_3.0.2 scales_1.3.0
#> [52] UCSC.utils_1.3.0 rmarkdown_2.29 XVector_0.47.0
#> [55] httr_1.4.7 gridExtra_2.3 openxlsx_4.2.7.1
#> [58] evaluate_1.0.1 knitr_1.49 viridisLite_0.4.2
#> [61] rlang_1.1.4 Rcpp_1.0.13-1 glue_1.8.0
#> [64] jsonlite_1.8.9 R6_2.5.1 zlibbioc_1.53.0