This package functions as an API wrapper to ToppGene. It takes a file from Seurat’s FindAllMarkers, Presto’s Wilcoxauc functions, or similarly formatted data that contains columns of genes, groups of cells (clusters or celltypes), avg log fold changes, and p-values.
As an introduction, this vignette will work with the FindAllMarkers output from Seurat’s PBMC 3k clustering tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
You can follow that tutorial and get the markers file from this line:
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)
Alternatively, this markers table is included in the scToppR package:
library(scToppR)
data("pbmc.markers")
head(pbmc.markers)
#> p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
#> RPS12 1.273332e-143 0.7387061 1.000 0.991 1.746248e-139 0 RPS12
#> RPS6 6.817653e-143 0.6934523 1.000 0.995 9.349729e-139 0 RPS6
#> RPS27 4.661810e-141 0.7372604 0.999 0.992 6.393206e-137 0 RPS27
#> RPL32 8.158412e-138 0.6266075 0.999 0.995 1.118845e-133 0 RPL32
#> RPS14 5.177478e-130 0.6336957 1.000 0.994 7.100394e-126 0 RPS14
#> RPS25 3.244898e-123 0.7689940 0.997 0.975 4.450053e-119 0 RPS25
With this data we can run the function toppFun
to get
results from ToppGene.
toppData <- toppFun(markers = pbmc.markers,
topp_categories = NULL,
cluster_col = "cluster",
gene_col = "gene",
p_val_col = "p_val_adj",
logFC_col = "avg_log2FC")
#> This function returns data generated from ToppGene (https://toppgene.cchmc.org/)
#>
#> Any use of this data must be done so under the Terms of Use and citation guide established by ToppGene.
#>
#> Terms of Use: https://toppgene.cchmc.org/navigation/termsofuse.jsp
#> Citations: https://toppgene.cchmc.org/help/publications.jsp
#> Working on cluster: 0
#> Working on cluster: 1
#> Working on cluster: 2
#> Working on cluster: 3
#> Working on cluster: 4
#> Working on cluster: 5
#> Working on cluster: 6
#> Working on cluster: 7
#> Working on cluster: 8
Here it is important to tell toppFun the names of the relevant columns for clusters and genes. Additionally, you can run toppFun on all ToppGene categories by setting topp_categories to NULL. You may also provide 1 or more specific categories as a list. To see all ToppGene categories, you can also use the function get_ToppCats():
get_ToppCats()
#> [1] "GeneOntologyMolecularFunction" "GeneOntologyBiologicalProcess"
#> [3] "GeneOntologyCellularComponent" "HumanPheno"
#> [5] "MousePheno" "Domain"
#> [7] "Pathway" "Pubmed"
#> [9] "Interaction" "Cytoband"
#> [11] "TFBS" "GeneFamily"
#> [13] "Coexpression" "CoexpressionAtlas"
#> [15] "ToppCell" "Computational"
#> [17] "MicroRNA" "Drug"
#> [19] "Disease"
You can also set additional parameters in the toppFun function, please check the documentation for more information.
The results of toppFun are organized into a data frame as such:
head(toppData)
#> Category ID
#> 1 GeneOntologyMolecularFunction GO:0003735
#> 2 GeneOntologyMolecularFunction GO:0005198
#> 3 GeneOntologyMolecularFunction GO:0019843
#> 4 GeneOntologyMolecularFunction GO:1990948
#> 5 GeneOntologyMolecularFunction GO:0055105
#> 6 GeneOntologyMolecularFunction GO:0070180
#> Name PValue QValueFDRBH
#> 1 structural constituent of ribosome 1.551778e-96 9.108938e-94
#> 2 structural molecule activity 1.810316e-46 5.313277e-44
#> 3 rRNA binding 3.550084e-30 6.946331e-28
#> 4 ubiquitin ligase inhibitor activity 1.381737e-12 2.027699e-10
#> 5 ubiquitin-protein transferase inhibitor activity 4.557616e-12 5.350641e-10
#> 6 large ribosomal subunit rRNA binding 1.240237e-11 1.213365e-09
#> QValueFDRBY QValueBonferroni TotalGenes GenesInTerm GenesInQuery
#> 1 6.333528e-93 9.108938e-94 19912 195 245
#> 2 3.694370e-43 1.062655e-43 19912 911 245
#> 3 4.829848e-27 2.083899e-27 19912 81 245
#> 4 1.409878e-09 8.110798e-10 19912 9 245
#> 5 3.720350e-09 2.675321e-09 19912 10 245
#> 6 8.436637e-09 7.280189e-09 19912 11 245
#> GenesInTermInQuery Source URL Cluster
#> 1 76 0
#> 2 80 0
#> 3 26 0
#> 4 7 0
#> 5 7 0
#> 6 7 0
scToppR can automatically create DotPlots for each ToppGene category. Simply run:
plots <- toppPlot(toppData, category = "GeneOntologyMolecularFunction", clusters = NULL)
#> Multiple clusters entered: function returns a list of ggplots
plots[1]
#> $`0`
This will create a list of plots for all clusters in one specific
category. Here, the category “GenoOntologyMolecularFunction” was
requested, and the clusters
parameter was left NULL as
default. If clusters
is NULL, then all available ones are
used. For example, the output here creates a list of plots for each
cluster for the “GenoOntologyMolecularFunction”. If multiple clusters
are selected, users can use combine = TRUE
to return a
patchwork object of plots. Leaving combine = FALSE
returns
a list of ggplot objects. If using the save = TRUE
parameter, the function will automatically save each individual plot in
the format: {category}_{cluster}_dotplot.pdf
scToppR can also create balloon plots showing overlapping terms between all clusters.
toppBalloon(toppData, categories = "GeneOntologyMolecularFunction")
#> Balloon Plot: GeneOntologyMolecularFunction
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
#> Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
#> argument is not numeric or logical: returning NA
This function also has a save parameter, which will automatically save plots, which is helpful if multiple categories are visualized.
scToppR will also automatically save the results of the ToppGene
query. By default it will save separate files for each cluster. To save
as one large file, set the parameter split = FALSE
. It will
also save all files as Excel spreadsheets, but this can be changed using
the format
parameter–it must be one of
c("xlsx", "csv", "tsv")
.
sessionInfo()
#> R Under development (unstable) (2024-10-21 r87258)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 DESeq2_1.47.0
#> [3] airway_1.27.0 SummarizedExperiment_1.37.0
#> [5] Biobase_2.67.0 GenomicRanges_1.59.0
#> [7] GenomeInfoDb_1.43.0 IRanges_2.41.0
#> [9] S4Vectors_0.45.1 BiocGenerics_0.53.2
#> [11] generics_0.1.3 MatrixGenerics_1.19.0
#> [13] matrixStats_1.4.1 scToppR_0.99.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.6 rjson_0.2.23 xfun_0.49
#> [4] bslib_0.8.0 ggplot2_3.5.1 lattice_0.22-6
#> [7] vctrs_0.6.5 tools_4.5.0 curl_6.0.1
#> [10] parallel_4.5.0 tibble_3.2.1 fansi_1.0.6
#> [13] pkgconfig_2.0.3 Matrix_1.7-1 lifecycle_1.0.4
#> [16] GenomeInfoDbData_1.2.13 compiler_4.5.0 farver_2.1.2
#> [19] stringr_1.5.1 munsell_0.5.1 codetools_0.2-20
#> [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
#> [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
#> [28] BiocParallel_1.41.0 cachem_1.1.0 DelayedArray_0.33.1
#> [31] viridis_0.6.5 abind_1.4-8 locfit_1.5-9.10
#> [34] tidyselect_1.2.1 zip_2.3.1 digest_0.6.37
#> [37] stringi_1.8.4 labeling_0.4.3 forcats_1.0.0
#> [40] fastmap_1.2.0 grid_4.5.0 colorspace_2.1-1
#> [43] cli_3.6.3 SparseArray_1.7.1 magrittr_2.0.3
#> [46] patchwork_1.3.0 S4Arrays_1.7.1 utf8_1.2.4
#> [49] withr_3.0.2 scales_1.3.0 UCSC.utils_1.3.0
#> [52] rmarkdown_2.29 XVector_0.47.0 httr_1.4.7
#> [55] gridExtra_2.3 openxlsx_4.2.7.1 evaluate_1.0.1
#> [58] knitr_1.49 viridisLite_0.4.2 rlang_1.1.4
#> [61] Rcpp_1.0.13-1 glue_1.8.0 jsonlite_1.8.9
#> [64] R6_2.5.1 zlibbioc_1.53.0