Contents

1 Introduction

scToppR is a package that allows seamless, workflow-based interaction with ToppGene, a portal for gene enrichment analysis. Researchers can use scToppR to directly query ToppGene’s databases and conduct analysis with a few lines of code. scToppR’s availability on Bioconductor ensures easy installation and integration with other Bioconductor workflows, allowing researchers to incorporate functional enrichment analysis from ToppGene into their existing pipelines.

The use of data from ToppGene is governed by their Terms of Use: https://toppgene.cchmc.org/navigation/termsofuse.jsp

This vignette demonstrates the use of scToppR within a differential expression workflow. We show the complete workflow from differential expression results to pathway analysis and visualization. While the examples show how to make live API calls to ToppGene, this vignette uses pre-computed results to ensure reproducibility and avoid dependency on internet connectivity.

2 Installation

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("scToppR")

3 Load Data

As an introduction, this vignette will work with the FindAllMarkers output from Seurat’s PBMC 3k clustering tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

You can follow that tutorial and get the markers file from this line:

pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE)

Alternatively, this markers table is included in the scToppR package:

library(scToppR)
data("pbmc.markers")
head(pbmc.markers)
#>               p_val avg_log2FC pct.1 pct.2     p_val_adj cluster  gene
#> RPS12 1.273332e-143  0.7387061 1.000 0.991 1.746248e-139       0 RPS12
#> RPS6  6.817653e-143  0.6934523 1.000 0.995 9.349729e-139       0  RPS6
#> RPS27 4.661810e-141  0.7372604 0.999 0.992 6.393206e-137       0 RPS27
#> RPL32 8.158412e-138  0.6266075 0.999 0.995 1.118845e-133       0 RPL32
#> RPS14 5.177478e-130  0.6336957 1.000 0.994 7.100394e-126       0 RPS14
#> RPS25 3.244898e-123  0.7689940 0.997 0.975 4.450053e-119       0 RPS25

With this data we can run the function toppFun to get results from ToppGene. The toppFun function can accept three different data formats:

The pbmc.markers data is in the “degs” format, so we will set type = "degs" in the toppFun function. We will also need to specify the relevant columns for clusters, genes, p values, and log fold changes:

# This is how you would run the analysis with live data (requires internet)
if (curl::has_internet()) {
     toppdata.pbmc <- toppFun(
        input_data = pbmc.markers,
        type = "degs",
        topp_categories = NULL,
        cluster_col = "cluster",
        gene_col = "gene",
        p_val_col = "p_val_adj",
        logFC_col = "avg_log2FC"
    )
} else {
   data("toppdata.pbmc")
}
#> This function returns data generated from ToppGene (https://toppgene.cchmc.org/)
#> 
#> Any use of this data must be done so under the Terms of Use and citation guide established by ToppGene.
#> 
#> Terms of Use: https://toppgene.cchmc.org/navigation/termsofuse.jsp
#> Citations: https://toppgene.cchmc.org/help/publications.jsp
#> Working on cluster:0
#> Working on cluster:1
#> Working on cluster:2
#> Working on cluster:3
#> Working on cluster:4
#> Working on cluster:5
#> Working on cluster:6
#> Working on cluster:7
#> Working on cluster:8

head(toppdata.pbmc)
#>                        Category         ID                                Name
#> 1 GeneOntologyMolecularFunction GO:0003735  structural constituent of ribosome
#> 2 GeneOntologyMolecularFunction GO:0005198        structural molecule activity
#> 3 GeneOntologyMolecularFunction GO:0019843                        rRNA binding
#> 4 GeneOntologyMolecularFunction GO:1990948 ubiquitin ligase inhibitor activity
#> 5 GeneOntologyMolecularFunction GO:1990932                   5.8S rRNA binding
#> 6 GeneOntologyMolecularFunction GO:0048027                 mRNA 5'-UTR binding
#>         PValue  QValueFDRBH  QValueFDRBY QValueBonferroni TotalGenes
#> 1 1.418323e-99 8.566669e-97 5.980920e-96     8.566669e-97      19978
#> 2 9.507317e-47 2.871210e-44 2.004569e-43     5.742419e-44      19978
#> 3 1.453993e-27 2.927374e-25 2.043780e-24     8.782121e-25      19978
#> 4 6.350812e-11 9.589727e-09 6.695180e-08     3.835891e-08      19978
#> 5 2.718755e-10 3.284256e-08 2.292942e-07     1.642128e-07      19978
#> 6 4.258090e-10 4.286477e-08 2.992654e-07     2.571886e-07      19978
#>   GenesInTerm GenesInQuery GenesInTermInQuery Source URL
#> 1         181          246                 76           
#> 2         902          246                 80           
#> 3          77          246                 24           
#> 4          13          246                  7           
#> 5           5          246                  5           
#> 6          25          246                  8           
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Genes
#> 1                            RPL21, RPL22, RPL23A, RPL24, RPL26, RPL27, RPL30, RPL27A, RPL28, RPL29, RPL31, RPL32, RPL34, RPL35A, RPL37, RPL37A, RPL38, RPL41, RPL36A, RPLP0, RPLP1, RPLP2, RPS2, RPS3, RPS3A, RPS4X, RPS4Y1, RPS5, RPS6, RPS7, RPS8, RPS9, RPS10, RPS12, RPS13, RPS14, RPS15, RPS15A, RPS16, RPS18, RPS19, RPS20, RPS21, RPS23, RPS25, RPS26, RPS27, RPS27A, RPS28, RPS29, RPL10A, RPL23, FAU, RPL36, RPSA, RPL14, RPSA2, RPL35, RPL13A, RPL3, RPL4, RPL5, RPL6, RPL7, RPL7A, RPL8, RPL9, RPL10, RPL11, RPL12, RPL13, RPL15, RPL17, RPL18, RPL18A, RPL19
#> 2 RPL21, RPL22, RPL23A, RPL24, RPL26, RPL27, RPL30, RPL27A, RPL28, RPL29, RPL31, RPL32, RPL34, RPL35A, MAL, RPL37, RPL37A, RPL38, RPL41, RPL36A, RPLP0, RPLP1, RPLP2, RPS2, RPS3, RPS3A, RPS4X, RPS4Y1, RPS5, RPS6, RPS7, RPS8, RPS9, RPS10, RPS12, RPS13, RPS14, RPS15, RPS15A, RPS16, SPOCK2, RPS18, RPS19, RPS20, RPS21, RPS23, RPS25, RPS26, ACTN1, RPS27, RPS27A, RPS28, RPS29, RPL10A, RPL23, FAU, RPL36, FBLN5, RPSA, RPL14, RPSA2, RPL35, RPL13A, RPL3, RPL4, RPL5, RPL6, RPL7, RPL7A, RPL8, RPL9, RPL10, RPL11, RPL12, RPL13, RPL15, RPL17, RPL18, RPL18A, RPL19
#> 3                                                                                                                                                                                                                                                                                                                                                                                                           RPL23A, RPL37, RPLP0, RPS3, RPS4X, RPS4Y1, RPS5, RPS9, RPS13, RPS18, RPL23, NPM1, NOP53, RPL3, RPL4, RPL5, RPL6, RPL7, RPL8, RPL9, RPL11, RPL12, RPL17, RPL19
#> 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           RPL37, RPS7, RPS15, RPS20, RPL23, RPL5, RPL11
#> 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          RPS9, RPS13, RPL6, RPL8, RPL19
#> 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   RPL26, RPL41, RSL1D1, RPS3A, RPS7, RPS13, RPS14, RPL5
#>   Cluster
#> 1       0
#> 2       0
#> 3       0
#> 4       0
#> 5       0
#> 6       0

Additionally, you can run toppFun on all ToppGene categories by setting topp_categories to NULL. You may also provide 1 or more specific categories as a list. To see all ToppGene categories, you can also use the function get_ToppCats():

get_ToppCats()
#>  [1] "GeneOntologyMolecularFunction" "GeneOntologyBiologicalProcess"
#>  [3] "GeneOntologyCellularComponent" "HumanPheno"                   
#>  [5] "MousePheno"                    "Domain"                       
#>  [7] "Pathway"                       "Pubmed"                       
#>  [9] "Interaction"                   "Cytoband"                     
#> [11] "TFBS"                          "GeneFamily"                   
#> [13] "Coexpression"                  "CoexpressionAtlas"            
#> [15] "ToppCell"                      "Computational"                
#> [17] "MicroRNA"                      "Drug"                         
#> [19] "Disease"

You can also set additional parameters in the toppFun function, please check the documentation for more information.

The results of toppFun (whether from a live API call or loaded from cached data) are organized into a data frame with the following structure:

# Examine the structure of the results
str(toppdata.pbmc)
#> 'data.frame':    8550 obs. of  15 variables:
#>  $ Category          : chr  "GeneOntologyMolecularFunction" "GeneOntologyMolecularFunction" "GeneOntologyMolecularFunction" "GeneOntologyMolecularFunction" ...
#>  $ ID                : chr  "GO:0003735" "GO:0005198" "GO:0019843" "GO:1990948" ...
#>  $ Name              : chr  "structural constituent of ribosome" "structural molecule activity" "rRNA binding" "ubiquitin ligase inhibitor activity" ...
#>  $ PValue            : num  1.42e-99 9.51e-47 1.45e-27 6.35e-11 2.72e-10 ...
#>  $ QValueFDRBH       : num  8.57e-97 2.87e-44 2.93e-25 9.59e-09 3.28e-08 ...
#>  $ QValueFDRBY       : num  5.98e-96 2.00e-43 2.04e-24 6.70e-08 2.29e-07 ...
#>  $ QValueBonferroni  : num  8.57e-97 5.74e-44 8.78e-25 3.84e-08 1.64e-07 ...
#>  $ TotalGenes        : int  19978 19978 19978 19978 19978 19978 19978 19978 19978 19978 ...
#>  $ GenesInTerm       : int  181 902 77 13 5 25 10 18 14 605 ...
#>  $ GenesInQuery      : int  246 246 246 246 246 246 246 246 246 246 ...
#>  $ GenesInTermInQuery: int  76 80 24 7 5 8 6 7 6 24 ...
#>  $ Source            : chr  " " " " " " " " ...
#>  $ URL               : chr  " " " " " " " " ...
#>  $ Genes             : chr  "RPL21, RPL22, RPL23A, RPL24, RPL26, RPL27, RPL30, RPL27A, RPL28, RPL29, RPL31, RPL32, RPL34, RPL35A, RPL37, RPL"| __truncated__ "RPL21, RPL22, RPL23A, RPL24, RPL26, RPL27, RPL30, RPL27A, RPL28, RPL29, RPL31, RPL32, RPL34, RPL35A, MAL, RPL37"| __truncated__ "RPL23A, RPL37, RPLP0, RPS3, RPS4X, RPS4Y1, RPS5, RPS9, RPS13, RPS18, RPL23, NPM1, NOP53, RPL3, RPL4, RPL5, RPL6"| __truncated__ "RPL37, RPS7, RPS15, RPS20, RPL23, RPL5, RPL11" ...
#>  $ Cluster           : chr  "0" "0" "0" "0" ...
cat("Number of enriched terms:", nrow(toppdata.pbmc), "\n")
#> Number of enriched terms: 8550
cat("Categories analyzed:", length(unique(toppdata.pbmc$Category)), "\n")
#> Categories analyzed: 19
cat("Clusters analyzed:", length(unique(toppdata.pbmc$Cluster)), "\n")
#> Clusters analyzed: 9

3.1 Plotting

scToppR can automatically create DotPlots for each ToppGene category. Simply run:

plots <- toppPlot(toppdata.pbmc,
    category = "GeneOntologyMolecularFunction",
    clusters = NULL
)
#> Warning in toppPlot.data.frame(toppdata.pbmc, category =
#> "GeneOntologyMolecularFunction", : P value adjustment not found - using 'BH' by
#> default. For no adjustment, use p_val_adj = 'none'.
#> Multiple clusters entered: function returns a list of ggplots
plots[1]
#> $`0`

This will create a list of plots for all clusters in one specific category. Here, the category “GenoOntologyMolecularFunction” was requested, and the clusters parameter was left NULL as default. If clusters is NULL, then all available ones are used. For example, the output here creates a list of plots for each cluster for the “GenoOntologyMolecularFunction”. If multiple clusters are selected, users can use combine = TRUE to return a patchwork object of plots. Leaving combine = FALSE returns a list of ggplot objects. If using the save = TRUE parameter, the function will automatically save each individual plot in the format: {category}_{cluster}_dotplot.pdf

scToppR can also create balloon plots showing overlapping terms between all clusters.

toppBalloon(toppdata.pbmc, categories = "GeneOntologyMolecularFunction")
#> Creating Balloon Plot:GeneOntologyMolecularFunction

This function also has a save parameter, which will automatically save plots, which is helpful if multiple categories are visualized.

3.2 Saving

scToppR will also automatically save the results of the ToppGene query. By default it will save separate files for each cluster. To save as one large file, set the parameter split = FALSE. It will also save all files as Excel spreadsheets, but this can be changed using the format parameter–it must be one of c("xlsx", "csv", "tsv").

tmpdir <- tempdir()
toppSave(toppdata.pbmc, filename = "PBMC", save_dir = tmpdir, split = TRUE, format = "xlsx")
#> Saving file:/tmp/RtmpCjRf7W/PBMC_0.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_1.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_2.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_3.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_4.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_5.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_6.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_7.xlsx
#> Saving file:/tmp/RtmpCjRf7W/PBMC_8.xlsx
sessionInfo()
#> R version 4.6.0 alpha (2026-04-05 r89794)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.2.1                 DESeq2_1.51.7              
#>  [3] airway_1.31.0               SummarizedExperiment_1.41.1
#>  [5] Biobase_2.71.0              GenomicRanges_1.63.2       
#>  [7] Seqinfo_1.1.0               IRanges_2.45.0             
#>  [9] S4Vectors_0.49.2            BiocGenerics_0.57.1        
#> [11] generics_0.1.4              MatrixGenerics_1.23.0      
#> [13] matrixStats_1.5.0           scToppR_0.99.10            
#> [15] knitr_1.51                  BiocStyle_2.39.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.6        xfun_0.57           bslib_0.10.0       
#>  [4] ggplot2_4.0.2       httr2_1.2.2         lattice_0.22-9     
#>  [7] vctrs_0.7.3         tools_4.6.0         curl_7.0.0         
#> [10] parallel_4.6.0      tibble_3.3.1        pkgconfig_2.0.3    
#> [13] Matrix_1.7-5        RColorBrewer_1.1-3  S7_0.2.1-1         
#> [16] lifecycle_1.0.5     compiler_4.6.0      farver_2.1.2       
#> [19] stringr_1.6.0       textshaping_1.0.5   tinytex_0.59       
#> [22] codetools_0.2-20    htmltools_0.5.9     sass_0.4.10        
#> [25] yaml_2.3.12         pillar_1.11.1       jquerylib_0.1.4    
#> [28] BiocParallel_1.45.0 cachem_1.1.0        DelayedArray_0.37.1
#> [31] magick_2.9.1        viridis_0.6.5       abind_1.4-8        
#> [34] tidyselect_1.2.1    locfit_1.5-9.12     zip_2.3.3          
#> [37] digest_0.6.39       stringi_1.8.7       bookdown_0.46      
#> [40] labeling_0.4.3      forcats_1.0.1       fastmap_1.2.0      
#> [43] grid_4.6.0          cli_3.6.6           SparseArray_1.11.13
#> [46] magrittr_2.0.5      patchwork_1.3.2     S4Arrays_1.11.1    
#> [49] dichromat_2.0-0.1   withr_3.0.2         scales_1.4.0       
#> [52] rappdirs_0.3.4      rmarkdown_2.31      XVector_0.51.0     
#> [55] otel_0.2.0          gridExtra_2.3       ragg_1.5.2         
#> [58] openxlsx_4.2.8.1    evaluate_1.0.5      viridisLite_0.4.3  
#> [61] rlang_1.2.0         Rcpp_1.1.1-1        glue_1.8.1         
#> [64] BiocManager_1.30.27 jsonlite_2.0.0      R6_2.6.1           
#> [67] systemfonts_1.3.2