--- title: "Getting Started with RankMap" author: - name: Jinming Cheng affiliation: - &nus_cbds Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore 169857, Singapore # email: jinming.cheng@outlook.com date: "`r format(Sys.time(), '%d %B, %Y')`" vignette: > %\VignetteIndexEntry{Getting Started with RankMap} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc_float: false bibliography-style: plain --- ```{r, include = FALSE} knitr::opts_chunk$set( eval = TRUE, collapse = TRUE, comment = "#>", out.width = "100%", dev = "png", dpi = 60, fig.height = 4.2, fig.width = 5.6 ) ``` # Introduction **RankMap** is an R package for fast, robust, and scalable reference-based cell type annotation in single-cell and spatial transcriptomics data. It works by transforming gene expression matrices into sparse ranked representations and training a multinomial logistic regression model using the `glmnet` framework. This rank-based approach improves robustness to batch effects, platform differences, and partial gene coverage—especially beneficial for technologies such as Xenium and MERFISH. **RankMap** supports commonly used data structures including `Seurat`, `SingleCellExperiment`, and `SpatialExperiment`. The workflow includes flexible preprocessing steps such as top-K gene masking, binning, expression weighting, and scaling, followed by efficient model training and rapid prediction. Compared to existing tools such as **SingleR**, **RCTD** (via **spacexr**), and **Azimuth**, **RankMap** achieves comparable or superior accuracy with significantly faster runtime, making it particularly well suited for high-throughput applications on large datasets. This vignette provides a quick-start guide to using **RankMap** for cell type prediction. # Installation Install RankMap from Bioconductor ```{r install_pkg, eval=FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("RankMap") ``` # Quick Start (Seurat Objects) ## Load Data ```{r load_pkgs} library(RankMap) library(Seurat) ``` Load example single-cell RNA-seq dataset (17,597 genes x 150 cells): ```{r read_sc} seu_sc <- readRDS(system.file("extdata", "seu_sc.rds", package = "RankMap")) seu_sc ``` Load example Xenium spatial transcriptomics dataset (313 genes x 150 cells): ```{r read_xen} seu_xen <- readRDS(system.file("extdata", "seu_xen.rds", package = "RankMap")) seu_xen ``` ## Predict Cell Types Run cell type prediction using the `RankMap()` function. By default, RankMap uses normalized expression from the "data" slot. For spatial datasets with limited gene panels, a smaller `k` (e.g., `k = 20`) is typically sufficient. For single-cell RNA-seq with deeper coverage, larger values of `k` (e.g., 100 or 200) are generally recommended. ```{r run_rankmap} pred_df <- RankMap( ref_data = seu_sc, ref_labels = seu_sc$cell_type, new_data = seu_xen, k = 20 ) ``` The result is a `data.frame` containing: `cell_id`, `predicted_cell_type` and `confidence` ```{r pred_res} head(pred_df) ``` ## Evaluate Performance If ground truth labels are available, you can evaluate prediction accuracy using: ```{r pred_performance} perf <- evaluatePredictionPerformance( prediction_df = pred_df, truth = seu_xen$cell_type_SingleR ) perf ``` # Quick Start (SummarizedExperiment Objects) ## Prepare Data Convert `Seurat` objects into `SingleCellExperiment` objects: ```{r load_pkg_sce} library(SingleCellExperiment) ``` ```{r prepare_sce_data} sce_sc <- SingleCellExperiment( assays = list( counts = GetAssayData(seu_sc, layer = "counts"), logcounts = GetAssayData(seu_sc, layer = "data") ), colData = seu_sc[[]] # seu_sc@meta.data ) sce_sp <- SingleCellExperiment( assays = list( counts = GetAssayData(seu_xen, layer = "counts"), logcounts = GetAssayData(seu_xen, layer = "data") ), colData = seu_xen[[]] # seu_xen@meta.data ) ``` ## Predict Cell Types Run cell type prediction using the `RankMap()` function. Set `k = 100` as a reasonable default when the optimal number of top-ranked genes is unknown. When using `SummarizedExperiment` input, the `logcounts` assay is used automatically. ```{r run_rankmap_sce} pred_df <- RankMap( ref_data = sce_sc, ref_labels = sce_sc$cell_type, new_data = sce_sp, k = 100 ) ``` ## Evaluate Performance Compare predictions with ground truth labels: ```{r pred_performance_sce} perf <- evaluatePredictionPerformance( prediction_df = pred_df, truth = sce_sp$cell_type_SingleR ) perf ``` # Session Info ```{r sessioninfo} sessionInfo() ``` \pagebreak