For the examples in this book, we will rely on a set of publicly available datasets that cover different sequencing-based and imaging-based platforms, namely: Visium, Visium HD, Xenium (10x Genomics), and CosMx (Bruker).
This chapter provides an overview of the example datasets used in the code examples in the later chapters.
6.2 Distribution
6.2.1 OSF repository and OSTA.data
These datasets have been deposited in an Open Storage Framework (OSF) repository here, and can be easily queried and downloaded using functions from the osfr CRAN package. For convenience, we have implemented the OSTA.data Bioconductor package to:
cache invalidation
It can happen that we change, add, or remove data from the OSF repositorying underlying OSTA.data. Should you ever run into any issues that might be related to this, we suggest removing affected cache resource(s) as follows, and retrieving these data anew.
Code
library(BiocFileCache)bfc<-BiocFileCache()# specify dataset identifierid<-"Xenium_HumanColon_Oliveira"# query cached files for 'id'que<-bfcquery(bfc, id)# clear matching resourcebfcremove(bfc, que$rid)# retrieve current datasetOSTA.data_load(id)
list and retrieve datasets available through our OSF node
In addition, several datasets are available from the STexampleData Bioconductor package as pre-formatted SpatialExperiment and SingleCellExperiment formats. These data objects are stored on Bioconductor’s ExperimentHub resource, and can be loaded in R by querying ExperimentHub or using loader functions provided in the STexampleData package.
Data files downloaded with the packages above are stored and managed in a temporary directory using BiocFileCache. Sometimes, these temporary files may need to be deleted manually, for example if there has been a recent change to files stored in the OSF repository. The code example below shows how to remove the temporary files for one of these datasets. Alternatively, you can also find the temporary directory on your system with BiocFileCache::BiocFileCache() and delete files individually.
Code
# locate and delete files in BiocFileCache directoryid<-"VisiumHD_HumanColon_Oliveira"bfc<-BiocFileCache::BiocFileCache()qid<-BiocFileCache::bfcquery(bfc, id)$ridBiocFileCache::bfcremove(bfc, qid)
6.3 Datasets
Below, we briefly summarize the characteristics of several key datasets, and note across which parts of the book these are being used.
6.3.1 HumanBreast_Janesick
In the underlying paper (Janesick et al. 2023), the Xenium data (2 replicates) were accompanied by consecutive slices of Chromium and Visium data. Therefore, these replicates are expected to have nearly identical biological findings. By transferring Chromium cell type labels to spatial technologies, such as Visium (with full transcriptome) and Xenium (at single-cell resolution), we can combine analytical insights from different platforms.
In the underlying paper, there are both normal adjacent tissue (NAT) and colorectal carcinoma (CRC) samples from 5 patients. The Visium HD data (P2 CRC) were accompanied by consecutive slices of Chromium, Visium, and Xenium data. Therefore, we can jointly analyze these modalities.
de Oliveira, Michelli Faria, Juan Pablo Romero, Meii Chung, Stephen R. Williams, Andrew D. Gottscho, Anushka Gupta, Susan E. Pilipauskas, et al. 2025. “High-Definition Spatial Transcriptomic Profiling of Immune Cell Populations in Colorectal Cancer.”Nature Genetics 57: 1512–23. https://doi.org/10.1038/s41588-025-02193-3.
Janesick, Amanda, Robert Shelansky, Andrew D. Gottscho, Florian Wagner, Stephen R. Williams, Morgane Rouault, Ghezal Beliakoff, et al. 2023. “High Resolution Mapping of the Tumor Microenvironment Using Integrated Single-Cell, Spatial and in Situ Analysis.”Nature Communications 14 (8353). https://doi.org/10.1038/s41467-023-43458-x.