Retinal Organoids counts data: Gene and Isoform Level Exploration

Required packages

library(HumanRetinaLRSData)
library(SummarizedExperiment)
library(ggplot2)
library(ggrepel)

Overview

HumanRetinaLRSData is a companion data package for our long-read sequencing study characterising isoform diversity, splicing dynamics, and allele-specific expression in developing human retinal organoids.

In this study we used nanopore-based long-read RNA sequencing to profile human stem cell-derived retinal organoids across multiple developmental stages, as well as purified retinal ganglion cells. These datasets enable exploration of gene- and isoform-level expression patterns, detection of differential isoform usage, identification of neuron-specific splicing programs, and quantification of allelic imbalance.

All raw and processed data generated in the study have been deposited in an Open Science Framework (OSF) repository. The HumanRetinaLRSData package provides convenient access to these files directly from the OSF node using the osfr package.

For ease of use, HumanRetinaLRSData is designed to:

Load data

se_gene     <- ROGeneLevelData()
se_isoform  <- ROIsoformLevelData()

Visualise gene-level expression

We perform PCA on log2-transformed CPM values and colour samples by developmental stage.

stage_colors <- c(
  "stage 1" = "orange",
  "stage 2" = "seagreen",
  "stage 3" = "purple"
)

expr_gene <- assay(se_gene, "cpm")
pca_gene  <- prcomp(t(log2(expr_gene + 1)), scale. = TRUE)
var_gene  <- round(
  pca_gene$sdev^2 / sum(pca_gene$sdev^2) * 100, 1
)

pca_df_gene <- data.frame(
  PC1    = pca_gene$x[, 1],
  PC2    = pca_gene$x[, 2],
  sample = colnames(expr_gene),
  stage  = colData(se_gene)[["stage"]]
)

ggplot(pca_df_gene, aes(x = PC1, y = PC2, color = stage)) +
  geom_point(size = 3) +
  geom_label_repel(
    aes(label = sample),
    size = 2.5, show.legend = FALSE
  ) +
  scale_color_manual(values = stage_colors, name = "Stage") +
  labs(
    title = "PCA of Gene-Level Expression",
    x     = paste0("PC1 (", var_gene[1], "%)"),
    y     = paste0("PC2 (", var_gene[2], "%)")
  ) +
  theme_bw(base_size = 12)

Visualise isoform-level expression

expr_iso <- assay(se_isoform, "cpm")
pca_iso  <- prcomp(t(log2(expr_iso + 1)), scale. = TRUE)
var_iso  <- round(
  pca_iso$sdev^2 / sum(pca_iso$sdev^2) * 100, 1
)

pca_df_iso <- data.frame(
  PC1    = pca_iso$x[, 1],
  PC2    = pca_iso$x[, 2],
  sample = colnames(expr_iso),
  stage  = colData(se_isoform)[["stage"]]
)

ggplot(pca_df_iso, aes(x = PC1, y = PC2, color = stage)) +
  geom_point(size = 3) +
  geom_label_repel(
    aes(label = sample),
    size = 2.5, show.legend = FALSE
  ) +
  scale_color_manual(values = stage_colors, name = "Stage") +
  labs(
    title = "PCA of Isoform-Level Expression",
    x     = paste0("PC1 (", var_iso[1], "%)"),
    y     = paste0("PC2 (", var_iso[2], "%)")
  ) +
  theme_bw(base_size = 12)

Interpretation

The PCA plots reveal:

Session information

sessionInfo()
## R version 4.6.0 alpha (2026-04-05 r89794)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggrepel_0.9.8               ggplot2_4.0.2              
##  [3] HumanRetinaLRSData_0.99.5   SummarizedExperiment_1.41.1
##  [5] Biobase_2.71.0              GenomicRanges_1.63.2       
##  [7] Seqinfo_1.1.0               IRanges_2.45.0             
##  [9] S4Vectors_0.49.1-1          BiocGenerics_0.57.0        
## [11] generics_0.1.4              MatrixGenerics_1.23.0      
## [13] matrixStats_1.5.0          
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        xfun_0.57           bslib_0.10.0       
##  [4] httr2_1.2.2         lattice_0.22-9      vctrs_0.7.3        
##  [7] tools_4.6.0         curl_7.0.0          tibble_3.3.1       
## [10] RSQLite_2.4.6       blob_1.3.0          pkgconfig_2.0.3    
## [13] Matrix_1.7-5        dbplyr_2.5.2        RColorBrewer_1.1-3 
## [16] S7_0.2.1-1          lifecycle_1.0.5     compiler_4.6.0     
## [19] farver_2.1.2        htmltools_0.5.9     sass_0.4.10        
## [22] yaml_2.3.12         pillar_1.11.1       jquerylib_0.1.4    
## [25] DelayedArray_0.37.1 cachem_1.1.0        abind_1.4-8        
## [28] tidyselect_1.2.1    digest_0.6.39       stringi_1.8.7      
## [31] dplyr_1.2.1         purrr_1.2.2         labeling_0.4.3     
## [34] fastmap_1.2.0       grid_4.6.0          cli_3.6.6          
## [37] SparseArray_1.11.13 magrittr_2.0.5      S4Arrays_1.11.1    
## [40] triebeard_0.4.1     dichromat_2.0-0.1   crul_1.6.0         
## [43] withr_3.0.2         osfr_0.2.9          filelock_1.0.3     
## [46] scales_1.4.0        rappdirs_0.3.4      bit64_4.6.0-1      
## [49] rmarkdown_2.31      XVector_0.51.0      httr_1.4.8         
## [52] bit_4.6.0           otel_0.2.0          memoise_2.0.1      
## [55] evaluate_1.0.5      knitr_1.51          BiocFileCache_3.1.0
## [58] urltools_1.7.3.1    rlang_1.2.0         Rcpp_1.1.1-1       
## [61] glue_1.8.0          DBI_1.3.0           httpcode_0.3.0     
## [64] jsonlite_2.0.0      R6_2.6.1            fs_2.0.1