#+SETUPFILE: orgsetup.org

* Annotation resources - =ensembldb=

*CSAMA2019*

*Johannes Rainer* (Eurac Research, Italy)
johannes.rainer@eurac.edu
github: /jorainer/ twitter: /jo_rainer/

** Annotation of genomic features

+ Annotations for genomic features (genes, transcripts, exons) provided by
  =TxDb= (=GenomicFeatures=) and =EnsDb= (=ensembldb=) databases.
+ =EnsDb=:
  - Designed for Ensembl-based annotations.
  - One database per species and Ensembl release.
+ Extract data using methods:
  - =genes=
  - =transcripts=
  - =exons=
  - =txBy=
  - =exonsBy=
  - ... 
+ Results returned as =GRanges=, =GRangesList= or =DataFrame=.
+ _Example_: get all gene annotations from an =EnsDb=:
  #+BEGIN_SRC R
    ## Load the database for human genes, Ensembl release 86.
    library(EnsDb.Hsapiens.v86)
    edb <- EnsDb.Hsapiens.v86

    ## Get all genes from the database.
    gns <- genes(edb)

    gns
  #+END_SRC

** =AnnotationFilter=: basic classes for filtering annotation resources

+ Extracting the full data not always required: filter databases.
+ =AnnotationFilter= defines *concepts* for filtering data resources.
+ One filter class for each annotation type/database column.
+ _Example_: create filters
  #+BEGIN_SRC R
    ## Create filter using the constructor function
    GeneNameFilter("BCL2", condition = "!=")

    ## Create using a filter expression
    AnnotationFilter(~ gene_name != "BCL2")

    ## Combine filters
    AnnotationFilter(~ seq_name == "X" & gene_biotype == "lincRNA")
  #+END_SRC

** Filtering =EnsDb= databases

+ _Example_: what filters can we use?
  #+BEGIN_SRC R
    ## List all supported filters by an EnsDb
    supportedFilters(edb)
  #+END_SRC
+ Provide filter(s) with the =filter= parameter or use the =filter= function to
  subset the data resource.
+ _Example_: get all transcripts for the gene /BCL2/.
  #+BEGIN_SRC R
    ## Get all transcripts for BCL2
    txs <- transcripts(edb, filter = ~ gene_name == "BCL2")
    txs

    ## Combine filters: only protein coding tx for the gene
    txs <- transcripts(edb, filter = ~ gene_name == "BCL2" &
                                tx_biotype == "protein_coding")
    txs

    ## For the pipe lovers:
    library(magrittr)
    txs <- edb %>%
        filter(~ gene_name == "BCL2" & tx_biotype == "protein_coding") %>%
        transcripts
    txs
  #+END_SRC
  
** Additional =ensembldb= capabilities

+ =EnsDb= contain also protein annotations:
  - Protein sequence.
  - Mapping of transcripts to proteins.
  - Annotation to Uniprot accessions.
  - Annotation of all protein domains within the protein sequences.
+ Functionality to map coordinates:
  - =genomeToTranscript=
  - =genomeToProtein=
  - =transcriptToGenome=
  - =transcriptToProtein=
  - =proteinToGenome=
  - =proteinToTranscript=

** Where to find =EnsDb= databases?

+ =AnnotationHub=!
+ _Example_: list =EnsDb= databases in =AnnotationHub=
  #+BEGIN_SRC R
    library(AnnotationHub)
    ah <- AnnotationHub()
    query(ah, "EnsDb")
  #+END_SRC

** Finally...

*Thank you for your attention!*