%\VignetteIndexEntry{The GenomeGraphs users guide}
%\VignetteDepends{GenomeGraphs}
%\VignetteKeywords{Visualization}
%\VignettePackage{GenomeGraphs}
\documentclass[11pt]{article}
\usepackage{hyperref}
\usepackage{url}
\usepackage[authoryear,round]{natbib}
\bibliographystyle{plainnat}

\newcommand{\scscst}{\scriptscriptstyle}
\newcommand{\scst}{\scriptstyle}

\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}

<<echo=FALSE>>=
options(width=50)
@ 

\author{Steffen Durinck\footnote{steffen@stat.berkeley.edu} and James
  Bullard\footnote{bullard@stat.berkeley.edu}}
\begin{document}
\title{The GenomeGraphs user's guide}

\maketitle

\tableofcontents
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

Genomic data analyses can benifit from integrated visualization of the
genomic information.  The GenomeGraphs package uses the biomaRt
package to do live queries to Ensembl and translates
e.g. gene/transcript structures to viewports of the grid graphics
package, resulting in genomic information plotted together with your
data.  Possible genomics datasets that can be plotted are: Array CGH
data, gene expression data and sequencing data.

<<>>=
library(GenomeGraphs)
@

<<eval=FALSE, echo=FALSE>>=
require(biomaRt)
require(grid)
source("../../R/GenomeGraphs-classes.R")
source("../../R/GenomeGraphs-methods.R")
source("../../R/Overlay.R")
source("../../R/GenomeGraphs.R")
@ 

\section{Creating a Ensembl annotation graphic}

To create an Ensembl annotation graphic, you need to decide what you
want to plot.  Genes and transcripts can be plotted individually using
the \Robject{Gene} and \Robject{Transcript} objects respectively.  Or
one can plot a gene region the forward strand or reverse strand only
or both.  In this section we will cover these different graphics.

\subsection{Plotting a Gene}

If one wants to plot annotation information from Ensembl then you need
to connect to the Ensembl BioMart database using the useMart function
of the biomaRt package.

<<>>=
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
@

Next we can retrieve the gene structure of the gene of interest.

<<fig=TRUE>>=
gene <- makeGene(id = "ENSG00000095203", type="ensembl_gene_id", biomart = mart)
gdPlot(gene) 
@

\subsection{Adding alternative transcripts}

To add alternative transcripts you first have to create a \Robject{Transcript} object.
Note that the order of the objects in the list determines the order in the plot.
<<fig=TRUE>>=
transcript <- makeTranscript(id = "ENSG00000095203", type="ensembl_gene_id", biomart = mart)
gdPlot(list(gene, transcript))
@  

\subsection{Plotting a gene region}

If you're interested in not just plotting one gene but a whole gene
region the you should create a \Robject{GeneRegion} object.  Note that
a \Robject{GeneRegion} object is strand specific.  In the example
below we will retrieve the genes on the forward (+) strand only and
add a genomic axis as well to give us the base positions.

<<fig=TRUE>>=
plusStrand <- makeGeneRegion(chromosome = 17, start = 30450000, end = 30550000, strand = "+", biomart = mart)
genomeAxis <- makeGenomeAxis(add53 = TRUE)
gdPlot(list(genomeAxis, plusStrand))
@

Let's now add the genes on the negative strand as well and an ideogram
of chromosome 17, highlighting the region we are looking at.

<<fig=TRUE>>=
minStrand <- makeGeneRegion( chromosome = 17, start = 30450000, end = 30550000, strand = "-", biomart = mart)
ideogram <- makeIdeogram(chromosome = 17)
genomeAxis <- makeGenomeAxis(add53=TRUE, add35=TRUE)
gdPlot(list(ideogram, plusStrand, genomeAxis, minStrand), minBase = 30450000, maxBase =  30550000)
@

\section{Adding Array data to the plot}

\subsection{Array CGH and gene expression array data}
The \Robject{Generic Array} object enables plotting of expression and
CGH array data together with segments if available.  The array
intensity data should be given as a matrix, with in the rows t he
different probes and in the columns the different samples.  For each
probe the start location should be given using the probeStart
argument.  This should be a one column matrix.  Lets load some dummy
data.

<<fig=TRUE>>=
data("exampleData", package="GenomeGraphs")

minbase <- 180292097 
maxbase <- 180492096

genesplus <- makeGeneRegion(start = minbase, end = maxbase, 
                            strand = "+", chromosome = "3", biomart=mart)
genesmin <- makeGeneRegion(start = minbase, end = maxbase, 
                           strand = "-", chromosome = "3", biomart=mart)

seg <- makeSegmentation(segStart[[1]], segEnd[[1]], segments[[1]], 
                        dp = DisplayPars(color = "black", lwd=2,lty = "solid"))

cop <- makeGenericArray(intensity  = cn, probeStart = probestart, 
                        trackOverlay =  seg, dp = DisplayPars(size=3, color = "seagreen", type="dot"))

ideog <- makeIdeogram(chromosome = 3)
expres <- makeGenericArray(intensity = intensity, probeStart = exonProbePos, 
                           dp = DisplayPars(color="darkred", type="point"))
genomeAxis <- makeGenomeAxis(add53 = TRUE, add35=TRUE)

gdPlot(list(a=ideog,b=expres,c=cop,d=genesplus,e=genomeAxis,f=genesmin), 
       minBase = minbase, maxBase =maxbase, labelCex = 2)
@ 

\subsection{Exon array data}

The example below plots probe level exon array data and is usefull in
relating alternative splicing with known transcript structures.

<<fig=TRUE>>=
data("unrData", package="GenomeGraphs")

title <- makeTitle(text ="ENSG00000009307", color = "darkred")
exon <- makeExonArray(intensity = unrData, probeStart = unrPositions[,3], 
            probeEnd=unrPositions[,4], probeId = as.character(unrPositions[,1]), 
            nProbes = unrNProbes, dp = DisplayPars(color = "blue", mapColor = "dodgerblue2"), 
            displayProbesets=FALSE)
affyModel.model <- makeGeneModel(start = unrPositions[,3], end = unrPositions[,4])
affyModel <- makeAnnotationTrack(start = unrPositions[,3], end = unrPositions[,4],
                                 feature = "gene_model", group = "ENSG00000009307", 
                                 dp = DisplayPars(gene_model = "darkblue"))

gene <- makeGene(id = "ENSG00000009307", biomart = mart)


transcript <- makeTranscript( id ="ENSG00000009307" , biomart = mart)
legend <- makeLegend(c("affyModel","gene"), fill = c("darkgreen","orange"))
rOverlay <- makeRectangleOverlay(start = 115085100, end = 115086500, region = c(3,5),  
                                 dp = DisplayPars(alpha = .2, fill = "olivedrab1"))
gdPlot(list(title, exon, affyModel, gene, transcript, legend), 
       minBase = 115061061, maxBase=115102147, overlay = rOverlay)
@

\subsection{Plotting Conservation Data}
The UCSC genome browser offers downloadable conservation data for a
variety of species. Here we show how you can plot that conservation
data along with annotation. 

<<fig=TRUE>>=
yeastMart <- useMart("ensembl", dataset = "scerevisiae_gene_ensembl")
minB <- 10000
maxB <- 20000

chrRoman <- as.character(as.roman(1))
grP <- makeGeneRegion(start = minB, end = maxB, strand = "+", 
                      chromosome = chrRoman, biomart = yeastMart)
grM <- makeGeneRegion(start = minB, end = maxB, strand = "-", 
                      chromosome = chrRoman, biomart = yeastMart)
gaxis <- makeGenomeAxis(add53 = TRUE, add35 = TRUE)
conserv <- yeastCons1[yeastCons1[,1] > minB & yeastCons1[,1] < maxB, ]

s1 <- makeSmoothing(x = lowess(conserv[,1], conserv[,2], f = .01)$x,
                    y = lowess(conserv[,1], conserv[,2], f = .01)$y, 
                    dp = DisplayPars(lwd = 3, color = "green"))
s2 <- makeSmoothing(x = lowess(conserv[,1], conserv[,2], f = .1)$x,
                    y = lowess(conserv[,1], conserv[,2], f = .1)$y, 
                    dp = DisplayPars(lwd = 3, color = "purple"))

consTrack <- makeBaseTrack(base = conserv[, 1], value = conserv[,2],
                           dp = DisplayPars(lwd=.2, ylim = c(0, 1.25), 
                           color = "darkblue"), trackOverlay = list(s1, s2))
gdPlot(list(grP, gaxis, grM, "conservation" = consTrack)) 
@ 


\section{Odds and Ends}
In addition to plotting the genes we can enable the plotting of names of genes. 

<<fig=TRUE>>=
plotGeneRegion <- function(chr = 1, minB = 9000, maxB = 13000, rot = 0, col = "green") {
  chrRoman <- as.character(as.roman(1:17)[chr])
  grP <- makeGeneRegion(start = minB, end = maxB, 
                        strand = "+", chromosome = chrRoman, biomart = yeastMart, 
                        dp = DisplayPars(plotId = TRUE, idRotation = rot, 
                        idColor = col))
  gaxis <- makeGenomeAxis( add53 = TRUE, add35 = TRUE, littleTicks = FALSE)
  gdPlot(list(grP, gaxis), minBase = minB, maxBase = maxB)
}
plotGeneRegion(col = "yellow", rot=90)
@ 

Finally, if you are interested in seeing how things look you can just
plot the object without the list, or without the \emph{minBase}, \emph{maxBase}
arguments. 
<<fig=TRUE>>=
gdPlot(makeGeneRegion(start = 9000, end = 15000, biomart = yeastMart,
                      strand = "-", chromosome = "I", 
                      dp = DisplayPars(plotId=TRUE)))
@ 

\subsection{Overlays} 
\texttt{Overlays} can be used to annotate different regions of the
plot. Currently, we can draw boxes and write text on the plot.

<<echo=TRUE, fig=TRUE>>=
ga <- makeGenomeAxis()
grF <- makeGeneRegion(start = 10000, end = 20000, chromosome = "I", strand = "+", biomart = yeastMart)
grR <- makeGeneRegion(start = 10000, end = 20000, chromosome = "I", strand = "-", biomart = yeastMart)
bt <- makeBaseTrack(base = yeastCons1[,1], value = yeastCons1[,2])
hr1 <- makeRectangleOverlay(start = 11000, end = 13000)
hr2 <- makeRectangleOverlay(start = 15900, end = 16500)
gdPlot(list(grF, ga, grR, bt), overlays = list(hr1, hr2))
@ 

A little nifty feature is to allow alpha blending to make things
slightly transparent. If the device you wish to plot on however, does
not support transparency then you will get a warning. 

<<echo=TRUE,fig=TRUE>>=
ro <- makeRectangleOverlay(start = 11000, end = 13000, region = c(1,3), 
          dp = DisplayPars(color = "green", alpha = .3))
to <- makeTextOverlay("here is some text", xpos = 15000, ypos = .95)
gdPlot(list(grF, ga, grR, bt), overlay = c(ro, to))
@ 

Also, one can use "absolute" coordinates to specify a region just in
case one wants to be a bit more precise. 
<<fig=TRUE>>=
roR <- makeRectangleOverlay(start = .1, end = .3, coords = "absolute",
                            dp = DisplayPars(fill = "grey", alpha = .2, lty = "dashed"),
                            region = c(.4,.7))
gdPlot(list(grF, ga, grR, bt), overlays = list(ro, roR))
@ 

\subsection{GenomeGraphs Classes}
\begin{table}[bp!]
    \begin{center}
      \begin{tabular}{@{}cp{8cm}@{}}
        \hline
        class & description \\
        \hline
        \texttt{gdObject} &  the root class of the system, never directly instantiated \\
        \texttt{Gene} &  class representing a gene \\
        \texttt{GeneRegion} & class defining a region of a
        chromsome, generally a set of genetic elements (genes) \\
        \texttt{Transcript} & class defining a transcript \\
        \texttt{TranscriptRegion} &  class defining a region of a chromsome, generally a set of genetic elements (transcripts)\\
        \texttt{Ideogram} &  an ideogram \\
        \texttt{Title} &  class to draw a title  \\
        \texttt{Legend} &  class to draw a legend  \\
        \texttt{GenomeAxis} &  class to draw a axis \\
        \texttt{Segmentation} &  class to draw horizontal lines in various sets of data \\
        \texttt{GenericArray} &  class to draw data from microarrays. \\
        \texttt{ExonArray} &  class to draw data from exon microarrays. \\
        \texttt{GeneModel} & class to draw custom gene models (intron-exon structures)  \\
        \texttt{BaseTrack} &  class to draw whatever kind of data at a given base \\
        \texttt{MappedRead} &  class to plot sequencing reads that are mapped to the genome \\
        \texttt{DisplayPars} & class managing various plotting parameters \\
        \texttt{AnnotationTrack} & class used to represent custom annotation \\
        \texttt{Overlay} & root class for overlays, never directly instantiated \\
        \texttt{RectangleOverlay} & class to represent rectangular regions of interest \\
        \texttt{TextOverlay} & class to draw text on plots \\
        \hline
      \end{tabular}
    \end{center}
  \end{table}

<<echo=TRUE, eval=TRUE, fig=TRUE>>=
data("seqDataEx", package = "GenomeGraphs")
str = seqDataEx$david[,"strand"] == 1
biomart = useMart("ensembl", "scerevisiae_gene_ensembl")
pList = list("-" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000, 
                                   strand = "-", biomart = biomart, 
                                   dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),
              makeGenomeAxis(dp = DisplayPars(size = 3)),
              "+" = makeGeneRegion(chromosome = "IV", start = 1300000, end = 1310000, 
                                   strand = "+", biomart = biomart, 
                                   dp = DisplayPars(plotId = TRUE, idRotation = 0, cex = .5)),
              "Nagalakshmi" = makeBaseTrack(base = seqDataEx$snyder[, "location"], value = seqDataEx$snyder[, "counts"], 
                                            dp = DisplayPars(lwd = .3, color = "darkblue", ylim = c(0,300))),
              "David +" = makeGenericArray(probeStart = seqDataEx$david[str, "location"], 
                                           intensity = seqDataEx$david[str, "expr", drop=FALSE], 
                                           dp = DisplayPars(pointSize = .5)),
              "David -" = makeGenericArray(probeStart = seqDataEx$david[!str, "location"], 
                                           intensity = seqDataEx$david[!str, "expr", drop=FALSE], 
                dp = DisplayPars(color = "darkgreen", pointSize = .5)),
              "Lee" = makeBaseTrack(base = seqDataEx$nislow[, "location"], 
                                    value = seqDataEx$nislow[, "evalue"], dp = DisplayPars(color="grey", lwd=.25)),
              "Conservation" = makeBaseTrack(base = seqDataEx$conservation[, "location"], 
                                             value = seqDataEx$conservation[, "score"], 
                                             dp = DisplayPars(color="gold4", lwd=.25)))

gdPlot(pList, minBase = 1301500, maxBase = 1302500, 
       overlay = makeRectangleOverlay(start = 1302105, end = 1302190, region = c(4,8),  dp = DisplayPars(alpha = .2)))
@ 

We can also employ different plotting types for the BaseTrack
object.
<<echo=TRUE, eval=TRUE, fig=TRUE>>=
setPar(pList$Lee@dp, "type", "h")
setPar(pList$Lee@dp, "color", "limegreen")
setPar(pList$Lee@dp, "lwd", 2)

gdPlot(pList, minBase = 1301500, maxBase = 1302500, 
       overlay = makeRectangleOverlay(start = 1302105, end = 1302190, region = c(4,8),  
       dp = DisplayPars(alpha = .2)))
@ 

\end{document}