---
title: "CopyNumberPlots: create copy-number specific plots using karyoploteR "
author: "Bernat Gel (bgel@igtp.cat); Miriam Magallon (mmagallon@igtp.cat)"
date: "`r doc_date()`"
package: "`r pkg_ver('CopyNumberPlots')`"
output: 
  BiocStyle::html_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{CopyNumberPlots vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}  
---


```{r, include=FALSE}
library(knitr)
opts_chunk$set(concordance=FALSE)
knitr::opts_chunk$set(fig.width = 18)
knitr::opts_chunk$set(fig.height = 12)
set.seed(21666641)
```


# Introduction

Data visualisation is a powerful tool used for data analysis and exploration in 
many fields. Genomics data analysis is one of these fields where good 
visualisation tools can be of great help. 
The aim of `r BiocStyle::Biocpkg("CopyNumberPlots")` is to offer the user an 
easy way to create copy-number related plots using the infrastructure provided
by the R package `r BiocStyle::Biocpkg("karyoploteR")`.

In addition to a set of specialized plotting functions for copy-number analysis
data and results (`plotBAF`, `plotCopyNumberCalls`, ...), 
`r BiocStyle::Biocpkg("CopyNumberPlots")` contains a number of data loading 
functions to help parsing and loading the results of widely used 
copy-number calling software such as `r BiocStyle::Biocpkg("DNAcopy")`, 
[DECoN](https://wellcomeopenresearch.org/articles/1-20/v1) or
[CNVkit](https://cnvkit.readthedocs.io/en/stable/).

Finally, since `r BiocStyle::Biocpkg("CopyNumberPlots")` extends the 
functionality of `r BiocStyle::Biocpkg("karyoploteR")`, it is possible 
to combine the plotting functions of both packages to get the perfect
figure for your data.

# Installation

`r BiocStyle::Biocpkg("CopyNumberPlots")` is a 
[Bioconductor](http://bioconductor.org) package and to install it we have
to use `r BiocStyle::Biocpkg("BiocManager")`.

```{r getPackage, eval=FALSE}
  if (!requireNamespace("BiocManager", quietly = TRUE))
      install.packages("BiocManager")
  BiocManager::install("CopyNumberPlots")
```

We can also install the package from github to get the latest devel version,
but beware that it might be incompatible with the release version of 
Bioconductor!

```` {r, eval = FALSE}
  BiocManager::install("bernatgel/CopyNumberPlots")
```



# Quick Start

To start working with `r BiocStyle::Biocpkg("CopyNumberPlots")` we will need
to use the `plotKaryoptype` function from `r BiocStyle::Biocpkg("karyoploteR")`.
If you want more information on how to customize it, use for other organisms
or genome version, etc... you can take a look at the
[karyoploteR tutorial](https://bernatgel.github.io/karyoploter_tutorial/) and 
specifically at the section on 
[how to plot ideograms](https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html).

For this quick start example we'll plot SNP-array data simulating a cancer 
genome. The data is in a file included with the package. You can use almost
any table-like file format, including the Final Report file you would get from
Illumina's Genome Studio. In this case, to keep the example small, we have 
data only for chomosome 1.

To load the data we'll use `loadSNPData` which will detect the right columns, 
read the data and build a GRanges object for us. 

If data uses Ensembl-style chromosome names (1,2,3,...,X,Y) instead of 
default karyoploteR UCSC chromosome names (chr1,chr2,chr3,...,chrX,chrY)
we could change the chromosome style to UCSC with the function `UCSCStyle`.

```{r}
  library(CopyNumberPlots)

  s1.file <- system.file("extdata", "S1.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
  s1 <- loadSNPData(s1.file)
  s1
```




Once we have our data loaded we can start plotting. We'll start by creating 
a karyoplot using `plotKaryotype`. If we were plotting more than one 
chromosome, we could use `plot.type=4` to get all chromosomes in a single line 
one next to the other. You can get more information on the available plot types
at 
[the karyoploteR tutorial](https://bernatgel.github.io/karyoploter_tutorial//Tutorial/PlotTypes/PlotTypes.html).

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
```

And once we have a karyoplot we can start adding out data. We can plot the 
B-allele frequency using `plotBAF`

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, s1)
```

We can plot LRR using `plotLRR`

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
  plotLRR(kp, s1)
```

And we can see in this plot that points with a LRR below -4 (and above 2) are 
plotted in red at -4 (and at 2) so we don't lose them. 

We can also use the 
[data positioning](https://bernatgel.github.io/karyoploter_tutorial//Tutorial/DataPositioning/DataPositioning.html) parameters `r0` and `r1` to add more than one data type 
on the same plot.

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, s1, r0=0.55, r1=1)
  plotLRR(kp, s1, r0=0, r1=0.45)
```

Finally, we can load a copy number calling made on this data and plot it.
To load the copy number calls in this file we can use the function
`loadCopyNumberCalls` that will read the data, identify the correct columns and 
create a GRanges object for us.

```{r}
  s1.calls.file <- system.file("extdata", "S1.segments.txt", package = "CopyNumberPlots", mustWork = TRUE)
  s1.calls <- loadCopyNumberCalls(s1.calls.file)
  s1.calls
```

And then use `plotCopyNumberCalls` to add them to the previous plot.

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, s1, r0=0.6, r1=1)
  plotLRR(kp, s1, r0=0.15, r1=0.55)
  plotCopyNumberCalls(kp, s1.calls, r0=0, r1=0.10)
```
 
With that the main functionality of  `r BiocStyle::Biocpkg("CopyNumberPlots")`
is covered. It is important to take into account that since we are extending
the functionality of `r BiocStyle::Biocpkg("karyoploteR")`, we can use all
karyoploteR functions to add more data and other data types into these plots.

In the following pages you will find more information on how to load 
data to use with `r BiocStyle::Biocpkg("CopyNumberPlots")`, how to create other 
plot types and how to customize them.


# Loading Copy-Number Data


The plotting functions in `r BiocStyle::Biocpkg("CopyNumberPlots")` expect
data to be in a `GRanges` with a few columns with specific names: 

* **baf** for B-allele frequency data (or similar) (real value between 0 and 1)
* **lrr** for log-ratio intensity data (or similar) (real value)
* **cn** for integer copy-number data  (integer value)
* **loh** for loss of heterozygosity (logical value)
* **segment.value** for values representing the copy-number called segment but
not necessarily the integer copy-number calls (real value)

You can create these structures yourself, but 
`r BiocStyle::Biocpkg("CopyNumberPlots")` has functions to help in loading
both raw data (mainly SNP-array and aCGH data) and copy-number calls.

## Load Raw Data

The main function to load raw data is `loadSNPData`. It will take either 
a file or an R object (`data.frame` or similar) and will load it, detect
the columns with the needed information (chromosome, position, log-ratio, 
B-allele frequency) based on the column names and build a `GRanges` object 
ready to use by the plotting functions.

```{r}
  raw.data.file <- system.file("extdata", "snp.data_test.csv", package = "CopyNumberPlots", mustWork=TRUE)
  snps <- loadSNPData(raw.data.file)
  snps
```

When run, the function will tell us the columns it identified and will proceed
load the data. To identify the columns it will internally use a set of 
regular expressions that work in most cases including on the 'Final Report' 
files created by Illumina's Genome Studio. If for any reason the automatic 
identification of the columns failed, it is possible to specify the exact column
names using the appropiate parameters (`chr.col`, `start.col`, `end.col`...).

<!-- In addition to that generic function, we have a specialized function to extract  -->
<!-- raw data from the results generated by the package  -->
<!-- `r BiocStyle::Biocpkg("DNAcopy")`. In this case, the function expects the  -->
<!-- the standard results object generated by the package, of class `DNAcopy`  -->
<!-- although it will also accept the object before segmentation, of class `CNA`. -->
<!-- It will return a list of GRanges with one GRanges per sample. -->

<!-- ```{r} -->
<!--   library(DNAcopy) -->
<!--   data(coriell) -->

<!--   CNA.object <- suppressWarnings(CNA(cbind(coriell$Coriell.05296), coriell$Chromosome, coriell$Position, data.type="logratio")) -->
<!--   CNA.object <-  smooth.CNA(CNA.object) -->
<!--   DNAcopy.results <- DNAcopy::segment(CNA.object) -->
<!--   DNAcopy.results -->
<!-- ``` -->

<!-- And to load it we simply need to call `loadSNPData.DNAcopy` -->

<!-- ```{r} -->
<!--   snps <- loadSNPData.DNAcopy(DNAcopy.results) -->
<!--   snps -->
<!-- ``` -->
  
<!-- ### Special case: BAF from VCF files -->

<!-- It is also possible to simulate BAF and LRR values from a VCF file, using the  -->
<!-- variant allele frequency. `r BiocStyle::Biocpkg("CopyNumberPlots")` includes -->
<!-- a function, `loadSNPDataFromVCF`, that will take a VCF file and build a GRanges -->
<!-- object with a baf column. If more than one sample is present in the file it -->
<!-- will return a list of GRanges with one GRanges per sample. -->

<!-- **Important:** Currently, only VCFs with an AD value. Computation of allele -->
<!-- frequency using other data is planned in the near future. -->

<!-- To load BAF-like data we need to call the function giving it the VCF file -->

<!-- ```{r} -->
<!--  vcf.file <- system.file("extdata", "example.vcf.gz", package = "CopyNumberPlots", mustWork = TRUE) -->
<!--  snps <- loadSNPDataFromVCF(vcf.file) -->
<!--  snps -->
<!-- ``` -->

<!-- And we can plot this using the same functions we used for SNP-array data -->

<!-- ```{r} -->
<!--  kp <- plotKaryotype(plot.type = 4) -->
<!--  plotBAF(kp, snps = snps, labels = names(snps)) -->
<!-- ``` -->


 
## Load Copy-Number Calls

Another set of functions included in the package are functions to load 
the results of copy-number calling algorithms, the copy number calls per se.
In this case we also have a generic function, `loadCopyNumberCalls`, and a
few functions specialized in specific copy-number calling packages. 

Again, the generic function can work with a file or an R object with a 
table-like structure and will try to discover the right columns itself. It will 
return a GRanges with the copy-number called segments and the optional columns
`cn` for integer copy-number values, `loh` for loss-of-heterozigosity regions and
`segment.value` for values computed for the segments (for example, mean value
of the probes in the segment).

As an example we will generate a "random" calling

```{r}
  cn.data <- toGRanges(c("chr14:66459785-86459774", "chr17:68663111-88866308",
                         "chr10:43426998-83426994", "chr3:88892741-120892733",
                         "chr2:12464318-52464316", "chrX:7665575-27665562"))
  
  cn.data$CopyNumberInteger <- sample(c(0,1,3,4), size = 6, replace = TRUE)
  cn.data$LossHetero <- cn.data$CopyNumberInteger<2
  
  cn.data
```

and load it 

```{r}
  cn.calls <- loadCopyNumberCalls(cn.data)
  cn.calls
```

we can see how the columns for cn and loh were correctly identified.

To plot this objet we can call, for example `plotCopyNumberCalls`.

```{r}
  kp <- plotKaryotype(plot.type = 1)
  plotCopyNumberCalls(kp, cn.calls = cn.calls)
```

  There are other specialized functions that will load either the R object
  produced by copy-number calling R packages or the files produced by 
  either R or external copy-number calling software.
  
  Currently there are specilized functions to load the data produced by:

* ASCAT
* DECoN
* DNAcopy
* pennCNV
* cnmops
* panel.cnmops
* CNVkit




<!-- ### ASCAT -->

<!-- ### cnmops -->

<!-- ### CNVkit -->

<!-- ### DECoN -->

<!-- ### DNAcopy -->

<!-- ### panel.cnmops -->

<!-- ### pennCNV -->


# Plotting Copy-Number Data

Once we have data loaded (or directly created by us) we can plot it.

There are two functions to plot raw data (`plotBAF` and `plotLRR`) and three
functions to plot the copy-number calls (`plotCopyNumberCalls`, 
`plotCopyNumberCallsAsLines` and `plotCopyNumberSummary`).

## Plotting Raw Data

  To demonstrate the raw-data plotting functions we'll use two example
  files included with the package
  
```{r}
  s1.file <- system.file("extdata", "S1.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
  s1 <- loadSNPData(s1.file)
  head(s1)
  
  s2.file <- system.file("extdata", "S2.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
  s2 <- loadSNPData(s2.file)
  head(s2)
```  

## plotBAF

To plot the B-Allele frequency (BAF) we'll use `plotBAF`. We'll start creating 
a karyoplot using
[karyoploteR's plotKaryotype](https://bernatgel.github.io/karyoploter_tutorial//Tutorial/CreateIdeogram/CreateIdeogram.html) and then add the BAF values into it.


```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, snps=s1)
```


We can change a number of parameters to alter the appearance of the plot.
We can activate and deactivate the axis and label, we can change the color, size
and glyph (shape) of the points, we can [use r0 and r1 alter the vertical 
position of the data](https://bernatgel.github.io/karyoploter_tutorial//Tutorial/DataPositioning/DataPositioning.html) and in general we can use any of the standard base R
plotting parameters.

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, snps=s1, r0=0, r1=0.2, labels = "BAF1", points.col = "orange",
          points.cex = 2, points.pch = 4, axis.cex = 0.3)
  plotBAF(kp, snps=s1, r0=0.3, r1=0.5, labels = "BAF2", points.col = "red",
          points.cex = 0.5, points.pch = 8, axis.cex = 0.7)
  plotBAF(kp, snps=s1, r0=0.6, r1=1, labels = "BAF3", 
          points.col = "#FF552222", points.cex = 1.8, points.pch = 16, 
          axis.cex = 0.7)
```

If we want to plot more than one sample, if we have  the data in a list of 
GRanges or in a GRanges list, `plotBAF` will take care of it and plot the 
different samples one below the other. It will also use the names of the list
as labels to identify the different samples. 

```{r}
  samples <- list("Sample1"=s1, "Sample2"=s2)
  kp <- plotKaryotype(chromosomes="chr1")
  plotBAF(kp, snps=samples)
```

## Plot LRR

The function `plotLRR` is equivalent to the `plotBAF` function but will plot the
data in the "lrr" column.

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  kpAddBaseNumbers(kp)
  plotLRR(kp, snps=s1)
```

`plotLRR` has a few specific parameters. Since the range of the data points
is not limited to [0,1] as in BAF, you can define the `ymin` and `ymax` values
and any point falling out of the [ymin, ymax] range will be plotted in red 
within this range. 

This can help us identify out-of-range data, such as the
deletion arround 50Mb in the plot above or the gained region at ~220Mb.

Changing the values of `ymin` and `ymax` we can see a bit different picture

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  kpAddBaseNumbers(kp)
  plotLRR(kp, snps=s1,  ymin=-1.5, ymax=1.5)
```

In this case we see many more points out-of-range. We can change the appearance
of this points, changing their color, for example, of we can change how they are
represented, using a density plot instead of raw points.

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  kpAddBaseNumbers(kp)
  plotLRR(kp, snps=s1,  ymin=-1.5, ymax=1.5, out.of.range = "density")
```

In this case, due to the very few points in the example, the default parameters
for the density plot are not optimal. We can increase the window size to compute
the density using larger windows. For example, we can set the window to 1
megabase.

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  plotLRR(kp, snps=s1,  ymin=-1.5, ymax=1.5, out.of.range = "density", density.window = 1e6)
```

And we can see the peaks corresponding to the accumulation of out-of-range
points.

Finally, we can control the presence and color of the horizontal line marking
the 0 with the "line.at.0.*" parameters.


We can also use the standard customization options with `plotLRR`.

```{r}
  kp <- plotKaryotype(chromosomes="chr1")
  plotLRR(kp, snps=s1, r0=0, r1=0.2, labels = "LRR1", points.col = "orange",
          points.cex = 2, points.pch = 4, axis.cex = 0.3)
  plotLRR(kp, snps=s1, r0=0.3, r1=0.5, labels = "LRR2", points.col = "red",
          points.cex = 0.5, points.pch = 8, axis.cex = 0.7, ymin=-1.5, ymax=1.5,
          out.of.range.col = "gold", out.of.range = "density",
          density.window = 10e6, density.height = 0.3)
  plotLRR(kp, snps=s1, r0=0.6, r1=1, labels = "LRR3",
          points.col = "#FF552222", points.cex = 1.8, points.pch = 16,
          axis.cex = 0.7)
```



## Plot Copy-Number Calls

The final data type we can plot with `r BiocStyle::Biocpkg("CopyNumberPlots")`
are copy number calls, that is, the results from copy-number calling algorithms.
To plot that we need a GRanges object with a at least one column of:
 * "cn" for integer copy number calls
 * "segment.value" for non-integer segment regional values
 * "loh" a logical for loss-of-heterozygosity
 
As an example we'll use the data generated by ASCAT in a cancer cell line.

```{r}
  s1.calls.file <- system.file("extdata", "S1.segments.txt", package = "CopyNumberPlots", mustWork = TRUE)
  s1.calls <- loadCopyNumberCalls(s1.calls.file)
  s2.calls <- loadCopyNumberCalls(system.file("extdata", "S2.segments.txt", package = "CopyNumberPlots", mustWork = TRUE))
  s3.calls <- loadCopyNumberCalls(system.file("extdata", "S3.segments.txt", package = "CopyNumberPlots", mustWork = TRUE))
  s1.calls
```

### plotCopyNumberCalls

The first function to plot the copy-number calls is `plotCopyNumberCalls`,
which will plot them as colored rectangles over the genome. It will create 
2 lines of rectangles: the top one with copy-number values and the bottom one
with loss-of-heterozygosity in blue.

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes = "chr1")
  plotCopyNumberCalls(kp, s1.calls)
```

By default we'll see losses in green, 2n regions in gray and gains in 
yellow-orange-red. And the LOH regions as a blue line below the CN data.
We can change the colors used with `cn.colors`. This parameter will take
any value accepted by `getCopyNumberColors`, including the predefined 
palletes. You can find them all in the documentation of `getCopyNumberColors`.
This fuction can also help us creating a legend.

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chromosomes="chr1")
  plotCopyNumberCalls(kp, s1.calls, cn.colors = "red_blue", loh.color = "orange", r1=0.8)
  cn.cols <- getCopyNumberColors(colors = "red_blue")
  legend("top", legend=names(cn.cols), fill = cn.cols, ncol=length(cn.cols))
```

As with the other plotting functions, giving it a list of GRanges will plot
them all.

```{r, fig.wide=TRUE}
  cn.calls <- list("Sample1"=s1.calls, "Sample2"=s2.calls, "Sample3"=s3.calls)
  kp <- plotKaryotype(chromosomes="chr1")
  plotCopyNumberCalls(kp, cn.calls, r1=0.3)
```

<!-- Using the data from more samples (in this case randomly generated) and -->
<!-- hiding the LOH data (`loh.height=0`) we can get the look of a heatmap usual -->
<!-- in many papers. -->


<!-- ```{r} -->
<!--   cn.calls <- createRandomRegions(nregions = 200, length.mean = 40e6, -->
<!--                                   length.sd = 10e6, non.overlapping = FALSE, -->
<!--                                   genome="hg19", mask=NA) -->
<!--   cn.calls$cn <- sample(0:5, size = length(cn.calls), replace = TRUE) -->
<!--   cn.calls$sample <- sample(paste0("S", 1:10), size = length(cn.calls), replace = TRUE) -->
<!--   cn.calls <- split(cn.calls, f = cn.calls$sample) -->

<!--   cn.cols <- getCopyNumberColors(colors = "red_blue") -->
<!--   kp <- plotKaryotype(plot.type=4) -->
<!--   kpDataBackground(kp, color = cn.cols["2"]) -->
<!--   plotCopyNumberCalls(kp, cn.calls, loh.height = 0, cn.colors = "red_blue") -->
<!-- ``` -->

### plotCopyNumberCallsAsLines

Another option is to plot the copy-number calls as lines using the function
`plotCopyNumberCallsAsLines`. We'll show a single chromosome in this case.

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chr="chr1")
  plotCopyNumberCallsAsLines(kp, s1.calls)
```

In this case we can change the standard customization options and make it use 
segments instead of lines using the additional parameter `style`. 

```{r, fig.wide=TRUE}
  kp <- plotKaryotype(chr="chr1")
  plotCopyNumberCallsAsLines(kp, s1.calls, style = "segments")
```

### plotCopyNumberSummary

Finally, to plot a view of the accumulation of copy number alterations we can
use `plotCopyNumberSummary`. It will create a coverage plot of gains and losses
over all samples in our dataset.


```{r}
  cn.cols <- getCopyNumberColors(colors = "green_orange_red")
  kp <- plotKaryotype(chromosomes="chr1")
  kpDataBackground(kp, color = cn.cols["2"], r0=0.3)
  plotCopyNumberCalls(kp, cn.calls, loh.height = 0, r0=0.3)
  plotCopyNumberSummary(kp, cn.calls, r1=0.25)
```

And we can change the appearance of the summary using the `direction` parameter.

```{r}
  cn.cols <- getCopyNumberColors(colors = "green_orange_red")
  kp <- plotKaryotype(chromosomes="chr1")
  kpDataBackground(kp, color = cn.cols["2"], r0=0.3)
  plotCopyNumberCalls(kp, cn.calls, loh.height = 0, r0=0.3)
  plotCopyNumberSummary(kp, cn.calls, r1=0.25, direction = "out")
```

# Session Info

```{r}
  sessionInfo()
```