---
title: "A Quick Start of cola Package"
author: "Zuguang Gu ( z.gu@dkfz.de )"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{1. A Quick Start of cola Package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, message = FALSE}
library(markdown)
library(knitr)
knitr::opts_chunk$set(
    error = FALSE,
    tidy  = FALSE,
    message = FALSE,
    fig.align = "center")
options(width = 100)
options(rmarkdown.html_vignette.check_title = FALSE)
library(cola)
```

Assume your matrix is stored in an object called `mat`, to perform consensus
partitioning with *cola*, you only need to run following code:

```{r, eval = FALSE}
# code only for demonstration
mat = adjust_matrix(mat)  # optional
rl = run_all_consensus_partition_methods(mat, mc.cores = ...)
cola_report(rl, output_dir = ..., mc.cores = ...)
```

In above code, there are three steps:

1. Adjust the matrix. In this step, rows with too many `NA`s are removed. Rows
   with very low variance are removed. `NA` values are imputed if there are
   less than 50% in each row. Outliers are adjusted in each row.
2. Run consensus partitioning with several methods.
   Partitioning methods are `hclust` (hierarchical clustering with cutree),
   `kmeans` (k-means clustering), `skmeans::skmeans` (spherical k-means
   clustering), `cluster::pam` (partitioning around medoids) and
   `Mclust::mclust` (model-based clustering). The default methods to extract
   top n rows are `SD` (standard deviation), `CV` (coefficient of variation),
   `MAD` (median absolute deviation) and `ATC` (ability to correlate to other
   rows). 
3. Generate a detailed HTML report for the complete analysis.


`run_all_consensus_partition_methods()` runs multiple methods in sequence, which might
take long time for big datasets. Users can also run consensus partitioining with
a specific top-value methods (e.g. SD) and partitioning methods (e.g. skmeans) by 
`consensus_partition()` function:

```{r, eval = FALSE}
res = consensus_partition(mat, top_value_method = ..., partition_method = ...)
cola_report(res, output_dir = ..., mc.cores = ...)
```

For extremely large datasets, users can run `consensus_partition_by_down_sampling()` by randomly 
sampling a subset of samples for classification, later the classes of the remaining
samples are predicted by the signatures of the _cola_ classification. More details
can be found in the vignette ["Work with Big Datasets"](working_with_big_datasets.html).

```{r, eval = FALSE}
res = consensus_partition_by_down_sampling(mat, subset = ...,
    top_value_method = ..., partition_method = ...)
cola_report(res, output_dir = ..., mc.cores = ...)
```

There are examples on real datasets for _cola_ analysis that can be found at https://jokergoo.github.io/cola_collection/.