---
title: "SMAD Quick Start"
author: 
- name: "Qingzhou (Johnson) Zhang"
  email: zqzneptune@hotmail.com
date: "`r Sys.Date()`"
package: SMAD
output: 
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{SMAD Quick Start}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction
This R package implements statistical modelling of affinity purificationâ€“mass 
spectrometry (AP-MS) data to compute confidence scores to identify *bona fide* 
protein-protein interactions (PPI).

# Prepare Input Data
Prepare input data into the dataframe *datInput* with the following format:

|idRun|idBait|idPrey|countPrey|lenPrey|
|-----|:----:|:----:|:-------:|:-------:|
|AP-MS run ID|Bait ID|Prey ID|Prey peptide count|Prey protein length|


```{r}
library(SMAD)
data("TestDatInput")
head(TestDatInput)
```

The test data is subset from the unfiltered BioPlex 2.0 data, which consists of
apoptosis proteins as baits.

# Methods

## CompPASS

Comparative Proteomic Analysis Software Suite (CompPASS) is based on spoke 
model. This algorithm was developed by Dr. Mathew Sowa for defining the human 
deubiquitinating enzyme interaction landscape [(Sowa, Mathew E., et al., 
2009)][1]. The implementation of this 
algorithm was inspired by Dr. Sowa's [online tutorial][2]. 
The output includes Z-score, S-score, D-score and WD-score. In its 
implementation in BioPlex 1.0 [(Huttlin, Edward L., et al., 2015)][3] and 
BioPlex 2.0 [(Huttlin, Edward L., et al., 2017)][4], a naive 
Bayes classifier that learns to distinguish true interacting proteins from 
non-specific background and false positive identifications was included in the 
compPASS pipline. This function was optimized from the [source code][5].

```{r echo=TRUE, message=FALSE, warning=FALSE}
scoreCompPASS <- CompPASS(TestDatInput)
head(scoreCompPASS)
```

Based on the scores, bait-prey interactions could be ranked and ready for downstream analyses.

```{r CompPASS output figure, echo=FALSE, fig.height=7, fig.width=7, message=FALSE, warning=FALSE, paged.print=FALSE}
par(mfrow = c(2, 2))
plot(sort(scoreCompPASS$scoreZ, decreasing = TRUE), pch = 16,
     xlab = "Ranked bait-prey interactions",
     ylab = "Z-score")
plot(sort(scoreCompPASS$scoreS, decreasing = TRUE), pch = 16,
     xlab = "Ranked bait-prey interactions",
     ylab = "S-score")
plot(sort(scoreCompPASS$scoreD, decreasing = TRUE), pch = 16,
     xlab = "Ranked bait-prey interactions",
     ylab = "D-score")
plot(sort(scoreCompPASS$scoreWD, decreasing = TRUE), pch = 16,
     xlab = "Ranked bait-prey interactions",
     ylab = "WD-score")

```


## HGScore

HGScore Scoring algorithm based on a hypergeometric distribution error model 
[(Hart et al., 2007)][6] with incorporation of 
NSAF [(Zybailov, Boris, et al., 2006)][7]. This algorithm was first introduced 
to predict the protein complex network of Drosophila melanogaster 
[(Guruharsha, K. G., et al., 2011)][8]. This scoring algorithm was based on
matrix model. Unlike CompPASS, we need protein length for each prey in 
the additional column.

```{r}
scoreHG <- HG(TestDatInput)
head(scoreHG)
```

```{r HG output figure, echo=FALSE, fig.height=7, fig.width=7, message=FALSE, warning=FALSE, paged.print=FALSE}

plot(sort(scoreHG$HG, decreasing = TRUE), pch = 16,
     xlab = "Ranked prey-prey interactions",
     ylab = "HGscore")

```

Noted that HG scoring implements matrix models which leads to significant increase of inferred protein-protein interactions.


[1]: https://doi.org/10.1016/j.cell.2009.04.042
[2]: http://besra.hms.harvard.edu/ipmsmsdbs/cgi-bin/tutorial.cgi
[3]: https://doi.org/10.1016/j.cell.2015.06.043
[4]: https://www.nature.com/articles/nature22366
[5]: https://github.com/dnusinow/cRomppass
[6]: https://doi.org/10.1186/1471-2105-8-236
[7]: https://doi.org/10.1021/pr060161n
[8]: https://doi.org/10.1016/j.cell.2011.08.047