---
title: "METAbolic pathway testing combining POsitive and NEgative mode data (metapone)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{metapone}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```
The metapone package conducts pathway tests for untargeted metabolomics data. It has three main characteristics: (1) expanded database combining SMPDB and Mummichog databases, with manual cleaning to remove redundancies; (2) A new weighted testing scheme to address the issue of metabolite-feature matching uncertainties; (3) Can consider positive mode and negative mode data in a single analysis.

Compared to existing methods, the weighted testing scheme allows the user to apply different level of penalty for multiple-mapped features, in order to reduce their undue impact on the results. In addition, considering positive mode and negative mode data simultaneously can improve the statistical power of the test.  

```{r setup}
library(metapone)
```

The input should contain at least three clumns - m/z, retention time, and feature p-value. Here to illustrate the usage of the method, we borrow the test results from our published study "Use of high-resolution metabolomics for the identification of metabolic T signals associated with traffic-related air pollution" in Environment International. 120: 145–154.

The positive mode results are in the object "pos". 

```{r example input}
data(pos)
head(pos)
```

The negative mode results are in the object "neg". If both positive mode and negative mode data are present, each is input into the algorithm as a separate matrix

```{r example input second matrix}
data(neg)
head(neg)
```

It is not required that both positive and negative mode results are present. Having a single ion mode is also OK. The test is based on HMDB identification. The common adduct ions are pre-processed and stored in:

```{r example load database}
data(hmdbCompMZ)
head(hmdbCompMZ)
```

Pathway information that was summarized from Mummichog and smpdb is built-in:

```{r example load pathway}
data(pa)
head(pa)
```

The user can specify which adduct ions are allowed by setting the allowed adducts. For example:

```{r example adduct ions}
pos.adductlist = c("M+H", "M+NH4", "M+Na", "M+ACN+H", "M+ACN+Na", "M+2ACN+H", "2M+H", "2M+Na", "2M+ACN+H")
neg.adductlist = c("M-H", "M-2H", "M-2H+Na", "M-2H+K", "M-2H+NH4", "M-H2O-H", "M-H+Cl", "M+Cl", "M+2Cl")
```

It is common for a feature to be matched to multiple metabolites. Assume a feature is matched to m metabolites, metapone weighs the feature by (1/m)^p, where p is a power term to tune the penalty. m can also be capped at a certain level such that the penalty is limited. These are controlled by parameters:

Setting p: fractional.count.power = 0.5
Setting the cap of n: max.match.count = 10

It is easy to see that when p=0, no penalty is assigned for multiple-matching. The higher p is, the larger penalty for features that are multiple matched. 

Other parameters include p.threshold, which controls which metabolic feature is considered significant. If use.fgsea = FALSE, then testing is done by permutation. Otherwise, a GSEA type testing is conducted and the parameter use.meta controls using metabolite-based or feature-based testing. Overall, the analysis is conducted this way:

```{r example analysis, warning=FALSE}
# permutation test
r<-metapone(pos, neg, pa, hmdbCompMZ=hmdbCompMZ, pos.adductlist=pos.adductlist, neg.adductlist=neg.adductlist, p.threshold=0.05,n.permu=200,fractional.count.power=0.5, max.match.count=10, use.fgsea = FALSE)

# GSEA type testing based on metabolites
#r<-metapone(pos, neg, pa, hmdbCompMZ=hmdbCompMZ, pos.adductlist=pos.adductlist, neg.adductlist=neg.adductlist, p.threshold=0.05,n.permu=100,fractional.count.power=0.5, max.match.count=10, use.fgsea = TRUE, use.meta = TRUE)

# GSEA type testing based on features
#r<-metapone(pos, neg, pa, hmdbCompMZ=hmdbCompMZ, pos.adductlist=pos.adductlist, neg.adductlist=neg.adductlist, p.threshold=0.05,n.permu=100,fractional.count.power=0.5, max.match.count=10, use.fgsea = TRUE, use.meta = FALSE)

hist(ptable(r)[,1])
```

We can subset the pathways that are significant:

```{r example continued}
selection<-which(ptable(r)[,1]<0.025)
ptable(r)[selection,]
```

We note that applying the multiple-matching penalty using parameter fractional.count.power will effectively making fractional counts out of the multiple-matched features. Thus the mapped feature tables, you will see fractional counts, rather than integer counts. 

```{r example continued 2}
ftable(r)[which(ptable(r)[,1]<0.025 & ptable(r)[,2]>=2)]
```

We can also use bbplot1d() and bbplot2d to visualize the pathway testing result.

```{r example visulization, warning=FALSE}
bbplot1d(r@test.result, 0.025)
bbplot2d(r@test.result, 0.025)
```