%\VignetteIndexEntry{Molecule Identification with CAMERA} %\VignetteKeywords{CAMERA}\\ %\VignettePackage{CAMERA}\\ \documentclass[a4paper,12pt]{article} \usepackage{hyperref} \usepackage[table]{xcolor} \setlength{\parindent}{0cm} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\textit{#1}}} \newcommand{\Rfunarg}[1]{{\textit{#1}}} \newcommand{\denovo}{{\em de-Novo{}}} \begin{document} \title{LC-MS Peak Annotation and Identification with \Rpackage{CAMERA}} \author{Carsten Kuhl, Ralf Tautenhahn and Steffen Neumann} \maketitle \section*{Introduction} %{{{ The R-package \Rpackage{CAMERA} is a ({\bf C}ollection of {\bf A}lgorithms for {\bf ME}tabolite p{\bf R}ofile {\bf A}nnotation). It's primarily used for annotation of LC-MS data. Therefore it interacts directly with processed data from \Rpackage{xcms} and \Rpackage{Rdisop} for additional analyses. It includes the annotation of isotope peaks, adducts and fragments in peak lists generated by \Rpackage{xcms}. Additional methods group together mass signals measured from a single metabolite, based on rules for mass differences and peak shape comparison \cite{annobird07}. Based on this annotation, the molecular composition can be calculated if the mass spectrometer has a high-enough accuracy for both the mass and the isotope pattern intensities. %}}} \section{Peak Annotation} %{{{ \subsection{Adduct list and molecular mass estimation} For soft ionisation methods such as LC/ESI-MS, different adducts (e.g. $[M+K]^+$, $[M+Na]^+ $) and fragments (e.g. $[M-C_3H_9N]^+$, $[M+H-H_20]^+ $) occur. Depending on the molecule having an intrinsic charge, $[M]^+$ may be observed as well. In most cases, substances generates a bulk of different ions, so deconvolution is necessary. Therefore an estimation of the molecular mass of $[M]$ can be calculated from at least two annotated adduct ions. To scan for adducts every theoretical possible combination of adducts from a given list of ions are calculated. For a small example of common ions see Tab.~\ref{tab:anno}. \begin{table}[ht] \parbox{0.60\textwidth}{% \small \begin{tabular}{|l|r|}\hline Formula & Mass difference in amu \\ \hline $[M+H]^+$ & 1.007276 \\ $[M+Na]^+$ & 22.98977 \\ $[M+K]^+$ & 38.963708 \\ $[2M+Na]^+$ & 22.98977 \\ $[M+H+Na]^{2+}$ & 23.9976 \\ ... & ... \\ \hline \end{tabular} } \parbox{0.38\textwidth}{% \caption{{\footnotesize{Examples of calculated adducts for the Kations (K,H,Na) with their mass differences occuring in positive ion mode. The actual difference is calculated considering the charge and the number of molecules $M$ in the observed ion.}}} \label{tab:anno}} \end{table} Every group of peaks is scanned, if these combinations fit with the mass differences and then molecular masses are computated. \section{Processing with \Rpackage{CAMERA}} \subsection{Preprocessing with \Rpackage{xcms}} \label{preprocess} At first, create an \Rclass{xcmsSet} with your favourite parameters, e.g.: \begin{verbatim} library(CAMERA) #Single sample example file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file,method="centWave",ppm=30,peakwidth=c(5,10)) #Multiple sample library(faahKO) filepath <- system.file("cdf", package = "faahKO") xsg <- group(faahko) \end{verbatim} \subsection{Annotation Workflow} The annotation procedure can be split into two parts: first, we want to answer the questions which peaks/ions from one substance belongs together and second, computate the exact mass of the molecule. The principial use of CAMERA is demonstrated in the next sections. See the manpages for further information about the functions and their parameters . The annotation workflow in CAMERA contains four different functions: For dealing with the first question: \begin{enumerate} \item peak grouping after retention time (\Rfunction{groupFWHM}) \item peak group verification with EICs correlation (\Rfunction{groupCorr}) \end{enumerate} Both methods separate the peaks into groups, which we define as "pseudospectra". These pseudospectra can consists from one up to 100 ions, depending on your molecule and solves the first task. And for the second: \begin{enumerate} \item annotation of possible isotopes (\Rfunction{findIsotopes}) \item annotation of adducts and calculating hypothetical masses for the group (\Rfunction{findAdducts}) \end{enumerate} This results in a data-frame similiar to a \Rpackage{xcms} peak table, that can be easily stored in a comma separated table (Excel-readable). The use of these methods can be in different order, which is shown in the next section. \subsubsection{Working with single sample} Let's come to the practical work. Here we have a single sample file either in positive or negative ionisation mode. The xcmsSet was created as shown in section \ref{preprocess}. \begin{verbatim} # Create an xsAnnotate object an <- xsAnnotate(xs) # Group after RT anF <- groupFWHM(an, perfwhm = 0.6) # Annotate Isotopes anI <- findIsotopes(anF, mzabs = 0.01) # Verify grouping anIC <- groupCorr(anI, cor_eic_th = 0.75) #Annotate adducts anFA <- findAdducts(anIC, polarity="positive") \end{verbatim} In the above example, we first create the pseudospectra with the retentiontime information. The \Rfunarg{perfwhm} parameter defines the window width, which is used for matching. Lower it for a smaller windows or set it to a higher value, if the retention time varies. This step generate 14 pseudospectra. Afterwards we annotate isotopic peaks, with \Rfunarg{mzabs} as allowed m/z error. In this example we find 36 isotope peaks, with means the number of $[M+1],[M+2],ldots$ ions. These isotope information are useful in the next step, where for every peak in one pseudospectra a pairwise EIC correlation is done. If the correlation value between two peaks is higher than the threshold \Rfunarg{cor\_eic\_th} it will stay in the group, otherwise both are separated. If the peaks are annotated isotope ions, they will not be divided. That seperate our 14 pseudospectra into 35. Additionally you can set an constraint, that the primary adducts e.g. $[M+H]$ and $[M+Na]$ stay together, even if the correlation value is small. To do so call the function with an polarity parameter like in \Rfunction{findAdducts} (groupCorr (..., polarity="positive"), which only seperate the 14 into 32 groups. The difference is only small, three single ions were not sorted out, but as you can see the importance in figure \ref{fig:groupCorr}. In the upper example a annotation was possible in contrast to the lower one, because the $[M+Na]$ has put into his own group. So we strongly encourage the use of the polarity mode. <>= library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave",ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findIsotopes(an) anC <- groupCorr(an, cor_eic_th = 0.75) anCp<- groupCorr(an, cor_eic_th = 0.75, polarity="positive") an1 <- findAdducts(anC, polarity="positive") an2 <- findAdducts(anCp, polarity="positive") par(mfrow=c(2,1)) plotEICs(an1, pspecIdx=8, maxlabel=5) plotEICs(an2, pspecIdx=8, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-groupCorr} \end{center} \caption{\label{fig:GroupCorr} Example of using the polarity parameter} \end{figure} After the second pseudogroup creating step we now finally do a complete annotation of adducts. Therefore the \Rfunarg{polarity} parameter must be set. For exporting the result simply do: \begin{verbatim} peaklist <- getPeaklist(xsa) write.csv(peaklist, file='xsannotated.csv') \end{verbatim} where file is the ouput filename. That's so far as for the simple sample approach. Please note that every method has additional parameters, that are not explicitly mentioned here. Also if your analysis doesn't need annotations, only a seperation into groups, then simply stop after groupCorr. The grouping results are stored in \Rfunarg{object@pspectra}, which is a list, that saves as element the peakindexes for every group. \begin{verbatim} #xsa is here the result from findAdducts peak.idx <- xsa@pspectra[[1]] #print the indexes of all peaks from pseudospectrum 1 cat(peak.idx) \end{verbatim} \subsubsection{Working with multiple sample} In this case we have multiple samples like replicates of one probe or e.g. a wildtype vs. mutant experiment. As in the example before, we start with the already processed xcmsSet-object. Note: If you want to do an \Rfunction{diffreport} later on, make sure that you run \Rfunction{fillPeaks} on your xcmsSet before. As test dataset we use here the faahKO. For more information about that dataset see http://dx.doi.org/10.1021/bi0480335. CAMERA contains different approaches, how it deals with multiple sample datasets. Here we only show the most common way, for the other strategies see the manpage of xsAnnotate, especially the parameter sample. \begin{verbatim} #Create an xsAnnotate object xsa <- xsAnnotate(xsg) #Group after RT value of the xcms grouped peak xsaF <- groupFWHM(xsa, perfwhm=0.6) #Verify grouping xsaC <- groupCorr(xsaF) #Annotate isotopes, could be done before groupCorr xsaFI <- findIsotopes(xsaC) #Annotate adducts xsaFA <- findAdducts(xsaFI, polarity="positive") #Get final peaktable and store on harddrive write.csv(getPeaklist(xsaFA),file="result_CAMERA.csv") \end{verbatim} Similar to the previous example, the grouping is followed by an annotation. But in contrast to that, we now have more additional summaries respectively analysis functions. For a comparison and statistical analysis between different sample classes, \Rpackage{xcms} contains the \Rfunction{diffreport} function. CAMERA can use this method for better representation. \begin{verbatim} #Run fillPeaks on xcmsSet xsa.fill <- fillPeaks(xsg) #Make a diffreport with CAMERA result diffreport <- annotateDiffreport(xsg.fill) #Save on harddrive write.csv(diffreport, file="diffreport.csv") \end{verbatim} The \Rfunction{annotateDiffreport} is a wrapper for the \Rpackage{xcms} \Rfunction{diffreport} function and combines the results from CAMERA. The resulting table has three different columns, see section \ref{CAMERAresults}. For a speed up it's possible to preselect pseudospectra or make an automatic selection based on the diffreport result. For example select only groups with a fold change higher than 4. \begin{verbatim} #Example 1 with creating list of interest from grouped xcmsSet diffreport <- annotateDiffreport(xsg.fill, quick=TRUE) #save Results write.csv(diffreport, file="diffreport.csv") #Look into the table and select interesting pseudospectra #e.g. pseudospectra 10,11 and 30 psg_list <- c(10,11,30) diffreport.annotated <- annotateDiffreport(xsg.fill, psg_list=psg_list, polarity="positive") #Example 2 with automatic selection diffreport.annotated <- annotateDiffreport(xsg.fill, fc_th=4, polarity="positive") \end{verbatim} Both examples generate a data-frame, identical to the normal diffreport result, but now with three additional result columns from CAMERA. In example 1 we perform a quick-run, that means we only generate the xsAnnotate object und call groupFWHM and findIsotopes. From these results we preselect 3 pseudospectra (10,11,30), taken from the column pc. In the next run, we run annotateDiffreport again with our list as parameter. An annotation will only be done for these three groups. In example 2 we perform an automatic preselection, where the \Rfunarg{fc\_th} parameter defines a threshold for selecting groups, which contains ions with a fold change higher than four. For other pseudospectra, no adduct annotation will be calculated. The fold change value is taken from the diffreport result. For other parameters see the manpage of \Rfunction{annotateDiffreport}. \subsubsection{The function \Rfunction{annotate}} \Rfunction{Annotate} is a wrapper function for working with multiple samples, which have the same sample class. It's similar to \Rfunction{annnoteDiffreport}, but doesn't use the diffreport. Therefore you can't use the parameter \Rfunarg{fc\_th} and \Rfunarg{pval\_th} for an automatic selection. Only a handmade preselection with the parameter \Rfunarg{pval\_list} is possible. A "quick"\ mode is also available, that runs only \Rfunction{groupFWHM} and \Rfunction{findIsotopes}. The normal mode runs \Rfunction{groupFWHM}, \Rfunction{findIsotopes}, \Rfunction{group\-Corr} and \Rfunction{findAdducts} in order as mentioned. Every parameter of these functions also works with \Rfunction{annotate}. As a small example: \begin{verbatim} #A full annotation run an <- annotate(xs, perfwhm=0.7, cor_eic_th=0.75, ppm=10, polarity="positive") #Generate result peaklist <- getPeaklist(an) #Save results write.csv(peaklist,file="results.csv") \end{verbatim} \subsection{Interpretation of the Results} \label{CAMERAresults} \begin{table}[ht] \begin{center} \begin{tabular}{|c|cc|c|c|l|}\hline id & mz & rt & isotopes & adduct & pc \\ \hline 65 & 176.04 & 280.09 & & & \\ \rowcolor{blue!20} 76 & 136.05 & 280.43 &[14][M+1]1+ & & 5\\ \rowcolor{blue!20} 77 & 135.05 & 280.43 &[14][M]1+ & & 5\\ \rowcolor{blue!20} 74 & 153.06 & 280.43 & &[M+H]+ 152.05437 & 5\\ \rowcolor{blue!20} 75 & 175.04 & 280.43 & &[M+Na]+ 152.05437 & 5\\ \rowcolor{blue!20} 73 & 197.02 & 280.76 & &[M+2Na-H]+ 152.05437 & 5\\ 78 & 377.74 & 286.15 & & & \\ 79 & 732.5 & 286.49 & & & \\ \rowcolor{red!20} 83 & 488.32 & 286.82 & &[M+Na]+ 465.33205 & 7\\ \rowcolor{red!20} 82 & 466.34 & 286.82 & &[M+H]+ 465.33205 & 7\\ ...&&&&&\\ \hline \end{tabular} \end{center} \caption{{\footnotesize{Example of annotation result for one sample. Colums with intensity values are omitted. blue-line: annotated group 5, red-line: annotated group 7}}} \label{tab:int} \end{table} %\rowcolor{blue!20} 1594 & 149.02 & 746.02 & &\small [36] [M+1]1+ & 7 \\ %\rowcolor{red!20} 36 & 150.03 & 746.02 & & & 7 \\ %155 & 205.09 & 745.69 & & & 7 \\ %323 & 279.05 & 745.69 & & & 7 \\ %\rowcolor{blue!20} 359 & 279.16 & 746.02 &\small [M+H]+ 278.15 &\small [419] [M+1]1+ & 7 \\ %\rowcolor{red!20} 419 & 280.16 & 746.02 & & & 7 \\ %459 & 297.06 & 746.02 & & & 7 \\ %\rowcolor{blue!20} 1592 & 301.14 & 746.02 &\small [M+Na]+ 278.15 &\small [1593] [M+1]1+ [426] [M+2]1+ & 7\\ %\rowcolor{red!20} 1593 & 302.14 & 746.02 & & & 7\\ %\rowcolor{red!20} 426 & 303.15 & 746.02 & & & 7\\ %\rowcolor{blue!20} 446 & 317.12 & 745.69 &\small [M+K]+ 278.15 &\small [500] [M+1]1+ & 7\\ %\rowcolor{red!20} 500 & 318.12 & 746.02 & & & 7 \\ %\rowcolor{blue!20} 623 & 363.11 & 746.02 & &\small [642] [M+1]1+ & 7\\ %\rowcolor{red!20} 642 & 364.11 & 746.36 & & & 7 \\ %597 & 381.12 & 746.02 & & & 7 \\ %\rowcolor{blue!20} 756 & 579.29 & 746.02 &\small [2M+Na]+ 278.15 &\small [758] [M+1]1+ [760] [M+2]1+ & 7\\ %\rowcolor{red!20} 758 & 580.29 & 746.02 & & & 7\\ %\rowcolor{red!20} 760 & 581.30 & 745.69 & & & 7 \\ Table \ref{tab:int} shows an example of annotation results. A small cutout of the result table is displayed, the columns with the intensity values are omitted and the rows are ordered by there rt values for better readability. The column \textit{pc} shows the result of the peak correlation based annotation (independent of the annotations \textit{iso} and \textit{adduct}). Peaks with the same label are supposed to belong to the same spectrum. The column \textit{adduct} shows the annotation hypotheses for the ions. The value after the brackets is the estimated molecular mass. The column \textit{isotopes} contains the annotated isotopes for a monoisotopic peak. The values in the first square brackets denote the isotope-group-id(column \textit{id}), the second is the isotope annotation and the number after the brackets is the charge of the isotope. \subsection{Annotation without verification by correlation} A short notice for former \Rpackage{esi} user, this step is now obsolete and not longer supported. All annotations use the peak correlation if possible. % TODO: NYI % \subsection{Visualisation} % % After the inspection of the annotated peaklist one might be interested % in visualising a group of peaks, using the row numbers as indices. % \begin{verbatim} % plot.pspcetra(xs_annotate,idx=c(289,287,314,320)) % \end{verbatim} %}}} % \section{Metabolite Identification} %{{{ % Sooner or later you want to know which metabolites have been % measured. The package supports two approaches with the help of the \Rpackage{Rdisop}: % TODO JOE: Bitte mal deine beiden Funktionen collectIsotopes und von der anderen fällt mir gerade der Name nicht ein, % beschreiben und deren Handling mit Rdisop % \begin{description} % \item{A targeted} search based on a list of compounds or spectra. You % can search for the occurence of any compounds known in e.g. KEGG, % or an in-house library of known spectra. % \item{An untargeted} approach which collects as many isotope patterns % as possible for a compound, and performs a \denovo{} % interpretation. Optionally followed by a check whether the compound % is known in e.g. PubChem. % \end{description} % % \subsection{Compound library lookup} % %If you have a set of metabolites of interest, you might want to know %whether they are present in an acquired profile run, and if so, their %intensities. % %The best source for {\em recognition} would be a library of known %spectra obtained specifically for your experimental setting. This %might even include retention times. The second best approach is to %calculate a set of expected pseudo-spectra resembling a molecule of %interest. % %Finally, the (pseudo-) spectra have to be matched against the XCMS %peak list. % %\subsection{\denovo{} interpretation} % %The \denovo{} identification is based on the package \Rpackage{Rdisop}. %Based on the exact mass and optionally the intensities of the isotopes, %hypotheses for the molecular formula can be calculated: % %<<>>= %library(Rdisop) %decomposeMass(46.042) %@ % %However, this works only for the molecular mass, and not the ions %measured in ESI-MS. So we first annotate the xcmsSet, and obtain %the adduct and isotope list: % %<<>>= %# library(xcms) %# library(esi) %# cdfpath <- system.file("cdf", package = "faahKO") %# cdffiles <- list.files(cdfpath, recursive = TRUE,full=T) % %# xset <- xcmsSet(cdffiles,snthresh=3,max=10) %# xsg <- group(xset) %# xsg <- retcor(xsg) %# xsg <- group(xsg,bw=10) % %# xsann <- annotate(xsg) %@ % %We now extract those peaks, for which we can guesstimate the [M] Mass, %and collect their isotope peaks: % %<<>>= %# createPseudoSpectra(xs, xsann) %@ % %For some of the monoisotopic peaks we might not have the higher %isotope peaks. So optionally we check back in the raw data if they %actually do exist: % %<<>>= %# fillSpectrumPeaks(xs, expectedPeaks) %@ % % %If they are known adducts, we should remove these adducts first: % %<<>>= %# subMolecule(, adduct) %@ % %Now we are ready to ask \Rpackage{Rdisop} for the molecular formulae: % %<<>>= %# decomposeIsotope(isotopesMatrix) %@ % %Optionally, we can check whether these molecule are known in some %compound library. A few lookup-functions are predefined: % %<<>>= %# lookupPubchem() %@ % %and also % %<<>>= %# library(KEGGSOAP) %# lookupKEGG() %@ % %You might now ant to use these compounds as legend for your exported %data matrix, in your clustered heatmap, or whereever. % %}}} \subsection{Visualisation of the Results} For a graphical presentation of the annotation result CAMERA provides the function \Rfunction{plotEICs} to visualize the raw data and the function \Rfunction{plotPsSpectrum} to plot all peaks of a speudospectrum. The next examples shows the use of both functions. <>= library(CAMERA) file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") xs <- xcmsSet(file, method="centWave",ppm=30, peakwidth=c(5,10)) an <- xsAnnotate(xs) an <- groupFWHM(an) an <- findAdducts(an, polarity="positive") plotEICs(an, pspecIdx=2, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-EICPspec1} \end{center} \caption{\label{EICpspec1} EICs.} \end{figure} In figure \ref{EICpspec1} you see the EICs of all peaks from one pseudospectrum. With this plot you can manual check if the grouping makes sense. In the other figure \ref{EICpspec2} you see a typical $m/z$ plot, with labelled, annotated peaks. <>= plotPsSpectrum(an, pspec=2, maxlabel=5) @ \begin{figure} \begin{center} \includegraphics{CAMERA-Pspec1} \end{center} \caption{\label{EICpspec2} Spectra.} \end{figure} % \section{Small Collection of helpful functions} % \subsection{Pspeudospectra} % \label{getpseudospectra} % \section{Examples using CAMERA test dataset} % \label{section:example} % %{{{ % % \textbf{Example 1 } % Fast annotation without further using of xsAnnotate of the MM14 dataset. % \begin{verbatim} % library(CAMERA) % file <- system.file('mzdata/MM14.mzdata', package = "CAMERA") % xs <- xcmsSet(file,method="centWave",ppm=30,peakwidth=c(5,10)) % an <- annotate(xs) % peaklist <- getPeaklist(an) % write.csv(peaklist,'/tmp/mm14.csv') % \end{verbatim} % %xset <- xcmsSet(cdffiles,snthresh=6,method="centWave",ppm=30,peakwidth=c(5,10)) % There are 126 peaks in 10 groups, of which 48 peaks get isotope annotations and 25 % peaks are annotated as adducts. {length( which(peaklist[,"adduct"]!=""))} % % \bigskip % \textbf{Example 2 } % Annotation with exact use of an xsAnnotate object. % \begin{verbatim} % library(CAMERA) % cdfpath <- system.file("cdf", package = "faahKO") % cdffiles <- list.files(cdfpath, recursive = TRUE,full=T) % % xset <- xcmsSet(cdffiles,snthresh=3,max=10) % xsg <- group(xset) % xsg <- retcor(xsg) % xsg <- group(xsg,bw=10) % % #create xsAnnotate object % xanno<-xsAnnotate(xsg,sample=1,category="WT") % % #group according to retention time % xanno<-groupFWHM(xanno) % % #check grouping with EIC correlation, when indicated regroup % xanno<-groupCorr(xanno) % % #search for isotopes % xanno<-findIsotopes(xanno) % % #calculate possible adducts % xanno<-findAdducts(xanno,polarity="positive") % % #get annotated peaklist % an<-getPeaklist(xanno) % write.csv(an,'/tmp/faah-an2.csv') % \end{verbatim} % There are 1829 peaks in 221 groups, of which 126 peaks get isotope annotations and % 126 peaks are annotated as adducts. %\bigskip %\textbf{Example 3 } %Annotation with preceding EIC extraction. The most comprehensive %verification method is used and the peak-correlation-only based %annotations can be computed. %\begin{verbatim} % library(esi) % cdfpath <- system.file("cdf", package = "faahKO") % cdffiles <- list.files(cdfpath, recursive = TRUE,full=T) % % xset <- xcmsSet(cdffiles,snthresh=3,max=10) % xsg <- group(xset) % xsg <- retcor(xsg) % xsg <- group(xsg,bw=10) % % an <- annotate(xsg, gcmfrac=.4, cor_exp_th=0.5, cor_eic_th=0.7, na.ok=F,pc=TRUE) % write.csv(an$annotated,'/tmp/faah-an3-gc.csv') %\end{verbatim} %More verifications are performed, 200 peaks get isotope annotations %and 221 peaks are annotated as adducts/fragments. 76 peak correlation %groups are found. %}}} \begin{thebibliography}{t1} \bibitem{annobird07} Ralf Tautenhahn, Christoph B\"ottcher, Steffen Neumann : Annotation of LC/ESI--MS Mass Signals, BIRD 2007 Proc. of BIRD 2007 -- 1st International Conference on Bioinformatics Research and Development, 2007. \url{http://www.springerlink.com/content/473l404001787974/} and \url{http://msbi.ipb-halle.de/~rtautenh/bird07.pdf} %\bibitem{disopwabi06} B\"ocker \end{thebibliography} \end{document}