%\VignetteIndexEntry{RMAGEML} %\VignetteDepends{RMAGEML} %\VignetteKeywords{RMAGEML} %\VignettePackage{RMAGEML} \documentclass[12pt]{article} \usepackage{amsmath,epsfig,fullpage} \usepackage{amsmath} \usepackage{hyperref} \usepackage{url} \usepackage[authoryear,round]{natbib} \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \author{Steffen Durinck$^\ddagger$\footnote{Steffen.Durinck@esat.kuleuven.ac.be}, Joke Allemeersch$^\ddagger$\footnote{Joke.Allemeersch@esat.kuleuven.ac.be}, Vincent J Carey$^\P$, Yves Moreau$^\ddagger$,\\ Bart De Moor$^\ddagger$} \begin{document} \title{Documentation of the RMAGEML package} \maketitle \begin{center} $^\ddagger$Department of Electronical Engineering, ESAT-SCD, K.U.Leuven,\\ Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium, \url{http://www.esat.kuleuven.ac.be/~dna/BioI}\\and $^\P$Channing Laboratory, Brigham and Women's Hospital, 75 Francis Street, Boston 02115, USA \end{center} %library(tools) %Rnwfile<- file.path("/home/steffen/programming/R/MAGEML/inst/doc/RMAGEML.Rnw") %Sweave(Rnwfile,pdf=TRUE,eps=TRUE,stylepath=TRUE,driver=RweaveLatex()) \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} MAGE-ML or Microarray Gene Expression Markup Language is a language designed to describe and exchange information about microarray experiments. MAGE-ML is based on XML and can describe microarray designs, microarray experiment setups, gene expression data, and data analysis results. \\ This package provides the link between MAGE-ML files and BioConductor. It gives the possibility to read in MAGE-ML files that describe cDNA microarray experiments. The functions convert the MAGE-ML files into the customary BioConductor objects (i.e., \tt{marrayLayout}, \tt{marrayInfo} \rm and \tt{marrayRaw} \rm objects or limma \tt{RGList} \rm objects).\\ \ \noindent Here we give a short introduction to the Microarray and GeneExpression Object Model (MAGE-OM) and how we implemented the extraction of information necessary to make BioConductor objects. For a full description of MAGE-OM, we refer to the Gene Expression Specification: \url{http://www.omg.org/cgi-bin/doc?formal/03-02-03}.\\ The main classes of the MAGE object model are BioSequence, Quantitationtype, ArrayDesign, DesignElement, Array, BioMaterial, BioAssay, BioAssayData, Experiment, HigherLevelAnalysis, Protocol, Description, AuditAndSecurity, Measurement, and BioEvent.\\ In MAGE-ML these translate into packages with the same name. The packages needed for building BioConductor objects are BioAssayData, BioAssay, BioMaterial, BioSequence, ArrayDesign, and DesignElement.\\ The \textsf{DesignElement} package contains a mapping of \textsf{\emph{Features}}, which are the actual features present on the array, to \textsf{\emph{Reporters}}, the reporter a feature represents. The \textsf{DesignElement} package also provides a mapping from \textsf{\emph{Reporters}} to their corresponding \textsf{\emph{BioSequence}} references. These \textsf{\emph{BioSequence}} objects are characterized by their name and database entries in the \textsf{BioSequence} package. The \textsf{ArrayDesign} package contains information on the layout of the array. From this package, we can derive the position of each \textsf{\emph{Feature}} on the array in terms of \textsf{\emph{Zone}} (block or grid) and row and column within each \textsf{\emph{Zone}}. The \textsf{BioAssayData} package describes the feature references that were assayed and the measured and derived \textsf{\emph{QuantitationTypes}}. The \textsf{BioAssay} package describes the different steps in the microarray experiment. The last package used to make BioConductor objects is the \textsf{BioMaterial} package and describes how a sample is treated to obtain, for example, labeled samples used for hybridization.\\ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Prerequisites}{ The RMAGEML package depends on SJava(>= 0.68) and a Java VM, e.g. j2resdk1.4.0.\\ Other dependencies are as the Java-MAGEstk API and Java Xerces included in the package itself.\\ } \section{Getting started} {\bf Installing the package.} The package can be installed as a normal R package: download the RMAGEML\_2.0.4.tar.gz package and under Unix use the command\\ \ \noindent{\tt R CMD INSTALL RMAGEML\_2.1.0.tar.gz}.\\ \ \noindent The equivalent command for Windows is\\ \ \noindent{\tt Rcmd INSTALL RMAGEML\_2.1.0.zip}.\\ \ \noindent The package automatically loads the Biobase and marrayInput packages from BioConductor and the SJava libraries, so these should be installed as well.\\ \ \noindent {\bf Starting R.} Before starting R one should be aware that the RMAGEML package uses SJava and that SJava requires to set the LD\_LIBRARY\_PATH environment variable before starting R.\\ Without setting this variable the package won't work\\ \noindent {\bf Loading the package.} You can load the package into R by typing\\ <<>>= ## load up the library library(RMAGEML) @ \section{Import to marray packages} \subsection{One step import and creation of an marrayRaw object from MAGE-ML files} In the marray packages of BioConductor the design of an array experiment is typically described by an \tt{marrayLayout} \rm and \tt{marrayInfo} \rm object. The function \tt{importMAGEML} \rm parses all MAGE-ML files present in the directory, which is given as a parameter to the function. From these files it creates an \tt{marrayLayout} \rm object, containing the Layout of one type of microarrays, and an \tt{marrayInfo} \rm object containing the gene names and database entries of the features spotted on the array. The name of the database to which the entries refer, is given in the `notes' slot of the Gnames object. Next the function will extract the raw data values and output a complete \emph{marrayRaw} object as a result.\\ \ \noindent The function can be tested on the MEXP-14 dataset. This example is available from ArrayExpress at \url{http://www.ebi.ac.uk/arrayexpress/}.\\ If one knows which \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types} are required, the import function can be used as: <<>>= ## create marrayRaw object datadir <- system.file("MAGEMLdata", package="RMAGEML") raw <- importMAGEML(directory = datadir, package = "marray", arrayID = "A-MEXP-14", DED = "DED:707", QTD = "QTD:707", name.Rf = "QT:F635 Mean", name.Rb = "QT:B635 Median", name.Gf = "QT:F532 Mean" ,name.Gb = "QT:B532 Median") print(raw) @ \noindent If however you do not know which \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types} to use, you can call the function as follows: <<>>= ## create marrayRaw object datadir <- system.file("MAGEMLdata", package="RMAGEML") if(interactive()){ raw <- importMAGEML(directory = datadir, package = "marray") } @ \noindent This will generate a few selection panels which allow selection of the appropriate \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types}. \subsection{Creation of a Gnames marrayInfo object} If one just wants to make an marrayInfo object containing the gene names and database identifiers of the spotted features the function getGnames can be used. <<>>= #To obtain an marrayInfo object containing the database identifiers of the features present on the array.# data<-system.file("MAGEMLdata", package="RMAGEML") mageom<-importMAGEOM(directory=data) getGnames(mageom, arrayID="A-MEXP-14", DED="DED:707", package = "marray") @ \noindent Again leaving out the `DED' parameter will cause selection panels to pop up displaying the available \emph{DesignElement Dimensions}. \subsection{Creation of an marrayLayout object} In the marray packages the information on the array layout is stored in an marrayLayout object which can be created by the getArrayLayout function. <<>>= data<-system.file("MAGEMLdata", package="RMAGEML") #To obtain an marrayInfo object containing the database identifiers of the features present on the array.# mageom<-importMAGEOM(directory=data) getArrayLayout(mageom, arrayID="A-MEXP-14", DED="DED:707") @ \subsection{Make an marrayRaw object} The function makeMarrayRaw takes a Gnames and Layout object and parameters corresponding to the \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types} to create an \emph{marrayRaw} object. <<>>= data<-system.file("MAGEMLdata", package="RMAGEML") #To obtain an marrayInfo object containing the database identifiers of the features present on the array.# mageom<-importMAGEOM(directory=data) gnames<-getGnames(mageom, arrayID="A-MEXP-14", DED = "DED:707", package = "marray") layout<-getArrayLayout(mageom, arrayID="A-MEXP-14", DED = "DED:707") raw <- makeMarrayRaw(mageOM=mageom, layout = layout, gnames = gnames, directory = data, arrayID="A-MEXP-14", DED = "DED:707",QTD = "QTD:707", name.Rf = "QT:F635 Mean", name.Rb = "QT:B635 Median", name.Gf = "QT:F532 Mean" ,name.Gb = "QT:B532 Median") @ \section{Import to limma package} \subsection{One step import and creation of a limma RGList object from MAGE-ML files} In the limma package of BioConductor the raw data is stored in an \tt{RGList} \rm object. The function \tt{importMAGEML} \rm parses all MAGE-ML files present in the directory which is given as a parameter to the function. From these files it creates the \tt{RGList} \rm object, containing the layout, gene names and database entries of the features spotted on the array and the foreground and background intensities for the green and red channels.\\ \noindent The function can be tested on the MEXP-14 dataset. This example is available from ArrayExpress at \url{http://www.ebi.ac.uk/arrayexpress/}.\\ For import to limma the same function as MAGEML import to marray packages can be used, just adapt the name of the package into limma as follows: <<>>= ## create RGList object datadir <- system.file("MAGEMLdata", package="RMAGEML") raw <- importMAGEML(directory = datadir, package = "limma", arrayID="A-MEXP-14", DED = "DED:707", QTD = "QTD:707", name.Rf = "QT:F635 Mean", name.Rb = "QT:B635 Median", name.Gf = "QT:F532 Mean" ,name.Gb = "QT:B532 Median") print(raw) @ \noindent Similarly if one only specifies the `directory' and the `package', selection panels will pop up to select the \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types}. \subsection{Creating the genes dataframe of an RGList object} In limma the gene names, gene identifiers and layout information is stored in a dataframe which can be created by the getArrayLayoutLimma function. <<>>= data<-system.file("MAGEMLdata", package="RMAGEML") #To obtain an marrayInfo object containing the database identifiers of the features present on the array.# mageom<-importMAGEOM(directory=data) genes<-getArrayLayoutLimma(mageom, arrayID = "A-MEXP-14", DED="DED:707") print(genes[1:10,]) @ \subsection{Make an RGList object} The function makeRG takes a genes dataframe (containing the layout, gene identifiers and gene names), and parameters corresponding to \emph{DesignElement Dimension, QuantitationType Dimension} and \emph{Quantitation Types} to create a limma \tt{RGList} \rm object. <<>>= data<-system.file("MAGEMLdata", package="RMAGEML") #To obtain an marrayInfo object containing the database identifiers of the features present on the array.# mageom<-importMAGEOM(directory=data) genes<-getArrayLayoutLimma(mageom, arrayID = "A-MEXP-14", DED = "DED:707") raw<- makeRG(mageOM=mageom, genes = genes, directory=data, arrayID="A-MEXP-14", DED = "DED:707", QTD = "QTD:707", name.Rf = "QT:F635 Mean", name.Rb = "QT:B635 Median", name.Gf = "QT:F532 Mean" ,name.Gb = "QT:B532 Median") @ \end{document}