---
title: "Analyzing Lipid Feature Tendencies with LipidTrend"
package: LipidTrend
output:
BiocStyle::html_document:
toc: true
toc_depth: 3
number_sections: true
date: "`r Sys.Date()`"
vignette: >
%\VignetteIndexEntry{LipidTrend}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r style, echo=FALSE, results='asis'}
BiocStyle::markdown(css.files = c('custom.css'))
knitr::opts_chunk$set(collapse=TRUE, comment="#>")
```
# Introduction
`LipidTrend` is an R package designed to identify statistically significant
differences in lipidomic feature-level trends between groups. It supports both
one-dimensional and two-dimensional analyses of continuous lipid features (e.g.,
chain length, double bond count).
The package includes three main functions:
1. **analyzeLipidRegion()** – Performs the core statistical analysis to detect
lipid feature trends using Gaussian kernel smoothing.
2. **plotRegion1D()** – Visualizes trend analysis results for one-dimensional
lipid features.
3. **plotRegion2D()** – Visualizes trend analysis results for two-dimensional
lipid features.
In addition to these core functions, several helper functions are available to
facilitate the exploration and extraction of results from the returned
`LipidTrendSE` object.
For more details, please refer to the [Helper Functions section](#helper).
# Installation
To install `LipidTrend`, ensure that you have R 4.5.0 or later installed
(see the R Project at [http://www.r-project.org](http://www.r-project.org))
and are familiar with its usage.
`LipidTrend` package is available on Bioconductor repository
[http://www.bioconductor.org](http://www.bioconductor.org).
Before installing `LipidTrend`, you must first install the core Bioconductor
packages. If you have already installed them, you can skip the following step.
```{r install_Bioconductor, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install()
```
Once the core Bioconductor packages are installed, you can proceed with
installing `LipidTrend`.
```{r install_package, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("LipidTrend")
```
After the installation is complete, you’re ready to start using `LipidTrend`.
Now, let’s load the package first.
```{r load, message=FALSE}
library(LipidTrend)
```
# Data preparation
`LipidTrend` requires a SummarizedExperiment object as input data. It must
contain the following components:
1. **Assay**: A numeric matrix representing lipid abundance values,
where each row corresponds to a lipid species and each column to a sample.
Please ensure the values meet the following requirements:
* Preprocessing required: Before using `LipidTrend`, the abundance data
must undergo preprocessing to address missing or noisy values. This
preprocessing should include filtering, imputation, and normalization.
* No transformation needed: Abundance values should not undergo any
transformations (e.g., log or square root transformations) before analysis.
* Value constraints: Zero values are acceptable; however, missing values
(NA) are not supported and must be handled during preprocessing.
2. **RowData**: A data frame containing lipid feature information
(e.g., double bond count, chain length, or other continuous variables),
where each row corresponds to a lipid species and each column to a
specific lipid feature.
* The number and order of rows must exactly match those in the abundance
matrix (Assay).
* This component must include either one or two numeric feature columns:
* One column enables one-dimensional trend analysis.
* Two columns enable two-dimensional trend analysis.
3. **ColData**: A data frame containing metadata for each sample.
* Each row must correspond to a sample, matching the column order of
the Assay matrix.
* The columns are required to be arranged accordingly:
* `sample_name`: A unique identifier for each sample.
* `label_name`: A display name used for plotting or grouping.
* `group`: The experimental condition or biological group associated
with each sample.
If you are already familiar with constructing a SummarizedExperiment object,
you can skip the following section. Otherwise, refer to the example in the rest
of this section to learn how to build a SummarizedExperiment object.
## Abundance data
The abundance data is a matrix containing lipid abundance values across lipids
and samples, where rows represent lipids and columns represent samples.
```{r abundance}
# load example abundance data
data("abundance_2D")
# view abundance data
head(abundance_2D, 5)
```
## Lipid characteristic table
The lipid characteristic table is a data frame containing information about
each lipid's characteristics, such as the number of double bonds and chain
length.
The order of the lipids in this table must align with the abundance data.
A one-dimensional analysis will be conducted if the table has only one column,
and a two-dimensional analysis will be performed if it contains two columns.
The table can only have a maximum of two columns.
In this example, we use data suitable for a two-dimensional analysis.
```{r char_table}
# load example lipid characteristic table (2D)
data("char_table_2D")
# view lipid characteristic table
head(char_table_2D, 5)
```
## Group information table
The group information table is a data frame containing grouping details
corresponding to the samples in the lipid abundance data. It must adhere to the
following requirements:
1. The columns must be arranged in the following order: `sample_name`,
`label_name`, and `group`.
2. All sample names must be unique.
3. The sample names in the `sample_name` column must match those in the lipid
abundance data.
4. The columns `sample_name`, `label_name`, and `group` must not contain
missing values (NA).
For example:
```{r groupInfo}
# load example group information table
data("group_info")
# view group information table
group_info
```
## Constructing SummarizedExperiment object
Once the abundance data, lipid characteristic table, and group information
table are prepared, we can construct the input SummarizedExperiment object.
We will use the `SummarizedExperiment` function from
`r Biocpkg("SummarizedExperiment")`.
Follow the command below to create this object.
```{r build_se}
se_2D <- SummarizedExperiment::SummarizedExperiment(
assays=list(abundance=abundance_2D),
rowData=S4Vectors::DataFrame(char_table_2D),
colData=S4Vectors::DataFrame(group_info))
```
# Initiate `LipidTrend` workflow
The `LipidTrend` workflow starts with a SummarizedExperiment object as input.
It supports both one-dimensional (1D) and two-dimensional (2D) lipid
features analyses.
Based on the number of feature columns provided in the `rowData` (e.g.,
chain length, double bond count), the function automatically performs
either 1D or 2D trend detection.
After statistical computation and visualization, the workflow returns:
* A result table summarizing statistical significance, fold change, and trend
direction.
* One or more plots highlighting significant lipid regions with group-specific
abundance trends.
This streamlined workflow enables researchers to identify structured lipidomic
patterns across feature dimensions with statistical rigor and biological
interpretability.
We recommend using the `set.seed()` function before starting to ensure
stability in the permutation process during computation.
```{r set_seed}
set.seed(1234)
```
## One-dimensional analysis {#lipid1d}
One-dimensional analysis is applied when the input dataset contains a single
continuous lipid feature, such as chain length or double bond count.
This approach is ideal when the biological question centers on one specific
biochemical property of lipids or when only one type of feature annotation
is available.
Compared to two-dimensional analysis, the one-dimensional approach is more
straightforward to interpret and requires less data completeness. It is
particularly suitable in the following scenarios:
* The study focuses on a specific lipid characteristic (e.g., elongation of
chain length or degree of desaturation).
* Only one lipid feature (e.g., double bond count) is annotated or reliably
available in the dataset.
* Visualizing lipid trends along a single axis sufficiently addresses the
biological hypothesis.
To begin, we will first examine the structure of the example input data to
ensure it is correctly formatted for one-dimensional analysis.
```{r LipidTrend_1D}
# load example data
data("lipid_se_CL")
# quick look of SE structure
show(lipid_se_CL)
```
### Analyze lipid region
**Overview of Region-Based Trend Analysis**
The `analyzeLipidRegion()` function performs region-based statistical analysis
of lipidomic features, integrating both marginal testing and permutation-based
smoothed testing to identify meaningful trends across continuous lipid features
(e.g., chain length, double bond).
This two-stage approach enhances statistical power and robustness, especially in
small-sample datasets:
1. **Marginal Test**: Each lipid feature is first tested individually using
either a t-test (with glog10 transformation) or a Wilcoxon test. This step
yields a marginal statistic and a corresponding marginal p-value for each
feature.
2. **Region-Based Permutation Test with Smoothing**: The resulting vector of
marginal statistics is then smoothed using a Gaussian kernel, which integrates
information from neighboring lipid features based on their similarity (e.g.,
proximity in chain length). A smoothed statistic is computed for each lipid,
and an empirical p-value is derived via permutation testing by comparing the
observed statistic to a null distribution.
**Note:**
- If `test=t.test` (default), abundance values are internally transformed using
glog10 transformation before testing.
- If `test=Wilcoxon` test, due to its higher computational cost during repeated
permutations, it is recommended to set permute_time to fewer than 10,000
to maintain a reasonable runtime.
**Split-Chain Analysis for Chain Length Features**
To enhance the biological interpretability of chain length–related trends,
`split_chain` provides an option to analyze even-chain and odd-chain lipids
via the `split_chain` parameter separately.
Enabling this option allows the function to separate lipids based on chain
length parity (even vs. odd) and perform region-based statistical testing
independently for each group. This is particularly beneficial when distinct
biosynthetic or regulatory patterns are expected between even-
and odd-chain lipid species.
To activate this feature, configure the `split_chain` and `chain_col`
parameters as follows:
* If `split_chain=TRUE`:
* The function performs separate analyses for even- and odd-chain lipids.
* Specify the column name containing chain length information using
the `chain_col` parameter.
* If `split_chain=FALSE` (default):
* All lipids will be analyzed together without parity distinction.
* Set `chain_col=NULL`.
**Recommendation:**
Set `split_chain=TRUE` when analyzing features such as chain length, as it
often leads to more meaningful biological insights.
**Note:**
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be issued.
**Abundance-Weighted vs. Unweighted Statistics**
To reflect the biological importance of more abundant lipids, we provide an
option controlled by the `abund_weight` parameter, which allows for weighting
region statistics based on the average abundance of each lipid species.
* When `abund_weight=TRUE` (default), the marginal test statistic is scaled by
each lipid’s normalized average abundance during the smoothing step. This
emphasizes biologically dominant signals while down-weighting low-abundance,
potentially noisy lipids.
* When `abund_weight=FALSE`, all lipids are treated equally, regardless of
their abundance. In this case, the region statistic is calculated solely based
on test statistics and feature similarity.
This flexibility allows users to choose between:
* A weighted strategy that highlights strong, dominant lipid shifts
(recommended for most biological datasets).
* An unweighted strategy that gives equal weight to all lipids, which may be
suitable for hypothesis-driven or targeted studies.
**Note:**
- Abundance weighting is applied only during the smoothing and permutation
steps; it does not affect the initial marginal testing.
```{r countRegion1D}
# run analyzeLipidRegion
res1D <- analyzeLipidRegion(
lipid_se_CL, ref_group="sgCtrl", split_chain=TRUE,
chain_col="chain", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res1D)
```
The `analyzeLipidRegion()` function produces an extended SummarizedExperiment
object called `LipidTrendSE`. To facilitate result extraction, we offer several
helper functions that make viewing the resulting data frame easier. For more
information on these helper functions, please refer to
[Helper Functions section](#helper).
```{r Region1D_result}
# view even chain result (first 5 lines)
head(even_chain_result(res1D), 5)
# view odd chain result (first 5 lines)
head(odd_chain_result(res1D), 5)
```
### Result visualization
This section demonstrates how to visualize the results from the `LipidTrendSE`
object returned by the `analyzeLipidRegion()` function.
**Note:**
- If `split_chain=TRUE` was set in `analyzeLipidRegion()`, two separate plots
will be generated: one for even-chain lipids, and one for odd-chain lipids.
- If `split_chain=FALSE`, only a single plot will be returned, showing all
lipids together.
```{r plotRegion1D}
# plot result
plots <- plotRegion1D(res1D, p_cutoff=0.05, y_scale='identity')
# even chain result
plots$even_result
# odd chain result
plots$odd_result
```
The visualization illustrates lipid trends and includes the following
components:
1. **Blue/Red Ribbons** – Highlight regions where the trend significantly
differs between groups, as determined by the smoothed permutation test.
2. **Points** – Represent the mean abundance within each group (case vs.
control) for each lipid feature.
**Color Interpretation:**
* Red regions indicate a significant increase in lipid abundance in the case
group compared to the control group.
* Blue regions indicate a significant decrease in lipid abundance in the case
group compared to the control group.
These visualizations highlight not only the magnitude of lipid abundance
changes but also the specific feature-level regions (e.g., chain length)
where group differences are most pronounced.
## Two-dimensional analysis {#lipid2d}
Two-dimensional analysis is applicable when the input dataset includes two
continuous lipid features, such as chain length, double bond count, or other
numeric lipid characteristics. Compared to one-dimensional analysis,
2D analysis enables the detection of more complex patterns by simultaneously
evaluating lipid trends across two biochemical axes.
This method is particularly well-suited for the following scenarios:
* The dataset contains two well-annotated lipid features suitable for trend
analysis (e.g., chain length, double bond, or other structural indices).
* The biological question involves joint effects or interactions between
lipid properties—such as the enrichment of long-chain polyunsaturated species.
* The objective is to identify localized regions within the 2D feature space
that contribute to group differences.
* There is a need for more detailed and spatial visualization of
lipid trend patterns.
Two-dimensional analysis provides a high-resolution map of lipid changes,
allowing for the identification of specific combinations of features (e.g.,
long-chain saturated vs. short-chain unsaturated lipids) that may i
ndicate pathway-level regulation.
Let’s now take a quick look at the structure of the example input data
used for 2D analysis.
```{r LipidTrend_2D}
# load example data
data("lipid_se_2D")
# quick look of SE structure
show(lipid_se_2D)
```
### Analyze lipid region
**Overview of Region-Based Trend Analysis**
The `analyzeLipidRegion()` function performs region-based statistical analysis
of lipidomic features, integrating both marginal testing and permutation-based
smoothed testing to identify meaningful trends across continuous lipid features
(e.g., chain length, double bond).
This two-stage approach enhances statistical power and robustness, especially in
small-sample datasets:
1. **Marginal Test**: Each lipid feature is first tested individually using
either a t-test (with glog10 transformation) or a Wilcoxon test. This step
yields a marginal statistic and a corresponding marginal p-value for
each feature.
2. **Region-Based Permutation Test with Smoothing**: The resulting vector of
marginal statistics is then smoothed using a Gaussian kernel, which integrates
information from neighboring lipid features based on their similarity (e.g.,
proximity in chain length). A smoothed statistic is computed for each lipid,
and an empirical p-value is derived via permutation testing by comparing the
observed statistic to a null distribution.
**Note:**
- If `test=t.test` (default), abundance values are internally transformed using
glog10 transformation before testing.
- If `test=Wilcoxon` test, due to its higher computational cost during repeated
permutations, it is recommended to set permute_time to fewer than 10,000
to maintain a reasonable runtime.
**Split-Chain Analysis for Chain Length Features**
To enhance the biological interpretability of chain length–related trends,
`split_chain` provides an option to analyze even-chain and odd-chain lipids
via the `split_chain` parameter separately.
Enabling this option allows the function to separate lipids based on chain
length parity (even vs. odd) and perform region-based statistical testing
independently for each group. This is particularly beneficial when distinct
biosynthetic or regulatory patterns are expected between even-
and odd-chain lipid species.
To activate this feature, configure the `split_chain` and `chain_col`
parameters as follows:
* If `split_chain=TRUE`:
* The function performs separate analyses for even- and odd-chain lipids.
* Specify the column name containing chain length information using
the `chain_col` parameter.
* If `split_chain=FALSE` (default):
* All lipids will be analyzed together without parity distinction.
* Set `chain_col=NULL`.
**Recommendation:**
Set `split_chain=TRUE` when analyzing features such as chain length, as it
often leads to more meaningful biological insights.
**Note:**
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be issued.
**Abundance-Weighted vs. Unweighted Statistics**
To reflect the biological importance of more abundant lipids, we provide an
option controlled by the `abund_weight` parameter, which allows for weighting
region statistics based on the average abundance of each lipid species.
* When `abund_weight=TRUE` (default), the marginal test statistic is scaled by
each lipid’s normalized average abundance during the smoothing step. This
emphasizes biologically dominant signals while down-weighting low-abundance,
potentially noisy lipids.
* When `abund_weight=FALSE`, all lipids are treated equally, regardless of
their abundance. In this case, the region statistic is calculated solely based
on test statistics and feature similarity.
This flexibility allows users to choose between:
* A weighted strategy that highlights strong, dominant lipid shifts
(recommended for most biological datasets).
* An unweighted strategy that gives equal weight to all lipids, which may be
suitable for hypothesis-driven or targeted studies.
**Note:**
- Abundance weighting is applied only during the smoothing and permutation
steps; it does not affect the initial marginal testing.
```{r countRegion2D}
# run analyzeLipidRegion
res2D <- analyzeLipidRegion(
lipid_se_2D, ref_group="sgCtrl", split_chain=TRUE,
chain_col="Total.C", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res2D)
```
The `analyzeLipidRegion()` function produces an extended SummarizedExperiment
object called `LipidTrendSE`. To facilitate result extraction, we offer several
helper functions that make viewing the resulting data frame easier. For more
information on these helper functions, please refer to
[Helper Functions section](#helper).
```{r Region2D_result}
# view even chain result (first 5 lines)
head(even_chain_result(res2D), 5)
# view odd chain result (first 5 lines)
head(odd_chain_result(res2D), 5)
```
### Result visualization
This section demonstrates how to visualize the results from the `LipidTrendSE`
object returned by the `analyzeLipidRegion()` function.
**Note:**
- If `split_chain=TRUE` was set in `analyzeLipidRegion()`, two separate plots
will be generated: one for even-chain lipids, and one for odd-chain lipids.
- If `split_chain=FALSE`, only a single plot will be returned, showing all
lipids together.
```{r LipidTrend_2D_plot}
# plot result
plot2D <- plotRegion2D(res2D, p_cutoff=0.05)
# even chain result
plot2D$even_result
# odd chain result
plot2D$odd_result
```
This plot visualizes two-dimensional lipid features, highlighting regional
trends and significant differences between case and control groups.
The X- and Y-axes represent two continuous lipid characteristics (e.g., total
chain length and double bond count). Each point corresponds to a lipid, with
the color indicating its log₂ fold-change (log₂FC) between case and
control groups:
* Red points indicate higher abundance in case samples.
* Blue points indicate higher abundance in control samples.
If `abund_weight=TRUE`, point size reflects the mean abundance of each lipid.
If `abund_weight=FALSE`, all points are displayed with equal size.
Asterisks (\*, \*\*, ***) denote levels of statistical significance based on
the marginal test, with more asterisks representing stronger evidence.
Colored outlines (red or blue) represent significant regions identified by
the smoothed region-based permutation test, which incorporates information
from neighboring lipids with similar feature values:
* Red regions indicate a significant trend of increasing abundance
in case samples.
* Blue regions indicate a significant trend of decreasing abundance
in case samples.
This visualization enables detection of both individual lipid-level changes
and region-level patterns, providing biologically meaningful trends
across two lipid features.
# Helper Functions – enhancing result viewing {#helper}
## View Results
`LipidTrend` provides 4 helper functions to enhance the viewing of the
`LipidTrendSE` object returned by `analyzeLipidRegion()`:
1. `result()` – Returns the result data frame.
2. `even_chain_result()` – Returns the even-chain result data frame.
3. `odd_chain_result()` – Returns the odd-chain result data frame.
4. `show()` – Displays a summary of the `LipidTrendSE` object.
**Notes:**
- If `split_chain=TRUE`, use `even_chain_result()` and `odd_chain_result()` to
view the results separately. Otherwise, use `result()`.
- To extract `assay`, `rowData`, or `colData` from the `LipidTrendSE` object,
use functions from the `r Biocpkg("SummarizedExperiment")` package.
## Interpreting the Result Table
The result table contains the following columns:
* **Feature columns (Total.C, Total.DB, etc.)**: Lipid feature values from the
input `rowData`, such as chain length or double bond count. Column names vary
depending on the input dataset.
* **avg.abund**: Mean abundance of each lipid across all samples. For
one-dimensional analysis, may also include **avg.abund.ctrl** and
**avg.abund.case** for group-wise means.
* **direction**: Indicates the sign of the smoothed statistic:
* "+" signifies an increasing trend in the case group.
* "–" signifies a decreasing trend in the case group.
* **smoothing.pval.BH**: Benjamini–Hochberg adjusted p-value from the
region-based permutation test.
* **marginal.pval.BH**: Benjamini–Hochberg adjusted p-value from the marginal
test (per lipid).
* **log2.FC**: Log₂ fold-change in lipid abundance between case and
control groups.
* **significance**: Overall significance label based on the smoothed test and
FC direction:
* "Increase" – significant positive trend in the case group
* "Decrease" – significant negative trend in the case group
* "NS" – not significant
# Session info
```{r sessionInfo, echo=FALSE}
sessionInfo()
```