LipidTrend is an R package designed to identify
statistically significant differences in lipidomic feature-level trends
between groups. It supports both one-dimensional and two-dimensional
analyses of continuous lipid features (e.g., chain length, double bond
count).
The package includes three main functions:
In addition to these core functions, several helper functions are
available to facilitate the exploration and extraction of results from
the returned LipidTrendSE object.
For more details, please refer to the Helper Functions section.
To install LipidTrend, ensure that you have R 4.5.0 or
later installed (see the R Project at http://www.r-project.org) and are
familiar with its usage.
LipidTrend package is available on Bioconductor
repository http://www.bioconductor.org.
Before installing LipidTrend, you must first install the
core Bioconductor packages. If you have already installed them, you can
skip the following step.
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install()Once the core Bioconductor packages are installed, you can proceed
with installing LipidTrend.
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("LipidTrend")After the installation is complete, you’re ready to start using
LipidTrend. Now, let’s load the package first.
LipidTrend requires a SummarizedExperiment object as
input data. It must contain the following components:
Assay: A numeric matrix representing lipid abundance values, where each row corresponds to a lipid species and each column to a sample. Please ensure the values meet the following requirements:
LipidTrend, the
abundance data must undergo preprocessing to address missing or noisy
values. This preprocessing should include filtering, imputation, and
normalization.RowData: A data frame containing lipid feature information (e.g., double bond count, chain length, or other continuous variables), where each row corresponds to a lipid species and each column to a specific lipid feature.
ColData: A data frame containing metadata for each sample.
sample_name: A unique identifier for each sample.label_name: A display name used for plotting or
grouping.group: The experimental condition or biological group
associated with each sample.If you are already familiar with constructing a SummarizedExperiment object, you can skip the following section. Otherwise, refer to the example in the rest of this section to learn how to build a SummarizedExperiment object.
The abundance data is a matrix containing lipid abundance values across lipids and samples, where rows represent lipids and columns represent samples.
# load example abundance data
data("abundance_2D")
# view abundance data
head(abundance_2D, 5)
#> HSD17B12KO01 HSD17B12KO02 HSD17B12KO03 sgCtrl01 sgCtrl02
#> TG 33:1 0.7591750 0.3109753 0.4456624 0.008158902 0.04569340
#> TG 36:1 3.2885277 2.7623366 3.0669865 0.193283659 0.25024117
#> TG 36:2 1.9974341 1.3529295 1.5173635 0.187196921 0.28231711
#> TG 37:2 1.7893400 0.8872451 1.0449693 0.121982450 0.71770893
#> TG 38:0 0.3413125 0.3504082 0.3784324 0.031840554 0.04336764
#> sgCtrl03
#> TG 33:1 0.08986397
#> TG 36:1 0.36918016
#> TG 36:2 0.44319480
#> TG 37:2 1.13574059
#> TG 38:0 0.05876135The lipid characteristic table is a data frame containing information about each lipid’s characteristics, such as the number of double bonds and chain length. The order of the lipids in this table must align with the abundance data.
A one-dimensional analysis will be conducted if the table has only one column, and a two-dimensional analysis will be performed if it contains two columns. The table can only have a maximum of two columns.
In this example, we use data suitable for a two-dimensional analysis.
The group information table is a data frame containing grouping details corresponding to the samples in the lipid abundance data. It must adhere to the following requirements:
sample_name, label_name, and
group.sample_name column must match
those in the lipid abundance data.sample_name, label_name, and
group must not contain missing values (NA).For example:
# load example group information table
data("group_info")
# view group information table
group_info
#> sample_name label_name group
#> 1 HSD17B12KO01 HSD17B12KO01 HSD17B12KO
#> 2 HSD17B12KO02 HSD17B12KO02 HSD17B12KO
#> 3 HSD17B12KO03 HSD17B12KO03 HSD17B12KO
#> 4 sgCtrl01 sgCtrl01 sgCtrl
#> 5 sgCtrl02 sgCtrl02 sgCtrl
#> 6 sgCtrl03 sgCtrl03 sgCtrlOnce the abundance data, lipid characteristic table, and group
information table are prepared, we can construct the input
SummarizedExperiment object. We will use the
SummarizedExperiment function from SummarizedExperiment.
Follow the command below to create this object.
LipidTrend workflowThe LipidTrend workflow starts with a
SummarizedExperiment object as input. It supports both one-dimensional
(1D) and two-dimensional (2D) lipid features analyses.
Based on the number of feature columns provided in the
rowData (e.g., chain length, double bond count), the
function automatically performs either 1D or 2D trend detection.
After statistical computation and visualization, the workflow returns:
This streamlined workflow enables researchers to identify structured lipidomic patterns across feature dimensions with statistical rigor and biological interpretability.
We recommend using the set.seed() function before
starting to ensure stability in the permutation process during
computation.
One-dimensional analysis is applied when the input dataset contains a single continuous lipid feature, such as chain length or double bond count. This approach is ideal when the biological question centers on one specific biochemical property of lipids or when only one type of feature annotation is available.
Compared to two-dimensional analysis, the one-dimensional approach is more straightforward to interpret and requires less data completeness. It is particularly suitable in the following scenarios:
To begin, we will first examine the structure of the example input data to ensure it is correctly formatted for one-dimensional analysis.
# load example data
data("lipid_se_CL")
# quick look of SE structure
show(lipid_se_CL)
#> class: SummarizedExperiment
#> dim: 29 6
#> metadata(0):
#> assays(1): abundance
#> rownames(29): 33 36 ... 62 64
#> rowData names(1): chain
#> colnames: NULL
#> colData names(3): sample_name label_name groupOverview of Region-Based Trend Analysis
The analyzeLipidRegion() function performs region-based
statistical analysis of lipidomic features, integrating both marginal
testing and permutation-based smoothed testing to identify meaningful
trends across continuous lipid features (e.g., chain length, double
bond).
This two-stage approach enhances statistical power and robustness, especially in small-sample datasets:
Marginal Test: Each lipid feature is first tested individually using either a t-test (with glog10 transformation) or a Wilcoxon test. This step yields a marginal statistic and a corresponding marginal p-value for each feature.
Region-Based Permutation Test with Smoothing: The resulting vector of marginal statistics is then smoothed using a Gaussian kernel, which integrates information from neighboring lipid features based on their similarity (e.g., proximity in chain length). A smoothed statistic is computed for each lipid, and an empirical p-value is derived via permutation testing by comparing the observed statistic to a null distribution.
Note:
- If test=t.test (default), abundance values are internally
transformed using glog10 transformation before testing.
- If
test=Wilcoxon test, due to its higher computational cost
during repeated permutations, it is recommended to set permute_time to
fewer than 10,000 to maintain a reasonable runtime.
Split-Chain Analysis for Chain Length Features
To enhance the biological interpretability of chain length–related
trends, split_chain provides an option to analyze
even-chain and odd-chain lipids via the split_chain
parameter separately.
Enabling this option allows the function to separate lipids based on chain length parity (even vs. odd) and perform region-based statistical testing independently for each group. This is particularly beneficial when distinct biosynthetic or regulatory patterns are expected between even- and odd-chain lipid species.
To activate this feature, configure the split_chain and
chain_col parameters as follows:
split_chain=TRUE:
chain_col parameter.split_chain=FALSE (default):
chain_col=NULL.Recommendation: Set split_chain=TRUE
when analyzing features such as chain length, as it often leads to more
meaningful biological insights.
Note:
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be
issued.
Abundance-Weighted vs. Unweighted Statistics
To reflect the biological importance of more abundant lipids, we
provide an option controlled by the abund_weight parameter,
which allows for weighting region statistics based on the average
abundance of each lipid species.
abund_weight=TRUE (default), the marginal test
statistic is scaled by each lipid’s normalized average abundance during
the smoothing step. This emphasizes biologically dominant signals while
down-weighting low-abundance, potentially noisy lipids.abund_weight=FALSE, all lipids are treated
equally, regardless of their abundance. In this case, the region
statistic is calculated solely based on test statistics and feature
similarity.This flexibility allows users to choose between:
Note:
- Abundance weighting is applied only during the smoothing and
permutation steps; it does not affect the initial marginal testing.
# run analyzeLipidRegion
res1D <- analyzeLipidRegion(
lipid_se_CL, ref_group="sgCtrl", split_chain=TRUE,
chain_col="chain", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res1D)
#> class: LipidTrendSE
#> dim: 29 6
#> metadata(0):
#> assays(1): abundance
#> rownames(29): 33 36 ... 62 64
#> rowData names(1): chain
#> colnames: NULL
#> colData names(3): sample_name label_name group
#>
#> LipidTrend Results:
#> ------------------------
#> Split chain analysis: Yes
#> Even chain result: 15 features
#> Odd chain result: 14 featuresThe analyzeLipidRegion() function produces an extended
SummarizedExperiment object called LipidTrendSE. To
facilitate result extraction, we offer several helper functions that
make viewing the resulting data frame easier. For more information on
these helper functions, please refer to Helper
Functions section.
# view even chain result (first 5 lines)
head(even_chain_result(res1D), 5)
#> chain avg.abund.ctrl avg.abund.case direction smoothing.pval.BH
#> 36 36 0.5751379 4.661859 + 0.001071429
#> 38 38 0.7665390 3.708691 + 0.001071429
#> 40 40 0.9500241 6.091883 + 0.001071429
#> 42 42 4.4355047 20.377929 + 0.001071429
#> 44 44 19.1928487 61.133996 + 0.001071429
#> marginal.pval.BH log2.FC significance
#> 36 0.0003639142 3.018926 Increase
#> 38 0.0007890215 2.274479 Increase
#> 40 0.0002324393 2.680852 Increase
#> 42 0.0004215113 2.199837 Increase
#> 44 0.0005093479 1.671406 Increase
# view odd chain result (first 5 lines)
head(odd_chain_result(res1D), 5)
#> chain avg.abund.ctrl avg.abund.case direction smoothing.pval.BH
#> 33 33 0.04790542 0.5052709 + 0.001
#> 37 37 0.65847733 1.2405181 + 0.001
#> 39 39 0.11565136 1.2085836 + 0.001
#> 41 41 0.41424681 4.3259329 + 0.001
#> 43 43 1.80730482 14.2470104 + 0.001
#> marginal.pval.BH log2.FC significance
#> 33 2.565158e-02 3.3987963 Increase
#> 37 2.233191e-01 0.9137372 Increase
#> 39 5.134067e-05 3.3854631 Increase
#> 41 1.591450e-04 3.3844488 Increase
#> 43 1.050032e-03 2.9787475 IncreaseThis section demonstrates how to visualize the results from the
LipidTrendSE object returned by the
analyzeLipidRegion() function.
Note:
- If split_chain=TRUE was set in
analyzeLipidRegion(), two separate plots will be generated:
one for even-chain lipids, and one for odd-chain lipids.
- If
split_chain=FALSE, only a single plot will be returned,
showing all lipids together.
# plot result
plots <- plotRegion1D(res1D, p_cutoff=0.05, y_scale='identity')
# even chain result
plots$even_result
#> Warning: Removed 8 rows containing missing values or values outside the scale range
#> (`geom_ribbon()`).
#> Removed 8 rows containing missing values or values outside the scale range
#> (`geom_ribbon()`).# odd chain result
plots$odd_result
#> Warning: Removed 9 rows containing missing values or values outside the scale range
#> (`geom_ribbon()`).
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_ribbon()`).The visualization illustrates lipid trends and includes the following components:
Color Interpretation:
These visualizations highlight not only the magnitude of lipid abundance changes but also the specific feature-level regions (e.g., chain length) where group differences are most pronounced.
Two-dimensional analysis is applicable when the input dataset includes two continuous lipid features, such as chain length, double bond count, or other numeric lipid characteristics. Compared to one-dimensional analysis, 2D analysis enables the detection of more complex patterns by simultaneously evaluating lipid trends across two biochemical axes.
This method is particularly well-suited for the following scenarios:
Two-dimensional analysis provides a high-resolution map of lipid changes, allowing for the identification of specific combinations of features (e.g., long-chain saturated vs. short-chain unsaturated lipids) that may i ndicate pathway-level regulation.
Let’s now take a quick look at the structure of the example input data used for 2D analysis.
# load example data
data("lipid_se_2D")
# quick look of SE structure
show(lipid_se_2D)
#> class: SummarizedExperiment
#> dim: 137 6
#> metadata(0):
#> assays(1): abundance
#> rownames(137): TG 33:1 TG 36:1 ... TG 64:5 TG 64:8
#> rowData names(2): Total.C Total.DB
#> colnames: NULL
#> colData names(3): sample_name label_name groupOverview of Region-Based Trend Analysis
The analyzeLipidRegion() function performs region-based
statistical analysis of lipidomic features, integrating both marginal
testing and permutation-based smoothed testing to identify meaningful
trends across continuous lipid features (e.g., chain length, double
bond).
This two-stage approach enhances statistical power and robustness, especially in small-sample datasets:
Marginal Test: Each lipid feature is first tested individually using either a t-test (with glog10 transformation) or a Wilcoxon test. This step yields a marginal statistic and a corresponding marginal p-value for each feature.
Region-Based Permutation Test with Smoothing: The resulting vector of marginal statistics is then smoothed using a Gaussian kernel, which integrates information from neighboring lipid features based on their similarity (e.g., proximity in chain length). A smoothed statistic is computed for each lipid, and an empirical p-value is derived via permutation testing by comparing the observed statistic to a null distribution.
Note:
- If test=t.test (default), abundance values are internally
transformed using glog10 transformation before testing.
- If
test=Wilcoxon test, due to its higher computational cost
during repeated permutations, it is recommended to set permute_time to
fewer than 10,000 to maintain a reasonable runtime.
Split-Chain Analysis for Chain Length Features
To enhance the biological interpretability of chain length–related
trends, split_chain provides an option to analyze
even-chain and odd-chain lipids via the split_chain
parameter separately.
Enabling this option allows the function to separate lipids based on chain length parity (even vs. odd) and perform region-based statistical testing independently for each group. This is particularly beneficial when distinct biosynthetic or regulatory patterns are expected between even- and odd-chain lipid species.
To activate this feature, configure the split_chain and
chain_col parameters as follows:
split_chain=TRUE:
chain_col parameter.split_chain=FALSE (default):
chain_col=NULL.Recommendation: Set split_chain=TRUE
when analyzing features such as chain length, as it often leads to more
meaningful biological insights.
Note:
- If fewer than two lipids are present in either the even or odd group,
analysis for that group will be skipped, and a warning will be
issued.
Abundance-Weighted vs. Unweighted Statistics
To reflect the biological importance of more abundant lipids, we
provide an option controlled by the abund_weight parameter,
which allows for weighting region statistics based on the average
abundance of each lipid species.
abund_weight=TRUE (default), the marginal test
statistic is scaled by each lipid’s normalized average abundance during
the smoothing step. This emphasizes biologically dominant signals while
down-weighting low-abundance, potentially noisy lipids.abund_weight=FALSE, all lipids are treated
equally, regardless of their abundance. In this case, the region
statistic is calculated solely based on test statistics and feature
similarity.This flexibility allows users to choose between:
Note:
- Abundance weighting is applied only during the smoothing and
permutation steps; it does not affect the initial marginal testing.
# run analyzeLipidRegion
res2D <- analyzeLipidRegion(
lipid_se_2D, ref_group="sgCtrl", split_chain=TRUE,
chain_col="Total.C", radius=3, own_contri=0.5, test="t.test",
abund_weight=TRUE, permute_time=1000)
# view result summary
show(res2D)
#> class: LipidTrendSE
#> dim: 137 6
#> metadata(0):
#> assays(1): abundance
#> rownames(137): TG 33:1 TG 36:1 ... TG 64:5 TG 64:8
#> rowData names(2): Total.C Total.DB
#> colnames: NULL
#> colData names(3): sample_name label_name group
#>
#> LipidTrend Results:
#> ------------------------
#> Split chain analysis: Yes
#> Even chain result: 81 features
#> Odd chain result: 56 featuresThe analyzeLipidRegion() function produces an extended
SummarizedExperiment object called LipidTrendSE. To
facilitate result extraction, we offer several helper functions that
make viewing the resulting data frame easier. For more information on
these helper functions, please refer to Helper
Functions section.
# view even chain result (first 5 lines)
head(even_chain_result(res2D), 5)
#> Total.C Total.DB avg.abund direction smoothing.pval.BH marginal.pval.BH
#> TG 36:1 36 1 1.6550926 + 0.001038462 7.707795e-05
#> TG 36:2 36 2 0.9634060 + 0.001038462 1.839016e-03
#> TG 38:0 38 0 0.2006871 + 0.001038462 7.391351e-05
#> TG 38:1 38 1 0.8449176 + 0.001038462 1.052245e-04
#> TG 38:2 38 2 0.9041542 + 0.001038462 2.100744e-03
#> log2.FC significance
#> TG 36:1 3.487890 Increase
#> TG 36:2 2.415022 Increase
#> TG 38:0 2.997840 Increase
#> TG 38:1 2.970089 Increase
#> TG 38:2 1.708606 Increase
# view odd chain result (first 5 lines)
head(odd_chain_result(res2D), 5)
#> Total.C Total.DB avg.abund direction smoothing.pval.BH marginal.pval.BH
#> TG 33:1 33 1 0.2765882 + 0.001056604 0.0232316173
#> TG 37:2 37 2 0.9494977 + 0.103854545 0.2273794624
#> TG 39:0 39 0 0.1750303 + 0.001056604 0.0001638433
#> TG 39:1 39 1 0.4870872 + 0.001056604 0.0001274996
#> TG 41:0 41 0 0.5079328 + 0.001056604 0.0001506772
#> log2.FC significance
#> TG 33:1 3.3987963 Increase
#> TG 37:2 0.9137372 NS
#> TG 39:0 3.8873738 Increase
#> TG 39:1 3.2356822 Increase
#> TG 41:0 2.8007824 IncreaseThis section demonstrates how to visualize the results from the
LipidTrendSE object returned by the
analyzeLipidRegion() function.
Note:
- If split_chain=TRUE was set in
analyzeLipidRegion(), two separate plots will be generated:
one for even-chain lipids, and one for odd-chain lipids.
- If
split_chain=FALSE, only a single plot will be returned,
showing all lipids together.
This plot visualizes two-dimensional lipid features, highlighting regional trends and significant differences between case and control groups.
The X- and Y-axes represent two continuous lipid characteristics (e.g., total chain length and double bond count). Each point corresponds to a lipid, with the color indicating its log₂ fold-change (log₂FC) between case and control groups: * Red points indicate higher abundance in case samples. * Blue points indicate higher abundance in control samples.
If abund_weight=TRUE, point size reflects the mean
abundance of each lipid.
If abund_weight=FALSE, all
points are displayed with equal size.
Asterisks (*, **, ***) denote levels of statistical significance based on the marginal test, with more asterisks representing stronger evidence.
Colored outlines (red or blue) represent significant regions identified by the smoothed region-based permutation test, which incorporates information from neighboring lipids with similar feature values: * Red regions indicate a significant trend of increasing abundance in case samples. * Blue regions indicate a significant trend of decreasing abundance in case samples.
This visualization enables detection of both individual lipid-level changes and region-level patterns, providing biologically meaningful trends across two lipid features.
LipidTrend provides 4 helper functions to enhance the
viewing of the LipidTrendSE object returned by
analyzeLipidRegion():
result() – Returns the result data frame.even_chain_result() – Returns the even-chain result
data frame.odd_chain_result() – Returns the odd-chain result data
frame.show() – Displays a summary of the
LipidTrendSE object.Notes:
- If split_chain=TRUE, use even_chain_result()
and odd_chain_result() to view the results separately.
Otherwise, use result().
- To extract
assay, rowData, or colData from
the LipidTrendSE object, use functions from the SummarizedExperiment
package.
The result table contains the following columns:
rowData, such as chain length
or double bond count. Column names vary depending on the input
dataset.#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] LipidTrend_1.1.0 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 generics_0.1.4
#> [3] SparseArray_1.11.1 robustbase_0.99-6
#> [5] lattice_0.22-7 digest_0.6.37
#> [7] magrittr_2.0.4 MKmisc_1.9
#> [9] evaluate_1.0.5 grid_4.5.1
#> [11] RColorBrewer_1.1-3 fastmap_1.2.0
#> [13] Matrix_1.7-4 jsonlite_2.0.0
#> [15] ggnewscale_0.5.2 limma_3.67.0
#> [17] BiocManager_1.30.26 scales_1.4.0
#> [19] jquerylib_0.1.4 abind_1.4-8
#> [21] cli_3.6.5 rlang_1.1.6
#> [23] XVector_0.51.0 Biobase_2.71.0
#> [25] withr_3.0.2 DelayedArray_0.37.0
#> [27] cachem_1.1.0 yaml_2.3.10
#> [29] S4Arrays_1.11.0 tools_4.5.1
#> [31] dplyr_1.1.4 ggplot2_4.0.0
#> [33] matrixTests_0.2.3.1 SummarizedExperiment_1.41.0
#> [35] BiocGenerics_0.57.0 buildtools_1.0.0
#> [37] vctrs_0.6.5 R6_2.6.1
#> [39] matrixStats_1.5.0 stats4_4.5.1
#> [41] lifecycle_1.0.4 Seqinfo_1.1.0
#> [43] S4Vectors_0.49.0 IRanges_2.45.0
#> [45] pkgconfig_2.0.3 pillar_1.11.1
#> [47] bslib_0.9.0 gtable_0.3.6
#> [49] glue_1.8.0 statmod_1.5.1
#> [51] DEoptimR_1.1-4 xfun_0.54
#> [53] tibble_3.3.0 GenomicRanges_1.63.0
#> [55] tidyselect_1.2.1 sys_3.4.3
#> [57] MatrixGenerics_1.23.0 knitr_1.50
#> [59] farver_2.1.2 htmltools_0.5.8.1
#> [61] labeling_0.4.3 rmarkdown_2.30
#> [63] maketools_1.3.2 compiler_4.5.1
#> [65] S7_0.2.0