---
title: "Analyzing Lipid Feature Tendencies with LipidTrend"
package: LipidTrend
output:
  BiocStyle::html_document:
    toc: true
    toc_depth: 3
    number_sections: true
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{LipidTrend}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r style, echo=FALSE, results='asis'}
BiocStyle::markdown(css.files = c('custom.css'))
knitr::opts_chunk$set(collapse=TRUE, comment="#>")
```

# Introduction
`LipidTrend` is an R package designed to identify statistically significant 
differences in lipidomic feature-level trends between groups. It supports both 
one-dimensional and two-dimensional analyses of continuous lipid features (e.g.,
chain length, double bond count).

The package includes three main functions:

1. **analyzeLipidRegion()** – Performs the core statistical analysis to detect 
lipid feature trends using Gaussian kernel smoothing.
2. **plotRegion1D()** – Visualizes trend analysis results for one-dimensional 
lipid features.
3. **plotRegion2D()** – Visualizes trend analysis results for two-dimensional 
lipid features.

In addition to these core functions, several helper functions are available to 
facilitate the exploration and extraction of results from the returned 
`LipidTrendSE` object.

For more details, please refer to the [Helper Functions section](#helper).


# Installation 
To install `LipidTrend`, ensure that you have R 4.5.0 or later installed 
(see the R Project at [http://www.r-project.org](http://www.r-project.org)) 
and are familiar with its usage.

`LipidTrend` package is available on Bioconductor repository
[http://www.bioconductor.org](http://www.bioconductor.org). 
Before installing `LipidTrend`, you must first install the core Bioconductor 
packages. If you have already installed them, you can skip the following step.
```{r install_Bioconductor, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install()
```

Once the core Bioconductor packages are installed, you can proceed with 
installing `LipidTrend`.
```{r install_package, eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("LipidTrend")
``` 

After the installation is complete, you’re ready to start using `LipidTrend`.
Now, let’s load the package first.
```{r load, message=FALSE}
library(LipidTrend)
```

# Data preparation
`LipidTrend` requires a SummarizedExperiment object as input data. It must 
contain the following components:

1. **Assay**: A numeric matrix representing lipid abundance values, 
where each row corresponds to a lipid species and each column to a sample. 
Please ensure the values meet the following requirements:
    
    * Preprocessing required: Before using `LipidTrend`, the abundance data 
    must undergo preprocessing to address missing or noisy values. This 
    preprocessing should include filtering, imputation, and normalization.
    * No transformation needed: Abundance values should not undergo any 
    transformations (e.g., log or square root transformations) before analysis.
    * Value constraints: Zero values are acceptable; however, missing values 
    (NA) are not supported and must be handled during preprocessing.
    
2. **RowData**: A data frame containing lipid feature information 
(e.g., double bond count, chain length, or other continuous variables), 
where each row corresponds to a lipid species and each column to a 
specific lipid feature.

    * The number and order of rows must exactly match those in the abundance 
    matrix (Assay).
    * This component must include either one or two numeric feature columns:
        * One column enables one-dimensional trend analysis.
        * Two columns enable two-dimensional trend analysis.

3. **ColData**: A data frame containing metadata for each sample.

    * Each row must correspond to a sample, matching the column order of 
    the Assay matrix.
    * The columns are required to be arranged accordingly:
        * `sample_name`: A unique identifier for each sample.
        * `label_name`: A display name used for plotting or grouping.
        * `group`: The experimental condition or biological group associated 
        with each sample.

If you are already familiar with constructing a SummarizedExperiment object, 
you can skip the following section. Otherwise, refer to the example in the rest 
of this section to learn how to build a SummarizedExperiment object.

## Abundance data
The abundance data is a matrix containing lipid abundance values across lipids 
and samples, where rows represent lipids and columns represent samples.
```{r abundance}
# load example abundance data
data("abundance_2D")

# view abundance data
head(abundance_2D, 5)
```

## Lipid characteristic table
The lipid characteristic table is a data frame containing information about 
each lipid's characteristics, such as the number of double bonds and chain 
length. 
The order of the lipids in this table must align with the abundance data.

A one-dimensional analysis will be conducted if the table has only one column, 
and a two-dimensional analysis will be performed if it contains two columns. 
The table can only have a maximum of two columns.

In this example, we use data suitable for a two-dimensional analysis.
```{r char_table}
# load example lipid characteristic table (2D)
data("char_table_2D")

# view lipid characteristic table
head(char_table_2D, 5)
```

## Group information table
The group information table is a data frame containing grouping details 
corresponding to the samples in the lipid abundance data. It must adhere to the 
following requirements:

1. The columns must be arranged in the following order: `sample_name`, 
`label_name`, and `group`.
2. All sample names must be unique.
3. The sample names in the `sample_name` column must match those in the lipid 
abundance data.
4. The columns `sample_name`, `label_name`, and `group` must not contain 
missing values (NA).

For example:
```{r groupInfo}
# load example group information table
data("group_info")

# view group information table
group_info
```

## Constructing SummarizedExperiment object
Once the abundance data, lipid characteristic table, and group information 
table are prepared, we can construct the input SummarizedExperiment object. 
We will use the `SummarizedExperiment` function from 
`r Biocpkg("SummarizedExperiment")`. 

Follow the command below to create this object.
```{r build_se}
se_2D <- SummarizedExperiment::SummarizedExperiment(
    assays=list(abundance=abundance_2D),
    rowData=S4Vectors::DataFrame(char_table_2D),
    colData=S4Vectors::DataFrame(group_info))
```

# Initiate `LipidTrend` workflow

The `LipidTrend` workflow starts with a SummarizedExperiment object as input. 
It supports both one-dimensional (1D) and two-dimensional (2D) lipid 
features analyses.

Based on the number of feature columns provided in the `rowData` (e.g., 
chain length, double bond count), the function automatically performs 
either 1D or 2D trend detection.

After statistical computation and visualization, the workflow returns:

* A result table summarizing statistical significance, fold change, and trend 
direction.
* One or more plots highlighting significant lipid regions with group-specific 
abundance trends.

This streamlined workflow enables researchers to identify structured lipidomic 
patterns across feature dimensions with statistical rigor and biological 
interpretability.

We recommend using the `set.seed()` function before starting to ensure 
stability in the permutation process during computation.

```{r set_seed}
set.seed(1234)
```

## One-dimensional analysis {#lipid1d}

One-dimensional analysis is applied when the input dataset contains a single 
continuous lipid feature, such as chain length or double bond count. 
This approach is ideal when the biological question centers on one specific 
biochemical property of lipids or when only one type of feature annotation 
is available.

Compared to two-dimensional analysis, the one-dimensional approach is more 
straightforward to interpret and requires less data completeness. It is 
particularly suitable in the following scenarios:

* The study focuses on a specific lipid characteristic (e.g., elongation of 
chain length or degree of desaturation).
* Only one lipid feature (e.g., double bond count) is annotated or reliably 
available in the dataset.
* Visualizing lipid trends along a single axis sufficiently addresses the 
biological hypothesis.

To begin, we will first examine the structure of the example input data to 
ensure it is correctly formatted for one-dimensional analysis.

```{r LipidTrend_1D}
# load example data
data("lipid_se_CL")

# quick look of SE structure
show(lipid_se_CL)
```

### Analyze lipid region
**Overview of Region-Based Trend Analysis**

The `analyzeLipidRegion()` function performs region-based statistical analysis 
of lipidomic features, integrating both marginal testing and permutation-based 
smoothed testing to identify meaningful trends across continuous lipid features 
(e.g., chain length, double bond).

This two-stage approach enhances statistical power and robustness, especially in
small-sample datasets:

1. **Marginal Test**: Each lipid feature is first tested individually using 
either a t-test (with glog10 transformation) or a Wilcoxon test. This step 
yields a marginal statistic and a corresponding marginal p-value for each 
feature.

2. **Region-Based Permutation Test with Smoothing**: The resulting vector of 
marginal statistics is then smoothed using a Gaussian kernel, which integrates 
information from neighboring lipid features based on their similarity (e.g., 
proximity in chain length). A smoothed statistic is computed for each lipid, 
and an empirical p-value is derived via permutation testing by comparing the 
observed statistic to a null distribution.

**Note:**  
- If `test=t.test` (default), abundance values are internally transformed using 
glog10 transformation before testing. <br>
- If `test=Wilcoxon` test, due to its higher computational cost during repeated 
permutations, it is recommended to set permute_time to fewer than 10,000 
to maintain a reasonable runtime.<br>
<br>

**Split-Chain Analysis for Chain Length Features**

To enhance the biological interpretability of chain length–related trends, 
`split_chain` provides an option to analyze even-chain and odd-chain lipids 
via the `split_chain` parameter separately.

Enabling this option allows the function to separate lipids based on chain 
length parity (even vs. odd) and perform region-based statistical testing 
independently for each group. This is particularly beneficial when distinct 
biosynthetic or regulatory patterns are expected between even- 
and odd-chain lipid species.

To activate this feature, configure the `split_chain` and `chain_col`
parameters as follows:

* If `split_chain=TRUE`:
    * The function performs separate analyses for even- and odd-chain lipids.
    * Specify the column name containing chain length information using 
    the `chain_col` parameter.
* If `split_chain=FALSE` (default):
    * All lipids will be analyzed together without parity distinction.
    * Set `chain_col=NULL`.
    
**Recommendation:**
Set `split_chain=TRUE` when analyzing features such as chain length, as it 
often leads to more meaningful biological insights.

**Note:**  
- If fewer than two lipids are present in either the even or odd group, 
analysis for that group will be skipped, and a warning will be issued.<br>
<br>

**Abundance-Weighted vs. Unweighted Statistics**

To reflect the biological importance of more abundant lipids, we provide an 
option controlled by the `abund_weight` parameter, which allows for weighting 
region statistics based on the average abundance of each lipid species.

* When `abund_weight=TRUE` (default), the marginal test statistic is scaled by 
each lipid’s normalized average abundance during the smoothing step. This 
emphasizes biologically dominant signals while down-weighting low-abundance, 
potentially noisy lipids.
* When `abund_weight=FALSE`, all lipids are treated equally, regardless of 
their abundance. In this case, the region statistic is calculated solely based 
on test statistics and feature similarity.

This flexibility allows users to choose between:

* A weighted strategy that highlights strong, dominant lipid shifts 
(recommended for most biological datasets).
* An unweighted strategy that gives equal weight to all lipids, which may be 
suitable for hypothesis-driven or targeted studies.

**Note:**  
- Abundance weighting is applied only during the smoothing and permutation 
steps; it does not affect the initial marginal testing.

```{r countRegion1D}
# run analyzeLipidRegion
res1D <- analyzeLipidRegion(
    lipid_se_CL, ref_group="sgCtrl", split_chain=TRUE, 
    chain_col="chain", radius=3, own_contri=0.5, test="t.test", 
    abund_weight=TRUE, permute_time=1000)

# view result summary 
show(res1D)
```

The `analyzeLipidRegion()` function produces an extended SummarizedExperiment 
object called `LipidTrendSE`. To facilitate result extraction, we offer several 
helper functions that make viewing the resulting data frame easier. For more 
information on these helper functions, please refer to 
[Helper Functions section](#helper).

```{r Region1D_result}
# view even chain result (first 5 lines)
head(even_chain_result(res1D), 5)

# view odd chain result (first 5 lines)
head(odd_chain_result(res1D), 5)
```

### Result visualization
This section demonstrates how to visualize the results from the `LipidTrendSE` 
object returned by the `analyzeLipidRegion()` function.

**Note:**  
- If `split_chain=TRUE` was set in `analyzeLipidRegion()`, two separate plots 
will be generated: one for even-chain lipids, and one for odd-chain lipids.<br>
- If `split_chain=FALSE`, only a single plot will be returned, showing all 
lipids together.


```{r plotRegion1D}
# plot result
plots <- plotRegion1D(res1D, p_cutoff=0.05, y_scale='identity')

# even chain result
plots$even_result
# odd chain result
plots$odd_result
```

The visualization illustrates lipid trends and includes the following 
components:

1. **Blue/Red Ribbons** – Highlight regions where the trend significantly 
differs between groups, as determined by the smoothed permutation test.
2. **Points** – Represent the mean abundance within each group (case vs. 
control) for each lipid feature.

**Color Interpretation:**

* Red regions indicate a significant increase in lipid abundance in the case 
group compared to the control group.
* Blue regions indicate a significant decrease in lipid abundance in the case 
group compared to the control group.

These visualizations highlight not only the magnitude of lipid abundance 
changes but also the specific feature-level regions (e.g., chain length) 
where group differences are most pronounced.


## Two-dimensional analysis {#lipid2d}

Two-dimensional analysis is applicable when the input dataset includes two 
continuous lipid features, such as chain length, double bond count, or other 
numeric lipid characteristics. Compared to one-dimensional analysis, 
2D analysis enables the detection of more complex patterns by simultaneously 
evaluating lipid trends across two biochemical axes.

This method is particularly well-suited for the following scenarios:

* The dataset contains two well-annotated lipid features suitable for trend 
analysis (e.g., chain length, double bond, or other structural indices).
* The biological question involves joint effects or interactions between 
lipid properties—such as the enrichment of long-chain polyunsaturated species.
* The objective is to identify localized regions within the 2D feature space 
that contribute to group differences.
* There is a need for more detailed and spatial visualization of 
lipid trend patterns.

Two-dimensional analysis provides a high-resolution map of lipid changes, 
allowing for the identification of specific combinations of features (e.g., 
long-chain saturated vs. short-chain unsaturated lipids) that may i
ndicate pathway-level regulation.

Let’s now take a quick look at the structure of the example input data 
used for 2D analysis.


```{r LipidTrend_2D}
# load example data
data("lipid_se_2D")

# quick look of SE structure
show(lipid_se_2D)
```

### Analyze lipid region
**Overview of Region-Based Trend Analysis**

The `analyzeLipidRegion()` function performs region-based statistical analysis 
of lipidomic features, integrating both marginal testing and permutation-based 
smoothed testing to identify meaningful trends across continuous lipid features 
(e.g., chain length, double bond).

This two-stage approach enhances statistical power and robustness, especially in
small-sample datasets:

1. **Marginal Test**: Each lipid feature is first tested individually using 
either a t-test (with glog10 transformation) or a Wilcoxon test. This step 
yields a marginal statistic and a corresponding marginal p-value for 
each feature.

2. **Region-Based Permutation Test with Smoothing**: The resulting vector of 
marginal statistics is then smoothed using a Gaussian kernel, which integrates 
information from neighboring lipid features based on their similarity (e.g., 
proximity in chain length). A smoothed statistic is computed for each lipid, 
and an empirical p-value is derived via permutation testing by comparing the 
observed statistic to a null distribution.

**Note:**  
- If `test=t.test` (default), abundance values are internally transformed using 
glog10 transformation before testing. <br>
- If `test=Wilcoxon` test, due to its higher computational cost during repeated 
permutations, it is recommended to set permute_time to fewer than 10,000 
to maintain a reasonable runtime.<br>
<br>

**Split-Chain Analysis for Chain Length Features**

To enhance the biological interpretability of chain length–related trends, 
`split_chain` provides an option to analyze even-chain and odd-chain lipids 
via the `split_chain` parameter separately.

Enabling this option allows the function to separate lipids based on chain 
length parity (even vs. odd) and perform region-based statistical testing 
independently for each group. This is particularly beneficial when distinct 
biosynthetic or regulatory patterns are expected between even- 
and odd-chain lipid species.

To activate this feature, configure the `split_chain` and `chain_col`
parameters as follows:

* If `split_chain=TRUE`:
    * The function performs separate analyses for even- and odd-chain lipids.
    * Specify the column name containing chain length information using 
    the `chain_col` parameter.
* If `split_chain=FALSE` (default):
    * All lipids will be analyzed together without parity distinction.
    * Set `chain_col=NULL`.
    
**Recommendation:**
Set `split_chain=TRUE` when analyzing features such as chain length, as it 
often leads to more meaningful biological insights.

**Note:**  
- If fewer than two lipids are present in either the even or odd group, 
analysis for that group will be skipped, and a warning will be issued.<br>
<br>

**Abundance-Weighted vs. Unweighted Statistics**

To reflect the biological importance of more abundant lipids, we provide an 
option controlled by the `abund_weight` parameter, which allows for weighting 
region statistics based on the average abundance of each lipid species.

* When `abund_weight=TRUE` (default), the marginal test statistic is scaled by 
each lipid’s normalized average abundance during the smoothing step. This 
emphasizes biologically dominant signals while down-weighting low-abundance, 
potentially noisy lipids.
* When `abund_weight=FALSE`, all lipids are treated equally, regardless of 
their abundance. In this case, the region statistic is calculated solely based 
on test statistics and feature similarity.

This flexibility allows users to choose between:

* A weighted strategy that highlights strong, dominant lipid shifts 
(recommended for most biological datasets).
* An unweighted strategy that gives equal weight to all lipids, which may be 
suitable for hypothesis-driven or targeted studies.

**Note:**  
- Abundance weighting is applied only during the smoothing and permutation 
steps; it does not affect the initial marginal testing.


```{r countRegion2D}
# run analyzeLipidRegion
res2D <- analyzeLipidRegion(
    lipid_se_2D, ref_group="sgCtrl", split_chain=TRUE, 
    chain_col="Total.C", radius=3, own_contri=0.5, test="t.test", 
    abund_weight=TRUE, permute_time=1000)

# view result summary 
show(res2D)
```

The `analyzeLipidRegion()` function produces an extended SummarizedExperiment 
object called `LipidTrendSE`. To facilitate result extraction, we offer several 
helper functions that make viewing the resulting data frame easier. For more 
information on these helper functions, please refer to 
[Helper Functions section](#helper).

```{r Region2D_result}
# view even chain result (first 5 lines)
head(even_chain_result(res2D), 5)
# view odd chain result (first 5 lines)
head(odd_chain_result(res2D), 5)
```

### Result visualization
This section demonstrates how to visualize the results from the `LipidTrendSE` 
object returned by the `analyzeLipidRegion()` function.

**Note:**  
- If `split_chain=TRUE` was set in `analyzeLipidRegion()`, two separate plots 
will be generated: one for even-chain lipids, and one for odd-chain lipids.<br>
- If `split_chain=FALSE`, only a single plot will be returned, showing all 
lipids together.

```{r LipidTrend_2D_plot}
# plot result
plot2D <- plotRegion2D(res2D, p_cutoff=0.05)

# even chain result
plot2D$even_result
# odd chain result
plot2D$odd_result
```

This plot visualizes two-dimensional lipid features, highlighting regional 
trends and significant differences between case and control groups.

The X- and Y-axes represent two continuous lipid characteristics (e.g., total 
chain length and double bond count). Each point corresponds to a lipid, with 
the color indicating its log₂ fold-change (log₂FC) between case and 
control groups:
* Red points indicate higher abundance in case samples.
* Blue points indicate higher abundance in control samples.

If `abund_weight=TRUE`, point size reflects the mean abundance of each lipid. 
<br>
If `abund_weight=FALSE`, all points are displayed with equal size.

Asterisks (\*, \*\*, ***) denote levels of statistical significance based on 
the marginal test, with more asterisks representing stronger evidence.

Colored outlines (red or blue) represent significant regions identified by 
the smoothed region-based permutation test, which incorporates information 
from neighboring lipids with similar feature values:
* Red regions indicate a significant trend of increasing abundance 
in case samples.
* Blue regions indicate a significant trend of decreasing abundance 
in case samples.

This visualization enables detection of both individual lipid-level changes 
and region-level patterns, providing biologically meaningful trends 
across two lipid features.


# Helper Functions – enhancing result viewing {#helper}
## View Results
`LipidTrend` provides 4 helper functions to enhance the viewing of the 
`LipidTrendSE` object returned by `analyzeLipidRegion()`:

1. `result()` – Returns the result data frame.
2. `even_chain_result()` – Returns the even-chain result data frame.
3. `odd_chain_result()` – Returns the odd-chain result data frame.
4. `show()` – Displays a summary of the `LipidTrendSE` object.

**Notes:**  
- If `split_chain=TRUE`, use `even_chain_result()` and `odd_chain_result()` to 
view the results separately. Otherwise, use `result()`.<br>
- To extract `assay`, `rowData`, or `colData` from the `LipidTrendSE` object, 
use functions from the `r Biocpkg("SummarizedExperiment")` package.

## Interpreting the Result Table
The result table contains the following columns:

* **Feature columns (Total.C, Total.DB, etc.)**: Lipid feature values from the 
input `rowData`, such as chain length or double bond count. Column names vary 
depending on the input dataset.
* **avg.abund**: Mean abundance of each lipid across all samples. For 
one-dimensional analysis, may also include **avg.abund.ctrl** and 
**avg.abund.case** for group-wise means.
* **direction**: Indicates the sign of the smoothed statistic:
    * "+" signifies an increasing trend in the case group.<br>
    * "–" signifies a decreasing trend in the case group.
* **smoothing.pval.BH**: Benjamini–Hochberg adjusted p-value from the 
region-based permutation test.
* **marginal.pval.BH**: Benjamini–Hochberg adjusted p-value from the marginal 
test (per lipid).
* **log2.FC**: Log₂ fold-change in lipid abundance between case and 
control groups.
* **significance**: Overall significance label based on the smoothed test and 
FC direction:
    * "Increase" – significant positive trend in the case group
    * "Decrease" – significant negative trend in the case group
    * "NS" – not significant

# Session info
```{r sessionInfo, echo=FALSE}
sessionInfo()
```