---
title: "gVenn: Proportional Venn diagrams for genomic regions and gene set overlaps"
author: "Christophe Tav"
date: "August 2025"
output:
html_document:
toc: true
toc_depth: 3
number_sections: false
vignette: >
%\VignetteIndexEntry{gVenn}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, echo=FALSE, out.width="20%", fig.align="center"}
knitr::include_graphics("figures/20250827_hex_gVenn_v1.png")
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
suppressWarnings(library(GenomicRanges))
```
# Introduction
**gVenn** stands for **gene/genomic Venn**.
It provides tools to compute overlaps between genomic regions or sets of genes
and visualize them as **Venn** diagrams with areas proportional to the number of
overlapping elements. In addition, the package can generate **UpSet** plots for
cases with many sets, offering a clear alternative to complex Venn diagrams.
With seamless support for `GRanges` and `GRangesList` objects, **gVenn**
integrates naturally into Bioconductor workflows such as ChIP-seq, ATAC-seq,
or other interval-based analyses.
Overlap groups can be easily extracted for further analysis, such as motif
enrichment, transcription factor binding enrichment, or gene annotation.
**gVenn** package produces clean, publication-ready figures.
```{r, echo=FALSE, out.width="100%", fig.align="center"}
knitr::include_graphics("figures/20250827_graphical_abstract_v2.png")
```
# Example workflow
This section demonstrates a typical workflow with gVenn, from computing overlaps to generating clean, publication-ready figures. The examples show how to work with genomic interval data.
We start by loading the package:
```{r setup}
library(gVenn)
```
## 1. Load example ChIP-seq genomic regions
We use the dataset **`a549_chipseq_dataset`**, which contains example consensus
peak subsets for **MED1**, **BRD4**, and **GR** after dexamethasone treatment
in A549 cells. To keep the dataset small and suitable for examples and tests,
each set has been restricted to peaks located on *chromosome 7*.
These data originate from Tav *et al.* (2023)
doi:10.3389/fgene.2023.1237092.
```{r, load_chip_dataset}
# Load the example A549 ChIP-seq peaks (subset on chr7 for demo)
data(a549_chipseq_peaks)
```
## 2. Compute overlaps between genomic regions
We compute overlaps between the ChIP-seq peak sets using `computeOverlaps()`:
```{r compute_overlaps}
genomic_overlaps <- computeOverlaps(a549_chipseq_peaks)
```
The result is a structured object that contains
- A GRanges object, where each region includes metadata describing its
overlap pattern across the input sets.
- An associated count matrix (or data frame) summarizing the number of
regions in each intersection.
## 3. Visualization
### Venn diagram
`plotVenn()` draws proportional Venn diagrams from the overlap object.
```{r, plot_venn, fig.width=5, fig.height=3, fig.align='center'}
plotVenn(genomic_overlaps)
```
### UpSet plot
For more than **three sets**, a Venn diagram with **areas exactly proportional**
to all intersections is **generally not mathematically attainable**. Solvers
(like those used by `eulerr`) provide **best-effort approximations**, but the
layout can become hard to read. In these cases, an **UpSet plot** is the
recommended visualization because it scales cleanly to many sets and preserves
intersection sizes precisely on bar axes.
We therefore suggest using `plotUpSet()` when you have **> 3 sets** (or any
time the Venn becomes visually crowded).
```{r, plot_upset, fig.width=5, fig.height=3, fig.align='center'}
plotUpSet(genomic_overlaps)
```
### Export visualization
You can export any visualization using `saveViz()`:
```{r save_plot, eval=FALSE}
venn <- plotVenn(genomic_overlaps)
saveViz(venn,
output_dir = ".",
output_file = "figure_gVenn",
format = "pdf")
```
By default, files are written to the current directory (".").
If you enabled the date option (today), the current date will be prepended
to the filename.
You can also export to PNG or SVG:
```{r save_plot2, eval=FALSE}
saveViz(venn,
output_dir = ".",
output_file = "figure_gVenn",
format = "png")
saveViz(venn,
output_dir = ".",
output_file = "figure_gVenn",
format = "svg")
```
## 4. Extract elements per overlap group
```{r, extractOverlaps_example1}
groups <- extractOverlaps(genomic_overlaps)
```
```{r, extractOverlaps_example2}
# Display the number of genomic regions per overlap group
sapply(groups, length)
```
#### Overlap group naming
When overlaps are computed, each group of elements or genomic regions is
labeled with a binary code that indicates which sets the element belongs to.
- Each digit in the code corresponds to one input set (e.g., A, B, C).
- A 1 means the element is present in that set, while 0 means absent.
- The group names in the output are prefixed with "group_" for clarity.