element;intro
#welcome; Welcome to the <b>Distance Scores</b> tour for <code>GeDi</code>.
#Step1; This tour will take you through the relevant elements of the UI. You can start the tour in each section by clicking on the dedicated button. You can (re)start the tour any time by clicking the dedicated button. The guided introduction can be exited any time by clicking outside of the highlighted area.
#sidebar; If you have not yet provided your data, please use the sidebar to navigate to the <b>Data Input</b> panel and complete these steps.
#distance_calc_box; Once you have input your data, you can calculate the distance between the individual genesets in your data. For this purpose, several different distance metrics are available.
#distance_calc_box; The protein-protein-interaction (PPI) weighted Meet-Min (<b>pMM</b>) score is an alternate version of the Meet-Min distance which has been adjusted to biological data. The original Meet-Min distance is weighted by the sum of the Protein-Protein interactions of the geneset members. The score is defined as: <math display="block"><mi>pMM</mi><mo>=</mo><mi>min(pMM(A->B)</mi><mo>,</mo><mi>pMM(B->A))</mi></math> where: <math display="block"><mi>pMM(X->Y)</mi><mo>=</mo><mi>1</mi><mo>-</mo><mfrac><mrow><mi>|X</mi><mo>&#8745</mo><mn>Y|</mn></mrow><mrow><mn>min(|X|</mn><mo>,</mo><mi>|Y|)</mi></mrow></mfrac><mi><mo>-</mo></mi><mfrac><mrow><mi>&#945</mi></mrow><mrow><mn>min(|X|</mn><mo>,</mo><mi>|Y|)</mi></mrow></mfrac><mrow><mi>&#8721 <sub>x&#8946X-Y</sub></mi><mfrac><mrow><mi>w &#8721 <sub>y&#8946X&#8745Y</sub> PPI(x, y)</mi><mo>+</mo><mi>&#8721 <sub>y&#8946Y-X</sub> PPI(x, y)</mi></mrow><mrow><mn>max(PPI)</mn><mo>*</mo><mi>(w*|X&#8745Y| + |Y - X|)</mi></mrow></mfrac></mrow></math> and: <math display="block"><mi>w</mi><mo>=</mo><mrow><mfrac><mrow><mi>min(|X|, |Y|)</mi></mrow><mrow><mn>|X|</mn><mo>+</mo><mi>|Y|</mi></mrow></mfrac></mrow></math> <mi>&#945</mi> is a scaling factor between 0 and 1. The corresponding PPI matrix can be downloaded in the <b>Data Input</b> panel. The exact definition of the PPI-weighted Meet-Min distance can be found in the <a href="https://doi.org/10.1186/s12864-019-5738-6">paper</a> by Yoon et al..
#distance_calc_box; The <b>Kappa</b> distance ist a set based distance which is based on the observed and expected agreement rates of two genesets. It is defined as: <math display="block"><mi>Kappa</mi><mo>=</mo><mi>1</mi><mo>-</mo><mfrac><mrow><mi>O</mi><mo>-</mo><mn>E</mn></mrow><mrow><mn>1</mn><mo>-</mo><mi>E</mi></mrow></mfrac></math> where: <math display="block"><mi>O</mi><mo>=</mo><mfrac><mrow><mi>|A&#8745B|</mi><mo>+</mo><mn>|(A&#8746B)<sup>c</sup>|</mn></mrow><mrow><mi>U</mi></mrow></mfrac></math> and: <math display="block"><mi>E</mi><mo>=</mo><mfrac><mrow><mi>|A|*|B|</mi><mo>+</mo><mn>|A<sup>c</sup>|*|B<sup>c</sup>|</mn></mrow><mrow><mi>U<sup>2</sup></mi></mrow></mfrac></math> and U is the set of all unique genes in the data. In this application the Kappa distance is additionally normalized to the (0, 1) interval to make it comparable to the remaining distance metrics.
#distance_calc_box; The <b>Jaccard</b> distance is based on the Jaccard coefficient which is defined as: <math display="block"><mi>Jaccard</mi><mo>=</mo><mi>1</mi><mo>-</mo><mfrac><mrow><mi>|A</mi><mo>&#8745</mo><mn>B|</mn></mrow><mrow><mn>|A</mn><mo>&#8746</mo><mi>B|</mi></mrow></mfrac></math> This is based on the Jaccard similarity which is then substracted from 1 to transform the similarity to a distance metric. The distance is solely based on set comparisons.
#distance_calc_box; Another available distance metric upon the choices in <code>GeDi</code> is a transformation of the overlap coefficient which is called the <b>Meet-Min</b> (MM) distance. The overlap coefficient is a similarity measure which is defined as: <math display="block"><mi>OC</mi><mo>=</mo><mfrac><mrow><mi>|A</mi><mo>&#8745</mo><mn>B|</mn></mrow><mrow><mn>min(|A|</mn><mo>,</mo><mi>|B|)</mi></mrow></mfrac></math> In order to transform this measure of similarity into a measure of distance, the overlap coefficient is substracted from 1, resulting in the calculation of the Meet-Min (MM) distance as: <math display="block"><mi>MM</mi><mo>=</mo><mi>1</mi><mo>-</mo><mfrac><mrow><mi>|A</mi><mo>&#8745</mo><mn>B|</mn></mrow><mrow><mn>min(|A|</mn><mo>,</mo><mi>|B|)</mi></mrow></mfrac></math> As a solely set based measurement, the Meet-Min distance only takes the composition of the genesets into account but not the underlying biological information inherent in the genesets.
#distance_calc_box; Additionally, there is also the Sorensen-Dice distance metric available. The distance is defined as: <math display="block"><mi>Sorensen-Dice</mi><mo>=</mo><mi>1</mi><mo>-</mo><mfrac><mrow><mi>2 * |A</mi><mo>&#8745</mo><mn>B|</mn></mrow><mrow><mn>|A|</mn><mo>+</mo><mi>|B|</mi></mrow></mfrac></math>
#distance_calc_box;Lastly, there is also the GO distance available, a distance score which leverages the GO similarity measure available in the <a href="https://bioconductor.org/packages/GOSemSim/">GOSemSim</a> package. The similarity measures are substracted from 1 to transform them to a distance metric. In the GeDi app, the score will use the GO BP and the Wang scoring method per default. But if using the function outside of the app, the original choice of the <a href="https://bioconductor.org/packages/GOSemSim/">GOSemSim</a> package are available.
#score_data; After you selected a distance metric via the options on the left, you can start the calculation of the distance scores by clicking on the <b>Score the Genesets</b> button. This will start the calculation of the individual distance scores. However, please take into account that this action can take a considerable amount of time depending on the number of genesets and the selected distance metric. Look out for the progress bar in the lower right corner of the panel to follow the progress of the scoring.
#distance_scores_box; After the calculation of the distance scores, the resulting scores will be visualized in this panel.
#tabsetpanel_scores; In the <b>Distance Scores Heatmap</b> card, the distance scores will be visualized as a heatmap. You have to first start the calculation of the heatmap by clicking the <b>Calculate Distance Score Heatmap</b> button. Again, this computation can take a little time, depending on the number of genesets in your data. After the heatmap has been computed, you can interact with it. Upon hovering over the heatmap, the involved genesets and calculated distance scores will be shown. You can also zoom into the heatmap by selecting an area of the heatmap. If you want to reset the view of the heatmap, you can simply click somewhere in the space outside of the heatmap to reset the view.
#tabsetpanel_scores;In the <b>Distance Scores Dendrogram</b> card, there is a dendrogram of the individual distance scores. The dendrogram is based on hierachical clustering and will iteratively combine the most similar genesets / sets of genesets. Via the drop-down menu on the left you can select different combination methods, which will influence the resulting dendrogram.
#tabsetpanel_scores;The <b>Distance Scores Graph</b> card visualises the distance scores as graph. In the graph, each geneset is a node and genesets with a distance score below a certain threshold are connected by an edge. The default threshold is set to 0.3, but you can adjust it via the slider on the left. The graph is also interactive. You can hover or click on nodes which will highlight the selected node and all connected nodes. You will also receive additional information about the respective geneset through the displayed text. Furthermore, you can follow the link to a database by clicking on the geneset id in the text field. This will provide you more information about your geneset. The database is depending on the id type of your genesets (GO, KEGG, Reactome, etc.).
#tabsetpanel_scores;In this graph, you can also search for specific genesets via the text input on the left. Upon clicking n the input field, a drop-down menu will open. You can either select a genesets from the shown options, or type in the name of the geneset you are looking. The drop down menu will suggest similar genesets that match your typed text. Once you've selected a geneset, it will be highlighted in the graph.
#hub_genes_box; Besides the graph, there is also the <b>Graph metrics</b> box available. This box contains a table with different metrics of the graph shown in this card. The metrics include for each geneset the degree, the betweeness, the harmonic centrality, the clustering coeeficient as well as the input data. With this table, you can get a nice overview of the data and distance score such as which geneset is the most similar to other genesets (i.e. has the highest degree).
#Thanks; Thank you for taking the <b>Distance Scores</b> tour of <code>GeDi</code>!
