SynLethDB is a comprehensive knowledge base of synthetic lethality. It contains the SL pairs collected from manual curations from literature, four other related databases (Syn-lethality, Decipher, GenomeRNAi), predictions by DAISY and text mining results. Current version of SynLethDB covers human and four model species, including mouse, fruit fly, worm and yeast.
Fig.1 Schematic diagram of the data resources, functional modules and graphical visualization components included in SynLethDB
SynLethDB also contains genomic data (expression profiles, mutations and Copy Number Alterations from COSMIC), high-quality drug-target interactions (built from KIBA, DrugBank and STITCH), and three drug sensitivity datasets (CCLE, GDSC and NCI-60) for more than 1,000 cancer cell lines. Based on the integrated data, six functional modules have been developed to explore the data resources, including query and filtering, calculating integrative confidence scores, search for orthologous genes, gene set enrichment analysis, drug-SL partner interaction query and statistical analysis of drug sensitivity. Also, a user-friendly web interface, including an interactive network and tabular viewer, statistical diagrams and graphical visualization plugins, has been implemented to facilitate data analysis and interpretation. The schematic diagram shown in Fig.1 describes the system architecture, data sources and functional modules of SynLethDB.
`
Users can input one or more genes (HGNC symbols and Entrez gene IDs are supported at present) to search for SL pairs of a species of interest. The SL gene pairs are displayed in the form of network and tabular viewers. To provide users with a biological context, the SL network also shows the SL pairs between the partners associated with the query genes. In the network viewer, the width of an edge is proportional to the integrative confidence score of the corresponding SL pair. Also, users can filter the SL network by changing the thresholds of confidence score and SL number to be displayed via the toolbar below the network viewer, as shown in Fig.2. A click on a gene node will trigger a popup window that displays a concise description of the gene and hyperlinks to public resources such as UniProt, Ensembl and NCBI GenBank. Clicking an edge will display the evidence sources, supportive references and confidence score. On the right side of the SL network, some statistics are shown, including the number of references supporting the SL pairs shown in the network, percentages of the evidence sources (pie chart), distribution curve of the confidence scores (line chart). In addition, user can launch the gene set enrichment analysis based on the set of genes associated with the query genes, or download the SL pairs, by clicking the buttons located on the bottom right corner.
Fig.2 SL Network associated with Fen1 on human, and statistics of related reference, percentages of the evidence sources and distribution of confidence score
In the tabular viewer, the detailed information of each SL pair, including species, diseases, integrative confidence score and PubMed IDs of supportive references, etc. are shown. If users are interested in the details of individual scores and evidence sources, a click on the evidence source will guide the user to the individual evidence source webpage. As illustrated in Fig. 3, evidence source, experimental methods, related reference PMIDs, diseases and individual scores are shown. Furthermore, with the ranking function of the tabular viewer, users can pick up high-confidence SL pairs according to the integrative confidence scores, as shown in Fig.4. Also, one can input a keyword into the textbox on the top right to filter the SL pairs. In the last column of the tabular viewer, we provide the calling interfaces to the functional modules of search for orthologs and statistical analysis of drug sensitivity for each SL.
Fig.4 Details of the SL pairs associated to FEN1 on human, ranked by the integrative confidence scores in descending order.
In the ortholog page, the orthologous genes of the query SL in other four species is firstly summarized. Also, the detailed information of the orthologs identified by four leading computational methods, i.e., InParanoid, NCBI HomoloGene, Ensembl Compara and PhylomeDB, is shown below. Fig. 5 shows the orthologs of SL (CHAF1B, FEN1) in human. On the right, a float navigation menu can guide the user to the detailed information of the genes and orthologs of individual species.
Fig.5 Orthologs of the SL ATM and EGFR (human) in other four species identified by InParanoid, NCBI HomoloGene, Ensembl Compara and PhylomeDB.
To conduct a preliminary evaluation of the SL partners as anticancer drug targets, we developed a statistical analysis module to estimate the druggability and efficacy of each SL, based on the collected genomic data (from COSMIC), drug-target interactions (from KIBA, DrugBank and STITCH), and three large-scale drug sensitivity datasets (CCLE, GDSC and NCI-60). Mathematically, let A and B be the SL to be analysed, a Wilcoxon rank sum test was conducted to examine if inhibiting gene B by drugs yields significant drug sensitivity levels in samples in which gene A is inactive (overactive) than in the rest of the samples, as shown in Fig.6.
Fig.6 Workflow of the statistical analysis based on drug sensitivity dataset
user can click the functional interface located on the tabular viewer to launch the statistical analysis for a specific SL pair. The analysis results are represented by a variety of graphical visualization plugins, as shown in Fig.7-9.
Fig.7 Box plots of the statistical analysis result.
Fig.8 Bubble chart of the data points of the drug sensitivity values.
Fig.9 Scatter plots of the data points of the drug sensitivity values.
For an SL, user can retrieve the drugs that target each single gene by clicking the links named ˇ°drug-target interactionsˇ± in the popup window of network viewer. As shown in Fig 10, for each drug, the cross-links to PubChem, ChEMBL and DruBank, and the interaction scores corresponding to KIBA, STITCH and DrugBank are shown. Also, user can filter and rank the drugs via the tabular viewer.
Fig.10 Drugs target to the SL pair AMIGO2 and EGFR.
As gene set enrichment analysis (GSEA) is helpful for understanding the biological mechanism of SL genes, we thus have carried out gene set enrichment analysis to find statistically significant pathways and GO functional annotation terms, based on the set of genes constituting SL relationships with each specific gene. For statistically significant pathways and functional annotation terms, links to external databases, such as KEGG, Reactome and Gene Ontology, are provided. Fig. 11 and Fig.12 show the results of GO enrichment and pathway analysis based on the SL partners associated to EGFR, respectively.
Fig.10 GO annotation enrichment analysis based on the SL partners associated to EGFR.
Fig.10 Pathway analysis based on the SL partners associated to EGFR.
All datasets in our database can be freely download at here.