Software

AutoRT

Description:AutoRT is a peptide sequence-based RT prediction tool developed using automated deep learning and transfer learning. It can provide high accurate RT prediction with models trained using a small number of peptides (~1000) with transfer learning.
URL: https://github.com/bzhanglab/AutoRT
Reference: Wen, B., Li, K., Zhang, Y. et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun 11, 1759 (2020). Manuscript.

CanProVar

Description: CanProVar is designed to store and display single amino acid alterations including both germline and somatic variations in the human proteome, especially those related to the genesis or development of human cancer based on the published literatures. Cancer-related variations and conrresponding annotations can be queried through the web-interface using Protein IDs in the Ensembl, IPI, RefSeq, and Uniport/Swiss-Prot databases or gene names and Entrez gene IDs. Fasta files with variation information are also available for download.
URLs: http://canprovar.zhang-lab.org/ (version 1); http://canprovar2.zhang-lab.org/ (version 2)
Reference: Jing Li, Dexter T Duncan, Bing Zhang. CanProVar: a human cancer proteome variation database. Hum Mutat,31(3):219-228, 2010

customProDB

Description: customProDB is an R package that enables the easy generation of sample-specific protein databases from RNA-Seq data for proteomics search.
URL: http://www.bioconductor.org/packages/devel/bioc/html/customProDB.html
Reference: Xiaojing Wang, Bing Zhang. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics, 29:3235-3237, 2013.

Customprodbj

Description: Customprodbj is a Java-based tool for customized protein database construction. It can be used to (1) build a customized database based on single VCF file, (2) build a customized database based on multiple VCF files from a sample and (3) build a customized database based on multiple VCF files from multiple samples.
URL: https://github.com/wenbostar/Customprodbj
Reference: Wen, B., Li, K., Zhang, Y. et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun 11, 1759 (2020). Manuscript.

DeepRescore

Description: DeepRescore is a post‐processing tool that combines peptide features derived from deep learning predictions, namely accurate retention time and MS/MS spectra predictions, with previously used features to rescore peptide‐spectrum matches. DeepRescore was implemented using Docker and NextFlow. The current implementation supports four search engines: MS-GF+, Comet, X!Tandem, and MaxQuant. DeepRescore takes PSM identification results from a search engine and MS/MS data in MGF format as input, although the latter is not required for MaxQuant.
URL: https://github.com/bzhanglab/DeepRescore
Reference: Li, K., Jain, A., Malovannaya, A., Wen, B.* and Zhang, B.* (2020), DeepRescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics. Proteomics. doi:10.1002/pmic.201900334

GLAD4U

Description: The goal of Gene List Automatically Derived For You (GLAD4U) is to implement an algorithm searching the scientific literature (Pubmed) to retrieve the list of publications corresponding to a user’s query. The algorithm will then translate the list of publications into a list of genes referenced in these publications. The last step is to present the user with the list of prioritized genes, from the most to the least referenced genes within the search space.
URL: http://glad4u.zhang-lab.org
Reference: Jerome Jourquin, Dexter Duncan, Zhiao Shi, Bing Zhang. GLAD4U: deriving and prioritizing gene lists from PubMed literature. BMC Genomics. 13(Suppl 8):S20, 2012.

GPU-FAN

Description: Network analysis plays an important role in systems biology. However, network analysis algorithms are usually computationally intensive. Mordern General Purpose computation on Graphics Processing Units (GPGPUs) provides a cost-effective platform for this type of applications. We have initiated a project to enable fast network analysis on GPUs. The first verison of the software package gpu-fan (GPU-based Fast Analysis of Networks) includes methods for computing four shortest path-based centrality metrics on NVIDIA’s CUDA platform. Speedup of 10x ~ 50x was observed for simulated scale-free networks and real world protein interaction and gene co-expression networks.
URL: http://gpu-fan.zhang-lab.org
Reference: Zhiao Shi, Bing Zhang. Fast network centrality analysis using GPUs. BMC Bioinformatics, 12:149, 2011

ICE

Description: The Iterative Clique Enumeration (ICE) algorithm identifies relatively independent co-expression modules from gene co-expression networks in order to facilitate further analyses of the transcriptional mechanisms encoded in the networks.
URL: http://ice.zhang-lab.org
Reference: Zhiao Shi, Catherine K Derow, Bing Zhang. Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression. BMC Systems Biology, 4:74, 2010

LinkedOmics

Description: LinkedOmics is publicly available portal that includes multi-omics data from all 32 TCGA Cancer types. It also includes mass spectrometry-based proteomics data generated by the Clinical Proteomics Tumor Analysis Consortium (CPTAC) for TCGA breast, colorectal and ovarian tumors. The web application has three analytical modules: LinkFinder, LinkInterpreter and LinkCompare. LinkFinder allows users to search for attributes that are associated with a query attribute, such as mRNA or protein expression signatures of genomic alterations, candidate biomarkers of clinical attributes, and candidate target genes of transcriptional factors, microRNAs, or protein kinases. Analysis results can be visualized by scatter plots, box plots, or Kaplan-Meier plots. To derive biological insights from the association results, the LinkInterpreter module performs enrichment analysis based on Gene Ontology, biological pathways, network modules, among other functional categories. The LinkCompare module uses visualization functions (interactive venn diagram, scatter plot, and sortable heat map) and meta-analysis to compare and integrate association results generated by the LinkFinder module, which supports multi-omics analysis in a cancer type or pan-cancer analysis.
URLs: http://www.linkedomics.org
Reference: Vasaikar S., Straub P., Wang J., Zhang B. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Research, gkx1090, 2017, https://doi.org/10.1093/nar/gkx109

NeoFlow

Description: NeoFlow is a streamlined computational workflow that integrates WES and MS/MS proteomics data for neoantigen prioritization to facilitate cancer immunotherapy. It includes four modules: (1) variant annotation and customized database construction; (2) variant peptide identification including MS/MS searching, FDR estimation, PepQuery validation, and optional RT-based validation; (3) human leukocyte antigen (HLA) typing; and (4) MHC-binding prediction and neoantigen prioritization.
URL: https://github.com/bzhanglab/neoflow
Reference: Wen, B., Li, K., Zhang, Y. et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nat Commun 11, 1759 (2020). Manuscript.

NetGestalt

Description: NetGestalt is a data integration framework that allows simultaneous presentation of large-scale experimental and annotation data from various sources in the context of a biological network to facilitate data visualization, analysis, interpretation, and hypothesis generation.
URL: http://www.netgestalt.org
Reference: Zhiao Shi, Jing Wang, Bing Zhang. NetGestalt: integrating multidimensional omics data over biological networks. Nature Methods, 10:597-598, 2013

NetSAM

Description: NetSAM (Network Seriation and Modularization) is an R package that takes an edge-list representation of a network as an input and generates files that can be used as an input for the one-dimensional network visualization tool NetGestalt (http://www.netgestalt.org) or other network analysis. NetSAM uses random walk distance-based hierarchical clustering to identify the hierarchical modules of the network (network modularization) and then uses the optimal leaf ordering (OLO) method to optimize the one-dimensional ordering of the genes in each module by minimizing the sum of the pair-wise random walk distance of adjacent genes in the ordering (network seriation).
URL: http://www.bioconductor.org/packages/release/bioc/html/NetSAM.html
Reference: Zhiao Shi, Jing Wang, Bing Zhang. NetGestalt: integrating multidimensional omics data over biological networks. Nature Methods, 10:597-598, 2013

NetWalker

Description: NetWalker takes a network and a list of nodes from the network as input and calculates steady-state probabilities (final scores) for all nodes in the network based on the random walk technology. Statistical analysis is implemented to evaluate the significance of the final scores. Specifically, for each node, a global p value is calculated to evaluate the overall significance of the node with regard to the input nodes, while a local p value is calculated to ensure that the significance is not simply due to network topology.
URL: http://netwalker.zhang-lab.org
Reference: Bing Zhang, Zhiao Shi, Dexter T. Duncan, Naresh Prodduturi, Lawrence J Marnett, Daniel C Liebler. Relating protein adduction to gene expression changes: a systems approach. Molecular Biosystems. 2011

PepQuery

Description: PepQuery is a peptide-centric search engine that allows quick and easy proteomic validation of genomic alterations without customized database construction. Next generation sequencing-based genomic studies continuously identify new genomic alterations that may lead to novel protein sequences, which are attractive candidates for disease biomarkers and therapeutic targets after proteomic validation. The popular approach for proteomic validation requires customized database construction and a full evaluation of all possible spectrum-peptide pairs, which is time-consuming. We implemented PepQuery as both stand-alone and web-based applications. The stand-alone version supports batch analysis and user-provided MS/MS data. The web version provides access to more than half a billion MS/MS spectra from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteomic studies, making MS/MS data directly available and useful to scientists outside the proteomics community.
URL: http://pepquery.org
Reference: Wen, B., et al. (2019). “PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.” Genome Res 29(3): 485-493. Manuscript.

WebGestalt

Description: WebGestalt is a “WEB-based GEne SeT AnaLysis Toolkit”. It is designed for functional genomic, proteomic and large-scale genetic studies from which large number of gene lists (e.g. differentially expressed gene sets, co-expressed gene sets etc) are continuously generated. WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists.
URL: http://www.webgestalt.org
References:
- Bing Zhang, Stefan A. Kirov, Jay R. Snoddy. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res, 33(Web Server issue), W741-8, 2005
- Jing Wang, Dexter Duncan, Zhiao Shi, Bing Zhang. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res, 41(Web Server issue), W77-83, 2013