dbPTM is updated as an integrated resource for PTMs, providing not only a comprehensive dataset of experimentally verified PTMs that are supported by the literature but also an integrative platform for accessing all available databases and tools that are associated with PTM analysis.
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | GlycoEpitope | Carbohydrate chains occupy truly significant positions in various fields of life sciences and biotechnology. Recently, the wide-ranging involvement of carbohydrate chains in life sciences has been extended to such diverse functions as cell to cell recognition and communication in neuronal tissues and immune systems, pathogen recognition, sperm-egg recognition and fertilization, regulating hormonal half-lives in the blood, directing embryonic development and differentiation, and directing distribution of various cells and proteins throughout the body. A large number of polyclonal or monoclonal antibodies have been used as very important tools for analyzing expression of various carbohydrate chains and their functions. In this database, useful information on these carbohydrate antigens, i.e. glyco-epitopes, and antibodies has been assembled as a compact encyclopedia. | http://www.glycoepitope.jp/ | |
2 | GlycomeDB | Carbohydrates are the third major class of biological macromolecules, besides proteins and DNA molecules. They are involved in numerous biological processes, among them protein folding and inter/intra cell recognition. In contrast to DNA and proteins neither a comprehensive database for carbohydrate structures nor a universal nomenclature for computational purposes exists. After the cease of funding for the Complex Carbohydrate Structure Database (CCSDB, often referred as CarbBank) in 1997, four initiatives developed independent databases with partially overlapping foci. For each database, a proprietary encoding scheme for residues and topology of the structures was designed. As a result it is virtually impossible to get an overview of all existing structures, and to compare the contents of the different databases. We have analysed all of the existing public databases and defined a sequence format based on XML (GlycoCT) capable of storing all structural information of carbohydrate sequences. We have implemented a library of parsers for the interpretation of the different encoding schemes for carbohydrates. With this library we have translated the carbohydrate sequences of all freely available databases (CFG , KEGG, GLYCOSCIENCES.de, BCSDB and Carbbank) to GlycoCT, and created a new database (GlycomeDB) containing all structures and annotations. During the process of data integration we found multiple inconsistencies in the existing databases which were corrected in collaboration with the responsible curators. With the new database, GlycomeDB, it is possible to get an overview of all carbohydrate structures in the different databases and to crosslink common structures in the different databases. Scientists are now able to search for a particular structure in the meta database and get information about the occurrence of this structure in the five carbohydrate structure databases. | http://www.glycome-db.org/ | 21045056 |
3 | UnicarbKB | UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data. | http://unicarbkb.org | 24234447 |
4 | GLYCOSCIENCES.de | The human genome seems to encode for not more than 30,000 to 40,000 proteins. A major challenge is to understand how posttranslational events, such as glycosylation, affect the activities and functions of these proteins in health and disease. The importance of protein glycosylation is becoming widely realized through studies on protein folding, protein localization and trafficking, protein solubility, biological half-life as well as studies on cell-cell interactions. The progressing Glycomics projects will dramatically accelerate the understanding of the roles of carbohydrates in cell communication and lead to novel therapeutic approaches for treatment of human disease. The MIT's magazine of innovation (January 21 2003) has identified Glycomics as one of the top ten technologies that will change the future. | http://www.glycosciences.de/ | 16239495 |
5 | GlycoSuiteDB | UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data. | http://www.unicarbkb.org/ | 12520065 |
6 | CFG | The CFG's Glycan Structures Database offers detailed structural and chemical information for thousands of glycans, including both synthetic glycans and glycans isolated from biological sources. Each glycan structure in the database is linked to relevant entries in CFG and external databases (including primary data and information about binding proteins, where available). Links are also provided to a 3-D modeling feature, references, and other information. | http://www.functionalglycomics.org/glycomics/molecule/jsp/carbohydrate/carbMoleculeHome.jsp | 25753711 |
7 | ProGlycProt | ProGlycProt (Prokaryotic Glycoproteins) is a manually curated, comprehensive repository of experimentally characterized bacterial glycoproteins and archaeal glycoproteins, generated from an exhaustive literature search. This is the focused beginning of an effort to provide concise relevant information derived from rapidly expanding literature on prokaryotic glycoproteins, their glycosylating enzyme(s), glycosylation linked genes, and genomic context thereof, in a cross-referenced manner. ProGlycProt is an extensive online collection of experimentally verified glycosites and glycoproteins of the prokaryotes. For users’ benefit, the database under menu ProGlycProtdb is arranged into two sections namely, ProCGP and ProUGP. ProCGP is the main section containing characterized prokaryotic glycoproteins, defined as entries with at least one experimentally known "glycosylated residue (glycosite)". Whereas, ProUGP is the supplementary section, presenting uncharacterized prokaryotic glycoproteins, defined as entries with experimentally identified glycosylation but unidentified glycosites. The ProGlycProt has been developed with an aim to aid and advance the emerging scientific interests in understanding the mechanisms, implications, and novelties of protein glycosylation in prokaryotes that include many pathogenic as well as economically important bacterial species. A general data update policy is once in three months. Existing entries are updated in real-time. | http://www.proglycprot.org/ | 22039152 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | CPLA | CPLM (Compendium of Protein Lysine Modifications) is an online data resource specifically designed for protein lysine modifications (PLMs). The CPLM database was extended and adapted from our CPLA 1.0 (Compendium of Protein Lysine Acetylation) database (Liu et al., 2011), and the 2.0 release contains 203,972 modification events on 189,919 modified lysines in 45,748 proteins for 12 types of PLMs, including N?-lysine acetylation (Yang et al., 2007; Shahbazian et al., 2007; Smith et al., 2009), ubiquitination (Gao, et al., 2013), methylation (Chen, et al., 2006), sumoylation (Ren, et al., 2009; Xue, et al., 2006), glycation (Priego-Capote, et al., 2010), butyrylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), crotonylation (Tan, et al., 2011), malonylation (Xie, et al., 2012), propionylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), succinylation (Xie, et al., 2012; Zhang, et al., 2011), phosphoglycerylation (Moellering, R. E. and B. F. Cravatt, 2013) and prokaryotic Pupylation (Liu, et al., 2011). | http://cpla.biocuckoo.org | 21059677 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | RedoxDB | SUMMARY: Redox regulation and signaling, which are involved in various cellular processes, have become one of the research focuses in the past decade. Cysteine thiol groups are particularly susceptible to post-translational modification, and their reversible oxidation is of critical role in redox regulation and signaling. With the tremendous improvement of techniques, hundreds of redox proteins along with their redox-sensitive cysteines have been reported, and the number is still fast growing. However, until now there is no database to accommodate the rapid accumulation of information on protein oxidative modification. Here we present RedoxDB-a manually curated database for experimentally validated redox proteins. RedoxDB (version 1.0) consists of two datasets (A and B, for proteins with or without verified modified cysteines, respectively) and includes 2157 redox proteins containing 2203 cysteine residues with oxidative modification. For each modified cysteine, the exact position, modification type and flanking sequence are provided. Additional information, including gene name, organism, sequence, literature references and links to UniProt and PDB, is also supplied. The database supports several functions including data search, blast and browsing. Bulk download of the entire dataset is also available. We expect that RedoxDB will be useful for both experimental studies and computational analyses of protein oxidative modification. AVAILABILITY: The database is freely available at: http://biocomputer.bio.cuhk.edu.hk/RedoxDB. | http://biocomputer.bio.cuhk.edu.hk/RedoxDB/ | 22833525 |
2 | PTM-SD | Posttranslational modifications (PTMs) define covalent and chemical modifications of protein residues. They play important roles in modulating various biological functions. Current PTM databases contain important sequence annotations but do not provide informative 3D structural resource about these modifications. Posttranslational modification structural database (PTM-SD) provides access to structurally solved modified residues, which are experimentally annotated as PTMs. It combines different PTM information and annotation gathered from other databases, e.g. Protein DataBank for the protein structures and dbPTM and PTMCuration for fine sequence annotation. PTM-SD gives an accurate detection of PTMs in structural data. PTM-SD can be browsed by PDB id, UniProt accession number, organism and classic PTM annotation. Advanced queries can also be performed, i.e. detailed PTM annotations, amino acid type, secondary structure, SCOP class classification, PDB chain length and number of PTMs by chain. Statistics and analyses can be computed on a selected dataset of PTMs. Each PTM entry is detailed in a dedicated page with information on the protein sequence, local conformation with secondary structure and Protein Blocks. PTM-SD gives valuable information on observed PTMs in protein 3D structure, which is of great interest for studying sequence-structure- function relationships at the light of PTMs, and could provide insights for comparative modeling and PTM predictions protocols. Database URL: PTM-SD can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/PTM-SD/. © The Author(s) 2014. Published by Oxford University Press. | http://www.dsimb.inserm.fr/dsimb_tools/PTM-SD/ | 24857970 |
3 | PTMfunc | PTMfunc is a repository of functional predictions for protein post-translational modifications (PTMs). To find predictions for your protein of interest just search using a protein name or ID in the search box above. We rely mostly on ids from ENSEMBL but also have protein names for most species. For more info click on documentation. | http://ptmfunc.com/ | 22817900 |
4 | PTMCode | PTMCode is a resource of known and predicted functional associations between protein post-translational modifications (PTMs) within and between interacting proteins. It currently contains 316,546 modified sites from 69 different PTM types which are also propagated through ortholgs between 19 different eukaryotic species. A total of 1.6 million sites and 17 million functional associations more than 100,000 proteins can currently be explored. | http://ptmcode.embl.de/ | 25361965 |
5 | PSP | PhosphoSitePlus® (PSP) is an online systems biology resource providing comprehensive information and tools for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation. See About PhosphoSite above for more information. Please cite the following reference for this resource: Hornbeck PV, et al (2015) PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43:D512-20. [reprint] | http://www.phosphosite.org/homeAction.do | 25514926 |
6 | ProteomeScout | ProteomeScout is a database of proteins and post-translational modifications. There are two main data types in ProteomeScout. 1) Proteins: Visualize proteins or annotate your own proteins. 2) Experiments: You can load a new experiment or browse and analyze an existing experiment. | https://proteomescout.wustl.edu/ | 25414335 |
7 | novPTMenzy | Several attempts have been made to catalog the wealth of available information on Post-Translational Modification(PTMs) for easy retrieval and analysis. However, the tools and databases available mainly focus on modified sites or enzymes of well-known PTMs. Tools for newly discovered PTMs like AMPylation and Eliminylation or unusual PTMs like sulfation,hydroxylation,deamidation etc are not yet available. novPTMenzy is a step towards cataloging information about novel and unusual PTMs and using this information for genome mining of ezymes involved in these PTMs and understanding the pathways in which they are involved. novPTMenzy provides a database Using novPTMenzy user can search for enzymes involved in five PTMs namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation.The search tool also links the protein to closest experimentally characterized neighbor and closest structural neighbor. | http://www.nii.ac.in/novptmenzy.html | 25931459 |
8 | HIstome | Post-translational modification (PTM) of histones is a crucial step in epigenetic regulation of a gene. N-terminal tails of histones are the most accessible regions of these peptide as they protrude from the nucleosome and possess no specific structure. These tails are subjected to various modifications such as acetylation, methylation, phosphorylation, ubiquitination etc. by the 'writers'. PTMs are believed to function in a combinatorial pattern referred to as the 'histone code'. The major function of PTMs is to either create sites for the recruitment of specific factors or modify existing sites so as to abolish previous interactions. This alters the expression states of associated loci by multiple ways thus enabling gene regulation. PTMs can recruite enzymes that can ‘write’, ‘erase’ or ‘read’ modifications and the repertoire of such modifiers is found to be fairly large in number (~150 different enzymes in humans). Certain modifications such as acetylation, phosphorylation, change the overall charge on basic histone proteins and thereby interfere with the histone-DNA interaction essential for nucleosome stability. In terms of molecular weight, these modifications range from light (acetylation, methylation, phosphorylation) to heavy (ubiquitination, poly ADP ribosylation). Here we include 8 different types of modifications that exist on all histone peptides. PTMs are often found to be cell cycle dependent. Role of various histone PTMs has been evaluated in many important cellular processes such as demarcating euchromatin and hetrochromatin regions, transcriptional regulation of Hox gene clusters, maintainance of stemness, cell cycle control etc. Presence or absence of certain PTMs is shown to be a hallmark of different cancers. | http://www.actrec.gov.in/histome/ptm_main.php | 22140112 |
9 | dbPTM | Protein modification is an extremely important post-translational regulation that adjusts the physical and chemical properties, conformation, stability and activity of a protein; thus altering protein function. Due to the high-throughput of mass spectrometry-based methods in identifying site-specific post-translational modifications (PTMs), dbPTM is updated to integrate experimental PTMs obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles. The new version of dbPTM aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed statistical method has been applied to identify the significant substrate motifs for each type of PTMs containing sufficient experimental data. According to the data statistics in dbPTM, over 60% of PTM sites are located in the functional domains of proteins. It is known that most PTMs can create binding sites for specific protein-interaction domains that work together for cellular function. Thus, this update integrates protein-protein interaction and domain-domain interaction to determine the functional association of PTM sites located in protein-interacting domains. Additionally, the information of structural topologies on transmembrane proteins is integrated in dbPTM in order to delineate the structural correlation between the reported PTM sites and transmembrane topologies. To facilitate the investigation of PTMs on transmembrane proteins, the PTM substrate sites and the structural topology are graphically represented. Also, literature information related to PTMs, orthologous conservations and substrate motifs of PTMs are also provided in the resource. Lastly, this version features an improved web interface to facilitate convenient access to the resource. | http://dbptm.mbc.nctu.edu.tw/index.php | 23193290 |
10 | CrosstalkDB | This database aims to collect mass spectrometry data of multiply modified histones or histone tails. You can search, analyze and download data from this database without having to log in. Quantification can be based on either spectral counting or peak intensities. We recommend isoScale and Histone Coder for spectra validation and quantification. For details of the database, see Schwämmle, V.; Aspalter, C.-M.; Sidoli, S. and Jensen, O. N. Large-scale analysis of co-existing post-translational modifications on histone tails reveals global fine-structure of crosstalk Mol Cell Proteomics, 2014, 13, 1855-1865 We encourage users to register and upload their data from mass spectrometry experiments. Registration is only formal and no private data (not even your email) will be required. After uploading your data, you will still be able to correct errors or delete selected entries. As special feature, the statistical part includes calculation of interaction patterns between different histone modifications. With this tool, it should be possible to reveal the crosstalk between multiple histone modifications. We are sure that this software is not exempt from bugs. Please send us a message (at the Impressum / Feedback page) describing your problem(s). | http://crosstalkdb.bmb.sdu.dk/ | 24741113 |
11 | CPLM | CPLM (Compendium of Protein Lysine Modifications) is an online data resource specifically designed for protein lysine modifications (PLMs). The CPLM database was extended and adapted from our CPLA 1.0 (Compendium of Protein Lysine Acetylation) database (Liu et al., 2011), and the 2.0 release contains 203,972 modification events on 189,919 modified lysines in 45,748 proteins for 12 types of PLMs, including N?-lysine acetylation (Yang et al., 2007; Shahbazian et al., 2007; Smith et al., 2009), ubiquitination (Gao, et al., 2013), methylation (Chen, et al., 2006), sumoylation (Ren, et al., 2009; Xue, et al., 2006), glycation (Priego-Capote, et al., 2010), butyrylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), crotonylation (Tan, et al., 2011), malonylation (Xie, et al., 2012), propionylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), succinylation (Xie, et al., 2012; Zhang, et al., 2011), phosphoglycerylation (Moellering, R. E. and B. F. Cravatt, 2013) and prokaryotic Pupylation (Liu, et al., 2011). | http://cplm.biocuckoo.org/ | 24214993 |
12 | RESID | The RESID Database of Protein Modifications is a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications. | http://pir.georgetown.edu/resid/ | 12520062 |
13 | SysPTM | SysPTM Version 2.0, updated June 15th, 2013. Visits: 110. SysPTM provides a systematic and sophisticated platform for proteomic PTM research, equipped not only with a knowledge base of manually curated multi-type modification data, but also with four fully developed, in-depth data mining tools. Currently, SysPTM contains data detailing 471109 experimentally determined PTM sites on 53235 proteins, covering more than 50 modification types, curated from public resources including five databases and four webservers and more than three hundred peer-reviewed mass spectrometry papers. Protein annotations including Pfam domains, KEGG pathways, GO functional classification, and ortholog groups are integrated into the database. Five online tools have been developed and incorporated, including: PTMBlast, PTMPathway, PTMPhylog, PTMCluster and PTMGO.In SysPTM, the roles of single-type and multi-type modifications can be systematically investigated in a full biological context. SysPTM could be an important contribution to modificomics research. | http://lifecenter.sgst.cn/SysPTM/ | 24705204 |
14 | topPTM | topPTM is a database that integrates experimentally verified post-translational modifications (PTMs) from available databases and research articles, and annotates the PTM sites on transmembrane proteins with structural topology. The biological effects of PTMs on transmembrane proteins include phosphorylation for signal transduction and ion transport, acetylation for structure stability, attachment of fatty acids for membrane anchoring and association, as well as the glycosylation for substrates targeting, cell-cell interactions, and viruses infection. The experimentally verified PTMs are mainly collected from public resources including dbPTM, Phospho.ELM, PhosphoSite, OGlycBase, and UbiProt. For transmembrane proteins, the information of membrane topologies is collected from TMPad, TOPDB, PDBTM, and OPM. In order to fully investigate the PTMs on transmembrane proteins, the UniProtKB protein entries containing the annotation of membrane protein and the information of membrane topology are regarded as potential transmembrane proteins. To delineate the structural correlation and consensus motif of these reported PTM sites, the topPTM database also provide structural analyses, including the membrane accessibility of PTM substrate sites, protein secondary and tertiary structures, protein domains, and cross-species conservations of each entry. | http://topptm.cse.yzu.edu.tw/ | 24302577 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | PubMeth | Epigenetics, and more specifically DNA methylation is a fast evolving research area. In almost every cancer type, each month new publications confirm the differentiated regulation of specific genes due to methylation and mention the discovery of novel methylation markers. Therefore, it would be extremely useful to have an annotated, reviewed, sorted and summarized overview of all available data. PubMeth is a cancer methylation database that includes genes that are reported to be methylated in various cancer types. A query can be based either on genes (to check in which cancer types the genes are reported as being methylated) or on cancer types (which genes are reported to be methylated in the cancer (sub) types of interest). The database is freely accessible at http://www.pubmeth.org. PubMeth is based on text-mining of Medline/PubMed abstracts, combined with manual reading and annotation of preselected abstracts. The text-mining approach results in increased speed and selectivity (as for instance many different aliases of a gene are searched at once), while the manual screening significantly raises the specificity and quality of the database. The summarized overview of the results is very useful in case more genes or cancer types are searched at the same time. | http://www.pubmeth.org | 17932060 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | MYRbase | Myristoylation is a common lipid modification of proteins in Eukaryotes and their Viruses as well as some Bacteria and essential for the function of several important proteins (such as G proteins, SRC and related kinases, ADP ribosylation factors, HIV gag, HIV nef,...). The saturated 14-carbon fatty acid (Myristate) is attached most often co-translationally by the enzyme NMT (MyristoylCoA:Protein N-Myristoyltransferase) to N-terminal glycines or glycines that become N-terminal after proteolytic cleavage. Based on sequence variability of known substrate proteins, physical property profiles and structural models of NMT-substrate interactions (J Mol Biol. 2002 Apr 5;317[4]:523-40), we developed a powerful prediction tool for glycine myristoylation (J Mol Biol. 2002 Apr 5;317[4]:541-57) that is available as webserver (http://mendel.imp.univie.ac.at/myristate/) and whose sensitivity allows large-scale database runs. To facilitate selection of targets for experimental verification of our predictions, we evaluate the evolutionary conservation of the predicted myristoylation motif within close homologues (EvOluation). If a sequence is predicted to be myristoylated and the same applies to its homologues (preferably in a series of different organisms), we not only add another dimension of credibility to our prediction but derive that the lipid anchor might play an essential role for that protein's function. Such an analysis has been applied in a large-scale approach to the proteins included in the SwissProt and Genbank databases. The corresponding predicted entries and their homologues were annotated and summarized in tabular form accessible from MYRbase. | http://mendel.imp.ac.at/myristate/myrbase/ | 15003124 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | GlycoFish | LIPID PROFILING & CELL ENGINEERING POST-DOCTORAL SCIENTIST An opening for a motivated and talented post-doctoral scientist in lipid profiling/ lipidomics and cellular engineering is available in the laboratory of Dr. Betenbaugh. Candidates should have a PhD in biochemistry, molecular biology, bioengineering, chemical engineering, or a related discipline with a strong record of publication and experience. Previous work experience in one or more of the following specialties is highly desirable: identification and quantification of various lipid classes and molecules, cell line engineering, and knowledge of biological pathway modeling. Please send email to Dr. Mike Betenbaugh at beten@jhu.edu describing your background and interest in the project. | http://betenbaugh.jhu.edu/GlycoFish/ | 21591763 |
2 | UniPep | Unipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment). Unipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment). Unipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment). | http://www.unipep.org/ | 16901351 |
3 | GlycoProtDB | GlycoProtDB is a glycoprotein database providing information of Asn (N)-glycosylated proteins and their glycosylated site(s), which were constructed by employing a bottom-up strategy using actual glycopeptide sequences identified by LC/MS-based glycoproteomic technologies. Current contents are glycoproteins identified from model organisms C.elegans and mouse (C57BL/6, male). The database is searchable using gene ID, gene name, and its description (protein name) as query. Each data sheet of glycproteins is based on a single amino acid sequence in Wormpep database for C.elegans and NCBI Refseq database for mouse. The sheet presents actually detected N-glycosylation site(s) which are displayed each capturing methods of glycopeptide subset, e.g., lectins Concanavalin A, wheat germ agglutinin (WGA), or HILIC (hydrophilic interaction chromatography), as well as potential N-glycosylation sites (NX[STC], X?P). Protein sequences, which have common glycopeptide sequence(s), are linked each other. | http://jcggdb.jp/rcmg/gpdb/index.action | 22823882 |
4 | GlycoFly | LIPID PROFILING & CELL ENGINEERING POST-DOCTORAL SCIENTIST An opening for a motivated and talented post-doctoral scientist in lipid profiling/ lipidomics and cellular engineering is available in the laboratory of Dr. Betenbaugh. Candidates should have a PhD in biochemistry, molecular biology, bioengineering, chemical engineering, or a related discipline with a strong record of publication and experience. Previous work experience in one or more of the following specialties is highly desirable: identification and quantification of various lipid classes and molecules, cell line engineering, and knowledge of biological pathway modeling. Please send email to Dr. Mike Betenbaugh at beten@jhu.edu describing your background and interest in the project. | http://betenbaugh.jhu.edu/GlycoFly/ | 21480662 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | dbOGAP | Introduction: Protein O-GlcNAcylation is an O-linked glycosylation involving attachment of beta-N-acetylglucosamine (GlcNAc) to Ser/Thr residues catalyzed by O-GlcNAc transferase (OGT) without further extension of GlcNAc, whose removal is catalyzed by O-GlcNAcase (OGA). Unlike N-linked and mucin-type O-linked glycosylation, O-GlcNAcylation occurs primarily in nucleocytoplasmic proteins, and is often dynamic and reciprocal to phosphorylation at the same or adjacent Ser/Thr residues (often mutually inhibitory). Compared to phosphorylation, the amount of research on O-GlcNAcylation has been disproportionally small. Growing evidences now suggest that O-GlcNAcylation is common and has broad roles in physiology and diseases especially through its interplay with phosphorylation, e.g., regulation of insulin signaling and roles in diabetes and neurodegenerative diseases. To facilitate research on O-GlcNAcylated proteins, we developed a database of O-GlcNAcylated proteins and sites (dbOGAP) based on experimental data curated from literature as well as from collaborating labs. The database also provides additional sequence annotations and functional information integrated from databases such as UniProt, and pathway and disease databases. Review statistics for the current version of dbOGAP (v1.0). For more, please see USHUPO 2010 abstract, presentation and the paper Jinlian Wang, Manabu Torii, Hongfang Liu, Gerald W Hart and Zhang-Zhi Hu*.dbOGAP - An Integrated Bioinformatics Resource for Protein O-GlcNAcylation. BMC Bioinformatics 2011, 12:91 . | http://cbsb.lombardi.georgetown.edu/hulab/OGAP.html | 21466708 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | O-GlycBase | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/databases/OGLYCBASE/ | 9847232 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | PhosPhAt | Phosphorylation site database: The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt 3.0) contains information on Arabidopsis phosphorylation sites which were identified by mass spectrometry in large scale experiments by different research groups. Specific information about the peptide properties, their annotated biological function as well as the experimental and analytical context is given. For a majority of peptides, the actual annotated mass spectrum is displayed in interactive manner. Phosphorylation site predictor: The PhosPhAt service has a built-in plant specific phosphorylation site predictor trained on the experimental dataset for Serine, threonine and tyrosine phosphorylation (pSer, pThr, pTyr). Protein sequences or Arabidopsis AGI gene identifier can be submitted to the predictor. | http://phosphat.uni-hohenheim.de/ | 23172287 |
2 | SubPhos | Protein phosphorylation is the most common post-translational modification (PTM) regulating major cellular processes such as cell division, growth, and differentiation through highly dynamic and complex signaling pathways. However, the dynamic interplay of protein phosphorylation is not occurring randomly within the cell but is rather finely orchestrated by specific kinases and phosphatases that are unevenly distributed across subcellular compartments. This spatial separation not only regulates protein phosphorylation but can also control the activity of other enzymes and the transfer of other post-translational modifications. | http://bioinfo.ncu.edu.cn/SubPhos.aspx | 25236462 |
3 | PhosSNP | As we are entering the age of "Personal Genomics" or "Personalized Medicine", it has been expected that the knowledge of human genetic polymorphisms and variations could provide a foundation for understanding differences in susceptibility to diseases and designing individualized therapeutic treatments (Cargill, et al., 1999; Collins, et al., 1998). Recent progresses of the International HapMap Project and similar projects (International HapMap Consortium, 2005; Frazer, et al., 2007) have provided a wealth of information detailing tens of millions human genetic variations between individuals, including copy number variations (CNVs) (Redon, et al., 2006) and single nucleotide polymorphisms (SNPs) (Hinds, et al., 2005). It was estimated that ~90% of human genetic variations are due to SNPs (Collins, et al., 1998). In particular, by changing amino acids in proteins, non-synonymous SNPs (nsSNPs) in the gene coding regions could account for nearly half of the known genetic variations linked to human inherited diseases (Stenson, et al., 2003). In this regard, numerous efforts have been contributed to elucidate how nsSNPs generate deleterious effects on the stability and function of proteins. Obviously, an nsSNP might change the physicochemical property of a wild-type amino acid to affect the protein stability and dynamics, or disrupt the interacting interface that prohibits the protein to form a complex with its partners (Kono, et al., 2008; Stitziel, et al., 2004; Uzun, et al., 2007; Yue and Moult, 2006). Alternatively, nsSNPs could also influence post-translational modifications (PTMs) of proteins (eg., phosphorylation), by changing the residue types of the target sites or key flanking amino acids (Erxleben, et al., 2006; Gentile, et al., 2008; Ryu, et al., 2009; Savas and Ozcelik, 2005; Yang, et al., 2008). Previously, the Armstrong group firstly coined the term of phosphorylopathy to describe human genetic variation that results in aberrant regulation of protein phosphorylation (Erxleben, et al., 2006; Gentile, et al., 2008). | http://phossnp.biocuckoo.org/ | 19995808 |
4 | PhospoPep | PhospoPep version 2.0 is a project to support systems biology signaling research by providing interactive interrogation of MS-derived phosphorylation data from 4 different organisms. Currently there is data from the fly (Drosophila melanogaster), human (Homo sapiens), worm (Caenorhabditis elegans), and yeast (Saccharomyces cerevisiae). The experimental data was collected and analyzed by the Aebersold group at the Swiss Federal Institute of Technology (ETH) in collaboration with the Functional Genomics Center ( FGCZ ) in Zurich, Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. The tabs below show details about the data collected from each organism, and link to this information in the database. PhosphoPep offers different software tools which allow users to browse through single proteins, through pathways, and importantly to integrate the data with information from external sources, like protein-protein interaction data. Finally all data can be readily exported e.g. for a targeted proteomics approach and the generated data can be again validated using PhosphoPep, enabling systems biology signaling research. | http://www.phosphopep.org/ | 21082442 |
5 | PhosphoPOINT | MOTIVATION: To fully understand how a protein kinase regulates biological processes, it is imperative to first identify its substrate(s) and interacting protein(s). However, of the 518 known human serine/threonine/tyrosine kinases, 35% of these have known substrates, while 14% of the kinases have identified substrate recognition motifs. In contrast, 85% of the kinases have protein-protein interaction (PPI) datasets, raising the possibility that we might reveal potential kinase-substrate pairs from these PPIs. RESULTS: PhosphoPOINT, a comprehensive human kinase interactome and phospho-protein database, is a collection of 4195 phospho-proteins with a total of 15 738 phosphorylation sites. PhosphoPOINT annotates the interactions among kinases, with their down-stream substrates and with interacting (phospho)-proteins to modulate the kinase-substrate pairs. PhosphoPOINT implements various gene expression profiles and Gene Ontology cellular component information to evaluate each kinase and their interacting (phospho)-proteins/substrates. Integration of cSNPs that cause amino acids change with the proteins with the phosphoprotein dataset reveals that 64 phosphorylation sites result in a disease phenotypes when changed; the linked phenotypes include schizophrenia and hypertension. PhosphoPOINT also provides a search function for all phospho-peptides using about 300 known kinase/phosphatase substrate/binding motifs. Altogether, PhosphoPOINT provides robust annotation for kinases, their downstream substrates and their interaction (phospho)-proteins and this should accelerate the functional characterization of kinomemediated signaling. AVAILABILITY: PhosphoPOINT can be freely accessed in http://kinase. bioinformatics.tw/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. | http://kinase.bioinformatics.tw/ | 18689816 |
6 | PhosphoNET | PhosphoNET is an open-access, online resource developed by Kinexus Bioinformatics Corporation to foster the study of cell signalling systems to advance biomedical research in academia and industry. PhosphoNET is the world’s largest repository of known and predicted information on human phosphorylation sites, their evolutionary conservation and the identities of protein kinases that may target these sites. PhosphoNET presently holds data on over 950,000 known and putative phosphorylation sites (P-sites) in over 23,000 human proteins that have been collected from the scientific literature and other reputable websites. Over 19% of these phospho-sites have been experimentally validated. The rest have been predicted with a novel P-Site Predictor algorithm developed at Kinexus with academic partners at the University of British Columbia and Simon Fraser University. With the PhosphoNET Evolution module, this website also provides information about cognate proteins in over 20 other species that may share these human phospho-sites. This helps to define the most functionally important phospho-sites as these are expected to be highly conserved in nature. With the Kinase Predictor module, listings are provided for the top 50 human protein kinases that are likely to phosphorylate each of these phospho-sites using another proprietary kinase substrate prediction algorithm developed at Kinexus. Our kinase substrate predictions are based on deduced consensus phosphorylation site amino acid frequency scoring matrices that we have determined for each of ~500 different human protein kinases. The specificity matrices are generated directly from the primary amino acid sequences of the catalytic domains of these kinases, and when available, have proven to correlate strongly with substrate prediction matrices based on alignment of known substrates of these kinases. The higher the score, the better the prospect that a kinase will phosphorylate a given site. Over 30 million kinase-substrate phospho-site pairs are quantified in PhosphoNET. Kinexus Bioinformatics Corporation has the capability to test most of these putative interactions in vitro for our clients. | http://www.phosphonet.ca/ | 22165948 |
7 | PhosphoGRID | PhosphoGRID is an online database of experimentally verified in vivo protein phosphorylation sites in the model eukaryotic organism Saccharomyces cerevisisae. The database includes results from both high throughput (HTP) MS proteomics studies in addition to phosphosites identified in low throughput (LTP) studies of individual proteins or protein complexes. The identity of specific protein kinases and phosphatases shown to regulate appearance of phosphorylations are recorded, where available, as are the function(s) of the phosphorylation, and conditions under which the modification was demonstrated to occur. The PhosphoGRID curators would appreciate comments on omissions and errors, as well as notifications of newly published or submitted data. | http://www.phosphogrid.org/ | 23674503 |
8 | Phospho3D | Phospho3D is a database of three-dimensional structures of phosphorylation sites which stores information retrieved from the Phospho.ELM database and which is enriched with structural information and annotations at the residue level. The database also collects the results of a large-scale structural comparison procedure providing clues for the identification of new putative phosphorylation sites. Phospho3D 2.0 also includes P3Dscan, which allows to compare your own protein structure against the set of 3D phosphorylation sites collected in the database. | http://www.phospho3d.org/ | 20965970 |
9 | Phospho.ELM | Phospho.ELM is a database of experimentally verified phosphorylation sites in eukaryotic proteins. The current release (Version 9.0, September 2010) of Phospho.ELM contains 8,718 substrate proteins from different species covering more than 42,500 instances. Instances are fully linked to literature references. List of references to the HTP data sets. | http://phospho.elm.eu.org/ | 21062810 |
10 | PHOSIDA | This database accompanies 'PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites', Florian Gnad, Shubin Ren, Juergen Cox, Jesper V Olsen, Boris Macek, Mario Oroshi, Matthias Mann (2007); Genome Biology. An update of the database is described in 'PHOSIDA 2011: the posttranslational modification database', Florian Gnad, Jeremy Gunawardena, Matthias Mann (2011); Nucleic Acids Research. Phosida allows the retrieval of phosphorylation, acetylation, and N-glycosylation data of any protein of interest. It lists posttranslational modification sites associated with particular projects and proteomes or, alternatively, displays posttranslational modifications found for any protein or protein group of interest. In addition, structural and evolutionary information on each modified protein and posttranslational modification site is integrated. Importantly, Phosida links extensive peptide information to the sites, such as several peptides implicating the same site and temporal profiles of each site in response to stimulus (e.g., EGF stimulation). | http://www.phosida.com/ | 21081558 |
11 | PepCyber:P~PEP | PepArray pro is a proteomics tool to provide PepArray Layout file that contains information about peptides, peptide IDs, and the array-location of the peptides to be synthesized on chip. The Layout file is required by the synthesis of an addressable peptide microarray. Peptide microarrays (PepArrays) provide powerful proteomics technology platform for a broad range of applications in studying the interactions between protein-protein, protein-nucleic acid, and many other intermolecular interactions as signatures to cellular signaling pathways, and regulatory network activities. Such studies can be applied to not only basic research but also clinical biomedical tool development such as biomarker detection, diagnostic reagent discovery, drug development, and many more. PepArray pro supports the generation of peptide sequences containing standard or non-standard amino acids from reading user-input sequences, importing from web resources, or modifying existing peptides. Currently, phosphopeptides from the corresponding databases Phospho.ELM and PepCyber P~PEP are also supported. The designed Layout file can incorporate reference and/or control peptides for quality of synthesis and assay, generate peptide modifications, replicate the generated peptides, and provide design statistics. The designed PepArrays can be stored and archived. PepArray pro makes available of a set of catalog PepArray Layout files. | http://www.pepcyber.org/PPEP/copyright.php | 18160410 |
12 | P3DB | P(3)DB (http://www.p3db.org/) provides a resource of protein phosphorylation data from multiple plants. The database was initially constructed with a dataset from oilseed rape, including 14,670 nonredundant phosphorylation sites from 6382 substrate proteins, representing the largest collection of plant phosphorylation data to date. Additional protein phosphorylation data are being deposited into this database from large-scale studies of Arabidopsis thaliana and soybean. Phosphorylation data from current literature are also being integrated into the P(3)DB. With a web-based user interface, the database is browsable, downloadable and searchable by protein accession number, description and sequence. A BLAST utility was integrated and a phosphopeptide BLAST browser was implemented to allow users to query the database for phosphopeptides similar to protein sequences of their interest. With the large-scale phosphorylation data and associated web-based tools, P(3)DB will be a valuable resource for both plant and nonplant biologists in the field of protein phosphorylation. | http://www.p3db.org/ | 18931372 |
13 | MAPRes | The new version of MAPRes is an extension of old version of MAPRes to mine association rules on the basis of bio-physical and bio-chemical properties of the amino acids. Several studies have been performed to analyse primary sequence of the amino acids but analyses performed on the bases of physic-chemical property of the amino acids such as polarity and charge of the amino acids is not been considered yet. The new versio of MAPRes also facilitates users to analyze non-modified sites. | http://www.imsb.edu.pk/Database.htm | 25258092 |
14 | LymPHOS | Current proteomic technology is capable of producing huge amounts of analytical information, which is often difficult to manage in a comprehensive form. Curation, further annotation and public communication of proteomic data require the development of standard data formats and efficient, multimedia database structures. We have implemented a workflow for the annotation of a phosphopeptide database (LymPHOS) that includes tools for MS data filtering and phosphosite assignation, mass spectrum visualization, experimental description and accurate phosphorylation site assignation. Experimental annotations were fitted to current minimum information about a proteomics experiment guidelines. A new guideline for phosphoprotein sample preparation is also proposed. Currently, the database describes 342 phosphorylation sites mapping to more than 200 gene sequences, and it can be accessed through the net (http://www.lymphos.org). | http://www.lymphos.org/ | 19639593 |
15 | HPRD | COMMERCIAL ENTITIES MAY NOT USE THIS SITE WITHOUT PRIOR LICENSING AUTHORIZATION. PLEASE SEND AN E-MAIL FOR FURTHER INFORMATION ABOUT LICENSING. The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. HPRD has been created using an object oriented database in Zope, an open source web application server, that provides versatility in query functions and allows data to be displayed dynamically. | http://www.hprd.org/ | 18988627 |
16 | dbPSP | As one of the most important and ubiquitous post-translational modifications (PTMs), protein phosphorylation regulates a broad spectrum of biological processes not only in humans but also in plants. The identification of site-specific phosphorylated substrates is fundamental for understanding the regulatory molecular mechanisms of protein phosphorylation in controlling plant growth and development. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources for protein phosphorylation in plants, including databases and predictors. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for user feedback. Please inform Han Cheng, Wankun Deng, Dr. Zexian Liu, or Dr. Yu Xue to add, remove or update one or multiple web links below. | http://dbpsp.biocuckoo.org/ | 25841437 |
17 | dbPPT | As one of the most important and ubiquitous post-translational modifications (PTMs), protein phosphorylation regulates a broad spectrum of biological processes not only in humans but also in plants. The identification of site-specific phosphorylated substrates is fundamental for understanding the regulatory molecular mechanisms of protein phosphorylation in controlling plant growth and development. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources for protein phosphorylation in plants, including databases and predictors. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for user feedback. Please inform Han Cheng, Wankun Deng, Dr. Zexian Liu, or Dr. Yu Xue to add, remove or update one or multiple web links below. | http://dbppt.biocuckoo.org/ | 25534750 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | PRENbase | PRENbase is an annotated database of known and predicted prenylated proteins. Homologous proteins are merged into clusters. This search interface is designed to allow sophisticated queries for the experimental status of the modification (known/predicted...), exclusive or shared types of modifying enzymes (FT, GGT1, GGT2) as well as for evolutionary conservation by constraining the taxonomic distribution within these clusters or for single sequences. | http://mendel.imp.ac.at/PrePS/PRENbase/ | 17411337 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | dbGSH | dbGSH is a database that integrates the experimentally verified cysteine S-glutathionylation (GSH) sites from multiple species. S-glutathionylation (GSH), the reversible protein post-translational modification (PTM) that generates a mixed-disulfide bond between glutathione and cysteine reside, critically regulates protein activity, stability, and redox regulation. Due to its importance in regulating oxidative/nitrosative stress and balance in cellular response, a number of methods rapidly evolve to increase the dataset of experimentally determined glutathionylation sites. However, there is currently no database dedicated to the integration of all experimentally verified S-glutathionylation sites with their characteristics, structure or functional information. Thus, the dbGSH database is created to integrate all available datasets and to provide their structural analysis. Up to December 10th 2013, the dbGSH has manually accumulated more than 2200 experimentally verified S-glutathionylated peptides from more research articles using a text mining approach. To solve the heterogeneity among the data collected from different sources, the sequence identity of these reported S-glutathionylated peptides are mapped to the UniProtKB protein entries. To delineate the structural correlation and consensus motif of these GSH sites, the dbGSH database also provides structural and functional analyses, including the motifs of substrate sites, solvent accessibility, protein secondary and tertiary structures, protein domains, and gene ontology. | http://csb.cse.yzu.edu.tw/dbGSH/ | 24790154 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | dbSNO | Protein S-nitrosylation (SNO) is a reversible post-translational modification (PTM) and involves the covalent attachment of nitric oxide (NO) to the thiol group of cysteine (Cys) residues. Given the increasing number of proteins reported to be regulated by this modification, S-nitrosylation is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders. Due to its importance in regulating protein functions and cell signaling, dbSNO (http://dbSNO.mbc.nctu.edu.tw) is extended as an informative resource for exploring structural environment of SNO substrate sites and regulatory networks of S-nitrosylated proteins. An increasing interest in the structural environment of PTM substrate sites motivated us to map all manually curated SNO peptides (4165 SNO sites within 2277 proteins) to PDB protein entries by sequence identity, which provides the information of spatial amino acid composition, solvent-accessible surface area, spatially neighboring amino acids, and side chain orientation for 298 substrate cysteine residues. Additionally, the annotations of protein molecular functions, biological processes, functional domains and human diseases are integrated to explore the functional and disease associations for S-nitrosoproteome. In this update, users are allowed to search a group of interested proteins/genes and the system reconstructs the S-nitrosylation regulatory network based on the information of metabolic pathways and protein-protein interactions. Most importantly, an endogenous yet pathophysiological S-nitrosoproteomic dataset from colorectal cancer patients was adopted to demonstrate that dbSNO could discover potential SNO proteins involving in the regulation of NO signaling for cancer pathways. | http://140.138.144.145/~dbSNO/index.php | 25399423 |
# | Database Name | Description | URL | Reference |
---|---|---|---|---|
1 | UbiProt | The UbiProt Database project aims to summarize a significant volume of data concerning various protein substrates of ubiquitylation. Each database entry describing particular ubiquitylated protein comprises information about protein properties and sources; ubiquitylation features, including details of respective conjugation cascade; literature reference and links to related databases. All data included were experimentally obtained by research groups from around the world and can be verified using respective references. | http://ubiprot.org.ru/ | 17442109 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | GlycoMine | Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition, and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. We present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-, N- and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources. | http://www.structbioinfor.org/Lab/GlycoMine/ | 25568279 |
2 | GlycoPP | Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition, and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. We present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-, N- and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources. | http://www.imtech.res.in/raghava/glycopp/ | 22808107 |
3 | GPP | Ab Initio Calculations of the Electronic Excited States of Molecules, Electronic Structure and Circular Dichroism of Proteins, Protein Folding and Evolution, Bioinformatics, Computer-Aided Drug Design, Drug Resistance. Please follow the links to publications on the respective topic. | http://comp.chem.nottingham.ac.uk/glyco/ | 19038042 |
4 | GS-align | Glycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. GS-align is a novel computational method for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition, and the optimal alignment is determined by the maximum structural similarity score, GS-score whose significance is size-independent. | http://www.glycanstructure.org/gsalign | 25857669 |
5 | GlycoEP | Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep?/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences. | http://www.imtech.res.in/raghava/glycoep | 23840574 |
6 | EnsembleGly | BACKGROUND: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. RESULTS: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. CONCLUSION: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences. | http://turing.cs.iastate.edu/EnsembleGly/ | 17996106 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | FragAnchor | A glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at [see text]. | http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html | 17893077 |
2 | PredGPI | PredGPI is a prediction system for GPI-anchored proteins. It is based on a support vector machine (SVM) for the discrimination of the anchoring signal, and on a Hidden Markov Model (HMM) for the prediction of the most probable omega-site | http://gpcr.biocomp.unibo.it/predgpi/index.htm | 18811934 |
3 | GPI-SOM | MOTIVATION: Anchoring of proteins to the extracytosolic leaflet of membranes via C-terminal attachment of glycosylphosphatidylinositol (GPI) is ubiquitous and essential in eukaryotes. The signal for GPI-anchoring is confined to the C-terminus of the target protein. In order to identify anchoring signals in silico, we have trained neural networks on known GPI-anchored proteins, systematically optimizing input parameters. RESULTS: A Kohonen self-organizing map, GPI-SOM, was developed that predicts GPI-anchored proteins with high accuracy. In combination with SignalP, GPI-SOM was used in genome-wide surveys for GPI-anchored proteins in diverse eukaryotes. Apart from specialized parasites, a general trend towards higher percentages of GPI-anchored proteins in larger proteomes was observed. AVAILABILITY: GPI-SOM is accessible on-line at http://gpi.unibe.ch. The source code (written in C) is available on the same website. SUPPLEMENTARY INFORMATION: Positive training set, performance test sets and lists of predicted GPI-anchored proteins from different eukaryotes in fasta format. | http://gpi.unibe.ch/ | 15691858 |
4 | big-Pi plant | Posttranslational glycosylphosphatidylinositol (GPI) lipid anchoring is common not only for animal and fungal but also for plant proteins. The attachment of the GPI moiety to the carboxyl-terminus after proteolytic cleavage of a C-terminal propeptide is performed by the transamidase complex. Its four known subunits also have obvious full-length orthologs in the Arabidopsis and rice (Oryza sativa) genomes; thus, the mechanism of substrate protein processing appears similar for all eukaryotes. A learning set of plant proteins (substrates for the transamidase complex) has been collected both from the literature and plant sequence databases. We find that the plant GPI lipid anchor motif differs in minor aspects from the animal signal (e.g. the plant hydrophobic tail region can contain a higher fraction of aromatic residues). We have developed the "big-Pi plant" program for prediction of compatibility of query protein C-termini with the plant GPI lipid anchor motif requirements. Validation tests show that the sensitivity for transamidase targets is approximately 94%, and the rate of false positive prediction is about 0.1%. Thus, the big-Pi predictor can be applied as unsupervised genome annotation and target selection tool. The program is also suited for the design of modified protein constructs to test their GPI lipid anchoring capacity. The big-Pi plant predictor Web server and lists of potential plant precursor proteins in Swiss-Prot, SPTrEMBL, Arabidopsis, and rice proteomes are available at http://mendel.imp.univie.ac.at/gpi/plants/gpi_plants.html. Arabidopsis and rice protein hits have been functionally classified. Several GPI lipid-anchored arabinogalactan-related proteins have been identified in rice. | http://mendel.imp.ac.at/gpi/plant_server.html | 14681532 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | PSKAcePred | In the prediction model PSKAcePred, the sequences fragments are firstly extracted in window size -10 to +10 (acetylation lysine centered in position of 0). Then, 13 optimization positions are chosen by using position-specific method. Users can submit protein sequence(s) in FASTA format to this web interface. The system efficiently returns the prediction results, including protein name, the position of site, flanking amino acid sequences and SVM probability. In the output of flanking amino acid sequences, the amino acids in green are those selected by using information gain and the red amino acids (K) are predicted as acetylation sites. There is no upper bound for the protein sequence length, input protein sequences have to: (i) contain minimum 21 amino acids; (ii) contain only characters that present amino acids. Maximum 20 protein sequences is recommend to be submitted in the textbox. Too many sequences for prediction can cause the system to crash. To carry out large-scale predictions, the researchers can download the Matlab codes about the PSKAcePred below. | http://bioinfo.ncu.edu.cn/inquiries_PSKAcePred.aspx | 23173045 |
2 | LAceP | Lysine acetylation is a crucial type of protein post-translational modification, which is involved in many important cellular processes and serious diseases. However, identification of protein acetylated sites through traditional experiment methods is time-consuming and laborious. Those methods are not suitable to identify a large number of acetylated sites quickly. Therefore, computational methods are still very valuable to accelerate lysine acetylated site finding. In this study, many biological characteristics of acetylated sites have been investigated, such as the amino acid sequence around the acetylated sites, the physicochemical property of the amino acids and the transition probability of adjacent amino acids. A logistic regression method was then utilized to integrate these information for generating a novel lysine acetylation prediction algorithm named LAceP. When compared with existing methods, LAceP overwhelms most of the state-of-art methods. Especially, LAceP has a more balanced prediction capability for positive and negative datasets. An online web server is freely available at http://www.scbit.org/iPTM/. | http://www.scbit.org/iPTM/ | 24586884 |
3 | PAIL | Protein acetylation is a widespread covalent modification in eukaryotes, transferring acetyl groups from acetyl coenzyme A (acetyl CoA) to either ?-amino (N?) group of amino-terminal residues or to the ?-amino group (N?) of internal lysines at specific sites (Glozak,MA et al., 2005;Kouzarides,T, 2000; Polevoda,B et al., 2000; Polevoda,B et al., 2002; Yang,XJ, 2004). As one of the most ubiquitous protein modifications, approximately 85% of eukaryotic proteins are N?-terminal acetylated in a co-translational manner on several types of residues such as Serine, Alainine, and so on (Polevoda,B et al., 2000; Polevoda,B et al., 2002). And N?-lysine acetylation is less common, but probably more important. N?-acetylation of proteins in internal lysine residues is an essential and highly reversible type of post-translational modification (PTM), and orchestrates a variety of cellular processes, including transcription regulation (Faiola,F et al., 2005; Brunet,A et al., 2004), DNA repair (Murr,R et al., 2006), apoptosis (Subramanian,C et al., 2005; Cohen,HY et al., 2004), cytokine signaling (Yuan,ZL et al., 2005), and nuclear import (Bannister,AJ et al.,2000), etc. As a ‘loss-of-function’ mechanism proposed, N?-acetylation greatly alters the electrostatic properties of a protein by neutralizing the positive charge of the lysine residues. And formation of hydrogen bonds on lysine side-chains are also disrupted (Yang,XJ, 2004; Yang,XJ,2004b). In addition, lysine acetylation also creates a new interface for protein binding, as a ‘gain-of-function’ mechanism (Yang,XJ, 2004; Yang,XJ,2004b). Thus, N?-acetylation may modulate the protein function, such as of protein-protein interaction, DNA binding, enzymatic activity, stability and subcellular localization (Glozak,MA et al., 2005; Polevoda,B et al., 2002; Yang,XJ, 2004; Faiola,F et al., 2005; Brunet,A et al., 2004; Yuan,ZL et al., 2005; Bannister,AJ et al.,2000; Yang,XJ,2004b). In this work, we present a novel online predictor for protein acetylation sites prediction of PAIL, Prediction of Acetylation on Internal Lysines. We have manually mined scientific literature to collect 249 experimentally verified acetylation sites of 92 distinct proteins. Then the BDM (Bayesian Discriminant Method) algorithm has been employed. The window length of a potential acetylated peptide has been optimized as 13. The accuracy of PAIL is highly encouraging with 85.13%, 87.97% and 89.21% at low, medium and high thresholds, respectively. Both Jack-knife validation and n-fold (6-, 8-, and 10-fold) cross-validation have been performed to show that the PAIL is accurate and robust. In this regard, we propose that PAIL could be a useful tool for experimentalists. And the prediction results of PAIL might also be insightful for further experimental design. For convenience, we have implemented the prediction system in a web server, which is available at: http://bdmpail.biocuckoo.org/. | http://bdmpail.biocuckoo.org/prediction.php | 17045240 |
4 | BRABSB-PHKA | BRABSB-PHKA is an in silico online tool for Prediction of potential Human Lysine(K) Acetylation(PHKA) sites from protein sequences. The computational methodology is based on Bi-Relative Binomial Score Bayes (BRBSB) combined with support vector machines (SVMs). BRBSB-PHKA yields, on average, a sensitivity of 83.91%, a specificity of 87.25% and an accuracy of 85.58% in the case of 5-fold cross validation, together with the results on independent test data sets, suggesting that BRBSB-PHKA presented here can facilitate the identification of human lysine acetylation sites and more confident annotation. BRBSB-PHKA supports two input forms for query sequence(s), directly PASTE a single sequence or several sequences in FASTA format into the input frame or UPLOAD a file in FASTA format from local disk (protein sequences here are all represented in single-letter code amino acids). The sequence part allows any character, figure or space except “>”. The prediction results of BRBSB-PHKA are shown in output table. Sequence name---denotes the name of each query sequence in FASTA format. If no names are provided for query sequences in FASTA format, the system will give them names of “default sequence 1”, “default sequence2”, ……. Position---stands for the absolute position of potentially acetyllysine sites in proteins. Acetylated residue ---refers to corresponding acetylated amino acid. Score ---refers to the predictive probability of acetylation at the corresponding site. Flanking residues---represents the flanking sequence centering on acetylated residue (the length is 15 for BRABSB-PHKA). | http://www.bioinfo.bio.cuhk.edu.hk/bpbphka | 22936054 |
5 | LysAcet | Reversible acetylation on lysine residues, a crucial post-translational modification (PTM) for both histone and non-histone proteins, governs many central cellular processes. Due to limited data and lack of a clear acetylation consensus sequence, little research has focused on prediction of lysine acetylation sites. Incorporating almost all currently available lysine acetylation information, and using the support vector machine (SVM) method along with coding schema for protein sequence coupling patterns, we propose here a novel lysine acetylation prediction algorithm: LysAcet. When compared with other methods or existing tools, LysAcet is the best predictor of lysine acetylation, with K-fold (5- and 10-) and jackknife cross-validation accuracies of 75.89%, 76.73%, and 77.16%, respectively. LysAcet's superior predictive accuracy is attributed primarily to the use of sequence coupling patterns, which describe the relative position of two amino acids. LysAcet contributes to the limited PTM prediction research on lysine epsilon-acetylation, and may serve as a complementary in-silicon approach for exploring acetylation on proteomes. An online web server is freely available at http://www.biosino.org/LysAcet/. | http://www.biosino.org/LysAcet/ | 19689425 |
6 | ASEB | Protein lysine acetylation plays an important role in the normal functioning of cells, including gene expression regulation, protein stability and metabolism regulation. Although large amounts of lysine acetylation sites have been identified via large-scale mass spectrometry or traditional experimental methods, the lysine (K)-acetyl-transferase (KAT) responsible for the acetylation of a given protein or lysine site remains largely unknown due to the experimental limitations of KAT substrate identification. Hence, the in silico prediction of KAT-specific acetylation sites may provide direction for further experiments. In our previous study, we developed the acetylation set enrichment based (ASEB) computer program to predict which KAT-families are responsible for the acetylation of a given protein or lysine site. In this article, we provide KAT-specific acetylation site prediction as a web service. This web server not only provides the online tool and R package for the method in our previous study, but several useful services are also included, such as the integration of protein-protein interaction information to enhance prediction accuracy. This web server can be freely accessed at http://cmbi.bjmu.edu.cn/huac. | http://cmbi.bjmu.edu.cn/huac | 22600735 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | NetChop | NetChop 3.1 Server The NetChop server produces neural network predictions for cleavage sites of the human proteasome. NetChop has been trained on human data only, and will therefore presumably have better performance for prediction of the cleavage sites of the human proteasome. However, since the proteasome structure is quite conserved, we believe that the server is able to produce reliable predictions for at least the other mammalian proteasomes. This server is an update to the Netchop 2.0 server. It has been trained using a novel sequence encoding scheme, and an improved neural network training strategy. The Netchop 3.0 version has two different network methods that can be used for prediction. C-term 3.0 and 20S 3.0. View the version history of this server. All the previous versions are available on line, for comparison and reference. C-term 3.0 network is trained with a database consisting of 1260 publicly available MHC class I ligands (using only C-terminal cleavage site of the ligands). 20S network is trained with in vitro degradation data published in Toes, et al. and Emmerich et al. C-term 3.0 network performs best in predicting the boundaries of CTL epitopes. Another proteasome prediction server is available in Tubingen University: PAProc | http://www.cbs.dtu.dk/services/NetChop/ | 11983929 |
2 | PHOXTRACK | PHOXTRACK (PHOsphosite-X-TRacing Analysis of Causal Kinases) is a computational tool to compare kinase activities between different phosphoproteomes to identify key regulating proteins. In its current version, PHOXTRACK maps quantified phosphopeptides to their putative kinases and tests for concordant changes of kinase activity comparing whole phosphoproteomes. For this purpose, PHOXTRACK searches for an enrichment of known kinase targets in the uploaded phosphoproteomics profile data. PHOXTRACK thus allows for identification of regulated kinase activities between experimental conditions. | http://phoxtrack.molgen.mpg.de/ | 25152232 |
3 | NetCorona | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetCorona/ | 15180906 |
4 | NetPicoRNA | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetPicoRNA/ | 8931139 |
5 | Pcleavage | Antigen processing and presentation are processes that occur within a cell that result in fragmentation (proteolysis) of proteins, association of the fragments with MHC molecules, and expression of the peptide-MHC molecules at the cell surface where they can be recognized by the T cell receptor on a T cell. This lead to the stimulation of CTL cells to clear the infection.The three major step where we can devise rules Degradation of antigens by proteasomes. Transport of peptides fragments through TAP transporter Binding of transported peptides MHC molecules. | http://www.imtech.res.in/raghava/pcleavage/index.html | 15988831 |
6 | PEIMAN | PEIMAN (Posttranslational modification Enrichment, Integration and Matching ANalysis) is a standalone software and platform free for enrichment analysis in post translational modification (PTM) types. The software also provides the comparison between two different lists of proteins, focusing on PTM types. Investigating the PTM frequency in each list is also available. | http://bs.ipm.ir/softwares/PEIMAN/ | 25911152 |
7 | PyTMs | BACKGROUND: Post-translational modifications (PTMs) constitute a major aspect of protein biology, particularly signaling events. Conversely, several different pathophysiological PTMs are hallmarks of oxidative imbalance or inflammatory states and are strongly associated with pathogenesis of autoimmune diseases or cancers. Accordingly, it is of interest to assess both the biological and structural effects of modification. For the latter, computer-based modeling offers an attractive option. We thus identified the need for easily applicable modeling options for PTMs. RESULTS: We developed PyTMs, a plugin implemented with the commonly used visualization software PyMOL. PyTMs enables users to introduce a set of common PTMs into protein/peptide models and can be used to address research questions related to PTMs. Ten types of modification are currently supported, including acetylation, carbamylation, citrullination, cysteine oxidation, malondialdehyde adducts, methionine oxidation, methylation, nitration, proline hydroxylation and phosphorylation. Furthermore, advanced settings integrate the pre-selection of surface-exposed atoms, define stereochemical alternatives and allow for basic structure optimization of the newly modified residues. CONCLUSION: PyTMs is a useful, user-friendly modelling plugin for PyMOL. Advantages of PyTMs include standardized generation of PTMs, rapid time-to-result and facilitated user control. Although modeling cannot substitute for conventional structure determination it constitutes a convenient tool that allows uncomplicated exploration of potential implications prior to experimental investments and basic explanation of experimental data. PyTMs is freely available as part of the PyMOL script repository project on GitHub and will further evolve. Graphical Abstract PyTMs is a useful PyMOL plugin for modeling common post-translational modifications. | http://www.pymolwiki.org/index.php/Pytms | 25431162 |
8 | PTM-X | Post-translational modification (PTM)(1) plays an important role in regulating the functions of proteins. PTMs of multiple residues on one protein may work together to determine a functional outcome, which is known as PTM cross-talk. Identification of PTM cross-talks is an emerging theme in proteomics and has elicited great interest, but their properties remain to be systematically characterized. To this end, we collected 193 PTM cross-talk pairs in 77 human proteins from the literature and then tested location preference and co-evolution at the residue and modification levels. We found that cross-talk events preferentially occurred among nearby PTM sites, especially in disordered protein regions, and cross-talk pairs tended to co-evolve. Given the properties of PTM cross-talk pairs, a naïve Bayes classifier integrating different features was built to predict cross-talks for pairwise combination of PTM sites. By using a 10-fold cross-validation, the integrated prediction model showed an area under the receiver operating characteristic (ROC) curve of 0.833, superior to using any individual feature alone. The prediction performance was also demonstrated to be robust to the biases in the collected PTM cross-talk pairs. The integrated approach has the potential for large-scale prioritization of PTM cross-talk candidates for functional validation and was implemented as a web server available at http://bioinfo.bjmu.edu.cn/ptm-x/. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc. | http://bioinfo.bjmu.edu.cn/ptm-x/ | 25605461 |
9 | PAProC | What are proteasomes ? Proteasomes are cytosolic multisubunit proteases which are involved in cell cycle control, transcription factor activation and the generation of peptide ligands for MHC I molecules (for reviews, see Baumeister et al. (1998), Rock & Goldberg (1999), Uebel & Tampe (1999)). They exist in several forms; either as proteolytically active core complexes or 20S proteasomes and, when associated with the ATP-dependent 19S cap complexes, larger 26S proteasomes that are able to recognize proteins marked by ubiquitin for proteasomal degradation (Jentsch & Schlenker, 1995; Hershko & Ciechanover, 1998). Another protein complex known to associate with the 20S core particle is PA28, the 11S regulator (Ahn et al., 1995), which was shown to improve the yield of antigenic peptides (Groettrup et al., 1996; Dick et al., 1996). Eukaryotic 20S proteasomes consist of four stacked rings (overall stoichiometry alpha7beta7beta7alpha7), each consisting of 7 different subunits (Groll et al., 1997 [See picture taken from this reference at the bottom of the page. The picture shows a section through the cylinder of yeast 20S proteasomes. The positions of the active sites are highlighted through binding of specific inhibitors (yellow).]) . Each of the two inner beta-rings carries three catalytically active sites on its inner surface. Their proteolytic specificities have been described as chymotrypsin-like (cleaving after large, hydrophobic AAs), trypsin-like (cleaving after basic AAs) and peptidyl-glutamyl-peptide-hydrolyzing (cleaving after acidic AAs) (for review, see Uebel & Tampe (1999)). Strings of unfolded proteins are thought to be inserted into the cylinder and to be cut into pieces by the active sites; the resulting peptide fragments are then released into the cytosol. Functionally, proteasomal protein degradation is believed to proceed from one substrate end to the other ("processively"), without the release of large degradation intermediates (Akopian et al., 1997; Nussbaum et al., 1998; Kisselev et al., 1999). Why is proteasomal cleavage specificity important for immune responses? In vertebrate cells, some of the proteolytic fragments produced by proteasomes are fed into the antigen processing machinery (see picture ). Since peptide presentation by MHC I molecules at the cell surface is an intrinsic requirement for the ability of the immune system to eradicate virus-infected or transformed cells (Rammensee et al., 1993; Pamer & Cresswell, 1998), it is of general interest to know exactly how the proteasome is involved in this process. Proteasomal cleavage specificity has been assessed by in vitro digestion experiments using either tri- or tetrapeptides with fluorogenic leaving groups (Kuckelkorn et al., 1995; Heinemeyer et al., 1997; Arendt & Hochstrasser, 1997), peptides of 15-40 AAs (Boes et al., 1994; Niedermann et al., 1995; Niedermann et al., 1996; Dick et al., 1998), or denatured proteins (Dick et al., 1991; Dick et al., 1994; Kisselev et al., 1998, Kisselev et al., 1999) as substrates. We analyzed the cleavage preferences of yeast wild-type and mutant proteasomes in a non-modified protein (Nussbaum et al., 1998). Using statistical analysis of cut sites, it was possible for the first time to determine so-called cleavage motifs, i.e. the preferred sequences around cleavage sites, for the three active beta-subunits of yeast proteasomes. Why would a prediction tool be beneficial? In order to apply experimentally determined information on cleavage site selection by proteasomes to any possible proteasome substrate, one needs an automated prediction device. Such devices already exist for the binding of peptides to MHC I molecules (Database SYFPEITHI , Rammensee et al., 1997) and have been described for peptide transport by the transporter associated with antigen processing (TAP) (Daniel et al., 1998). However, devices for the prediction of proteasomal cleavages are only at the beginning of their development. A proteasomal cleavage prediction tool could, especially in combination with MHC ligand predictors as SYFPEITHI, help to improve the forecast of MHC class I restricted CTL-responses. More specifically, it could support researchers in their quest for individual CTL-epitopes by limiting the number of possible MHC class I ligands from protein antigens. In addition, the effect of amino acid mutations in viral or tumor-specific proteins on antigen presentation could be assessed. Thus, proteasomal cleavage prediction would lend a hand in rational vaccine design. PAProC We have made the first step towards this end by providing PAProC (Prediction Algorithm for Proteasomal Cleavages), a public prediction tool for proteasomal cleavages. PAProC offers information on both the general cleavability of amino acid sequences (cuts per amino acids) and individual cleavages (positions and estimated strength; for details, please refer to the user information). PAProC was developed from the beginning, i.e. from the experimental basis to the ready-to-use public prediction tool, by proteasome experts at the Department of Immunology in close collaboration with programmers at the Department of Biomathematics, both at the University of Tübingen, Germany. We are therefore confident that PAProC has profited from the best possible expertise. However, we are aware of the fact that PAProC is still in its teething stage. For example, cleavage sites and estimated cleavage strength are not yet based on quantified cleavage data (in PAProC I). Therefore, we are continuously working to improve PAProC. However, we need your help: The program will profit from your experience with it. So please let us know how PAProC performed for you. Thank you for your collaboration. | http://paproc.de/ | 11345595 |
10 | ProP | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/ProP/ | 14985543 |
11 | PeptideMap | PeptideMap marks a peptide sequence at every position where a known proteolytic enzyme or reagent might cut it. You can select one or a few enzymes or let PeptideMap use the whole list. PeptideMap is simply the program Map run with -PROGRAMname=PeptideMap. (See the documentation for Map in the Program Manual for a complete description.) | http://prowl.rockefeller.edu/prowl/peptidemap.html | |
12 | ModPred | ModPred is a sequence-based predictor of potential post-translational modification (PTM) sites in proteins. It consists of 34 ensembles of logistic regression models, trained separately on a combined set of 126,036 non-redundant experimentally verified sites for 23 different modifications, obtained from public databases and an ad-hoc literature search. Areas under the ROC curve (AUCs) were estimated to range from ~60 to 97%, depending on the type of PTM. | http://www.modpred.org/ | 24888500 |
13 | ISSPred | In the modern era, process of protein expression is further complexed by the addition of new Post-translational Modification events such as proteolytic cleavage of polyproteins, proteome mediated peptide ligation, non-ribosomal addition of moieties and intein mediated protein splicing. Protein splicing is a recently discovered Post-translational Modification in which one internal fragment, termed intein (Protein introns), is excised from a precursor protein and the flanking regions, termed extein, ligate to form a mature protein. The process of precise intein splicing and formation of specific peptide bonds has been tempting scholars to develop many novel applications. This server is an attempt to help biolgist identify Inteins hiding in their protein sequences. | http://www.imtech.res.in/raghava/isspred/ | |
14 | CarSPred | Introduction: 1.The software CarSPred could be used to identify carbonylation sites of query human protein sequences. 2.The software consists of four modules which are devoted to K, R, T and P carbonylation site prediction separately. 3.It receives protein sequences or file in FASTA format as input. 4.For output result, list and file are optional and the annotations will clearly indicate the precise location and probability of putative carbonylation site in the sequence. 5.The software can also be used to predict carbonylation sites of other mammal proteins to a certain extent due to their close homology with human proteins. 6.The software is in 'CarSPred' folder. Datasets of carbonylated protein and sample sequences of carbonylation site are in 'Datasets' folder. | http://sourceforge.net/projects/hqlstudio/files/CarSPred-1.0/ | 25347395 |
15 | Motifs tree | MOTIVATION: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N(?)-terminal acetylation (N-Ac), two of the most common PTMs. RESULTS: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models. AVAILABILITY AND IMPLEMENTATION: Predictors for IMC and N-Ac and all datasets are freely available at http://terminus.unige.ch/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com. | http://terminus.unige.ch/ | 24681905 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | iMethyl-PseAAC | The web-server iMethyl-PseAAC is a web server that could predict methylation sites in proteins. With the assistance of SVM, the highlight of iMethyl-PseAAC is to employ amino acid sequence features extracted from the sequence evolution information via grey system model (Grey-PSSM). Caveat To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. | http://www.jci-bioinfo.cn/iMethyl-PseAAC | 24977164 |
2 | MeMo | MeMo is the first protein methylation prediction server based on SVM (support vector machine). Limited by available training data, at present MeMo only focuses on Arginine and Lysine sites. Users could submit their protein sequences to predict which arginine and Lysine sites are undergoing methylation. | http://www.bioinfo.tsinghua.edu.cn/~tigerchen/memo.html | 16845004 |
3 | MASA | Studies within the last few years have identified that protein methylation occurring on histones and other proteins are involved in the regulation of gene transcription. Several previous works were developed to computationally identify the potential methylation sites on lysine and arginine. With the investigation in protein tertiary structure, protein methylation site prefers to occur in regions that are easily accessible. However, previous works does not take the solvent accessible surface area (ASA) surrounding the methylation sites into account. Herein, we propose a method named MASA that incorporates support vector machine (SVM) with sequenced and structural characteristics for identifying protein methylation sites on lysine, arginine, glutamate, and asparagine. Because most of experimental methylation sites not have the corresponded protein tertiary structures in Protein Data Bank (PDB), the effective solvent accessible prediction tools was applied to determine the potential ASA values of amino acids in proteins. After the evaluation of the predictive performance based on cross-validation, it demonstrates that the ASA values surrounding the methylation sites can improve the prediction accuracy. Moreover, the independent test shows that the prediction accuracies on methylated lysine and arginine are 80.8% and 85.0%, respectively. Finally, the proposed method is implemented as an effective prediction system for identifying protein methylation sites. | http://MASA.mbc.nctu.edu.tw/ | 19263424 |
4 | BPB-PPMS | Protein methylation is one type of reversible post-translational modifications (PTMs), which plays vital roles in many cellular processes such as transcription activity, DNA repair. Experimental identification of methylation sites on proteins without prior knowledge is costly and time-consuming. In silico prediction of methylation sites might not only provide researches with information on the candidate sites for further determination, but also facilitate to perform downstream characterizations and site-specific investigations. In the present study, a novel approach based on Bi-profile Bayes feature extraction combined with support vector machines (SVMs) was employed to develop the model for Prediction of Protein Methylation Sites (BPB-PPMS) from primary sequence. Methylation can occur at many residues including arginine, lysine, histidine, glutamine, and proline. For the present, BPB-PPMS is only designed to predict the methylation status for lysine and arginine residues on polypeptides due to the absence of enough experimentally verified data to build and train prediction models for other residues. The performance of BPB-PPMS is measured with a sensitivity of 74.71%, a specificity of 94.32% and an accuracy of 87.98% for arginine as well as a sensitivity of 70.05%, a specificity of 77.08% and an accuracy of 75.51% for lysine in 5-fold cross validation experiments. Results obtained from cross-validation experiments and test on independent data sets suggest that BPB-PPMS presented here might facilitate the identification and annotation of protein methylation. Besides, BPB-PPMS can be extended to build predictors for other types of PTM sites with ease. For public access, BPB-PPMS is available at http://www.bioinfo.bio.cuhk.edu.hk/bpbppms. | http://www.bioinfo.bio.cuhk.edu.hk/bpbpp?ms | 19290060 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | NMT | Many posttranslational modifications (N-myristoylation or glycosylphosphatidylinositol (GPI) lipid anchoring) and localization signals (the peroxisomal targeting signal PTS1) are encoded in short, partly compositionally biased regions at the N- or C-terminus of the protein sequence. These sequence signals are not well defined in terms of amino acid type preferences but they have significant interpositional correlations. Although the number of verified protein examples is small, the quantification of several physical conditions necessary for productive protein binding with the enzyme complexes executing the respective transformations can lead to predictors that recognize the signals from the amino acid sequence of queries alone. Taxon-specific prediction functions are required due to the divergent evolution of the active complexes. The big-Pi tool for the prediction of the C-terminal signal for GPI lipid anchor attachment is available for metazoan, protozoan and plant sequences. The myristoyl transferase (NMT) predictor recognizes glycine N-myristoylation sites (at the N-terminus and for fragments after processing) of higher eukaryotes (including their viruses) and fungi. The PTS1 signal predictor finds proteins with a C-terminus appropriate for peroxisomal import (for metazoa and fungi). Guidelines for application of the three WWW-based predictors (http://mendel.imp.univie.ac.at/) and for the interpretation of their output are described. | http://mendel.imp.ac.at/myristate/SUPLpredictor.htm | 12824382 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | NetAcet | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetAcet/ | 15539450 |
2 | N-Ace | N-Ace is a web tool for predicting the protein Acetylation site based on Support Vector Machine (SVM), which is training depend on the amino acid sequence and other structural characteristics, such as accessible surface area, absolute entropy, non-bonded energy, size, amino acid composition, steric parameter, hydrophobicity, volume, mean polarity, electric charge, heat capacity and isoelectric point which is surrounding the modification site and implemented two stages SVM method. | http://N-Ace.mbc.NCTU.edu.tw/ | 20839302 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | GECS | About GECS GECS (Gene Expression to Chemical Structure) is a collection of prediction methods linking genomic or transcriptomic contents of genes to chemical structures of biosynthetic substances. This N-Glycan Prediction Server is based on the repertoire of glycosyltransferases for N-glycan biosynthesis. | http://www.genome.jp/tools/gecs/ | 16159923 |
2 | NetNGlyc | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetNGlyc/ |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | DictyOGlyc | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/DictyOGlyc/ | 10521537 |
2 | YinOYang | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/YinOYang/ | 11928486 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | Oglyc | O-glycosylation is one of the most important, frequent and complex post-translational modifications. This modification can activate and affect protein functions. Here, we present three support vector machines models based on physical properties, 0/1 system, and the system combining the above two features. The prediction accuracies of the three models have reached 0.82, 0.85 and 0.85, respectively. The accuracies of the three SVMs methods were evaluated by 'leave-one-out' cross validation. This approach provides a useful tool to help identify the O-glycosylation sites in mammalian proteins. An online prediction web server is available at http://www.biosino.org/Oglyc. | http://www.biosino.org/Oglyc/ | 16731044 |
2 | NetOGlyc | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetOGlyc/ | 23584533 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | PostMod | PostMod is a predict sever for phosphorylation sites. We develope new predict system soley sequence based approch. We combined physicochemical information ,motif information, and evolutionary information by simply comaparing sequence similarities. Taken together all those features we applied a novel algorithm, indirect relationship based noise-reducing system. This approch is powerful and intuitive to recognize phosphorylation sites. Moreover, our method can be generally applicable to predict other types of PTMs | http://pbil.kaist.ac.kr/PostMod | 20122181 |
2 | PlantPhos | Protein phosphorylation is the most widespread and well-studied post-translational modification in eukaryotic cells. It is one of the most prevalent intracellular protein modifications that influence numerous cellular processes (Steen, Jebanathirajah et al. 2006). It has been estimated that one-third to one-half of all proteins in a eukaryotic cell are phosphorylated (Hubbard and Cohen 1993). Furthermore, protein phosphorylation, catalyzed by specific kinases, plays crucial regulatory roles in intracellular signal transduction. The networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately affect transcriptional changes (Steffen, Petti et al. 2002). An estimated 1 to 3% of functional eukaryotic genes encode protein kinases, suggesting that they are involved in many aspects of cellular regulation and metabolism (Stone and Walker 1995). However, a full understanding of the mechanism of intracellular signal transduction remains a major challenge in cellular biology. Protein phosphorylation is an important post-translational modification that regulates various cellular processes not only in humans but also in plants. It is reported that the regulation of carbon and nitrogen metabolism in plants is driven by phosphorylation (Diolez, Kesseler et al. 1993). Phosphorylation is involved in modulating a sucrose phosphate synthase enzyme which controls the signaling pathway for the process of sucrose synthesis from carbon in plants (Huber 2007). Phosphorylation is also involved in modulating the plant process of synthesizing Ammonia, an organic compound which is required to give energy to certain organs which are not able to photosynthesize (Huber 2007). Furthermore, although not yet fully studied, it appears that phosphorylation is also involved in the process of plant growth and plant response to stress (Luan 2002; Huber 2007) . Stone et al. have identified part of the plant kinases; however, the precise functional roles of specific protein kinases were elucidated for only a few (Stone and Walker 1995). | http://csb.cse.yzu.edu.tw/PlantPhos/ | 21703007 |
3 | PKIS | The increasingly huge gap of kinase-specific phosphorylation data hampers the reconstruction of signal transduction networks. Existing experimental methods and computational phosphorylation sites (P-sites) predictions tools have various limitations in addressing this problem. Here, based on the latest version of Phopho.ELM (9.0), a novel kinase identification web server, PKIS, incorporating support vector machines (SVMs) with the composition of monomer spectrum (CMS) is used to assign protein kinase for experimentally verified P-sites of human in high specificity, no less than 99%. Comparisons with the well-known P-sites prediction tools, such as KinasePhos 2.0, Musite and GPS2.1, show that the PKIS are more competitive on identifying associated protein kinases for P-sites, which suggests that it is critical to design the kinase assignment algorithm. In addition, application of the PKIS on human phosphoproteomes identified corresponding kinases for tens of thousands of P-sites. These predicted results are significant in encoding the signal networks of human. It is anticipated that PKIS may become a valuable bioinformatics tool for identifying the novel signal pathways or even drug development. | http://bioinformatics.ustc.edu.cn/pkis/ | 23941207 |
4 | pkaPS | BACKGROUND: Protein kinase A (cAMP-dependent kinase, PKA) is a serine/threonine kinase, for which ca. 150 substrate proteins are known. Based on a refinement of the recognition motif using the available experimental data, we wished to apply the simplified substrate protein binding model for accurate prediction of PKA phosphorylation sites, an approach that was previously successful for the prediction of lipid posttranslational modifications and of the PTS1 peroxisomal translocation signal. RESULTS: Approximately 20 sequence positions flanking the phosphorylated residue on both sides have been found to be restricted in their sequence variability (region -18...+23 with the site at position 0). The conserved physical pattern can be rationalized in terms of a qualitative binding model with the catalytic cleft of the protein kinase A. Positions -6...+4 surrounding the phosphorylation site are influenced by direct interaction with the kinase in a varying degree. This sequence stretch is embedded in an intrinsically disordered region composed preferentially of hydrophilic residues with flexible backbone and small side chain. This knowledge has been incorporated into a simplified analytical model of productive binding of substrate proteins with PKA. CONCLUSION: The scoring function of the pkaPS predictor can confidently discriminate PKA phosphorylation sites from serines/threonines with non-permissive sequence environments (sensitivity of appoximately 96% at a specificity of approximately 94%). The tool "pkaPS" has been applied on the whole human proteome. Among new predicted PKA targets, there are entirely uncharacterized protein groups as well as apparently well-known families such as those of the ribosomal proteins L21e, L22 and L6. AVAILABILITY: The supplementary data as well as the prediction tool as WWW server are available at http://mendel.imp.univie.ac.at/sat/pkaPS. REVIEWERS: Erik van Nimwegen (Biozentrum, University of Basel, Switzerland), Sandor Pongor (International Centre for Genetic Engineering and Biotechnology, Trieste, Italy), Igor Zhulin (University of Tennessee, Oak Ridge National Laboratory, USA). | http://mendel.imp.ac.at/sat/pkaPS/ | 17222345 |
5 | PhosphoSVM | Phosphorylation is the most essential post-translational modification in eukaryotes and in particular plays a crucial role in a wide range of cellular processes. While, experiments on phosphorylation site discovery are time consuming and expensive to perform. Therefore, computational prediction methods becomes more popular as an important complementary approach in protein phosphorylation site study. The prediction tools can be grouped into two categories: Kinase-specific and non-kinase-specific tools. A kinase-specific prediction program requires as input both a protein sequence and the type of a kinase, and produces some measure of the likelihood that each S/T/Y residue in the sequence is phosphorylated by the chosen kinase. In contrast, a non-kinase-specific prediction tool requires only a protein sequence as input, and reports the likelihood that each S/T/Y residue is phosphorylated by any possible kinase. Non-kinase-specific tools may be able to detect phosphorylation sites for which the associated kinase is unknown or the number of known substrate sequences of the associated kinase is few. With the development of sequencing technology, there is an increase demand for non-kinase-specific tools, but the current state for them is not satisfying in both quality and quantity. In this work, we developed a non-kinase-specific protein phosphorylation site prediction method that uses random forest classifier to integrate nine different sequence level scores. These sequence-based features are Shannon entropy (SE), relative entropy (RE), predicted protein secondary structure (SS), predicted protein disorder (PD), accessible surface area (ASA), overlapping properties (OP), averaged cumulative hydrophobicity (ACH), and k-nearest neighbor (KNN). By carefully optimized parameter and sliding window size, our method achieved AUC values 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites in animals in a ten-fold cross-validation. | http://sysbio.unl.edu/PhosphoSVM/ | 24623121 |
6 | PhosphoRice | PhosphoRice,a meta-predictor of rice-specific phosphorylation site, was constructed by integrating the newly phosphorylation sites predictors, NetPhos2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos with parameters selected by restricted grid search and random search. It archieve an increase in MCC of 7.1%, and an increase in ACC of 4.6% than that of the best element predictor (Disphos_default), respectively. | https://github.com/PEHGP/PhosphoRice | 22305189 |
7 | PhosphoPICK | We're a bioinformatics group at the University of Queensland, Australia. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. Recent applications involve protein sorting, nuclear protein organisation, mechanisms of transcriptional regulation, sequence and structure determinants of protein function and modification, and protein engineering. | http://bioinf.scmb.uq.edu.au/phosphopick/phosphopick | 25304781 |
8 | PHOSITE | SUMMARY: The prediction of significant short functional protein sequences has inherent problems. In predicting phosphorylation sites, problems came from the shortness of phosphorylation sites, the difficulties in maintaining many different predefined models of binding sites, and the difficulties of obtaining highly sensitive predictions and of obtaining predictions with a constant sensitivity and specificity. The algorithm presented in this paper overcomes these problems. The proposed algorithm PHOSITE is based on the case-based sequence analysis. This enables the prediction of phosphorylation sites with constant specificity and sensitivity. Furthermore, this method leads not only to the prediction of phosphorylation sites in general but also predicts the most probable type of kinase involved. AVAILABILITY: The tool PHOSITE implementing the presented method can be evaluated under the website http://www.phosite.com. | http://www.phosite.com | 15297298 |
9 | PHOSFER | MOTIVATION: Phosphorylation is the most important post-translational modification in eukaryotes. Although many computational phosphorylation site prediction tools exist for mammals, and a few were created specifically for Arabidopsis thaliana, none are currently available for other plants. RESULTS: In this article, we propose a novel random forest-based method called PHOSFER (PHOsphorylation Site FindER) for applying phosphorylation data from other organisms to enhance the accuracy of predictions in a target organism. As a test case, PHOSFER is applied to phosphorylation sites in soybean, and we show that it more accurately predicts soybean sites than both the existing Arabidopsis-specific predictors, and a simpler machine-learning scheme that uses only known phosphorylation sites and non-phosphorylation sites from soybean. In addition to soybean, PHOSFER will be extended to other organisms in the near future. | http://saphire.usask.ca/saphire/phosfer/index.html | 23341503 |
10 | PPRED | One of the most critical cellular phenomenon is phosphorylation of proteins as it is involved in signal transduction in various processes including cell cycle, proliferation and apoptosis. This phenomenon is catalyzed by protein kinases that affect certain acceptor residues (Serine, Threonine and Tyrosine) in substrate sequences. Experiments by 2D-gel electrophoresis indicate that 30-50% of the proteins in an eukaryotic cell undergo phosphorylation. So, accurate prediction of the phosphorylation sites of eukaryotic proteins will help in understanding the overall intracellular events. Both experimental and computational methods have been developed to investigate the phosphorylation sites. In vivo and in vitro methods are often time-consuming, expensive and even limited by the restriction of enzymatic reactions. On the other hand, in silico prediction of phosphorylation sites from computational approaches can afford fast and automatic annotation for candidate phosphorylation sites which eventually will be an important breakthrough in many aspects of current molecular biology and very helpful for disease-related research and drug design. We have developed a prediction system (PPRED) that incorporates the evolutionary information of proteins to train the SVMs, which is applicable to predict accurately the phosphorylation sites from given protein sequences and to analysis the importance of such information to devise generalized prediction systems. | biomecis.uta.edu/~ashis/res/ppred/ | 20492656 |
11 | PPSP | As a reversible and dynamic post-translational modification of proteins, phosphorylation plays an essential regulatory role in a broad spectrum of the biological cellular processes. Conventional experimental identifications of protein kinase (PK)-specific phosphorylation sites on substrates in vivo and in vitro have provided the foundation of understanding the mechanisms of phosphorylation dynamics. However, these experiments are often time-consuming and expensive. And the enzymatic activity of the PKs are usually diminished or impeded in vitro, hampering on the studies of phosphorylation greatly. With regard of this, it is of note that the in silico prediction of PK-specific phosphorylation sites is urgent need for the further experimental manipulation. In this work, we presented a novel, versatile and comprehensive program, PPSP (Prediction of PK-specific Phosphorylation site), deployed with approach of Bayesian decision theory. With the unambiguous experimental verified training data set, PPSP could predict the bona fide phosphorylation sites accurately for 68 PK groups. | http://ppsp.biocuckoo.org/ | 16549034 |
12 | Predikin | Predikin is a system to predict substrate specificity of protein kinases. Some of the things Predikin can be used for include, Predict the most likely phosphorylation site for a specific protein kinase. Predict the most likely protein kinase for a phosphorylation site. Make predictions about WHOLE proteomes. For full details, please refer to the published articles on Predikin. There is also information in the documentation pages. If you experience any difficulties using Predikin, please contact us (we'd also like to hear from you if you have suggestions about improvements to Predikin or this website). | http://predikin.biosci.uq.edu.au/ | 18477637 |
13 | PredPhospho | MOTIVATION: Phosphorylation is involved in diverse signal transduction pathways. By predicting phosphorylation sites and their kinases from primary protein sequences, we can obtain much valuable information that can form the basis for further research. Using support vector machines, we attempted to predict phosphorylation sites and the type of kinase that acts at each site. RESULTS: Our prediction system was limited to phosphorylation sites catalyzed by four protein kinase families and four protein kinase groups. The accuracy of the predictions ranged from 83 to 95% at the kinase family level, and 76-91% at the kinase group level. The prediction system used-PredPhospho-can be applied to the functional study of proteins, and can help predict the changes in phosphorylation sites caused by amino acid variations at intra- and interspecies levels. | http://www.ngri.re.kr/ proteo/PredPhospho.htm | 15231530 |
14 | PSEA | Protein phosphorylation catalysed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of kinase-specific phosphorylation sites and disease-related phosphorylation substrates that have been identified, the desire to explore the regulatory relationship between protein kinases and disease-related phosphorylation substrates is motivated. In this work, we analysed the kinases’ characteristic of all disease-related phosphorylation substrates by using our developed Phosphorylation Set Enrichment Analysis (PSEA) method. We evaluated the efficiency of our method with independent test and concluded that our approach is helpful for identifying kinases responsible for phosphorylated substrates. In addition, we found that Mitogen-activated protein kinase (MAPK) and Glycogen synthase kinase (GSK) families are more associated with abnormal phosphorylation. It can be anticipated that our method might be helpful to identify the mechanism of phosphorylation and the relationship between kinase and phosphorylation related diseases. | http://bioinfo.ncu.edu.cn/PKPred_Home.aspx | 24681538 |
15 | PTMPred | Recent efforts to develop a universal view of complex networks have created both excitement and confusion about the way in which knowledge of network structure can be used to understand, control, or design system behavior. This paper offers perspective on the emerging field of “network science” in three ways. First, it briefly summarizes the origins, methodological approaches, and most celebrated contributions within this increasingly popular field. Second, it contrasts the predominant perspective in the network science literature (that abstracts away domain-specific function and instead focuses on graph-theoretic measures of system structure and dynamics) with that of engineers and practitioners of decision science (who emphasize the importance of network performance, constraints, and trade-offs). Third, it proposes optimizationbased reverse engineering to address some important open questions within network science from an operations research perspective. We advocate for increased, yet cautious, participation in this field by operations researchers. | http://doc.aporc.org/wiki/PTMPred | 24291233 |
16 | RLIMS-P | RLIMS-P is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from biomedical literature (Hu et al., 2005). RLIMS-P currently works on PubMed abstracts and open access full text articles. | http://research.bioinformatics.udel.edu/rlimsp/ | 25122463 |
17 | ViralPhos | ViralPhos is a web server for identifying potential virus phosphorylation sites with substrate motifs. Phosphorylation of virus proteins is linked to viral replication, which leads to an inhibition of normal host-cell functions. This has motivated the field to further elucidate the process of phosphorylation in viral proteins. However, few studies have investigated substrate motifs in identifying virus phosphorylation sites. Additionally, mass spectrometry-based experiments used to investigate such tend to be time-consuming and labor-intensive. 329 experimentally verified phosphorylation fragments on 111 virus proteins were collected from virPTM. These were clustered into subgroups of significantly conserved motifs using a recursively statistical method. Two-layered Support Vector Machines (SVMs) is then applied to train a predictive model for the identified substrate motifs. The SVM models are evaluated using a five-fold cross validation which yields an average accuracy of 0.86 for serine, and 0.81 for threonine. Furthermore, the proposed method is shown to perform at par with three other phosphorylation site prediction tools: PPSP, KinasePhos 2.0 and GPS 2.1. In this study, we propose a computational method, ViralPhos, which aims to investigate virus substrate site motifs and identify potential phosphorylation sites on virus proteins. We identified informative substrate motifs that matched with several well-studied kinase groups as potential catalytic kinases for virus protein substrates. The identified substrate motifs were further exploited to identify potential virus phosphorylation sites. | http://csb.cse.yzu.edu.tw/ViralPhos/ | 24564381 |
18 | DISPHOS | DISPHOS computationally predicts serine, threonine and tyrosine phosphorylation sites in proteins. The new version of the predictor (DISPHOS 1.3) was trained on over 2000 non-redundant experimentally confirmed protein phosphorylation sites (1,079 Serine sites, 666 Threonine sites, and 375 Tyrosine sites). The new set of phosphorylation sites was augmented using the entries from SwissProt R44, Phospho.ELM database, and literature. The observation that amino acid composition, sequence complexity, hydrophobicity, charge and other sequence attributes of regions adjacent to phosphorylation sites are very similar to those of intrinsically disordered protein regions suggests that disorder in and around the potential phosphorylation target site is an important prerequisite for phosphorylation. Thus, DISPHOS uses disorder information to improve the discrimination between phosphorylation and non-phosphorylation sites. The accuracy of DISPHOS reaches 81.3% +/- 2.2% for Serine, 74.8% +/- 2.5% for Threonine, and 79.0% +/- 2.4% for Tyrosine. The application of DISPHOS to ordered and disordered protein regions, as well as to various functional protein categories and proteomes provides strong support for the hypothesis that protein phosphorylation predominantly occurs in regions of intrinsic disorder. Executable version of DISPHOS 1.3 was developed in collaboration with Molecular Kinetics, Inc. This predictor is also available on the Molecular Kinetics website: http://www.pondr.com | http://www.dabi.temple.edu/disphos/ | 14960716 |
19 | KinomeXplorer | KinomeXplorer is an integrated framework for modeling kinase-substrate interactions and aid in the design of inhibitor-based follow-up perturbation experiments. An interactive web interface allows investigation of predicted kinase-substrate interactions from human and major eukaryotic model organisms. | http://kinomexplorer.info/ | 24874572 |
20 | PhoScan | Protein phosphorylation plays important roles in a variety of cellular processes. Detecting possible phosphorylation sites and their corresponding protein kinases is crucial for studying the function of many proteins. This article presents a new prediction system, called PhoScan, to predict phosphorylation sites in a kinase-family-specific way. Common phosphorylation features and kinase-specific features are extracted from substrate sequences of different protein kinases based on the analysis of published experiments, and a scoring system is developed for evaluating the possibility that a peptide can be phosphorylated by the protein kinase at the specific site in its sequence context. PhoScan can achieve a specificity of above 90% with sensitivity around 90% at kinase-family level on the data experimented. The system is applied on a set of human proteins collected from Swiss-Prot and sets of putative phosphorylation sites are predicted for protein kinase A, cyclin-dependent kinase, and casein kinase 2 families. PhoScan is available at http://bioinfo.au.tsinghua.edu.cn/phoscan/. | http://bioinfo.au.tsinghua.edu.cn/phoscan/ | 17680694 |
21 | AMS | We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing. | https://code.google.com/p/automotifserver/ | 22555647 |
22 | CKSAAP_PhSite | As one of the most widespread protein post-translational modifications, phosphorylation is involved in many biological processes such as cell cycle, apoptosis. Identification of phosphorylated substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of phosphorylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of phosphorylation sites is much desirable due to their convenience and fast speed. In this paper, a new bioinformatics tool named CKSAAP_PhSite was developed that ignored the kinase information and only used the primary sequence information to predict protein phosphorylation sites. The highlight of CKSAAP_PhSite was to utilize the composition of k-spaced amino acid pairs as the encoding scheme, and then the support vector machine was used as the predictor. The performance of CKSAAP_PhSite was measured with a sensitivity of 84.81%, a specificity of 86.07% and an accuracy of 85.43% for serine, a sensitivity of 78.59%, a specificity of 82.26% and an accuracy of 80.31% for threonine as well as a sensitivity of 74.44%, a specificity of 78.03% and an accuracy of 76.21% for tyrosine. Experimental results obtained from cross validation and independent benchmark suggested that our method was very promising to predict phosphorylation sites and can be served as a useful supplement tool to the community. For public access, CKSAAP_PhSite is available at http://59.73.198.144/cksaap_phsite/. | http://59.73.198.144/cksaap_phsite/ | 23110047 |
23 | CRPhos | Welcome to the pTools webserver. This website is a joint development by the Centre for Proteome Analysis and the Intelligent Systems Lab, at the University of Antwerp. Here we present in-house developed tools for protein and proteome (data) analysis. Downloadable codes is shared here whenever a project is considered sufficiently mature, in the meanwhile these pages give you some overview of the current status of some projects. Though the proteomics field is rapidly evolving, data analysis is still a major bottleneck in proteome analysis. Sharing data, databases and tools among the research community is one of our goals. The pTools website has a sister site, called pData, on which we share experimental proteomic datasets. | http://www.ptools.ua.ac.be/CRPhos | 18940828 |
24 | DAPPLE | DAPPLE represents an alternative method (to machine-learning approaches) to predicting phosphorylation sites in an organism of interest. It is a pipeline involving BLAST searches that uses experimentally-determined phosphorylation sites in one organism (or several organisms) to predict phosphorylation sites in an organism of interest. It outputs a table in tab-deliminated text format (which can also be easily imported into a spreadsheet program like Excel), which contains various information helpful for choosing phosphorylation sites that are of interest to you, such as the number of sequence differences between the query site and the hit site, the location of the query site and the hit site in their respective intact proteins, whether the corresponding intact proteins are reciprocal BLAST hits (and thus predicted orthologues), and so on. The following is a web interface to DAPPLE. If you would instead like to run DAPPLE on your own machine, you may download it here. This .zip file includes instructions for setting up DAPPLE. | http://saphire.usask.ca/saphire/dapple/index.html | 23658419 |
25 | GPS | Protein phosphorylation is the most ubiquitous post-translational modification (PTM), and plays important roles in most of biological processes. Identification of site-specific phosphorylated substrates is fundamental for understanding the molecular mechanisms of phosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources of protein phosphorylation, including phosphorylation databases, prediction of non-specific or organism-specific phosphorylation sites, prediction of kinase-specific phosphorylation sites or phospho-binding motifs, and other tools. A testing data set taken from four high throughput experiments is available at: Comparison_data. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue or Yongbo Wang to add, remove or update one or multiple web links below. | http://gps.biocuckoo.org/ | 21062758 |
26 | HMMpTM | During the last decades a large number of computational methods have been developed for predicting transmembrane protein structure and topology. Current predictors rely on two topogenic signals in the protein sequence: the distribution of positively charged residues in extra-membrane loops and the existence of N-terminal signals. However, phosphorylation and glycosylation are post-translational modifications (PTMs) that occur in a compartment-specific manner and therefore the presence of a phosphorylation or glycosylation site in a transmembrane protein provides topological information. Here we report a Hidden Markov Model based method capable of predicting the topology of transmembrane proteins and the existence of kinase specific phosphorylation and N/O-linked glycosylation sites across the protein sequence. Our method integrates a novel feature in transmembrane protein topology prediction which results in improved performance for topology prediction and reliable prediction of phosphorylation and glycosylation sites when compared to currently available predictors. | http://aias.biol.uoa.gr/HMMpTM/ | 24225132 |
27 | KinasePhos | Protein phosphorylation is an important reversible mechanism in post-translational modifications of proteins, and it affects a lot of kinds of essential cellular processes. Due to the importance of protein phosphorylation in cellular control, there are many schemes and models to predict the catalytic kinase-specific phosphorylation sites. Most of methods are based on the consensus sequences of position probabilities, just like our previous version KinasePhos 1.0, which is also a web server based on the consensus. The known phosphorylation sites from public domain data sources are categorized by their annotated protein kinases. In the previous version, feature based on the profile hidden Markov model, and computational models are learned from the kinase-specific groups of the phosphorylation sites. After evaluating the learned models, the model with highest accuracy was selected from each kinase-specific group, for using in a web-based prediction tool for identifying protein phosphorylation sites. It is a kinase-specific phosphorylation site prediction tool with both high sensitivity and specificity. Moreover, the current release of KinasePhos, version 2.0, adapts the sequence-based amino acid coupling-pattern analysis and solvent accessibility as new features for SVM (support vector machine) to characterize the phosphorylation site. The feature of coupling-pattern [XdZ] denotes the amino acid coupling-pattern of amino acid types X and Z that are separated by d amino acids. We use the coupling strength CXdZ defined by coupling-pattern analysis, and we compute the differences between positive and negative set of phosphorylation proteins. We select the features which are top 250 differences of CXdZ. Then build SVM (support vector machine) to build the models and performed the cross validation. It is about 95% prediction accuracy that using this prediction model and gets 7% more improvement than previous version. Compared with other tools, the special features chosen for SVM model-building produces the best prediction so far. | http://kinasephos2.mbc.nctu.edu.tw/ | 17517770 |
28 | MetaPredPS | Remarkable morphological anomalies were observed in a female of Hoplopleura capitosa found on Mus musculus caught in Niemirówek, the Tomaszów district (Poland). The anomalies concerned the shape and chaetotaxis of some parapleural plates on the abdomen, constitute one of the basic taxonomical features of Anoplura. | http://metapred.biolead.org/MetaPredPS/ | 1823471 |
29 | Musite | To address the various limitations of current tools when applying to proteomes and to better utilize the large magnitude of experimentally verified phosphorylation sites, we developed a unique standalone application system Musite, specifically designed for large-scale prediction of both general and kinase-specific phosphorylation sites. Musite utilized local sequence similarity patterns (KNN scores) and generic features (disorder scores and amino acid frequencies) of phosphorylation sites, and employed a comprehensive machine learning approach to make predictions. Musite is the first tool that provides utility for training a phosphorylation-site prediction model from users' own data and supports continuous adjustment of stringency levels. Musite provides a user-friendly graphic user interface, which makes it easy for biologists to perform predictions in an automated fashion. Applications of Musite on six proteomes yielded tens of thousands of putative phosphorylation sites with high stringency. These predictions provide useful hypotheses for experimental validations. Cross-validation tests show that Musite significantly outperforms existing tools for predicting general phosphorylation sites and is at least comparable to those for predicting kinase-specific phosphorylation sites. Moreover, as an open-source software, Musite can be also served as an open platform for building machine learning application for phosphorylation-site prediction. | http://musite.sourceforge.net/ | 20702892 |
30 | NetPhorest | KinomeXplorer is an integrated framework for modeling kinase-substrate interactions and aid in the design of inhibitor-based follow-up perturbation experiments. An interactive web interface allows investigation of predicted kinase-substrate interactions from human and major eukaryotic model organisms. | http://netphorest.info/ | 18765831 |
31 | NetPhos | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetPhos/ | 10600390 |
32 | NetPhosK | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetPhosK/ | 15174133 |
33 | NetPhosYeast | The Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation. | http://www.cbs.dtu.dk/services/NetPhosYeast/ | 17282998 |
34 | phos_pred | Reversible protein phosphorylation is one of the most important post-translational modifications, which regulates various biological cellular processes. Identification of the kinase-specific phosphorylation sites is helpful for understanding of phosphorylation mechanism and regulation processes. Although a number of computational approaches have been developed, currently few studies have concerned about hierarchical structures of kinases and most of the existing tools use only local sequence information to construct predictive models. In this work, we conduct a systematic and hierarchy-specific investigation of protein phosphorylation site prediction in which protein kinases are clustered into hierarchical structures with four levels including kinase, subfamily, family and group. To enhance phosphorylation site prediction at all hierarchical levels, functional information of proteins, including gene ontology (GO) and protein-protein interaction (PPI), is adopted in addition to primary sequence to construct prediction models based on random forest (RF). Analysis of selected GO and PPI features shows that functional information is critical in determining protein phosphorylation sites for every hierarchical level. Furthermore, the prediction results of Phospho.ELM and additional testing dataset demonstrate that the proposed method remarkably outperforms existing phosphorylation prediction methods at all hierarchical levels.The proposed method is freely available at http://bioinformatics.ustc.edu.cn/phos_pred/. | http://bioinformatics.ustc.edu.cn/phos_pred/ | 24452754 |
35 | Phos3D | Phos3D is a web server for the prediction of phosphorylation sites (P-sites) in proteins, originally designed to investigate the advantages of including spatial information in P-site prediction. The approach is based on Support Vector Machines trained on sequence profiles enhanced by information from the spatial context of experimentally identified P-sites. In addition to serine, threonine, and tyrosine P-sites, Phos3D is capable to predict kinase-specific phosphorylations by the serine kinases PKA, PKC, MAPK, and CKII, as well as by the tyrosine kinase SRC. The quality of predictions is greatly dependent on the quality of submitted protein structures. | http://phos3d.mpimp-golm.mpg.de/ | 19383128 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | PrePS | PrePS stands for Prenylation Prediction Suite and combines three predictors for protein CaaX farnesylation, CaaX geranylgeranylation and Rab geranylgeranylation in one webinterface. The predictors aim to model the substrate-enzyme interactions based on refinement of the recognition motifs for each of the prenyltransferases. Motif information has been extracted from sets of known substrates (learning sets) and specific scoring functions have been created utilizing both sequence as well as physical property profiles including interpositional correlations to account for partially overlapping substrate specificities. The PrePS selectively assigns the modifying enzyme to predicted substrate proteins and sensitively filters out false positive predictions based on the methodology that has already been applied successfully for the prediction of GPI-anchors, myristoylation and PTS1 peroxisomal targeting. | http://mendel.imp.univie.ac.at/sat/PrePS | 15960807 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | GPS-PUP | The Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium. | http://pup.biocuckoo.org/ | 21850344 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | GSTPred | GSTpred is a web-server specially trained for the Glutathione S-transferase protein.The prediction is based on the basis of amino acid composition, dipeptide composition, tripeptide composition by using support vector machines(SVM).The prediction result will be displayed on web browser in tabular form with score. Our model predict GSTs proteins with very high accuracy. During our study we may achieved accuracy 91.59% for peptide composition, 95.79% for dipeptide compostion and 97.66% for tripeptide composition model. We developed user friendly webserver where user can submit there sequence (directly paste sequence in box or upload sequence file) and select the option for simple composition, dipeptide composition, tripepttide composition and threshold. After some time result will dispalyed on the terminal in a tabular fom with name and score of each sequence. We also provide suplementary dataset and standalone version of GSTPred sotware without any charge. User can download this standalone software and our data set on local system. | http://www.imtech.res.in/raghava/gstpred/ | 17627599 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | GPS-SNO | The Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium. | http://sno.biocuckoo.org/ | 20585580 |
2 | iSNO-AAPair | The web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped. | http://app.aporc.org/iSNO-AAPair/index.html | 24109555 |
3 | iSNO-PseAAC | The web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped. | http://app.aporc.org/iSNO-PseAAC/ | 23409062 |
4 | PSNO | S-nitrosylation (SNO) is one of the most universal reversible post-translational modifications involved in many biological processes. Malfunction or dysregulation of SNO leads to a series of severe diseases, such as developmental abnormalities and various diseases. Therefore, the identification of SNO sites (SNOs) provides insights into disease progression and drug development. In this paper, a new bioinformatics tool, named PSNO, is proposed to identify SNOs from protein sequences. Firstly, we explore various promising sequence-derived discriminative features, including the evolutionary profile, the predicted secondary structure and the physicochemical properties. Secondly, rather than simply combining the features, which may bring about information redundancy and unwanted noise, we use the relative entropy selection and incremental feature selection approach to select the optimal feature subsets. Thirdly, we train our model by the technique of the k-nearest neighbor algorithm. Using both informative features and an elaborate feature selection scheme, our method, PSNO, achieves good prediction performance with a mean Mathews correlation coefficient (MCC) value of about 0.5119 on the training dataset using 10-fold cross-validation. These results indicate that PSNO can be used as a competitive predictor among the state-of-the-art SNOs prediction tools. A web-server, named PSNO, which implements the proposed method, is freely available at http://59.73.198.144:8088/PSNO/. | http://59.73.198.144:8088/PSNO/ | 24968264 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | iSuc-PseAAC | The web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped. | http://app.aporc.org/iSuc-PseAAC/ | 26084794 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | JASSA | SUMOylation is a post-translational modification conserved from yeast to human that modulates several fundamental cellular processes, and have been showed to be involved in human disorders. It consists of the covalent attachment of a small ubiquitin-related modifier protein (SUMO) to a target protein by a mechanism similar to that of ubiquitination. (Melchior, 2000; Geiss-Friedlander and Melchior, 2007). There are at least three SUMO isoforms (SUMO1,2,3) in mammalian cells but only one in Saccharomyces cerevisiae (Smt3). This post-translational modification involves a cascade of SUMO-specific enzymes (for a review, see (Johnson, 2004; Kerscher et al., 2006; Martin et al., 2007)). First, SUMO is proteolytically cleaved to expose the internal diglycine motif required for conjugation. Then SUMO is activated in an ATP-dependent manner by the heterodimeric SUMO activating enzyme SAE1/SAE2. SUMO is transferred to the conjugating enzyme Ubc9 and is conjugated to the target substrate protein. This process can be enhanced by involvement of growing number of E3 ligases. SUMO peptides is reversibly covalent conjugated onto an acceptor lysine residue (K) of the substrate which often lies within the consensus sequence ?KxE (where ? is an hydrophobic residue and x any amino acid; Melchior, 2000; Rodriguez et al., 2001). It should be noted that inverted SUMOylation motif ([E/D]xK?) have been reported few years ago (Matic et al., 2010) (Table 1). Extended SUMO consensus motifs, like consensus inverted, PDSM (phosphorylation-dependent SUMOylation motif) (Gregoire et al., 2006; Hietakangas et al., 2006; Shalizi et al., 2006), NDSM (negatively charged amino acid-dependent SUMOylation site) (Yang et al., 2006), SUMO-acetyl switch (Stankovic-Valentin et al., 2007) and HCSM (hydrophobic cluster SUMOylation motif) (Matic et al., 2010; Hietakangas et al., 2006) also have been reported (for a review, see (Martin et al., 2007)). Moreover, elements flanking the core motif (such as acidic, phosphorylatable or proline residues) may impact on SUMO conjugation (Table 1) (Gareau and Lima, 2011). Notably, about 25% of experimentally validated SUMOylated sites do not match with any of these motifs. Additionally, not all sites that adhere to the consensus are modified, likely because SUMO is conjugated only to residues appropriately presented to the SUMOylation machinery. SUMO can also interact non-covalently with proteins harboring SUMO-interacting motifs (SIM), also known as SUMO-binding domains (SBDs) or motifs (SBMs), typically consisting of a hydrophobic core (Kerscher, 2007) (Table 1). Negatively charged patches of residues flanking the SIM may contribute to the orientation and/or the isoforme-specificity binding (Hannich et al., 2005; Hecker et al., 2006) and an implication of the phosphorylation near the SIM, named phosphoSIM, have been reported (Stehmeier and Muller, 2009). | http://www.jassa.fr/ | 26142185 |
2 | SUMOplot | The SUMOplot™ Analysis Program predicts and scores sumoylation sites in your protein. The presence of this post-translational modification may help explain larger MWs than expected on SDS gels due to attachment of SUMO protein (11kDa) at multiple positions in your protein. SUMO-1 (small ubiquitin-related modifier; also known as PIC1, UBL1, Sentrin, GMP1, and Smt3) is a member of the ubiquitin and ubiquitin-like superfamily. Most SUMO-modified proteins contain the tetrapeptide motif B-K-x-D/E where B is a hydrophobic residue, K is the lysine conjugated to SUMO, x is any amino acid (aa), D or E is an acidic residue. The SUMOplot™ Analysis Program predicts the probability for the SUMO consensus sequence (SUMO-CS) to be engaged in SUMO attachment. The SUMOplot™ score system is based on two criteria: direct amino acid match to SUMO-CS. substitution of the consensus amino acid residues with amino acid residues exhibiting similar hydrophobicity. | http://www.abgent.com/sumoplot | |
3 | SUMOhydro | SUMOhydro has been developed to predict sumoylation lysine (K) sites in proteins by introduction of hydrophobicity to binary encoding. With the assistance of Support Vector Machine (SVM)(http://svmlight.joachims.org/), the predictor was trained and tested in a new and stringent sumoylation sites dataset. The proposed SUMOhydro has been proved to be more powerful than the traditional methods which constructed the prediction model based on all the sumoylation sites. when compared with two existing predictors, it can serve as a competitive method in predicting sumoylation sites. | http://protein.cau.edu.cn/others/SUMOhydro/introduction.html | 22720073 |
4 | GPS-SUMO | The Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium. | http://sumosp.biocuckoo.org/ | 24880689 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | The Sulfinator | The Sulfinator is a software tool able to predict tyrosine sulfation sites in protein sequences. It employs four different Hidden Markov Models that were built to recognise sulfated tyrosine residues located N-terminally, within sequence windows of more than 25 amino acids and C-terminally, as well as sulfated tyrosines clustered within 25 amino acid windows, respectively. All four HMMs contain the distilled information from one multiple sequence alignment. The data sets used to train and test the HMM are available. | http://web.expasy.org/sulfinator/ | 12050077 |
# | Tool Name | Description | URL | Reference |
---|---|---|---|---|
1 | hCKSAAP_UbSite | CKSAAP_UbSite is a web server that could predict ubiquitination sites in proteins. With the assistance of SVM, the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs as input feature vector. It was trained and tested on a set of experimentally verified ubiquitination sites obtained from Radivojac et al (Proteins, 2010, 78: 365-380). The class-balanced accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. Since Radivojac et al’s dataset was selected from the proteome of S. cerevisiae, the application of CKSAAP_UbSite should be favorable in the proteome of S. cerevisiae. | http://protein.cau.edu.cn/cksaap_ubsite/ | 23603789 |
2 | iUbiq-Lys | The web-server iUbiq-Lys is a web server that could predict ubiquitination sites in proteins. With the assistance of SVM, the highlight of iUbiq-Lys is to employ amino acid sequence features extracted from the sequence evolution information via grey system model (Grey-PSSM). | http://www.jci-bioinfo.cn/iUbiq-Lys | 25248923 |
3 | UbiPred | BACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites. RESULTS: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaïveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation.Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules. CONCLUSION: We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at http://iclab.life.nctu.edu.tw/ubipred. | http://iclab.life.nctu.edu.tw/ubipred | 18625080 |
4 | UbiProber | Systematic dissection of the ubiquitylation proteome is emerging as an appealing but challenging research topic because of the significant roles ubiquitylation plays not only in protein degradation but also in many other cellular functions. Since ubiquitylation is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitylation sites using conventional experimental approaches. To efficiently discover lysine-ubiquitylation sites, a highly specific predictor for in silico prediction of ubiquitylation sites in any individual organism is urgently needed to guide experimental design. Here we present a novel protein ubiquitylation prediction tool named UbiProber, implemented by support vector machines that integrates local sequence similarities to known ubiquitylation sites, physicochemical property and amino acid compositions, and we used the information gain to identify the key positions and amino acids to optimize the prediction model. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, but the cross-validation result indicates that the integration of key positions and amino acids features of ubiquitylation sequences can improve predictive performance. UbiProber offers four models of Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Combined, an independent test on a 1:1 ratio of positive and negative samples revealed that the areas under ROC curves (AUCs) of Combined model reached 83.36%. Cross-validation tests also show that UbiProber achieves some improvement over existing tools in predicting species-specific ubiquitylation sites. | http://bioinfo.ncu.edu.cn/UbiProber.aspx | 23626001 |
5 | UbPred | UbPred is a random forest-based predictor of potential ubiquitination sites in proteins. It was trained on a combined set of 266 non-redundant experimentally verified ubiquitination sites available from our experiments and from two large-scale proteomics studies (Hitchcock, et al., 2003; Peng, et al., 2003). Class-balanced accuracy of UbPred reached 72%, whereas the AUC (area under the ROC curve) was estimated to be ~80%. | http://www.ubpred.org/ | 19722269 |