Download

The download of experimental PTM sites in dbPTM


Due to the inaccessibility of database contents in several online PTM resources, a total 41 biological databases related to PTMs are integrated in dbPTM. To solve the heterogeneity among the data collected from different sources, the reported modification sites are mapped to the UniProtKB protein entries using sequence comparison. With the high-throughput of mass spectrometry-based methods in post-translational proteomics, this update also includes manually curated MS/MS-identified peptides associated with PTMs from research articles through a literature survey. First, a table list of PTM-related keywords is constructed by referring to the UniProtKB PTM list (http://www.uniprot.org/docs/ptmlist.txt) and the annotations of RESID. Then, all fields in the PubMed database are searched based on the keywords of the constructed table list. This is then followed by downloading the full text of the research articles. For the various experiments of proteomic identification, a text-mining system is developed to survey full-text literature that potentially describes the site-specific identification of modified sites.Furthermore, in order to determine the locations of PTMs on a full-length protein sequence, the experimentally verified MS/MS peptides are then mapped to UniProtKB protein entries based on its database identifier (ID) and sequence identity. In the process of data mapping, MS/MS peptides that cannot align exactly to a protein sequence are discarded. Finally, each mapped PTM site is attributed with a corresponding literature (PubMed ID). All types of PTM were categorized by the modified amino acid, with tab-delimited format. These datasets provide UniProt ID, modified position, PTM type, and the sequence with upstream 10 amino acids to downstream 10 amino acids. However, some types of PTM, which were occurred in N-terminal or C-terminal protein, were extracted the sequences with dashes ('-').


PTM TypeNumber of experimental SitesNumber of literaturesDownload
ADP-ribosylation10735WindowsMAC / Linux
AMPylation2720WindowsMAC / Linux
Acetylation13816919451WindowsMAC / Linux
Amidation33161084WindowsMAC / Linux
Biotinylation1212WindowsMAC / Linux
Blocked amino end2625WindowsMAC / Linux
Butyrylation823WindowsMAC / Linux
C-linked Glycosylation21119WindowsMAC / Linux
Carbamidation221WindowsMAC / Linux
Carboxyethylation11WindowsMAC / Linux
Carboxylation4463WindowsMAC / Linux
Cholesterol ester22WindowsMAC / Linux
Citrullination12218WindowsMAC / Linux
Crotonylation3737WindowsMAC / Linux
D-glucuronoylation11WindowsMAC / Linux
Deamidation9244WindowsMAC / Linux
Deamination4213WindowsMAC / Linux
Decanoylation67WindowsMAC / Linux
Decarboxylation22WindowsMAC / Linux
Dephosphorylation13395WindowsMAC / Linux
Disulfide bond61WindowsMAC / Linux
Farnesylation8783WindowsMAC / Linux
Formation of an isopeptide bond2916WindowsMAC / Linux
Formylation25647WindowsMAC / Linux
GPI-anchor10357WindowsMAC / Linux
Gamma-carboxyglutamic acid50894WindowsMAC / Linux
Geranylgeranylation8148WindowsMAC / Linux
Glutarylation9524WindowsMAC / Linux
Glutathionylation412985WindowsMAC / Linux
Hydroxyceramide ester31WindowsMAC / Linux
Hydroxylation2404353WindowsMAC / Linux
Iodination192WindowsMAC / Linux
Lactoylation3061WindowsMAC / Linux
Lactylation3362WindowsMAC / Linux
Lipoylation3530WindowsMAC / Linux
Malonylation1284712WindowsMAC / Linux
Methylation161148897WindowsMAC / Linux
Myristoylation312207WindowsMAC / Linux
N-carbamoylation11WindowsMAC / Linux
N-linked Glycosylation273611941WindowsMAC / Linux
N-palmitoylation7464WindowsMAC / Linux
Neddylation16764WindowsMAC / Linux
Nitration8117WindowsMAC / Linux
O-linked Glycosylation166924457WindowsMAC / Linux
O-palmitoleoylation611WindowsMAC / Linux
O-palmitoylation33WindowsMAC / Linux
Octanoylation78WindowsMAC / Linux
Oxidation42783WindowsMAC / Linux
Phosphatidylethanolamine amidation97WindowsMAC / Linux
Phosphorylation161505444986WindowsMAC / Linux
Propionylation133WindowsMAC / Linux
Pyrrolidone carboxylic acid965571WindowsMAC / Linux
Pyrrolylation11WindowsMAC / Linux
Pyruvate2521WindowsMAC / Linux
S-Cyanation12WindowsMAC / Linux
S-archaeol11WindowsMAC / Linux
S-carbamoylation12WindowsMAC / Linux
S-cysteinylation22WindowsMAC / Linux
S-diacylglycerol5850WindowsMAC / Linux
S-linked Glycosylation57WindowsMAC / Linux
S-nitrosylation4172190WindowsMAC / Linux
S-palmitoylation6501571WindowsMAC / Linux
Serotonylation92WindowsMAC / Linux
Stearoylation33WindowsMAC / Linux
Succinylation1797359WindowsMAC / Linux
Sulfation252129WindowsMAC / Linux
Sulfhydration87WindowsMAC / Linux
Sulfoxidation7581124WindowsMAC / Linux
Sumoylation5889221WindowsMAC / Linux
Thiocarboxylation1313WindowsMAC / Linux
UMPylation108WindowsMAC / Linux
Ubiquitination348307669WindowsMAC / Linux

The benchmark data set for PTM analyses


Owing to the labor-intensive MS/MS-based experiments, a variety of computational methods have been proposed to identify putative PTM sites based on protein sequence. With numerous PTM prediction methods, it is difficult to determine a best prediction tool merely according to their cross-validation performances. Although most of these studies have provided independent testing results for their prediction methods, there is no standard dataset for the evaluation of predictive powers among various PTM prediction tools. Therefore, this update compiles non-homologous benchmark datasets to evaluate the predictive power for PTM sites prediction tools, that provides suggestions to users with the need to predict PTM sites with high sensitivity (Sn), high specificity (Sp), or balanced Sn and Sp.


PTM TypeNumber of proteinsNumber of positive sitesNumber of negative sitesDownload
Phosphorylation by CDK1,0201,50329,823WindowsMAC / Linux
Phosphorylation by MAPK8571,27022,436WindowsMAC / Linux
Phosphorylation by PKA9051,20929,813WindowsMAC / Linux
Phosphorylation by PKC69194324,207WindowsMAC / Linux
Phosphorylation by CK251181915,387WindowsMAC / Linux
Phosphorylation by CAMKL45455620,129WindowsMAC / Linux
Phosphorylation by GSK29139710,328WindowsMAC / Linux
Phosphorylation by AKT35138014,617WindowsMAC / Linux
Phosphorylation by CAMK225436612,575WindowsMAC / Linux
Phosphorylation by CK11743395,808WindowsMAC / Linux
Phosphorylation by RSK2212156,985WindowsMAC / Linux
Phosphorylation by GRK771472,310WindowsMAC / Linux
Phosphorylation by PKG1261457,311WindowsMAC / Linux
Phosphorylation by DYRK1091423,470WindowsMAC / Linux
Phosphorylation by MAPKAPK1001253,096WindowsMAC / Linux
Phosphorylation by DMPK991093,533WindowsMAC / Linux
Phosphorylation by PKD88973,401WindowsMAC / Linux
Phosphorylation by PDK177932,274WindowsMAC / Linux
Phosphorylation by SGK63773,057WindowsMAC / Linux
Phosphorylation by RAD5329751,560WindowsMAC / Linux
Phosphorylation by DAPK51531,284WindowsMAC / Linux
Phosphorylation by PKN2650866WindowsMAC / Linux
Phosphorylation by CAMK134442,342WindowsMAC / Linux
Phosphorylation by MLCK2034484WindowsMAC / Linux
Phosphorylation by NDR28321,096WindowsMAC / Linux
Acetylation5,64614,4078,704WindowsMAC / Linux
Citrullination66761,501WindowsMAC / Linux
C-linked Glycosylation39113159WindowsMAC / Linux
Crotonylation2011736WindowsMAC / Linux
Formylation1301721,452WindowsMAC / Linux
Gamma-carboxyglutamic acid54319553WindowsMAC / Linux
Glutarylation2177252,543WindowsMAC / Linux
Glutathionylation1,4933,5556,617WindowsMAC / Linux
Hydroxylation2011,2702,900WindowsMAC / Linux
Lipoylation2829779WindowsMAC / Linux
Malonylation2,7687,63517,371WindowsMAC / Linux
Methylation5,43814,68636,501WindowsMAC / Linux
Nitration6164983WindowsMAC / Linux
N-linked Glycosylation1,9692,5178,330WindowsMAC / Linux
O-linked Glycosylation1,2984,47037,969WindowsMAC / Linux
S-diacylglycerol235759WindowsMAC / Linux
S-nitrosylation1,4343,5925,803WindowsMAC / Linux
Succinylation2,5995,0495,526WindowsMAC / Linux
Sumoylation1,4325,19116,066WindowsMAC / Linux
Ubiquitination4,4539,7678,579WindowsMAC / Linux

The proteomics data for different types of cancer


To advance cancer research, we present a collection of proteomics datasets spanning multiple cancer types. These datasets offer valuable insights into tumor and normal tissue samples, supporting research on the molecular mechanisms driving cancer progression. The table below lists critical details for each dataset, including the cancer type, number of tumor and normal samples, access to volcano plots for differential expression analysis, and direct download links. This resource empowers researchers to explore protein-level changes, aiding in discovering potential biomarkers and therapeutic targets.


Cancer TypeNumber of Tumor SampleNumber of Normal SampleVolcano PlotDownloadSource
Colon Adenocarcinoma (COAD) 97 100 Click to view figure Windows / MAC / Linux PDC000109
Ovarian Serous Cystadenocarcinoma (OV) 83 20 Click to view figure Windows / MAC / Linux PDC000110
Breast Invasive Carcinoma (BRCA) 135 18 Click to view figure Windows / MAC / Linux PDC000120
Uterine Corpus Endometrial Carcinoma (UCEC) 104 49 Click to view figure Windows / MAC / Linux PDC000125
Clear Cell Renal Cell Carcinoma (CCRCC) 110 84 Click to view figure Windows / MAC / Linux PDC000127
Lung Adenocarcinoma (LUAD) 110 101 Click to view figure Windows / MAC / Linux PDC000153
Hepatocellular carcinoma (HCC) 165 165 Click to view figure Windows / MAC / Linux PDC000198
Glioblastoma (GBM) 99 10 Click to view figure Windows / MAC / Linux PDC000204
Early Onset Gastric Cancer (EOGC) 80 80 Click to view figure Windows / MAC / Linux PDC000214
Head and Neck Squamous Cell Carcinoma (HNSCC) 109 63 Click to view figure Windows / MAC / Linux PDC000221
Lung Squamous Cell Carcinoma (LSCC) 108 99 Click to view figure Windows / MAC / Linux PDC000234
Pancreatic Ductal Adenocarcinoma (PDAC) 140 75 Click to view figure Windows / MAC / Linux PDC000270

TOP