The 2020 Metrics of the HUPO Human Proteome Project (HPP) effort to credibly detect every protein of the human proteome has been released (see https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00485). This report now provides evidence for detected expression for >90% of the 19,773 predicted proteins coded in the human genome. The HPP annually reports on the progress made throughout the world toward credibly identifying and characterizing the complete human protein parts list and promoting proteomics as an integral part of multiomics studies in medicine and the life sciences. The 2020 metrics paper describes the credibly detected proteins (PE1 level) as well as the 4 other PE levels of protein evidence in a central repository for community sharing of these results. With the neXtProt release of 2020−01, 17,874 genes encoding proteins are classified as PE1 and having strong protein-level evidence. This PE1 level is up 180 proteins from 17,694 one year earlier and represent 90.4% of the 19,773 predicted coding genes (all PE1,2,3,4 proteins in neXtProt). Conversely, the number of neXtProt PE2,3,4 proteins, termed the “missing proteins” (MPs), was reduced by 230 from 2129 to 1899 since the previous year’s release neXtProt 2019−01. PeptideAtlas is the primary source of uniform reanalysis of raw mass spectrometry (MS) data for neXtProt, supplemented this year with extensive data from the MS repository MassIVE. The mass spectrometry data knowledge bases promoted 362 and 84 canonical proteins (PeptideAtlas and MassIVE respectively) in the last year to increase the credibly identified proteins. The Human Protein Atlas also released new protein detection repositories (based on antibody binding data to human proteins) for Blood, Brain, and Metabolic Atlases. The Biology and Disease-driven (B/D)-HPP teams continue to pursue the identification of driver proteins that underlie disease states, the characterization of regulatory mechanisms controlling the functions of these proteins, their proteoforms, and their interactions.
Of the remaining “missing proteins”, hydrophobic proteins account for about 40% of these and are compounded by protein sequence structures that are difficult to extract credible peptides for high-stringency identification. These missing proteins include large families or groups including GPCR, zinc finger, homeobox, keratin-associated, and coiled-coil domain proteins. We expect novel strategies for finding missing proteins, characterizing the functions of already-detected “dark” proteins, and utilizing proteogenomics in precision medicine to be fruitful in the coming years.
In addition, the Journal of Proteome Research will produce a year-end virtual Issue with dozens of high-impact papers from the 7 annual special issues of JPR from the Human Proteome Project.