Proteomics is the large-scale study of proteins, including their structures, functions, and interactions. It has become an important technology for understanding biological systems on a global scale. Mass spectrometry plays a key role in proteomic analysis by allowing researchers to identify and characterize proteins and their post-translational modifications like phosphorylation. There are challenges in analyzing post-translational modifications since proteins exist in multiple modified forms, but methods like affinity enrichment and tandem mass spectrometry are used to map modifications and locate them on protein sequences.
Proteomics is the large-scale study of proteins, particularly their structures and functions It is much more complicated than genomics mostly because while an organism's genome is more or less constant, the proteome differs from cell to cell and from time to time. This is because distinct genes are expressed in distinct cell types. This means that even the basic set of proteins which are produced in a cell needs to be determined. In the past this was done by mRNA analysis, but this was found not to correlate with protein content. It is now known that mRNA is not always translated into protein, and the amount of protein produced for a given amount of mRNA depends on the gene it is transcribed from and on the current physiological state of the cell. Proteomics confirms the presence of the protein and provides a direct measure of the quantity present. Not only does the translation from mRNA cause differences, many proteins are also subjected to a wide variety of chemical modifications after translation. Many of these post-translational modifications are critical to the protein's function.
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form in a biologically functional way. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code.
The DNA sequence of a gene encodes the amino acid sequence of a protein. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells
Each level of protein structure is essential to the finished molecule's function. The primary sequence of the amino acid chain determines where secondary structures will form, as well as the overall shape of the final 3D conformation. The 3D conformation of each small peptide or subunit determines the final structure and function of a protein conglomerate.
The proteome is the entire set of proteins expressed by a genome, cell, tissue or organism. More specifically, it is the set of expressed proteins in a given type of cells or an organism at a given time under defined conditions. The term is a blend of proteins and genome. The term "proteomics" was first coined in 1997[3] to make an analogy with genomics, the study of the genes. The word "proteome" is a blend of "protein" and "genome", and was coined by Marc Wilkins in 1994 while working on the concept as a PhD student.[4][5] The proteome is the entire complement of proteins,[4] including the modifications made to a particular set of proteins, produced by an organism or system. This will vary with time and distinct requirements, or stresses, that a cell or organism undergoes.
A research team from ETH Zurich, led by Professor Ruedi Aebersold, and from the Institute for Systems Biology, Seattle, has used mass spectroscopy methods to fully map the human proteome for the first time. The data is being made available to all researchers.The big wide world of proteins: Systems biologists at ETH Zurich and the IBS have drawn a complete map of the human proteome. (Image: e-pics/ETH Zurich) https://www.e-pics.ethz.ch/
DNA tells what possibly RNA what probably and Proteins what actually happens.
Sample preparation: cells, body fluids, tissue s C hromatography electrophoresis • different combinations : accuracy, sensit i vity, mass range … secondary electron multiplier micro channel plates time-of-flight (TOF) quadrupol (Q) ion trap (IT) fourier transform ion cyclotron (FTICR )
Generic mass spectrometry (MS)-based proteomics experiment. The typical proteomics experiment consists of five stages. In stage 1, the proteins to be analysed are isolated from cell lysate or tissues by biochemical fractionation or affinity selection. This often includes a final step of one-dimensional gel electrophoresis, and defines the ‘sub-proteome’ to be analysed. MS of whole proteins is less sensitive than peptide MS and the mass of the intact protein by itself is insufficient for identification. Therefore, proteins are degraded enzymatically to peptides in stage 2, usually by trypsin, leading to peptides with C-terminally protonated amino acids, providing an advantage in subsequent peptide sequencing. In stage 3, the peptides are separated by one or more steps of high-pressure liquid chromatography in very fine capillaries and eluted into an electrospray ion source where they are nebulized in small, highly charged droplets. After evaporation, multiply protonated peptides enter the mass spectrometer and, in stage 4, a mass spectrum of the peptides eluting at this time point is taken (MS1 spectrum, or ‘normal mass spectrum’). The computer generates a prioritized list of these peptides for fragmentation and a series of tandem mass spectrometric or ‘MS/MS’ experiments ensues (stage 5). These consist of isolation of a given peptide ion, fragmentation by energetic collision with gas, and recording of the tandem or MS/MS spectrum. The MS and MS/MS spectra are typically acquired for about one second each and stored for matching against protein sequence databases. The outcome of the experiment is the identity of the peptides and therefore the proteins making up the purified protein population.
Strategies for quantitative peptide analyses. (A) Quantification using isotope dilution is widely used and accepted in the proteomics community. It is based on the incorporation of a stable isotope signature into all of the proteins of one sample and the incorporation of a different stable isotope signature in all proteins of a second sample. The samples are then combined to serve as mutual references. Stable isotope incorporation has been achieved by chemical modification of proteins using suitable isotope coded labeling reagents (26), metabolic labeling (35), or by enzyme reactions (36). The method is schematically illustrated here. (B) Quantification using tandem mass tags relies on variants of stable isotope labeling reagents (37, 38). They consist of two isotopically labeled elements, which have an overall constant mass. Currently, these reagents can be multiplexed to four channels. Quantification is performed in the MS/MS mode by measuring the relative intensity of the reporter group attached at the N terminus and observed in low mass range of the CID spectrum. (C) Quantification using internal standards is a variant of isotopic dilution in which a subset of isotopically labeled peptides is added to the sample at defined concentrations to perform precise quantification using calibration curves. Although it is more demanding in terms of sample preparation, this method is likely to gain importance in the future in the more directed approach indicated above for quantifying proteins in a larger number of samples. It may also be a more effective way to perform hypothesis-driven studies by screening for known or putative proteins (i.e., peptides) present in samples.
The bottom-up approach is the most popular method when tackling high-complexity samples for large-scale analyses. The term shotgun proteomics (33, 80, 97) is the protein equivalent to shotgun genomic sequencing in which the DNA is sheared and sequenced in smaller overlap- ping contigs. Bottom-up proteomics is an approach in which proteins are proteolytically digested into peptides prior to mass analysis, and the ensuing peptide masses and sequences are used to identify corresponding proteins. Most bottom-up applications require tandem data acquisition in which peptides are subjected to collision-activated dissociation (CAD or CID). The most widely used method for bottom-up tandem MS data identification is the database search (132, 133) in which experimental MSn data are compared with the predicted, in silico–generated fragmentation patterns of the peptides under investigation.
Protein MS is tightly linked and highly dependent on separation technologies that simplify in- credibly complex biological samples prior to mass analysis. Because proteins are identified by the mass-to-charge ratios of their peptides and fragments, sufficient separation is required for un- ambiguous identifications. Front-end separation is also required to detect low-abundance species that would otherwise be overshadowed by a higher abundance signal. Therefore, both accuracy and sensitivity of a mass spectrometric experiment rely on efficient separation. There is a very strong conceptual link between chemical separation and MS in which the latter is viewed as the mass-resolution dimension of separation of molecules (33). Selection of appropriate separation methods is often the first step in designing the proteomic application. Two major approaches to separation widely used in proteomics are gel based and gel free. Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is the historic centerpiece of the gel-based separation methods (68–71). There are many excellent reviews that cover 2D PAGE and gel-based approaches to proteomics (72–75). Reverse phase resins (RPLC or RP) separate compounds based on their hydrophobicity, and a significant advantage of RPLC is that the buffers used are compatible with ES Multidimensional separation is used to address this high sample complexity. By definition, the multidimensional separation approach combines several separation techniques coupled to improve the resolving power. An important consideration for multidimensional separation is the orthogonality of the individual separation methods (98) in which each dimension uses different (orthogonal) molecular properties of molecules as a basis for separation. Although there are recent review papers that cover historical and theoretical aspects of multidimensional separation (77, 99), we mention some of the milestones in addition to the current trends. One of the first 2D setups featured cation exchange chromatography coupled to a reverse phase column in line with a mass spectrometer (82) used for separation of Escherichia coli proteins.
1D: isoelectric focussing (IEF) separation by IP 2D: dimension: SDS-PAGE separation by MW staining > 1000 proteins /gel molecular analysis by mass spectrometry, HPLC, Westernblot ... pitfalls : very basic / acid ic; large / small ; hydrophobic ; low-abundan ce proteins …
For the isolation of protein spots from 2D gels robots are used in modern laboratories.
Protein and peptide fractionation Proteins of interest to biological researchers are usually part of a very complex mixture of other proteins and molecules that co-exist in the biological medium. This presents two significant problems. First, the two ionization techniques used for large molecules only work well when the mixture contains roughly equal amounts of constituents, while in biological samples, different proteins tend to be present in widely differing amounts. If such a mixture is ionized using electrospray or MALDI, the more abundant species have a tendency to "drown" or suppress signals from less abundant ones. The second problem is that the mass spectrum from a complex mixture is very difficult to interpret because of the overwhelming number of mixture components. This is exacerbated by the fact that enzymatic digestion of a protein gives rise to a large number of peptide products. To contend with this problem, two methods are widely used to fractionate proteins, or their peptide products from an enzymatic digestion. The first method fractionates whole proteins and is called two-dimensional gel electrophoresis . The second method, high performance liquid chromatography is used to fractionate peptides after enzymatic digestion. In some situations, it may be necessary to combine both of these techniques. Gel spots identified on a 2D Gel are usually attributable to one protein. If the identity of the protein is desired, usually the method of in-gel digestion is applied, where the protein spot of interest is excised, and digested proteolytically. The peptide masses resulting from the digestion can be determined by mass spectrometry using peptide mass fingerprinting . If this information does not allow unequivocal identification of the protein, its peptides can be subject to tandem mass spectrometry for de novo sequencing . Characterization of protein mixtures using HPLC/MS is also called shotgun proteomics and mudpit . A peptide mixture that results from digestion of a protein mixture is fractionated by one or two steps of liquid chromatography. The eluent from the chromatography stage can be either directly introduced to the mass spectrometer through electrospray ionization, or laid down on a series of small spots for later mass analysis using MALDI.
Peptides tend to fragment along the backbone. Fragments can also loose neutral chemical groups like NH 3 and H 2 O.
Proteins and peptides are polar, nonvolatile, and thermally unstable species that require an ionization technique that transfers an analyte into the gas phase without extensive degradation. Two such techniques paved the way for the modern bench-top MS proteomics, matrix-assisted laser desorption ionization (MALDI), (10–13) and electrospray ionization (ESI) (14). The ionization is triggered by a laser beam (normally a nitrogen laser). A matrix is used to protect the biomolecule from being destroyed by direct laser beam and to facilitate vaporization and ionization.
William E. Stephens 1952 TOF patent
Mass spectrometry (MS) is an analytical technique that measures the mass-to-charge ratio of charged particles.[1] It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and other chemical compounds. The MS principle consists of ionizing chemical compounds to generate charged molecules or molecule fragments and measurement of their mass-to-charge ratios.[1] In a typical MS procedure: A sample is loaded onto the MS instrument, and undergoes vaporization The components of the sample are ionized by one of a variety of methods (e.g., by impacting them with an electron beam), which results in the formation of charged particles (ions) The ions are separated according to their mass-to-charge ratio in an analyzer by electromagnetic fields The ions are detected, usually by a quantitative method The ion signal is processed into mass spectra MS instruments consist of three modules: An ion source, which can convert gas phase sample molecules into ions (or, in the case of electrospray ionization, move ions that exist in solution into the gas phase) A mass analyzer, which sorts the ions by their masses by applying electromagnetic fields A detector, which measures the value of an indicator quantity and thus provides data for calculating the abundances of each ion present
Mass spectrometers used in proteome research. The left and right upper panels depict the ionization and sample introduction process in electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). The different instrumental configurations (a–f) are shown with their typical ion source. a, In reflector time-of-flight (TOF) instruments, the ions are accelerated to high kinetic energy and are separated along a flight tube as a result of their different velocities. The ions are turned around in a reflector, which compensates for slight differences in kinetic energy, and then impinge on a detector that amplifies and counts arriving ions. b, The TOF-TOF instrument incorporates a collision cell between two TOF sections. Ions of one mass-to-charge (m/z) ratio are selected in the first TOF section, fragmented in the collision cell, and the masses of the fragments are separated in the second TOF section. c, Quadrupole mass spectrometers select by time-varying electric fields between four rods, which permit a stable trajectory only for ions of a particular desired m/z. Again, ions of a particular m/z are selected in a first section (Q1), fragmented in a collision cell (q2), and the fragments separated in Q3. In the linear ion trap, ions are captured in a quadruple section, depicted by the red dot in Q3. They are then excited via resonant electric field and the fragments are scanned out, creating the tandem mass spectrum. d, The quadrupole TOF instrument combines the front part of a triple quadruple instrument with a reflector TOF section for measuring the mass of the ions. e, The (three-dimensional) ion trap captures the ions as in the case of the linear ion trap, fragments ions of a particular m/z, and then scans out the fragments to generate the tandem mass spectrum. f, The FT-MS instrument also traps the ions, but does so with the help of strong magnetic fields. The figure shows the combination of FT-MS with the linear ion trap for efficient isolation, fragmentation and fragment detection in the FT-MS section.
High-resolution separation demonstrated by the UPLC-MudPIT system. (a) A base peak chromatogram of tryptic peptides from yeast lysate separated by a 60-cm triphasic column with a 350-min gradient. (b) A mass chromatogram of six typical peptides used to estimate a peak capacity. (c) A gradient profile monitored by UV. (d ) Representative fragmentation scan (MS/MS) spectra and their assignments. A triphasic column composed of 5-cm C18 trap (5 μm)/5-cm SCX/60-cm C18 analytical (3 μm) was operated at 125 μL min−1 constant flow (system pressure, ∼15 kpsi; column flow rate, ∼0.16 μL min−1). A 10-μg of yeast Lys-C + tryptic digest was injected in the system. Peptides were eluted by a two-step UHP-MudPIT (i.e., the chromatogram shown in a is of the second step eluted with 500 mM ammonium acetate). In b, six mid-intensity peaks distributed nearly evenly across the chromatogram were picked. wb, a peak width at the base line, given in minutes. The estimated peak capacity was ∼400. Reprinted with permission from Reference 92.
High-resolution separation demonstrated by the UPLC-MudPIT system. (a) A base peak chromatogram of tryptic peptides from yeast lysate separated by a 60-cm triphasic column with a 350-min gradient. (b) A mass chromatogram of six typical peptides used to estimate a peak capacity. (c) A gradient profile monitored by UV. (d ) Representative fragmentation scan (MS/MS) spectra and their assignments. A triphasic column composed of 5-cm C18 trap (5 μm)/5-cm SCX/60-cm C18 analytical (3 μm) was operated at 125 μL min−1 constant flow (system pressure, ∼15 kpsi; column flow rate, ∼0.16 μL min−1). A 10-μg of yeast Lys-C + tryptic digest was injected in the system. Peptides were eluted by a two-step UHP-MudPIT (i.e., the chromatogram shown in a is of the second step eluted with 500 mM ammonium acetate). In b, six mid-intensity peaks distributed nearly evenly across the chromatogram were picked. wb, a peak width at the base line, given in minutes. The estimated peak capacity was ∼400. Reprinted with permission from Reference 92.
Schematic detailing the quantitative analysis capabilities of Census. (a) Use of Census with isotopic labeling (see text). (b) Use of Census with label-free analysis. For chromatogram alignment, Census uses a Pearson correlation between mass spectra and dynamic time warping (255). After alignment, chromatograms are extracted as described. LC, liquid chromatography. Reprinted with permission from Reference 189. Census is capable of achieving en masse quantification of proteins for high-complexity samples analyzed with MudPIT.
Schematic representation of methods for stable-isotope protein labelling for quantitative proteomics. a, Proteins are labelled metabolically by culturing cells in media that are isotopically enriched (for example, containing 15N salts, or 13C-labelled amino acids) or isotopically depleted. b, Proteins are labelled at specific sites with isotopically encoded reagents. The reagents can also contain affinity tags, allowing for the selective isolation of the labelled peptides after protein digestion. The use of chemistries of different specificity enables selective tagging of classes of proteins containing specific functional groups. c, Proteins are isotopically tagged by means of enzyme-catalysed incorporation of 18O from 18O water during proteolysis. Each peptide generated by the enzymatic reaction carried out in heavy water is labelled at the carboxy terminal. In each case, labelled proteins or peptides are combined, separated and analysed by mass spectrometry and/or tandem mass spectrometry for the purpose of identifying the proteins contained in the sample and determining their relative abundance. The patterns of isotopic mass differences generated by each method are indicated schematically. The mass difference of peptide pairs generated by metabolic labelling is dependent on the amino acid composition of the peptide and is therefore variable. The mass difference generated by enzymatic 18O incorporation is either 4 Da or 2 Da, making quantitation difficult. The mass difference generated by chemical tagging is one or multiple times the mass difference encoded in the reagent used.
Not single-step Rather, individual components for separating, identifying and quantifying the polypeptides as well as tools for integrating and analysing all the data must be used in concert. Out of a bewildering multitude of techniques and instruments, two main tracks can be identified. The first, and most commonly used, is a combination of 2DE and MS. The second track combines limited protein purification with the more recently developed techniques of automated peptide MS/MS and, if accurate quantification is desired, stable- isotope tagging of proteins or peptides. In either track a suitable data processing, storage and visualization infrastructure needs to be developed, if the platform is intended for high-throughput operation.
Post-translational modifications modulate the activity of most eukaryote proteins. Analysis of these modifications presents formidable challenges but their determination generates indispensable insight into biological function. Strategies developed to characterize individual proteins are now systematically applied to protein populations. The combination of function- or structure-based purification of modified 'subproteomes', such as phosphorylated proteins or modified membrane proteins, with mass spectrometry is proving particularly successful. To map modification sites in molecular detail, novel mass spectrometric peptide sequencing and analysis technologies hold tremendous potential. Finally, stable isotope labeling strategies in combination with mass spectrometry have been applied successfully to study the dynamics of modifications.
Some common and important post-translational modifications
In some reactions, the purpose of phosphorylation is to "activate" or "energize" a molecule, increasing its energy so it is able to participate in a subsequent reaction with a negative free-energy change. All kinases require a divalent metal ion such as Mg2+ or Mn2+ to be present, which stabilizes the high-energy bonds of the donor molecule (usually ATP or ATP derivative) and allows phosphorylation to occur. In other reactions, phosphorylation of a protein substrate can inhibit its activity (as when AKT phosphorylates the enzyme GSK-3). One common mechanism for phosphorylation-mediated enzyme inhibition was demonstrated in the tyrosine kinase called "src" (pronounced "sarc", see: Src (gene)). When src is phosphorylated on a particular tyrosine, it folds on itself, and thus masks its own kinase domain, and is thus shut "off". In still other reactions, phosphorylation of a protein causes it to be bound to other proteins which have "recognition domains" for a phosphorylated tyrosine, serine, or threonine motif. As a result of binding a particular protein, a distinct signaling system may be activated or inhibited. In the late 1990s it was recognized that phosphorylation of some proteins causes them to be degraded by the ATP-dependent ubiquitin/proteasome pathway. These target proteins become substrates for particular E3 ubiquitin ligases only when they are phosphorylated.
Glycosylation is the addition of saccharide to a protein or a lipid molecule N-Linked Glycosylation Amide nitrogen of Asparagine O-Linked Glycosylation Hydroxy oxygen of Serine and Threonine
The protein is shown as a line, and modifications are indicated by the symbols. The two columns show enzymatic digestion by two different enzymes to cover as much as possible of the protein sequence with peptides in the preferred mass range for MS analysis (500–3,000 Da).