Transcript Document
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang2; Baozhen Shan1; Bin Ma2 1Bioinformatics Solutions Inc, Canada 2University of Waterloo, Canada Protein sequence analysis • Problem Complete protein sequence coverage o antibody confirmation o biomarker discovery Database search software along is insufficient Protein sequence analysis • Possible reasons for incomplete coverage • “non-database” peptides o unexpected modifications o mutated residues o novel peptide • database errors • Meanwhile Large amount of high-quality spectra are not matched. Proposed workflow for in-depth analysis • A workflow to identify both the database and “non-database” peptides • Objective • Maximize protein sequence coverage • Explain more high-quality MS/MS spectra Proposed workflow for in-depth analysis • Workflow Multiple enzyme • Multiple protein digests with different enzymes • High accuracy MS for both precursor and fragment ions Proposed workflow for in-depth analysis • Workflow Multiple enzyme • Identify de novo sequence tags • Reveal a set of high quality spectra PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17(20):2337-42. Proposed workflow for in-depth analysis • Workflow Multiple enzyme • Identify database peptides. • Database search result validated by de novo tags • Reveal a set of confident proteins PEAKS DB: De Novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 2012; 11:10.1074, 1–8. Proposed workflow for in-depth analysis • Workflow Multiple enzyme For input spectra with + highly confident de novo tags - no significant database matches • Identify peptides with unexpected modifications • Peptides from the set of confident proteins are “modified” in-silico by trying all possible modifications in UNIMOD. • Speed up by de novo tags PeaksPTM: Mass spectrometry-based identification of peptides with unspecified modifications. Journal of Proteome Research 10.7 (2011) : 2930-2936 Proposed workflow for in-depth analysis • Workflow Multiple enzyme For input spectra with + highly confident de novo tags - no significant database matches • Identify peptides with mutation, such as residue insertion, deletion, and substitution. • Screen the protein database to find short sequences similar to de novo tags • Use both the de novo tags and database sequence to reconstruct the most probable sequences that match the spectrum SPIDER: software for protein identification from sequence tags with de novo sequencing error. J Bioinform Comput Biol. 2005 Jun;3(3):697-716. Proposed workflow for in-depth analysis • Workflow Multiple enzyme Unassigned de novo sequence tags are reported as possible novel peptides Proposed workflow for in-depth analysis • Result integration In-depth analysis of BSA Test the workflow with the standard bovine serum albumin • Sample • Pure ALBU_BOVIN from SIGMA • 3 digests with Trypsin, LysC, GluC. • LC-MS/MS with Thermo LTQ-Orbitrap XL. Trypsin LysC GluC • Workflow • Workflow implemented in PEAK 6 • 3 digests in one project • Searched database: Swiss-Prot LC-MS/MS Workflow Result • More PSMs are identified in each additional step: 5,152 MS/MS spectra 1,737 PSMs 906 PSMs 44 PSMs 38 MS/MS spectra Filtered at 1% FDR 1,737 -> 2,687 PSMs PEAKS ALC score > 70% Result • BSA coverage 98% 96% 94% 92% 90% 88% 86% 84% 82% 96% 87% Trypsin + PEAKS DB Proposed workflow The uncovered 4% is in the protein N-terminal region, which is mostly likely cleaved-off and not in the purchased sample1. 1specific binding site (Asp-Thr-His-Lys) for Cu(II) ions. T. Peters Jr., F.A. Blumenstock. J. Biol. Chem., 242 (1967), p. 1574 Result • Contaminants • Identified with at least 3 unique peptides. – Human keratin proteins (K2C1_HUMAN and K1C_HUMAN) – Bacteria protein (SSPA_STAAR) – Trypsin (TRY1_BOVIN) Result • PTMs • Unsuspected modifications identified by PTM search – Three PTMs specified in database search » » » Carbamidomethylation (C) Oxidation (M) Deamidation (NQ) Result • Mutation • 214th amino acid A T • Brown 1975, Fed. Proc. 34:591 Result • Unexplained de novo tags • Might be… – Novel peptides outside of the searched database KK.QTALVELLK.HK ||||||| DPALVELLKK Summary • A software workflow proposed for in-depth protein sequence analysis • Found many things in a “pure” sample – Contaminants – Unsuspected PTMs – Mutations • Improved protein sequence coverage – BSA coverage: 87% -> 96% • Explained more high-quality MS/MS spectra – Identified MS/MS spectra: 1,737 -> 2,687 Q/A