Transcript Slide 1
Protein Sequencing Research Group (PSRG): Results of the PSRG 2011 Study: SensitivityAssessment of Edman and Mass Spectrometric Terminal Sequencing of an Undisclosed Protein H.A. Remmer1, J.S.Smith2, W.Sandoval3, B.Xiang4, K.Mawuenyega5, D. Suckau6, V. Katta3, J.J. Walters7, P.Hunziker8 of Michigan, Ann Arbor, MI, United States, 2University of Texas Medical Branch, Galveston, TX, United States, 3Genentech, Inc., South San Francisco, CA, United States, 4Monsanto Company, St. Louis, MO, United States, 5Washington University School of Medicine, St. Louis, MO, United States, 6Bruker Daltonics, Bremen, Germany, 7Sigma-Aldrich, St. Louis, MO, United States, 8University of Zurich, Zurich, Switzerland 1University INTRODUCTION Establishing the N-terminal sequence of intact proteins plays a critical role in biochemistry and drug development. Edman degradation and top-down and bottom-up mass spectrometry methods for Nterminal sequence analysis have been used for that task. In this study, we proposed to determine the ability of these sequencing techniques to deal with various sample formats and to assay sensitivity. For the 2011 study, the PSRG distributed three kinds of sample sets (designated A, B or C) of 3 tubes each. Each tube contained the same artificial recombinant (unknown) protein in varying amounts and formats (see table below). Participants chose which of three sample sets - or any combination of sets - they would like to receive. Participants obtained the following information: (a) protein MW is ~52 kDa, (b) the sequence is NOT in a public database,(c) tubes 1 with lowest sample amount contains ~ 5 pmol protein in the selected format (d) potential presence of a copurified E. coli protein at <20 kDa in Sample Set A is known, but of no interest to current study and(e) Sample Set A are soluble in 0.1% TFA, 0.1 % TFA/20 % acetonitrile or 25 mM AMBIC. Study participants were directed to a website to anonymously upload sequences and supporting data. The analysis of the results of the 2011 study focuses on the length and accuracy of the sequence calls depending on increasing amounts of protein. A total of 38 participants requested 74 sample sets. Study Results: Edman Sequencing Sample A1 Participant 004 REFERENCE: T. Kishimoto, J. Kondo, T. Takako-Igarashi and H. Tanaka. A novel method for analyzing protein terminals. Poster presented at the ASMS conference, Salt Lake City, 2010. ACKNOWLEDGEMENTS Dr. Robert English (University of Texas Medical Branch) for accumulation & annonimization of data; Sigma-Aldrich for donation of the study sample; the Executive Board of the ABRF for support and scrutiny of the study proposal, Dr. Jack Simpson (National Cancer Institute, Frederick, MD) for functioning as liaison to the ABRF Executive Board, and participating labs for analyzing sample and returning data. 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 L R V F D E F K P L V E E P Q N L I R V F D E F K P L V K P E E P Q N L I R V F 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 P L V E X P Q N L I 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 33 34 35 36 37 38 39 40 41 42 0.1%TFA/30%IPA 2.8 93.60% G A L X V F D E F K Procise 494 0.1%TFA/20% ACCN 0.7 95.60% G A L R V F D E F K Procise 494HT 0.1% TFA na na note 1 PSRG002 Procise 494HT 0.1% TFA/50% ACCN 3.7 91.30% G R Solvent Initial yield Rep. Yield A L V F D E F K P L V E E P Q N L I R V F D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Participant 020 Procise 494HT na na na X A L X V F D E F K Participant 024 Procise 494 0.1%TFA/20% ACCN 1.2 86.10% G A L R V F D E F K Participant 058 Procise 494HT 0.1% TFA PSRG002 Procise 494HT R Sample A2 Instrument na note 1 0.1% TFA/50% ACCN 9.4 94.80% G A L V F D E F K P L V E E P Q N L I R V F D E F K P L V K P Sample A3 Instrument Solvent Initial yield Rep. Yield 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Participant 024 Procise 494 0.1%TFA/20% ACCN 7.8 98.60% G A L R V F D E F K Participant 058 Procise 494HT 0.1% TFA na na note 1 PSRG002 Procise 494HT na R G 42 0.1% TFA/50% ACCN 29.3 95.60% A L V F D E F K P L V E E P Q N L I R V F D E F K P L V K P E E P Q N L I R V F Instrument Solvent Initial yield Rep. Yield 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Procise 494HT na na na X A L R V F D E F K Initial yield 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 P L V K k e e X q n l i note 2 Sample B2 Participant 020 Sample C1 Instrument Solvent Rep. Yield 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Procise 494 na na na G Participant 014 Procise HT na 0.5 94.60% X A L X V F D E F K X L Participant 016 Procise 494cLC na na na X A L R V F D E F K P L V E E P Q N L I R V F D E R E P Participant 006 Participant 036 Edman degradation was successfully employed in this study to obtain N-terminal sequence information of an unknown protein, not present in public databases, independent of the sample format. However, the most frequently selected sample format was the PVDF membrane followed by the lyophilized sample. A slight dependency between concentration and read-length was found but intra group variation was much higher. Bottom-up work applied to the study samples typically yielded sequences of another protein . However, the correct sequence was called as well. One participant also called the 70 C-terminal residues. In this study, top-down sequencing was attempted by MALDI-ISD from samples A without any success. Investigation of the sample by PSRG showed that the accessible protein amount in samples A (lyophilized) to the analysis was only ~5% of what was determined by AAA potentially due to poor solubility. Only much higher sample amounts of A than distributed allowed to retrieve de novo sequences and several bacterial heat shock proteins (15-16 kDa range) were identified in that sample after LC protein separation. Taken together, Edman sequencing demonstrated that the strict dependency on sample material in particular when applied to a membrane after SDS-PAGE, allowed to operate quite robust and reliably. All mass spectrometric methods, if not linked strictly to an intact protein MW, can easily identify “non target” sequences. Here the solubility and the homogeneity of the sample play a much greater role, in particular for the top-down approaches that have the highest requirement for sample amount and quality to be particularly recognized in future studies. Rep. Yield 3 Procise 494HT Participant 024 CONCLUSION Initial yield 2 A Participant 058 The PSRG prepared the 3 sample sets for distribution as follows: The study protein (95% purity by SEC) was dissolved in 50% acetonitrile/0.1% TFA, lyophylized and the protein content was determined by AAA. The sample was the aliquoted based on protein content to achieve the desired concentrations (5pmol, 15pmol and 45pmol respectively). Samples A were lyophylized, samples B and C were subjected to SDS-PAGE (B) and subsequent electroblotting (C). Upon test analyses for validation, presence of contaminating proteins were acknowledged and found to mimic a client sample in a core facility setting. The validation analysis by ISD was performed on an UltrafleXtreme MALDI-TOF/TOF instrument after samples were shipped and showed that much less protein was available for analysis than anticipated by the original protein quantification. Participants obtained instructions for dissolution of samples in set A. However, valid ISD was only obtained for nominal 100pmol of the sample. The participants were asked to use their code number to report their data in Survey Monkey (www.surveymonkey.com). Edman Degradation Most participants performed the analysis on a Procise 494HT sequencer using standard reagents and protocols. The majority of participants used the sample as provided. For sample set C, the pvdf membrane was directly loaded onto the instrument, for set A, the sample was dissolved in 0.1% TFA containing 20%-50% acetonitrile, and applied onto a prosorb filter. Initial yields and repetitive yields were reported (see table). Bottom-up MS Method: Sample sets A and B were used for this analysis; samples A were dissolved in ammonium bicarbonate and digested usually using Trypsin and 1-2 additional enzymes. The analysis was mostly performed on an LTQ or LTQ Orbitrap and the MS/MS data were subjected to database search using Thermo Proteome Discoverer, or manual de novo mascot searches were performed. Top Down MS Method: The majority of participants utilized an Ultraflex MALDI-TOF/TOF instrument and performed in-source decay (ISD) using the matrices 2,5-diaminonapthalene (DAN) or 2,5dihydroxybenzioc acid (DHB) as matrix. Solvent 1 G Participant 024 STUDY METHODS: TYPICAL PARTICIPANT METHODS Instrument N-terminal Sequence Procise 494 na 0.5 95.80% G A L V F D E F K Procise 494HT na 0.6 90.80% X X L X V F X E F X P L V E Participant 040 Procise na na na X A L X V F D E F K X L V E Participant 058 Procise 494HT na 1.7 94.80% X A L R V F D E F K P L V E PSRG001 Procise 494cLC 93.00% G A PSRG002 g L R V F D E E E P Q N L I R V F D E F Procise 494HT na 2.3 88.00% A L R V F D E F X P X V Sample C2 Instrument Solvent Initial yield Rep. Yield 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Participant 006 Procise 494 na na 0.6 na na G A L r V F D E F f K Participant 014 Procise HT na 1.2 96.40% G A L R V F D E F K P L V E E P Q N L I R V F D X F K Participant 016 Procise 494cLC na 3.5 P L V E E P Q N L I R V F D E F K Participant 020 P L V E E P Q N L I X F 95.50% G A L R V F D E F K Procise 494HT na na na G A L R V F D E F K Participant 024 Procise 494 na 1.4 99.20% G A L R V F D E F K Participant 036 Procise 494HT na 2.3 92.70% G A L X V F D E F K Participant 058 Procise 494HT 95.80% L F PSRG001 Procise 494HT PSRG002 na 4.2 G A R V D E 95.70% G A L R V F D E I R V F D E F K P N X X P E E X Q X N D I G na 3.5 92.20% G A L R V F D E F K P L V E E P Q N L Instrument Solvent Initial yield Rep. Yield 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Participant 006 Procise 494 na na na G A L R V F D E F K P L V E E p Q Participant 024 Procise 494 na 10.8 97.70% G A L R V F D E F K Participant 036 X E D Q N F K P V Procise 494HT 3 K L Sample C3 na F P P L V L E V E E E P Q N L Procise 494HT na 4.8 93.60% G A L X V F D E F K P L V E E P Q N L I X F Participant 040 Procise na 11.5 93.50% G A L R V F D E F K P L V E E P Q N L I R V F D E F K P L V K P E Participant 058 Procise 494HT na 18.1 96.30% G A L R V F D E F K P L V E E PSRG001 Procise 494HT na 5.5 96.60% G A L R V F D E F K P L V E E P Q N L I R V F D E E P N L H P X E na 11.3 89.80% G A L R V F D E F K P L V E E P Q N L I R V F D 6 F 7 D PSRG002 Procise 494HT note 1: no sequence detected. Participant suspects sample not soluble in 0.1% TFA note 2: a total of 50 amino acid residues were sequenced correct N-terminal call a tentative call is denoted with a lower case letter no call is marked with "X "; a wrong call is denoted with a letter not color coded; 450 D 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 L V E E P Q N L I R V F D E F 45 1 452 453 454 455 456 457 458 459 460 461 462 463 464 465 D K A L V A L H V H H H H H H MS Terminal sequence (de novo) 4700 Proteomics Analyzer N-terminus C-terminus 1 G X 2 A X 3 L X 4 R X 5 V X 6 F X 7 D X 8 E X 9 F X 10 K X 11 P X 12 13 L V X X 14 E X 15 E X 16 17 note 1 18 19 20 21 22 23 24 25 26 27 MS LTQ Orbitrap Velos ETD Terminal sequence (de novo) N-terminus C-terminus N-terminus C-terminus N-terminus C-terminus 1 X X X X G 2 X X X X A 3 X X X X L 4 X X X X R 5 X X X X V 6 X X X X F 7 X X X X D 8 X X X X E 9 X X X X F 10 X X X X K 11 X X X X X 12 13 X X X X X X X X X X 14 X X X X X A 15 X X X X X L 16 18 19 20 21 22 23 24 25 26 27 L H V H H H H H H Terminal sequence (de novo) 1 AcM E X X 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 F T X X M N X X D L X X D Y X X F F X X A Q X X A G X X F D V D E D K D A A V L K V F A V L V H P V R H A H L H E H L H L H F N-terminal Sequence Study Results: Bottom-Up Sequencing C-terminal Sequence Sample A1 Participant #040 Sample Processing 10 mM AmBiC, Trypsin and Glu-C LC N/A Sample A2 Participant #048 Sample Processing Trypsin and Chymotrypsin LC Not provided Participant #034 Trypsin, Glu-C and Lys-C N/A 100 mM AmBiC, Lys-C and Lys-N Eksigent NanoLC-2D Samples B1, B2, B3 Sample Processing LC Participant # 048* Trypsin and Chymotrypsin Not provided LTQ Orbitrap Velos ETD Trypsin Not provided LTQ MS PSRG003 Participant #026 Ultraflex TOF/TOF LTQ MS MS N-terminus C-terminus N-terminus C-terminus 1 G 2 A 3 L 4 R 5 V 8 E 9 F 10 K 11 P 440 441 442 443 444 445 446 447 448 449 E T N L Y F Q G D D V K 17 note 2 V A 27 K * Participant #048 sequenced more than 200 amino acids by manual spectra interpretation . Note 1: Participant 040 also sequenced by Edman degradation and had the opportunity to search MS/MS data for the correct N-terminal peptide. Note 2: Participant PSRG003 used Lys-C and Lys-N in combination according to a published procedure for N-terminal sequencing (see reference section). correct N-terminal call Correct C-terminal call no call is marked with "X " an incorrect call is denoted with letter not color coded Study Results: Top-Down Sequencing Sample A Participant 016 Participant 028 Participant 002 Participant 034 PSRG001 Instrument UltraFlex III Ultraflex II Flex control Information not provided Ultraflex TOF/TOF 4800 MALDI-TOF/TOF Matrix Methods DHB, DAN MALDI-ISD DHB, DAN Intact MW , ISD no details provided Intact MW DAN ISD DAN ISD/T3 Sample Prep used sample as provided C4 Zip Tip, eluted with 75% ACN, 0.1% TFA used sample as provided used sample as provided Cl-MeOH precip. Reconst. in 0.1%TFA Results None of the participants were able to call an N-terminal or C-terminal sequence when analyzing sample set A. Investigation of the sample by the PSRG showed that the accessible protein amount in samples A (lyophilized) to the analysis was significantly less than was determined by AAA due to poor solubility of the sample in aqueous solvents only. The validation analysis by ISD was performed on an UltrafleXtreme MALDI-TOF/TOF instrument after samples were shipped. Participants obtained instructions for dissolution of samples in set A. However, valid ISD was only obtained for nominal 100pmol of the sample after LC purification. 42