PROTEOMICS WORKSHOP COMPUTER EXERCISES
MASS SPECTROMETERS
EXERCISE #1: Observe how one type of mass spectrometer works
Select this link to observe how a trapping instrument works. Electrospray source, Ion trap analyzer: LCQ and ESI Ion Trap System
ON-LINE RESOURCES
EXERCISE #2: Explore some useful on-line websites
General Information
NCBI http://www.ncbi.nlm.nih.gov/
Expasy http://ca.expasy.org/
Protein Data Bank (PDB) http://www.rcsb.org/pdb/home/home.do
Tools available on line
Expasy Tools http://ca.expasy.org/
Protein Prospector http://prospector.ucsf.edu/
XTandem Search Algorithm http://www.thegpm.org/
more data tools http://www.proteomecommons.org/
Database download
Expert Protein Analysis System (ExPASy) SwissProt & trEMBL http://ca.expasy.org/sprot/download.html
European Bioinformatics Institute (EBI) http://www.ebi.ac.uk/
NCBI for GenBank http://www.ncbi.nlm.nih.gov/Ftp/
PEPTIDE SEQUENCING
Exercise #3: Peptide Sequencing
Sequence a peptide "de novo"
PEPTIDE MASS MAPPING
Exercise #4: Peptide Mass Mapping
- data: Open the excel spreadsheet provided in the link excel_spreadsheet. Use the first 4 tabs.
- program: MS-Fit found at protein prospector
- copy paste the m/z values from the spreadsheet into the mass values in MS-Fit
- consider varying options: database, mass tolerance
PEPTIDE MS/MS IDENTIFICATION
Exercise #5: MS/MS Identification
- data: Open the excel spreadsheet provided in the link excel_spreadsheet. Use the tab labelled fragments
- program: MS-Tag found at protein prospector
- copy paste the m/z values from the spreadsheet into the mass values in MS-Tag
- consider varying options: database, mass tolerance
Exercise #6: MS/MS Identification
Use the method in Exercise #5 to find protein identities for spectra. Hardcopies to be distributed.
Exercise #7: XTandem protein identification
Identify proteins using the XTandem program. The program has been installed on each computer
- program location: C:\program files\tandem
- data location: C:\data (*.dta files)
- Open the tandem\bin folder (C:\program files\tandem\bin), Four "input.xml" files have been prepared to search the data. Open one of these files in notepad and note the location of the original data and the destination of the output (Data location: C:\data). 4 sample data have been prepared, single spectrum data: input JK04_1560.xml input JK03_1471.xml, multiple spectra data: inputJK03.xml inputLuci .xml
- Search a data file. Open a command prompt window, cd to c:\program files\tandem\bin (you can type cd <space> then drag and drop the folder icon from an open window as a typing shortcut), Type tandem inputJK04_1560.xml and <enter>. If it says not recognized, retype the command as tandem.exe inputJK04_1560.xml and <enter>. It should say ‘loading spectra’, spectra matching criteria = xx, computing models, then spit out a line of random letters and numbers to indicate its thinking. A command prompt will appear when the search is complete.
- View the output. Ggo to the www.thegpm.org website, click on the ‘genomes’ link at top left, then click on the ‘view saved xml data’ link at top left. You can then browse to select the XML file you just created (C:\data *output.xml files) and click the ‘view models’ link. This opens up a very nice html page of results.
- Search the four prepared data files and view the results.
- Make your own input files as follows: Go to the tandem\bin directory(C:\program files\tandem\bin). Open an input file and edit it to change the file names of the input and output files, using the *.dta files available in the C:\data folder. Save your edited file under a new name, be sure to select "save as" and change the file type to "all files" to save this as an xml file. Repeat the Search and View as described above.
DATA FILTERING
Exercise #8: A Scaffold file (.sfd) will be provided.
Examine the filtering capabilities of Scaffold. The program has been installed on each computer.
Program location: C:\program files\scaffold
Data location: C:\workshop (test.sfd)
Open the C:\workshop folder and double click on test.sfd. Test.sfd is a Scaffold output file created using the data from an experiment with two MudPITs.
1). First note how many identified proteins there are -- this can be seen in the Bio View column. Now go to the pull down menus at the top of the screen and relax the filtering criteria (in this case statistical values and the number of peptides found) by setting the minimum protein probability to 20%, the minimum number of peptides to 1, and the minimum peptide probability to 20%. Now note the number of identified proteins.
2). To see if there are any mitochondrial proteins identified in LS5 that are not in LS2, click on the word mitochondrion in the cellular compartment columns at the top of the display. You may have to click a couple of times to get the view you want. You can click on the column headings to get different views of the data.
3). Next double click on the Cathepsin B precursor row to get a view of the sequence coverage.
4). To view the spectra for a peptide click on one of the peptides in the top right portion of the display. The b and y ions are color-coded. Black peaks don't have an assignment.
5). Click on the publish button to look at the display. This view gives the specifics used for writing up you results.
6.) Lastly click on the statistics button to look at that display.