'96 BC/BP 378

Week 5

Working with proteins - section 3. You will use molecular modelling software to create amino acid and small peptide three dimensional structures; visualize protein secondary structural elements; do physical measurements on proteins of interest; and color protein structures for various purposes.

Author:

Susan Jean Johns

Protein Background Information

Biochemistry is the study of the molecular basis of life. All molecules have a three-dimensional conformation associated with their respective formulas. While all molecules are composed of atoms, these atoms are not rigid building blocks, but rather respond to their local environment, thereby resulting in a number of different conformations for the same formula. Working from a three-dimensional perspective can provide a great deal of insight into the chemical basis of life.

Proteins are composed of amino acids. While an individual amino acid can be considered a structural element, it is not a rigid one and the same amino acid will assume different conformations depending on its location within a given protein structure. Protein folding starts as a local event along a linear sequence whose final result depends on the interaction of all parts of the molecule with one another.

Proteins have primary structure, the order of the individual amino acids that comprise the protein and the location, if any, of its disulfide bonds. This is the type of information that you have been working with so far. Proteins also have a secondary structure. This refers to the spatial arrangement of amino acid residues that are near one another in the primary sequence. Such arrangements can produce periodic structures due to steric interactions and result in helixes, sheets and turns. This type of data is conformational and is determined by x-ray or NMR analysis of the protein. The tertiary structure of a protein refers to the spatial relationship of amino acid residues that are far apart from one another in the primary sequence. When a protein is composed of more than one primary sequences, or chains, there are additional relationships between these chains or sub-units with one another known as quaternary structures.

Proteins, therefore, have two different types of available data. Data based on their primary sequence and that derived from determining their actual structural conformation through x-ray crystallography or NMR determinations. Conformational data as well as primary sequence data has been collected into databases for use by the molecular biology community. These databases can be used to explore the structural side of a protein.

By understanding the structure of a protein, its relationship to other proteins of the same or different families can be visualized and explored. Models can be developed of unknown structures based on known ones. This is called homology modelling. Exploring the structural side of proteins is the theme of this week's laboratory session.


Background Information on Structural Data Entry

In order to work with structure on the computer its relevant information has to be entered into the computer in a form the machine can recognize. When that data doesn't exist in any other source then it is necessary for the user to input the data. Here at WSU, the VADMS Computing Resource supports the use of the MacroModel program for molecular modelling tasks. Therefore, in order to have structural data in a format that can be used by this software it should be entered either directly via the program's input mode or converted from x-ray data formats that its auxiliary software can work with. For this course all the data conversions necessary to have x-ray data available is done for you. All you have to do is enter the necessary name.


MacroModel Program Background Information

Before attempting to enter a structure into the computer with MacroModel, there are some facets of the program you should be familiar with.

1).The following is the initial screen of the MacroModel program.



When the program is started the initial screen shows this image with the INPUT and ORGANI buttons colored green. MacroModel is set up with a working display window surrounded on the bottom and right-hand side of the screen by a series of option buttons, and an area for program messages to the user (or user input window) at the top.

The option buttons are used to communicate user wishes to the program. To activate a button, move the cursor (or cross-hair) to the button's location and press the spacebar. Cursor control on your machine is with the mouse. An activated option is colored green. The message area either informs the user of the status of an option with multiple selection possibilities, requests parameter input, or relates error messages.

2) There are two different types of option buttons, those which move the program to major functional areas, and those which set the parameters for a given function. The major function buttons are located at the bottom of the screen while parameter buttons are on the right-hand side.

3) The program assumes that all atoms drawn on the screen with the DRAW option are carbons until told otherwise. Atom types can be changed by choosing the desired atom from the list on the right-hand side of the screen, moving to the location in the structure where the change is required, and pressing the spacebar.

4) When entering double bonds with the DRAW option, it is best to choose the DRAW option again, move to the starting point of the double bond, press the spacebar and then move off to the ending point of the bond, pressing the spacebar when the cursor is in the desired position. The DRAW option is always active when the button is green, and interesting bonds can result if you do not reset the option.

5) Structures can be grown on the screen using the GROW option. Select GROW and then the unit from the listing that is desired. The purple box denotes the site of the next addition to the growing molecule. This can be changed by using the Orign button.

6) When you make a mistake and wish to remove a bond or an atom or perhaps the entire structure use the DELT button. This button can toggle three ways. Press the space bar slowly and deliberately to send the multiple signals. Pressing it once will allow the removal of a bond or an atom. Once this mode of deletion is selected, go over to the location of the atom to be removed, or the center of the bond to be removed with the cursor and press the space bar. Pressing DELT twice will allow the removal of a molecule from a screen when more than one molecule is shown. Once this mode has been selected, move the cursor over to one of the atoms of the molecule to be removed and press the space bar. The entire screen can be cleared by pressing the DELT button three times and responding with y to the question Confirm complete deletion (Y/N):.

7) The most common problem encountered in minimization (the process for turning art into a realistic structure) is forgetting to add all of the necessary hydrogens to complete the structure prior to the minimizing process. If a problem occurs during a minimization attempt, check to see that all of the hydrogens are in place. Hydrogens can be added using the H ADD button in the INPUT menu. It is necessary to select this button three times to add hydrogen to all of the molecules on the screen. A message will appear at the top of the screen stating what sort of addition is currently possible.

8) Minimization errors not cured by the addition of hydrogens require the examination of the generated mmod.err file. The information contained in this file gives the atom type, number, and type of problem encountered. To look at this file from within the program, select the TTY option, enter the term and then typing the mmod.err file. Viewing this file will give the numbers of the problem atoms. Enter log and press ENTER when the TTY prompt reappears, to return to the program. To see how the structure is numbered, select ANALYZ and the NUM option from this menu. It may be necessary to re-draw the section of the molecule where the error occurred, or it could be that parameters are missing in the MM2 force field to allow the minimization of the desired structure. If redrawing the suspect section doesn't correct the problem, contact Susan Johns for assistance.


Background Information on the Minimization Process

The minimization process takes an image that has been entered into the working window of the MacroModel program and converts it into a structure in which the distances between atoms, the angles between atoms and other structural relationships are now realistic and match that of experimentally determined parameters. This process makes use of files known as force fields that are really tables of the various parameters needed to make such calculations. A number of different force fields can be used. Each force field has strengths and weaknesses. Some are better at having the types of interactions found in organic molecules, others work best with amino acids and peptides. Experience with running such processes determines which force field to use with what type of molecule, or if a combination of force fields should be used.

Minimization insures that all the various bond lengths, bond angles, and torsional values for the molecule being processed with are within the expected parameter values. Such a process is dependent on the size of the molecule. The more atoms in the molecule, the longer the minimization process takes. Once a molecule reaches a certain size, say over 20 atoms or so, it is best to create a batch process to do the actual calculations and allow you to move on to other computing tasks. The standard minimization run produces a molecule as it would exist in a vacuum. Once such a structure has been created, it can then be run again with changes made in the force fields to account for the effects of solvents (i.e., its environment.

When attempting a minimization, the first step is for the program to check and see if there are any problems with the structure as it appears on the screen. If there are, error messages will appear in the user input window. These need to be corrected before the minimization process can take place. Usually it is a simple problem like forgetting to add all the hydrogens need by the MM2 force field or not having any hydrogens for the AMBER force field. Once the error messages are corrected, the appearance of the following statement on the right hand side of the screen tells you that the minimization is beginning: Iter Movt kJ/mol. The numbers in the Iter column should get larger, those under Movt should decrease and finally reach zero, and those below kJ/mol should get as small as possible. At times this will be a negative number. The program's default settings result in you being prompted after every 50 passes (or iterations) to see if you want the minimization process to continue or not. This can be changed if desired.


Background Information on Structural Databases

As with primary sequences for proteins, you don't have to enter every peptide or protein structure that you are interested in. There are structural databases that contain this information. Here at WSU, the VADMS computing resource supports three databases that fall into this category. The first is NRL_3D, which, though a primary sequence database, only contains information on protein sequences with determined x-ray structures. The second is the Cambridge Structural Database, (CSD). This database contains the x-ray structures of small organic molecules. including peptides. The last database is the protein structural database, (PDB). Data from x-ray and NMR sources are collected here on proteins, DNA, RNA and a few carbohydrate structures.


Background Information on Structural Formats

Structural data comes in different types of formats. Some modelling packages use different formats depending on the size of the molecule being worked with. Small molecules may be given as fractional coordinates, Cambridge (CSD) database entries, or in the format of a given software package, depending on their source. The data for large structures such as proteins and nucleotides usually come from x-ray crystallographic efforts and are available in some form of a PDB format.


Small Molecule Formats

The normal way of reporting the x-ray crystal data from small molecules is to give the crystal's cell parameters and then the fractional coordinates for the x, y and z values. Papers usually also give information on bond lengths and angles for the component atoms of the structure. In order to use this data as given, either the program must be able to accept data in this format, or the data should be converted into another format.

For smaller molecules, data can also be supplied in a modelling package's own format. Most graphical modelling software allows you to enter a molecule and then minimize it to produce a final structure. The parameters used in this minimization may affect the quality of the resultant structure. Some programs have auxiliary software that allows you to convert selected modelling package files into their own formats.

Modelling packages that can use the data from the Cambridge Structural Database (CSD) either have an interface built into the program or use auxiliary software to get the data into a form that the software will accept. The conversion process may be long and involved.

Most molecular modelling packages have auxiliary software that can convert a file produced in its own internal format into something akin to a PDB format and vice versa. A few can handle fractional data directly.


Large Molecule Formats

Large molecules such as proteins or nucleotides are usually available in PDB format. Some software can access these ascii files directly, others have conversion programs to transform the data into their own internal formats.

Molecular modelling efforts require a modeller to be familiar with the general ideas of the PDB format for storing x-ray crystallographic data. A PDB file is an ascii file with each line being 80 characters long. In general, the data has been divided into various subject areas, each area using a code located in the first six characters of a line in the file to distinguish it from other areas. The access code for the structure and the line number in the file are located at the end of each line.

A listing of some of the common subject areas:

HEADER	type of the material studied
COMPND	name of the material studied
SOURCE	source of the material used for the crystal
AUTHOR	who did the work
JRNL	journal reference for the work
REVDAT	revisions to the original data submitted
REMARK	comments on some aspect of the crystallization process, the refinement
	process used, references or changes in the data
SEQRES	the sequence of the material studied
HET	the names of nonpeptide units in the structure other than water
FORMUL	the formula(s) for these nonpeptide units
HELIX	helical assignments within the structure and their type
SHEET	sheet assignments within the structure and their type
TURN 	turn assignments within the structure and their type
SSBOND	the location of disulfide linkages in the structure
CRYST1	the crystal's cell parameters
ORIGX 	transformation values
SCALE 	scaling factors for the crystal
HETATM	atom data for nonpeptide units of the structure
ATOM	atom data for peptide residues of the structure
CONECT	connections between atoms in the structure
TER	the end of a protein chain
MASTER	line stating the number of various types of areas
END	the end of the file

Each of these codes gives the line a set format. PDB files don't believe in tabs in a file. The presence of tabs in a data file that otherwise looks fine to the eye will cause conversion software to crash.

The use of these codes makes locating certain types of desired data from a PDB file very easy. Any data that has been made into an accepted subject area can be searched for with the computers' search utility. Nonsubject area data can be more difficult to find.

Care must be taken with PDB files for the following reasons: some workers in this field name things after themselves, the residue codes for unusual amino acids may vary, and the newest x-ray equipment appears to have developed its own order for the component atoms of a peptide residue that does not match the one originally used in earlier PDB data. These small changes can cause problems with software written to convert files from one format to another.

In this course all data conversions will be handled for you. You only need to enter the appropriate name to pull up an structural data file into your modelling program. However, you should be aware that these resources exist and can be used to acquire data of interest.

Exercise for week 5

This series of exercises will acquaint you with a number of different skills needed by a molecular modeller. These skills include: entering structural data and then minimizing it to obtain its most realistic 3-D conformation; making physical measurements on structural models; and color coding protein structure for various purposes. Instructions in bold should be entered followed by pressing the ENTER key.

Molecular structures get complex in a hurry and even a structure as small as a tri-peptide is large enough to cause a modeller to have to use batch processing in order to get the final result. To see what this process is like, you will be creating a tri-peptide and using batch processing to produce its final structure. With a structural model it is possible to determine distances between atoms and compare these values with those for known structural features such as helixes and sheets. You will make measurements of this type. Color coding protein structures makes the information they contain more readily apparent. What appears simple can be difficult to model, including color coding.

l) Activate the computer.

By this time you should know how to activate the machine you want to use, make connections with ribozyme and log into your account. If you still need help with these functions, refer to the beginning of the exercises for weeks 2 and 3 for step by step instructions.


2) Move to this week's subdirectory and copy over to it the necessary files.

% cd five

Now copy over all the files needed to do this week's exercise. They are located in the directory location $UGRAD_DIR/week5.

% cp $UGRAD_DIR/week5/* .


3) Run the demo that describes this week's activities.

This week's demo deals with creating structural data on the computer. The materials to be modelled are amino acids and small peptides. Actual measurements are done on x-ray data to get a feel for the size of the structures being worked with. You will customize protein x-ray data to highlight certain information.

Graphical demos run on different computer. In fact most of this week's exercise will be conducted on this machine. To reach this machine and get yourself in a directory location in an account from where you can run the demo, enter the following command.

% model1

Now get into MacroModel and view the demo for week five. Entering mmv30 starts up the program. Respond to the question about a script file with week5.log and that about doing a batch process with n.

$ mmv30

week5.log

n

The demo shows you the building of the simple amino acid, glycine. It creates and minimizes the structure. In the actual exercises you will be creating structures for alanine, phenylalanine and histidine. The process is the same, draw the structure, check it for accuracy and then attempt to minimize it. If errors appear, stop the process, correct them and it try it again until the computer shows you that the minimization process is working. Very small structures can be done at the terminal by a patient modeller, larger ones should be started at the terminal and then a file written of the started structure to be submitted to a batch job for final processing. After glycine is created and minimized, the beginning steps of the creation of a tri-peptide model for HAF is shown. The structure is only minimized for 200 iterations. This is not enough to produce the final structure, but, far enough to know that the process is working. Even 200 iterations bores the user forced to sit in front of the terminal.

The demo next examines secondary structural elements. After the text screens introduce the topic, a helix is shown on the screen. It has been color coded so that the backbone is white. A second helix structure is brought up, this time the carbons of the backbone are purple and a series of C's appears to the right of the helix. These C's represent the location of the carbons in the helix along the y axis. The atom distances between these C's will be determined. These distances should correspond to the observed rise along the axis of a standard helix. Sheets are explored next. Two sheet sections are shown. Their backbone atoms are white. A second screen is shown in which the carbons of the backbone are purple. Atom distances are determined between these highlighted carbons. This should correspond to the axial distance between adjacent amino acid residues in a sheet.

Last week you worked with one of four unknown proteins. The demo now shows you some information for unknown4 of that series. The name of the unknown protein is given along with its PDB access code and its backbone color coded to show secondary structure. Color coding a protein looks easy when just the final result is shown. To show you the actual steps necessary, the protein crambin is color coded to reflect its secondary structure before you on the screen. When this is finished the demo will stop. It is your turn to use the NRes button. Select the NRes button with the cursor. Now move the cursor to a point on the screen at the end of one end of the aqua section of the molecule and press the space bar. A purple box should appear along with the three letter code for the found amino acid and a number showing its position in the sequence. Record that number below. Repeat this process for the other end of the aqua region and record the result below.

determined points in the aqua region: ________________________________________

Select STOP, and respond with y to the two questions asked by the program.


4) Entering structural data into the computer.

In this section you will use the MacroModel program to enter 3 simple protein amino acid structures into the computer. The respective amino acid formulas are given below.


                                            CH3    O
                                            |      ||
amino acid 1 -    alanine           NH2 --- CH --- C - OH

                                            C6H5
                                            |
                                            CH2    O
                                            |      ||
amino acid 2 -    phenylalanine     NH2 --- CH --- C - OH

                                               CH
                                           HN       NH
                                            |       |
                                            C ===== CH
                                            |
                                            CH2    O
                                            |      ||
amino acid 3 -    histidine         NH2 --- CH --- C - OH
[html note: - Please connect the top atoms of the five member ring of histidine with single bonds.]


section 4a:

Create the first molecular structure. Its formula is was given in the previous section. Use the template and instructions below to help you enter the structure for the amino acid alanine.

1) Activate MacroModel by typing mmv30.

2) Once in the program, respond to the first question by pressing the ENTER key. Then answer the question about what terminal you are using by entering 5 for a Tektronix. Enter the following responses to the asked questions with 4107, n, and 0.

3) When the menu window comes up, move the cursor to the DRAW button and press the space bar. Moving the mouse will move the cursor around on the terminal screen.


cursor

cursor - the large cross-hair lines that appear on the screen. In this program, movement of the cursor to a given button area followed by pressing the space bar selects the various options from the menu shown on the screen.


4) Put the cursor inside the working window near the left side of the screen in the middle, and press the space bar again. A letter C will appear on the screen. This point will correspond to position 1 of the template.

5) Use the cursor to move to position 2 of the template and press the spacebar again. The C will disappear and be replaced by a green line.

6) Repeat step 5 until positions 3 and 4 appear on the screen. Use the template above as a guide in drawing your model.

7) Put in the double bond. Select the DRAW button. Move over to the 3 position on the structure, press the spacebar. The terminal should beep at you for finding an existing atom on the screen. Then move down to position 5 and press the spacebar again. Move back to position 3 again. Another beep should sound. Once this move is finished a double bond will be in the screen between positions 3 and 5. You need to be accurate in the location process or additional atoms and not a double bond will appear on the screen.

8) Now select the O from the side and move to position 5 of the structure, and press the space bar. An O should appear at the chosen site. Move over to position 4 and likewise change that carbon atom into an oxygen as well. Select the N from the side and change the atom in position 1 to a nitrogen.

9) Select DRAW and move to position 2 and press the spacebar. Now move off to a location a quarter inch or so above position 2 and press the spacebar again. This adds the methyl side chain to the alanine structure.

10) Now move over to the H ADD button and select it to add the required hydrogens to all of the carbons of your molecule by pressing the space bar three times, slowly. After the first pressing of the spacebar hydrogen will appear on the nitrogen and the single bonded oxygen atoms. Information on the type of hydrogen addition going on will appear in the user input window. It is easy to press the keyboard keys faster than the computer can respond. This is especially true when lots of users are on the system. Pressing the spacebar slowly is the key to adding the necessary hydrogens. The user input window should have the phrase Full screen H addition at the end of this process. These hydrogens will appear as green lines off of the existing carbons of the structure.

11) Minimize the structure: select ENERGY, select MM2, and then select Start when the cursor reappears. Depending on the accuracy of your initial structure, you may be asked to continue the minimization process. If asked, respond with y and keep doing so until the process stops on it own. It should prompt you to continue 2 or 3 times. You will notice that the minimization process causes the buttons on the right-hand side of the screen to change. Number are presented there to allow you to track the minimization process. The first column, Iters, should start small and go as high as needed to complete the desired process. The second column, Movt, should start high and go down to zero. The last column is the current energy value for the molecule. This should go as low as possible. Its actual value depends on the molecule being worked with. Don't be surprised to see negative values for these energy figures.

12) Write your entered structure to a file. Select WRITE, answer the prompt for the name of file with ala, and enter a short structure statement. The cursor will reappear when the file has been written to your account.


section 4b:

Create the second molecular structure. Its formula was given in the previous section. Use the template and instructions on the next page to help you enter the structure for the amino acid phenylalanine.

1) Notice that the first part of the template on the next page is the same as that for alanine. In fact the only difference occurs from position 6 on. Therefore, instead of repeating the whole process of re-entering the alanine portion of the molecule, just start from the carbon of the methyl side chain of the alanine as position 6 of this template. Since you haven't deleted the alanine molecule it is still on the screen. Select INPUT to change operating modes of the program from energy minimization to data entry.

2) Currently the image of alanine occupies the entire screen. You will need to make more room on the screen if you want to enter the rest of the phenylalanine molecule. To do this use to cursor to select the Scale button. Then move the cursor to a spot at the bottom of the working window on the left and press the spacebar. Now move the cursor to a spot at the right side of the working window about in the middle of the window and press the spacebar. Your image should now fit in only that part of the window you marked off.

3) Select DELT. The phrase Atom or bond deletion should appear in the user input window. Then move the cursor to the middle hydrogen of the methyl side chain on the alanine molecule and press the spacebar. The program should beep at you, which means that it found an existing atom and then the atom and the bond connecting it to the methyl group's carbon atom should disappear from the screen.

4) Select the DRAW button and move the cursor to position 6 (the carbon of the methyl group). The terminal should beep at finding the existing atom. Then follow the template to put in the atoms at positions 7 through 12. At position 12 make a bond to connect to position 7 and form a six-member ring.

5) Put in the ring's double bonds. Select the DRAW button. Move over to the 7 position on the structure, press the spacebar, then move up to position 8 and press the spacebar again. Select the DRAW button again. Move over to the 9 position on the structure, press the spacebar, then move to position 10 and press the spacebar. Select the DRAW button once more. Move over to the 11 position on the structure, press the spacebar, then move down to position 12 and hit the spacebar. Once these moves are made there will be three double bonds on the screen between positions 7 and 8, 9 and 10, and 11 and 12. You need to be accurate in the location process, or additional atoms and not double bonds will appear in your molecule.

6) Now move over to the H ADD button and select it to add all the hydrogens to the atoms in your molecule that require them by pressing the space bar three times, slowly.

7) Minimize the structure: select ENERGY. Since you already have the force field selected (the MM2 button is highlighted in green), select Start when the cursor reappears and start the minimization process. Depending on the accuracy of your initial structure, you may be asked to continue the minimization process. If asked, respond with y and keep doing so until the process stops on it own. Again, this should take 2 or 3 passes to get a minimized structure.

8) Write your entered structure to a file. Select WRITE, answer the prompt for the name of file with phe, and enter a short structure statement. The cursor will reappear when the file has been written to your account.


section 4c:

Create the third molecular structure. Its formula is was given at the beginning of section 4. Use the template and instructions below to help you enter the structure for the amino acid histidine.

1) Notice that the first part of this template is the same as that for phenylalanine. In fact the only difference occurs from position 7 on. Therefore, instead of repeating the whole process of reentering the alanine portion of the molecule, just delete the portion of the molecule that doesn't fit what you need. Select INPUT to change operating modes of the program from energy minimization to data entry. Then select the DELT button. Move the cross-hair to the middle of the bond connecting positions 6 and 7. Press the spacebar. The designated bond should disappear from the screen. Select the DELT button again. Press the spacebar twice to get the molecule deletion version of this button. Remember to do it slowly. The user input window should have the phrase Molecule deletion in it. Move the cross-hair over to any position on the portion of the molecule to be deleted (the part at the top of the working window) and press the spacebar. The undesired part of the old phenylalanine molecule is now gone.

2) Currently the image of the partial histidine molecule fills the bottom section of the working window. You should have enough space to enter in the rest of the molecule. Select the DRAW button and move the cursor to position 6. The terminal should beep at finding the existing atom. Then follow the template to put in the atoms at positions 7 through 11. At position 11 make a bond to connect to position 7 and form a five-member ring.

3) Change the atoms at positions 9 and 11 into nitrogens. To do this select the N from the side of the working window and then move the cross-hair to the two desired positions on the structure and press the spacebar. The atoms at these positions should now be labelled with an N and be colored blue. Select the + from this area and use the cross-hair to place this charge on the nitrogen at position 11 of the template.

4) Put in the ring's double bonds. Select the DRAW button. Move over to the 7 position on the structure, press the spacebar, then move up to position 8 and press the spacebar again. Select the DRAW button again. Move over to the 10 position on the structure, press the spacebar, then move to position 11 and press the spacebar. Once these moves are made there should be two double bonds on the screen between positions 7 and 8, and 10 and 11. You need to be accurate in the location process, or additional atoms and not double bonds will appear in your molecule.

5) Now move over to the H ADD button and select it to add all the hydrogens to the atoms in your molecule that require them by pressing the space bar three times, slowly.

6) Minimize the structure: select ENERGY. Since you already have the force field selected (the MM2 button is highlighted in green) select Start when the cursor reappears and start the minimization process. Depending on the accuracy of your initial structure, you may be asked to continue the minimization process. If asked, respond with y and keep doing so until the process stops on it own. This should take 3 or 4 passes to get a minimized structure.

7) Write your entered structure to a file. Select WRITE, answer the prompt for the name of file with hist and enter a short structure statement. The cursor will reappear when the file has been written to your account. To insure that your will have a file to work with, write another file with the name filler.

You have just entered three amino acid structures into the computer. The files you have written all have the extension, dat. Since it is impossible to determine just how long each student in the class will take to reach this point, the remaining sections have been written assuming that you have started a new session with the MacroModel program. Therefore, exit the program by selecting STOP and respond with y to the questions about exiting. [If there is still time in the first lab session for the week, ignore selecting STOP and select INPUT instead. Then continue on to the next section to the point after the MacroModel program has been activated.]


5) Entering a small peptide into the computer.

Most of the time, a user is interested in something a little bigger than a single amino acid. Usually you are interested in a small peptide or in creating sequence sections for plugging in a homology model. To give you the flavor of doing this, the next section of the exercise deals with creating a tri-peptide out of the amino acids files you have already generated.

Modelling software is great, however it does have some problems. One of these problems shows up when you attempt to create a small peptide. The software can automatically generate these fragments, however, you need to known the final conformation the segment should have. The software believes in helix, sheet and turns, but not a random configuration. If you just want to see what shape the sequence naturally assumes you can't by using the automatic features. To do this you must read in the individual amino acid files, delete unnecessary atoms, make the proper peptide bonds and then minimize the resulting structure. This is because once the conformation of the segment has been set no amount of minimization will ever change it.

With this background information, create the tri-peptide HAF using the instructions beginning on the next page.

1) Activate MacroModel by typing mmv30.

2) Once in the program, respond to the first question by pressing the ENTER key. Then enter the following responses to the prompts: 5, 4107, n, and 0.

3) Select the READ button. When the File: prompt appears in the user input window, type in the name of the first amino acid to be used, hist. Answer the prompt for a structure number by pressing the ENTER key. Repeat this process to enter in the next file, that for alanine named ala. This time you will get the following prompt after the expected structure number one, Delete current structure (Y/N)?. Respond with n and press the ENTER key. Read in the third file, the one for phenylalanine, phe. Again, don't delete the two other previous files.

4) There should be three structures in the working window. The orientation of the last one, that of phenylalanine may not be the best for seeing all the atoms in the structure. If that is the case do the following steps to make the structure more visible. Select the ORIENT button. From that screen select Mol and then move the cross hair over to any atom in the phe molecule. Then select the Rot button. You will be prompted by the message, Input x, y, z or s for simplex search. Enter y to have your structure rotate around its y axis. The next prompt in the user input window wants to know the angle to be used, Enter angle (* to exit). Enter 30 and press the ENTER key. Your phe molecule will move around its y axis by 30 degrees. When you can see the hydrogens off the -carbon and amino nitrogen of the phenylalanine molecule clearly stop pressing the ENTER key and enter * instead followed by pressing the ENTER key to get out of this rotation mode of the program.

5) Select DELT and move the cross-hair to the middle of the bond between the hydroxyl group and the carbonyl group of the hist molecule. Press the spacebar. The bond should disappear. Do the same thing to the similar bond on the ala molecule. Once the bonds are gone, select DELT slowly twice. The user input window should have the message in it, Molecule deletion. Move the cross-hair to either atom of the hydroxyl group that used to be attached to the hist molecule and press the spacebar. Do the same thing to the other hydroxyl that used to be attached to the ala molecule.

6) Put in the peptide bonds between the three amino acids. Select DRAW. Then move the cross-hair to the carbonyl carbon of the hist molecule, press the spacebar. The terminal should beep at you. Now move the cross-hair over to the nitrogen of the ala molecule and press the spacebar. Again the terminal should beep and a bond should appear between the two atoms. This bond will be half green and half blue. Select DRAW, again. Then move the cross-hair to the carbonyl carbon of the ala molecule, press the spacebar. The terminal should beep at you. Now move the cross-hair over to the nitrogen of the phe molecule and press the spacebar. Again the terminal should beep and a bond should appear between the two atoms. This bond will be green and blue.

7) If you closely at the structure on the screen, you will notice that it has two nitrogens with too many hydrogens on them. Select DELT and then use the cross-hair to remove one of the hydrogens from the nitrogen on the ala residue and one from the nitrogen on the phe residue.

8) Start the minimization process on this tri-peptide. Actually, you will make a short pass at minimization to insure that the molecule doesn't have any problems with it and then submit the file to the batch queue to have it run there. Select ENERGY, select MM2, and then select IT/s. Information will be given on the current number of iterations between prompts asking about continuing or not. Enter 5 and press the ENTER key. Now select Start to get the process going. If everything is ok, the process will run for 5 iterations and then ask you if you want it to continue, respond with n. If this is not the case, get some help from your instructor.

9) Now that you know that everything is ok with your structure write the image on the screen to an output file that could be used to complete the rest of the minimization process. Select WRITE, enter TRI is the file name and put in some short comment on the nature of the molecule, such as HAF raw structure.

10) Exit the program by selecting STOP and responding to the two questions with y.

To complete this process, the following steps would have to be taken. A batch job control file would have to be modified to reflect the data for your batch job. An example of such a file is given below. In this control file, the input file is read in and an output file called tri_out.dat will be created to hold the minimized structure. The force field used in this calculation will be the MM2 one ( FFLD is 1) and the job will run until either an rms of less than .01 is reached or to total of 9945 iterations have passed (the MINI line).

   $! pcd  xxxxx  xxxxx  xxxxx  xxxxx  ffff.ffff  ffff.ffff  ffff.ffff ffff.ffff
   $ define incloc1 disk2:[public.mmv30.mmv30.inc1]
   $ run incloc1:batchmin.exe
    tri
    tri_out
    DEMX       0                         20.0000
    FFLD       1
    BGIN
    READ       1
    MINI       2      1   9945
    CONV       2      1
    END
   $ exit    

With the data and control file in hand, a user would then submit the job to a batch queue. A message stating the status of the job would be returned. It will either be executing or pending depending on the order in which it reached the queue. Batch queues run on a first come, first served basis. Another notice would be received when the job finishes if the user is still logged in at the time and not using the MacroModel program.


6) Looking at secondary structural elements.

In last week's exercise you predicted the secondary structure of one of four proteins. Look back in that exercise if necessary to determine which of one your were assigned and record that information below.

unknown used in week 4: ___________________________________________________

To better understand the nature of helixes, sheets and turns, you will be shown a number of files in which you can see the general characteristics of these structures. You will be asked to determine the distances between -carbon atoms in both helical and sheet structures and see the hydrogen bonding in all three structural elements. After this structural review, you will check your prediction results.

1) Activate MacroModel with mmv30.

2) Once in the program, respond to the first question by pressing the ENTER key. Then enter the following responses to the prompts: 5, 4107, n, and 0.

3) Select the READ button. When the File: prompt appears in the user input window, enter helix. A general helical structure will appear in the working window. The helix is shown with its backbone atoms colored in white and the side chains in the default atom colors of the program.

Determine the hydrogen bonding pattern that a helix has. To do this, select ANALYZ. From that button set, select HBOND. The user input window will report the number of hydrogen bonds found and show them in the working window as purple dashed lines.

4) When a helix is aligned straight up and down so that its axis is likewise straight up and down, you can determine the displacement of the -carbons along the helix's axis. To see this for yourself, select the READ button. To the File: prompt, enter helix2 and press the ENTER key in response to the structure number prompt. Delete the current structure on the screen by responding with y to the prompt. The structure on the screen changes to that of a properly aligned helix with its -carbons colored purple. Along the right side of the screen is a series of likewise aligned C's. The position of these C's was determined by finding the -carbons of helix's backbone and using the long cursor of an actual Tektronix's terminal to fix them into a row. Select the ADist button and move the cross-hair over to the C at the bottom of the C column. Press the spacebar and move up to the next C and press the spacebar again. The distance between the two points is shown as well as a colored dashed line connecting them. Press the spacebar again and move off to the next point up the row. Repeat this process until you have determined all the distances between the row of C's. Record the results below. Then determine the average of these values.

axial distances: ____________________________________________________________

_____________________________________________________________________________

average distance: ___________________________________________________________

How well do these values agree with the 1.5 angstrom number normally given for the standard -carbon displacement or rise for a helix? Can you come up with any reasons for the variations?

5) Now look at a sheet file. For a sheet to exist there must be two sheets so that the hydrogen bonding between the two strands can stabilize each other. Select READ and enter sheet to the File: prompt. Again respond to the structure number prompt by pressing the ENTER key. Delete the current structure on the screen by responding with y to the prompt. The image changes to that of two sheet strands on the screen. As with the helix images, the backbone is colored white. Select the HBOND button again to see the hydrogen bonding pattern between the two strands. Look closely at the hydrogen bonds. You will notice that there are bonds between the amino acids side chains and the opposite strand, between the two strands and between individual side chains.

Check out the distances between the -carbons of the strands. Select READ and enter sheet2 to the File: prompt. Respond to the structure number prompt by pressing the ENTER key. Delete the current structure by responding with y to the prompt. The -carbons are purple. Select ADist and then move the cross-hair to the first -carbon on the left side of the bottom strand. The -carbon is the atom between the N and the C=O group. Press the spacebar, then move off to the next -carbon to the right. Keep doing this until you have determined all the distances between the -carbons of the bottom strand. Then determine these distances for the top strand. Record these distances on the next page.


-carbon distances: _________________________________________________________

_____________________________________________________________________________

How well do your distances agree with the 3.5 angstroms number normally given for the standard -carbon displacement along a strand? What are possible reasons for these variations?

Now go through the following section to find out more information about your unknown protein and to determine just how good your predictions were. Replace the term x with the number you used in the instruction set given below.

6) Select the READ button. When the File: prompt appears in the user input window, enter the following answer, unkx_name. Answer the structure number prompt by pressing the ENTER key. Delete the current structure. The name of your protein will appear in the working window.

7) Select READ. When the File: prompt appears, enter the following answer, unkx_code. Answer the structure number prompt by pressing the ENTER key. Delete the current structure. The PDB access code for your protein will appear in the working window.

8) Because the protein structures to be read in next are large, select the A LAB button. The button turns from green to white. When you read in your structure it will be displayed without atom labels. Select READ again. When the File: prompt appears, enter the following response, unkx_struct. Answer the structure number prompt by pressing the ENTER key. Delete the current structure. A color coded structure of your protein will appear in the working window. Only the backbone structure is shown to make the image more easily understood. Side chains make it difficult to locate secondary structural elements. The white sections of the protein are considered to be random structural sections of the protein. The red ones are considered to be helical, the yellow ones sheets and the aqua ones turns.

Check your predictions by looking at the various colored regions of the protein. By selecting the button NRes and then moving to any point on a colored residue, you can determine its position in the protein's sequence and its residue name. This information is shown in purple by the selected residue. Use this button to check the colored sections starting and ending points and record them below. If needed, you can expand a small section of the screen through the use of the Clip button. To do this move the cross-hair to the lower left corner of the area to be expanded. Press the spacebar. A red mark appears on the screen. Now move to the upper right-hand corner of the area. and press the spacebar again. First, the second red mark appears and then, the working window clears and the selected area is expands to fill the window. You will have to repeat the FRes process, but that is a small price to pay for being able to see your area of interest better. By pressing the Clip button twice you can return the entire image to the working window.


helix (red sections): _______________________________________________________

_____________________________________________________________________________

sheet (yellow sections): ____________________________________________________

_____________________________________________________________________________

turn (aqua sections): _______________________________________________________

_____________________________________________________________________________


9) Exit the program by selecting STOP and responding to the two questions with y.


7) Color coding a protein to show its secondary structure.

Now that you have seen what a color coded protein secondary structures like, try coloring a protein yourself. A lot of proteins have a mixed structure, that is they contain both helixes and sheets. Turns are often under reported in x-ray structures. A small protein that is ideal for this coloring project is that of crambin. Crambin is small, 46 residues, and a mixed protein.

1) Activate MacroModel by typing mmv30.

2) Once in the program, respond to the first question by pressing the ENTER key. Then enter the following responses to the prompts: 5, 4107, n, and 0.

3) Select the READ button. At the File: prompt , enter in the name of the protein file to be used in this section, crambin. Answer the prompt for a structure number by pressing ENTER.

The image on the screen is that of the complete crambin structure. It has all the side chains shown. You must strip off the side chains in order to simplify the coloring process.

4) Select the ANALYZ button. From the buttons shown there select the SETS button. From this new button set select the MainS one. Pick the DISPLA button and then the Dis button. The crambin structure on the screen will disappear to be replaced by its backbone structure.

5) Color the backbone white. Do this by selecting the Mono button. Answer the question about what color to use with w for white. Respond with w, this time standing for "working set." The color of the image of the screen will turn white when the image is redrawn on the screen.

6) The next step is to color the various parts of the structure according to the following scheme, red for helix, yellow for sheet and aqua for turn. The crambin protein has two helixes (from 7 to 19 and 23 to 30), two sheets (from 1 to 4 and 32 to 35) and one turn (41 to 44). Select GEOMTR to get to portion of the program that allows for residue selection. Select the FRes button. You will be asked for the residue number; enter the starting point for the first helix. You will be prompted for the chain number, there is only one chain in crambin, therefore, whenever asked for this number just press ENTER. A purple label will be shown on the screen denoting the desired residue. The label appears on the carbonyl carbon of the sought for residue. Enter the ending point of the first helix, press ENTER. You now have the data on the screen needed to color the first helix. To get out of the FRes program mode, press the ENTER key twice.

Check the location of these two markers. If needed, you can expand a small section of the screen through the use of the Clip button. To do this move the cross-hair to the lower left corner of the area to be expanded. Press the spacebar. A red mark appears on the screen. Now move to the upper right-hand corner of the area. and press the spacebar again. First, the second red mark appears and then, the working window clears and the selected area is expands to fill the window. You will have to repeat the FRes process, but that is a small price to pay for being able to see your area of interest better. By pressing the Clip button twice you can return the entire image to the working window.

7) To color this section, select SETS. This time use the ResSq button. You will be prompted for the starting residue. Move the cross-hair over to any atom of the marked beginning of the first helix and press the spacebar. The program will prompt you for the ending point. Move the cross-hair over to any atom of the marked ending of the first helix and press the spacebar. Ignore the prompt for another staring point and select the Set1 button. Enter d for deposit to the question you are asked. Now select the DISPLA button and then the Set1 button. This time respond to the question with r for retrieve. Now select the Mono button and respond with r for red when asked for a color. Since you only want the section of the data in the working set to be colored, respond with w for working. Your designed portion of the sequence will disappear and be replaced by a properly colored one.

hint:- It is wise to save your work after each successful coloring operation. Do this by selecting Write and enter a filename of the current version of the data set. You don't need to enter any structure name information if you don't want to. Just press ENTER to continue the writing of the output file.

8) Use the steps given above to color all the rest of the secondary structural elements. You may find the use of Clip a great help to zero in on an area of interest and insure that you select only the atoms you really want.

9) When you have completed the coloring process, write the image on the screen to an output file. Select WRITE, enter crambin as the filename and use color as the extension. Enter your complete name to the program's prompt for the structure's name.

10) Exit the program by selecting STOP and responding to the two questions with y.

Logout of this machine by entering the command, logout, to the dollar sign prompt.

$ logout

Back on ribozyme, one of the files that were copied over at the beginning of the exercise contain images of some the information shown in the demo. To see what this data looks like in a slightly different format use the week5.images file. First, rename this file to reflect your own lastname and then print it off on the teaching lab printer.

% mv week5.images (your lastname).images5

% lpr (your lastname).images5

Pick up your hardcopy at the printer. The images shown are that of the minimized HAF structure and a colored secondary structure representation of crambin. Save this information. Add it to your growing collection of molecular images.


8) Creating an image file.

Along with the report form for this week, the files necessary to generate some images files were copied over at the beginning of the exercise. These include the control file and data file for generating a secondary structure representation of your unknown sequence from week 4. The software to be used to generate this image is called molscript.

To create one of these image files the following steps need to be taken. First, a control file containing the secondary structural information and color coding instructions needs to be generated. Second, a PDB formatted data file containing the coordinate information needs to be present. Both these files are now in your account's five sub-directory.

To generate the image file, enter the command line on the next page. Replace the x in this line with the number of your week 4 unknown sequence. The command line calls for the molscript program use the unknownx.in file as its control file and outputs a postscript file called (your lastname).ps.

% molscript <unknownx.in> (your lastname).ps

If the molscript process appears to have run successfully, then print out your image file. If you have any questions on this process, ask your instructor.

% lpr (your lastname).ps


9) Finishing up.

Rename the report form to your last name, go into the file using the pico editor and fill in all the questions expect those dealing with surfing the nets. The surfing questions are for extra credit and will give you an idea of the type of structural information is available on the internet.

% mv week5.week5 (your lastname).week5

% pico (your lastname).week5

If you don't intend to do the extra credit, rcp over your report form to the teacher account and log off of the machine. Otherwise, don't rcp over the report form and continue on with the extra credit optional portion of the exercise.

% rcp (your lastname).week5 teacher@ribozyme:receive

This concludes your computing session for this week. Log off ribozyme, get out of the emulator and back to the overlapping windows screen.

% logout

Press the alt and x keys together. This will cause the screen to ask if you really want to exit the program. Respond with y to get out of the teemtalk emulator and return to the overlapping windows screen.


Extra Credit (Optional) - Surfing the Nets for Structure Images

Now that you are back at the windows screen, you can go an explore the Nets for structural images. There is a site on the web known as Molecules R US. This site contains data for all the entries contained in the PDB x-ray database. To go there select the Netscape icon (the large N) with the arrow and press the left mouse button. The arrow changes to an hourglass while the connection is being made to the VADMS home page. Use the arrow to select the Bookmarks menu, and the FORM for PDB query: Molecules R US entry from this menu.

You are now connected to the Molecules R US home page. Depending on network traffic, it may take a moment for their logo to appear. This is a form driven system. Note the empty white box beside the Enter search keyword line. Move the arrow to the beginning of this box and press the left mouse button. You are now ready to enter either a PDB access code for a structure or a keyword to search the database with. The PDB access code for crambin is 1crn. Type this in the box and press ENTER.

The results of the search of the PDB database for the 1crn code is shown on the screen in blue text. There is only one data file with that access code. Real keyword searches may produce a number of hits to choose from. Move the arrow to that line, it turns to a hand in this process, and click the left mouse button.

You have reached the form to actually request a structural image of the desired access code.. Position the arrow on the Submit Request box and press the left mouse button. [If network traffic is slow, the extent of data file transfer will be shown followed by the phrase Document Done.] A new window appears on the screen for the Rasmol application program on the AT&T in which is displayed a wire frame model of the crambin structure.

At the top of the screen is a menu bar with the terms File Edit Display Colours ... in it. Use the mouse to select the Colours menu and the Structure option from it. The wire frame image is now colored in the default secondary structure colors used by this site. The residues involved in helixes are purple, those in sheets yellow, turns aqua and random white. Select the Display menu with the mouse and its Ribbons option. The crambin image now appears as a ribbon image with the side chain atoms removed.

The image before you can be rotated real time on the screen. This is done by either moving the scroll bar blocks in the bottom or right side scroll bars with the mouse, or by placing the mouse in the window and pressing down the left mouse button and moving it around the screen. This process is very slow. When you are finished playing with the crambin image, select the Exit option from the File menu on the control bar. The Rasmol window disappears and you are returned to the Molecules R Us submit window.

Click three times on the Back button at the top of the screen to go back to the initial query screen. At this point you can either explore with the following access codes (2fam, 2sn3, 4mt2 and 6lyz) or exit the program. To exit the ,program select the File option from the top of the screen and its Exit option. This will return you to the overlapping windows screen.

If you decide to explore the other access codes listed above, then you will need to move the arrow to the white box and click the left mouse button. Then use the Backspace key to remove the text found in the box and type in the new code. Repeat the instructions given above.

After you have checked out the images, get back on ribozyme. Move to the five sub-directory and finish filling out the report form for the week with your comments on surfing the nets for structural data. Rcp this file over to the teacher account and log off the system.

% pico (your lastname).week5

% rcp (your lastname).week5 teacher@ribozyme:receive

This concludes your computing session for this week. Log off ribozyme, get out of the emulator and back to the overlapping windows screen.

% logout

Press the alt and x keys together. This will cause the screen to ask if you really want to exit the program. Respond with y to get out of the teemtalk emulator and return to the overlapping windows screen.