Modelling Series
Learning about entering data for various uses. Exploring data formats and how to use them.
Author:
Susan Jean Johns
There are two basic ways to enter data into a computer. One is to use an editor or a piece of software to create a new data file in the proper format. The second is to modify an existing data file already in the proper format, by editing or some other means, to contain the new data.
The steps to create a data file from scratch are as follows:
1) collect the necessary raw data for the creation of the file 2) become familiar with the format needed for the desired computing tasks 3) determine the proper way to create the data, through editing or software usage 4) enter the data 5) check the created data for errors by visual examination 6) check the created data for errors by using it in a computing task similar to the desired one, and see if it functions properly
To modify an existing data file:
1) collect the necessary raw data 2) become familiar with the format being used 3) determine the proper way to change the data, through editing or software usage 4) change the existing data 5) check the created data for errors by visual examination 6) check the created data for errors by using it in a computing task similar to the desired one, and see if it functions properly
You must know the type of format the data needs to be in prior to entering it.
For doing sequence analysis tasks, data will be entered in the GCG format. There are other possible data formats, but GCG will be used in this course since that is the software package supported for sequence analysis by the VADMS Center.
Molecular modelling tasks have a greater variety of potential formats into which the data can be entered. The format used depends on the size of the molecule, the source of its raw data, and the software used to produce the raw data.
You can enter small molecular structures directly into a given program's format through the program's graphical interface followed by minimization.
Large molecular data sets, such as those for proteins or nucleotides, are usually entered in a PDB format. New data of this type is collected by sophisticated computer software and the output is usually some form of a PDB format. At times, modifications of this data are needed in order to make it workable with currently available visualization software. Some visualization packages access ascii PDB data files directly, while others use their own form of this data in their program and require data file conversion.
Format conversion for molecular modelling packages is a well-known problem. Many packages have their own conversion programs to allow data created with other software to be used within their program.
Molecular modelling data comes in different types of formats. Some modelling packages use different formats depending on the size of the molecule being worked with. Small molecules may be given as fractional coordinates, Cambridge database entries, or in the format of a given software package, depending on their source. The data for large structures such as proteins and nucleotides usually come from x-ray crystallographic efforts and are available in some form of a PDB format.
The normal way of reporting the x-ray crystal data from small molecules is to give the crystal's cell parameters and then the fractional coordinates for the x, y and z values. Papers usually also give information on bond lengths and angles for the component atoms of the structure. In order to use this data as given, either the program must be able to accept data in this format, or the data must be converted into another format.
For smaller molecules, data can also be supplied in a modelling package's own format. Most graphical modelling software allows you to enter a molecule and then minimize it to produce a final structure. The parameters used in this minimization may affect the quality of the resultant structure. Some programs have auxiliary software that allows you to convert selected modelling packages files into their own formats.
Modelling packages that can use the data from the Cambridge Structural Database either have an interface built into the program or use auxiliary software to get the data into a form that the software will accept. The conversion process may be long and involved. For VADMS users, a conversion tool has been written that allows a user who is armed with the REFCODE(s) of the desired data can extract the data, move it over to Model1 and then automatically convert data sets into MacroModel-usable files. The software on Model1's end is old and this process might not work in all cases.
Most molecular modelling packages have auxiliary software that can convert a file produced in its own internal format into something akin to a PDB format and vice versa. A few can handle fractional data directly.
Large molecules such as proteins or nucleotides are usually available in some sort of a PDB format. Some software can access these ascii files directly, others have conversion programs to transform the data into their own internal formats.
Molecular modelling efforts require a modeller to be familiar with the general ideas of the PDB format for storing x-ray crystallographic data. A PDB file is an ascii file with lines 80 characters long. In general, the data has been divided into various subject areas, each area using a code located in the first six characters of a line in the file to distinguish it from other areas. The access code for the structure and the line number in the file are located at the end of each line.
A listing of some of the common subject areas:
HEADER type of the material studied
COMPND name of the material studied
SOURCE source of the material used for the crystal
AUTHOR who did the work
JRNL journal reference for the work
REVDAT revisions to the original data submitted
REMARK comments on some aspect of the crystallization process, the
refinement process used, references or changes in the data
SEQRES the sequence of the material studied
HET the names of non peptide units in the structure other than water
FORMUL the formula(s) for these non peptide units
HELIX helical assignments within the structure and their type
SHEET sheet assignments within the structure and their type
TURN turn assignments within the structure and their type
SSBOND the location of disulfide linkages in the structure
CRYST1 the crystal's cell parameters
ORIGX transformation values
SCALE scaling factors for the crystal
HETATM atom data for non peptide units of the structure
ATOM atom data for peptide residues of the structure
CONECT connections between atoms in the structure
TER the end of a protein chain
MASTER line stating the number of various types of areas
END the end of the file
Each code signifies a set format for each line. PDB files don't have tabs. The presence of tabs in a file that otherwise looks fine to the eye will cause conversion software to crash.
The use of these codes makes locating certain types of desired data from a PDB file very easy. Any data that has been made into an accepted subject area can be searched for with grep. Non subject area data can be more difficult to find.
Care must be taken with PDB files for the following reasons: some workers in this field name things after themselves, the residue codes for unusual amino acids may vary, and the newest x-ray equipment appears to have developed its own order for the component atoms of a peptide residue that does not match the one originally used in earlier PDB data. These small changes can cause problems with software written to convert files from one format to another.
The software you will be using to enter structural data in this course is MacroModel. This software can be run in an automatic mode through the use of a log file as well as interactively. It is the interactive mode that you use to enter data. This software is housed on model1.
Before attempting to use MacroModel, there are some facts about the program you should be familiar with.
1) MacroModel is set up with a working display window surrounded on the
bottom and right-hand side of the screen by a series of option buttons, and an
area for program messages to the user (or user input window) at the top. Use
the option buttons to communicate your wishes to the program. To activate a
button, move the cursor (or cross-hair) to the button's location and press the
spacebar. Cursor (or cross-hair) control on your machine is with the mouse. An
activated option button is colored green. The message (or user input window)
area either informs you of the status of an option button with multiple
selection possibilities, requests parameter input, or relates error messages.
On the following page is the initial screen of the MacroModel program.
When the program is started the initial screen shows this image with the INPUT and ORGANI buttons colored in green. The input of organic molecules is the default operating mode of the program upon startup.
2) There are two different types of option buttons, those which move the program to major function areas, and those which set the parameters for a given function. The major function buttons are located at the bottom of the screen and parameter buttons are on the right-hand side.
3) The default atom colors used by the program are: green for hydrogen and carbon, dark blue for nitrogen, pale green for phosphorous, red for oxygen and yellow for sulfur.
4) When the program is running on automatic, such as in the demos you will be doing, there is no real way to pause the program to let you study a structure.
A type of pause has been created by changing between the various operating modes of the program. You will notice at times while a demo is running that the image in the working window will remain the same while the buttons are changing. This pause allows you to study what is on the screen. The rate at which a demo runs depends on the number of users on the system. The greater the number of users, the slower the demo. The demos will probably run slower than you would like them to.
Modellers need to know how to enter primary sequence data, use secondary structure prediction programs, create multiple alignments and do other aspects of sequence analysis if they are going to work with proteins.
At WSU, the software supported for sequence analysis tasks is GCG. The databases are created and stored using software from GCG. GCG sequence files have header information at the beginning of the file, the actual sequence at the end of the file; and a checksum line between the two sections.
The nature of the header section of the file depends on the database from which it was extracted or the verboseness of the person who created the file. If the file is from a database, there will be information on the name of the sequence, its source, its accession number, references, and feature information about the sequence. A sequence file that is not from a database may contain anything in its header section. The information placed there depends solely on the whims of its creator. Hopefully, at a minimum, the name of the sequence and some information about its preparation or features will be there.
Located between the header and sequence sections is what GCG refers to as the checksum line. It contains the filename of the sequence file, the length of the sequence, the date the file was created, the type of data it is (P for protein and N for nucleotide), and a number. This number is used in GCG programs to see if any scrambling of the data has occurred for whatever reason. After the checksum number are two periods. GCG uses the location of these two periods to signal the end of non sequence material in the file, and the beginning of the actual sequence information.
In the sequence portion of the file, the data is shown in blocks of ten, with fifty characters to a line. Each data line is preceded by a number showing the position in the sequence of the first character in that line.
With this format information, you can create a GCG-convertible file. Use an editor to create a file that contains whatever is desired in the header information. On a line by itself, between the header information and the actual sequence data, there must be a line with just two periods, one right after the other. The sequence can be entered in any manner you find convenient. No numbering is necessary, and a three-letter code may be used if desired. After the file is created, the GCG program REFORMAT is then run. REFORMAT will convert this file from a text file into a usable GCG sequence file. Command switches can be used to handle the conversion of three-letter codes to one-letter codes.
Modifying an existing sequence file can be done by making the necessary changes in the header and sequence sections. After the use of REFORMAT on this changed file, the new output file containing the modified data is ready for use.
In the GCG software suite a program called SEQED is normally used to enter and/or modify sequence data. This program can be run from a subdirectory in which the keyboard has been redefined to allow easy entry of nucleotide data.
Like GCG, other software packages have their own data format for storing sequence data files. When working with data gathered off the nets or from other computer users, it is not unusual to have conflicting data formats to deal with. Luckily, conversion software exists to handle the problem of converting one data format into another. Each of the databases has their own individual format. The main differences are in the way reference information is presented, the order in which the information is presented, and the length of their respective access codes. Therefore, data on the same sequence from various databases will look different, but contain the same information.
Examples of GCG formatted data files.
A protein file from the NRL_3D database in GCG format.
P1;1CRN - crambin - Abyssinian crambe
C;Species: Crambe abyssinica (Abyssinian crambe)
A;Note: seed
R;Hendrickson, W.A.; Teeter, M.M.
submitted to the Brookhaven Protein Data Bank, April 1981
A;Reference number: A50099; PDB:1CRN
R;Teeter, M.M.
Proc. Natl. Acad. Sci. U.S.A. 81, 6014, 1984
A;Title: Water structure of a hydrophobic protein at atomic resolution.
pentagon rings of water molecules in crystals of crambin.
R;Hendrickson, W.A.; Teeter, M.M.
Nature 29, 107, 1981
A;Title: Structure of the hydrophobic protein crambin determined directly from
the anomalous scattering of sulphur.
R;Teeter, M.M.; Hendrickson, W.A.
J. Mol. Biol. 12, 219, 1979
A;Title: Highly ordered crystals of the plant seed protein crambin.
C;Resolution: 1.5 angstroms
C;Determination: X-ray diffraction
C;Keywords: seed
F;1-4,32-35/Region: beta sheet
F;7-19/Region: helix (right hand alpha) (3/10 conformation res 17,19)
F;23-30/Region: helix (right hand alpha) (distorted 3/10 at res 30)
F;41-44/Region: turn
F;3-40/Disulfide bonds:
F;4-32/Disulfide bonds:
F;16-26/Disulfide bonds:
1CRN Length: 46 January 18, 1996 07:16 Type: P Check: 923 ..
1 TTCCPSIVAR SNFNVCRLPG TPEAICATYT GCIIIPGATC PGDYAN
Since this file is in GCG format, a checksum line separates the header or reference information from the sequence data. This line contains the access code for the sequence in the database, it length, the type of sequence it is (P for protein) and the checksum number for the sequence. Note the two periods at the end of the line. The location of these two periods allows GCG to determine where the header information ends and the sequence data starts.
At the end of the file is the actual sequence data. The default GCG format puts the sequence into numbered lines containing blocks of 10 characters, 5 blocks to a line. The number at the beginning of the line is the position in the sequence of the first character in the first block on that line of data. Since crambin is only 46 residues long it takes a single line to show the entire sequence.
A nucleotide file from the GenBank database in GCG format.
LOCUS A22411 479 bp DNA PAT 09-NOV-1994
DEFINITION DNA for CMV Ribozyme RNA-spacer-antisense RNA.
ACCESSION A22411
NID g641479
KEYWORDS .
SOURCE unidentified.
ORGANISM unidentified
unclassified.
REFERENCE 1 (bases 1 to 479)
AUTHORS Muellner,H., Uhlmann,E., Eckes,P., Schneider,R. and Uijtewaal,B.
TITLE Multifunctional RNA with self-processing activity, its production
and use
JOURNAL Patent: EP 0421376-A 1 10-APR-1991;
HOECHST AKTIENGESELLSCHAFT
COMMENT NCBI gi: 641479
FEATURES Location/Qualifiers
source 1. .479
/organism="Artificial sequences"
BASE COUNT 134 a 110 c 133 g 102 t
ORIGIN
A22411 Length: 479 January 18, 1996 07:27 Type: N Check: 6241 ..
1 CCGGGAGGTA GCTCCTGATG AGTCCGTGAG GACGAAACAA CCTTGTCGTC
51 GACAAAATGG TCAGTATGCC CCTCGAGTGG TCTCCTTATG GAGAACCTGT
101 GGAAAACCAC AGGCGGTACC CGCACTCTTG GTAATATCAG TGTATTACCG
151 TGCACGAGCT TCTCACGAAG CCCTTCCGAA GAAATCTAGG AGATGATTTC
201 AAGGGTAGCT CGACAACCTG GATCCAAAAT GGTCAGTATG CCCCCCATGG
251 CAACAGATTG GCGAATGAGA AAGTGGGTGG AGGACTTATC ATAGTAACAG
301 AAGAGAGACT AGAACTGCAG AAAATGGTCA GTATGCCCCA GATCTACCGG
351 AGGTTCTACT AGCATTGGGA GAGCTCGATT TGTCCATAGG CACACTGAGA
401 CGCAAAAAGC TTAAGGTTGT CGAGCTACCG GGGCCCAGGG CATACTCTGA
451 TGAGTCCGTG AGGACGAAAC CATTTTGGG
Since this file is in GCG format, a checksum line separates the header or reference information from the sequence data. At the end of the file is the actual sequence data. Since this sequence is 479 bases long, it takes 10 lines to show the entire sequence.
Week 4 Exercise
This series of exercises will acquaint you with entering data for a number of different uses. Items in these instructions which appear in bold should be entered followed by pressing the RETURN key.
l) Activate the computer
Moving the mouse changes the terminal from screen saver mode to active.
2) Select the RIBOZYME icon
From the Launcher window, select the RIBOZYME icon by moving the arrow with the mouse over to the RIBOZYME icon and pressing the mouse button twice. Successful connection to ribozyme is denoted by the appearance of a ribozyme information line and a login: prompt.
IRIX (ribozyme) login:
Once the login: prompt appears, log on to the machine by entering first your account name to the login: prompt, and then your password to the Password: prompt. Now that you are on ribozyme, enter sequence data into the computer. To do this, you will go through a number of steps designed to give you some insight into how data entry works.
4) Create a subdirectory to keep this week's work in.
To keep data in separate working areas, it is necessary to create subdirectories. This is done with the mkdir command. Create the following subdirectory in your account.
% mkdir week4
Now move into that location using the following command line.
% cd week4
Copy over the data files needed for this week's activities.
% cp $GRAD_DIR/week4m/*.* .
As you discovered last week in doing a gopher search of Brookhaven 's PDB site, there can be more than one set of coordinate data to work with. These data sets vary in possible included substrate groups, crystal shape and level of resolution. To make the number of possible data sets used in this course manageable, one of the various possible coordinate data sets has been chosen for each selected molecule. To insure that you will be using the correct one throughout the rest of the semester, copy the chosen one over to your account now. A command line example is given for each of the four selected molecules. Carry out only the command line that applies to your selected molecule.
plant ribulose bisphosphate carboxylase/oxygenase small subunit % cp $GRAD_DIR/data/4rub.coords . P21 ras proto-oncogene transforming protein % cp $GRAD_DIR/data/6q21.coords . basic fibroblast growth factor % cp $GRAD_DIR/data/4fgf.coords . superoxide dismutase % cp $GRAD_DIR/data/1sdy.coords .
Now examine the coordinate file with more and record the following pieces of information below: resolution, number of residues, chain identifier (if any), secondary structural elements. You will need to be familiar with these coordinates and what these values are for your selected protein. In the example command line xxxx.coords represents the name of your selected molecule's coordinate file.
% more xxxx.coords resolution: _____________________ number of residues: _____________________ chain identifier: __________________________________________________________ secondary structural elements: _____________________________________________
There are two ways you can enter sequence data from scratch with the VADMS software. The first way uses an editor to create a file. To do this, you need to be familiar with the format required by the software the file will be used with. The second way uses the GCG program SEQED. This program will create a sequence file in compatible GCG format. See the GCG program manual for complete details. A brief description is given later in this exercise. The method you choose to enter a sequence depends on how comfortable you feel with editing files. Simple changes will allow a file created with one data entry method to be usable in another software system.
section 6a
In this section you will use the editor, pico, to create a sequence file in raw GCG format and then reformat it into final GCG format.
Enter the following protein sequence into a file with pico. The name of the protein is melittin, its source is the honeybee, and it is 27 amino acids long. Call your file bee.seq. The sequence you wish to enter is
GIGAVLKVLTTGLPALISWISRKKRQQ
% pico bee.seq
Now insert some sort of comment lines, naming the protein, its source, and any other information you feel is important for future reference. The better the comments made at the beginning of a user-entered sequence, the more useful that data will be to future users of the sequence. Follow this with two periods on a line by itself. This is used by GCG as a marker to denote the end of the header or comment information and the beginning of the actual sequence.
..
Enter the sequence using capital letters, since most formats require them and its a good habit to get into. There are times when DNA sequencers use lower case letters to denote bases that they are not sure of. If you are using such a coding system for any reason, be sure it will not negatively affect any later analysis work you want to run on the sequence. Test it out on dummy data and if need be, keep two forms of the sequence data around -- one coded and the other in all upper case letters.
Finish the editing session by pressing Ctrl-x and responding to the exiting prompts appropriately.
Type the file you just created. See if you can spot any possible problems with the data prior to using your file in the GCG program REFORMAT.
% cat bee.seq
Now use your file with the GCG program REFORMAT. Invoke the GCG package by entering gcg. The GCG software package is set up to run programs simply by entering their name. You will be using the program REFORMAT to convert your edited file into a GCG compatible sequence file.
% gcg
The GCG welcome message appears on the screen.
REFORMAT is an interesting program that can be run in a variety of ways. Here you will be using the most simple aspects of the program. The program will ask you for the name of the file to work with and what to call the changed file. The default value will name it the same as before, so try to keep your raw data distinct from sequence files that will work in the system by using a different extension for each case. If the file is OK, there will be no error messages. If you get one, go back and revise bee.seq and repeat this process until there aren't any error messages. User input is shown in bold type.
% reformat
REFORMAT rewrites sequence file(s), symbol comparison table(s), or
enzyme data file(s) so that they can be read by GCG programs.
REFORMAT what sequence file(s) ? bee.seq <rtn>
bee.seq length: 27 aa
Type the reformatted file and note the differences between it and the original file you entered with the pico editor. While the comment or header information you entered is the same, the two periods have been replaced by an elaborate checksum line and your sequence data is now in blocks of 10 characters. Examine the reported length for the reformatted sequence. If it is not the 27 that it should be, locate the problem, correct it via editing and reformat it again.
% cat bee.seq
In this section you will use the GCG program SEQED to enter a simple protein sequence into the computer in GCG usable format.
Activate the SEQED program by entering its name at the prompt.
% seqed
The screen displays a double set of lines numbered from 0 to either 70 or 100. Since a filename was not given when the program was started, the software prompts you for the name of sequence to work with. Respond as shown below. User input shown in bold type.
SEQED of what sequence ? bee2.seq <rtn>
Because there is no such file in your present directory location to work with, the program starts prompting you for information to insert in the header or comment portion of the new file. The cursor moves to the top of the screen. Enter some comment lines. For this section you will be entering the same data as in section 6a, so put in comments relating to the name of the material and its source. When you are finished entering comment information, press Ctrl-d to return to sequence entry mode.
When the cursor moves to the first position on the top of the two lines, enter in the sequence. As you enter the actual sequence, the cursor moves to the right and each new character appears on the screen. Symbols appear as you move along on the lower line as well. When you are finished, press Crtl-d to go into the command mode of the program.
When your are in the command mode, a colon will be in the bottom left-hand corner of the screen. Save your efforts to an output file by entering exit. The program will write the file with the same name you gave it earlier, bee2.seq, and return a notice that the file contains so many residues, and then quit.
Type off the results of your efforts. Notice the P after the Type: term in the checksum line. This indicates that the file contains a protein sequence.
% cat bee2.seq
SEQED is a complex program with many different options. For more complete information of its operation consult the GCG manual located in your carrel drawer.
7) Using sequence data to search for structural information.
There is a small primary sequence database composed solely of the protein sequences deposited in the PDB database. The name of this database is NRL_3D. It can be used to do sequence-based searches on the PDB data. As originally organized, PDB is not set up to permit this type of sequence searching.
To explore this type of search in the database use the earlier created sequence of melittin, bee.seq. One way of doing database searching in GCG is with the program FASTA. Normally FASTA runs are done in batch mode, however, the NRL_3D database is so small this search can be done interactively.
% fasta
You are now in the program, selecting the parameters to be used. The query sequence is bee.seq, the database to be searched is nrl_3d:*, and you want only the top 5 sequences to be saved. For the rest of the parameters, accept the default values shown by pressing the RETURN key and moving on to the next item. Given below is an example of what you can expect in this procedure. User input is shown in bold type.
FASTA does a Pearson and Lipman search for similarity between a query sequence and any group of sequences of the same type (nucleic acid or protein.) For nucleotide searches, FastA may be more sensitive than BLAST. FASTA with what query sequence ? bee.seq <rtn>At this point information is given about the search and the hits that were found. The number of hits saved is determined by the number of hits that exceed the default expectation value. If this number is greater than 40 as it is in this case, you will see the following on the screen.Begin (* 1 *) ? <rtn> End (* 27 *) ? <rtn> Search for query in what sequence(s) (* SwissProt:* *) ? nrl_3d:* <rtn> What word size (* 2 *) ? <rtn> Don't show scores whose E() value exceeds: (* 10.0 *): <rtn> What should I call the output file (* bee.fasta *) ? <rtn>
Show more scores (* yes *) ? no The list contains 40 entries. How many alignments would you like to see (* 40 *) ? <rtn> Aligning...
Data is given on the time it took to do complete the fasta process, along with the name of the output file.
Examine the results of this search by typing the bee.fasta file. The first part of the file contains information on the statistics of the search and the rest shows the hits and their alignments to the original sequence. These results show that there has been a crystal structure solved for a melittin sequence though it does not match the one you entered exactly.
% cat bee.fasta
Another way to see if PDB contained any melittin data would have been to search the PDB gopher site for the term, melittin. This approach would not have produced aligned sequences, but would have resulted in letting you know that melittin data exists in the database and giving you the needed access code(s). Searching NRL_3D this way produces alignments between sequences whose structures may be unknown with those of known structures.
8) Converting sequence information from PDB files into GCG usable
files.
Sometimes it is handy to be able to convert sequence data from a PDB ascii file into a GCG usable file. At times a user needs to be sure that the reported sequence data in a sequence database is exactly matches that of a crystal structure and the user doesn't want to manually enter in the sequence.
In order to get the necessary data from a PDB file, you must know how these files designate sequence information. PDB files use the first 6 characters of each line as a label for the type of data contained in the rest of the line. Sequence data is found in lines beginning with the term SEQRES. Once a file of interest has been located using the grep utility to find the character string, seqres will yield the desired sequence information. PDB files use three-letter codes for peptide residues instead of the one-letter code utilized in the databases. GCG's REFORMAT program can handle the conversion of three-letter to one-letter code and back again.
For this section use the sequence data located in the PDB file with the code 2SN3. For the purpose of this section, this PDB file was copied over to your subdirectory at the beginning of the exercise. Use the grep command to locate in the file the lines that contains the protein sequence data. These lines begin with the term SEQRES.
% grep SEQRES 2sn3.pdb > 2sn3.seq
The contents of your 2sn3.seq file are the five lines of 2sn3.pdb file that start with SEQRES. Type off this file and get an idea of the steps you must take next to convert this data.
% cat 2sn3.seq
After examining the contents of the file, it will be obvious that some editing will need to be done. The final outcome of your editing efforts should result in a file that has some sort of header or comment information separated from the lines of three-letter sequence code by a line containing just two periods. The sequence lines need to be free of everything but the actual three-letter residue codes, separated from one another by a single space. Use pico to edit the 2sn3.seq file and put it into the necessary shape. Write out your modifications into a file called lastname.seq2. As in previous examples the lastname here represents your own last name.
% pico 2sn3.seq
The REFORMAT program can be run with the command switch -three. It will convert the three-letter code found in the file to be worked with into one-letter code in the resulting output file. Do the steps listed below to perform this conversion. If an error message results, find out why, and correct the situation. User input shown in bold type.
% reformat -three
REFORMAT rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs.REFORMAT what sequence file(s) ? lastname.seq2<rtn> lastname.seq2 length: 65 aa
Check the results of your work by typing the file onto the screen. Most PDB files are not as small as the one used in this section. Many PDB files have more than one chain, adding to the complexity of the process. The procedure, however, is still a handy one to be aware of.
% cat lastname.seq2
Sometimes it is necessary to get data files directly from a database server. The database servers on the networks have their information updated every evening while VADMS' sequence databases are updated on a bimonthly or quarterly basis and the PDB database is not even available here. Therefore, a search of NRL_3D would report an access code for a PDB data file that is not locally available. Assume that you have come across a needed sequence in NRL_3D and its code is 2BOP.
It is time to get help from a server. For this data location, we will use GOPHER. GOPHER is an access tool that allows you to search the Internet for useful information. To assist in this search a bookmark file was created in your account when it was first set up to narrow the search field you will need. You can either ignore the bookmark file and surf GOPHER without it or add more names to it. See Steve Thompson for more information on using GOPHER effectively. The screen traces shown on the next few pages have been truncated to take up less space. The blank space between the menus and the bottom of the displayed screens have been reduced.
% gopher
The following screen appears on your terminal.
Internet Gopher Information Client v2.1.3
Home Gopher server: serval.net.wsu.edu
--> 1. About WSUinfo/
2. Student Information System/
3. WSU Campuses Information/
4. Desktop Resources/
5. Discussion Forums/
6. Library Resources/
7. Software Archives/
8. Gopher Tunnels/
9. News & Weather/
10. Internet Reference/
Press ? for Help, q to Quit Page: 1/1
Once the screen is displayed, press the v key. This will load the existing list of bookmarks into the program and start the process of searching molecular biology sites for the information being sought.
Internet Gopher Information Client v2.1.3
Bookmarks
--> 1. Computational Biology (Welchlab - Johns Hopkins University)/
2. Brookhaven National Laboratory Protein Data Bank/
3. EMBnet BioInformation Resource EMBL (Germany)/
4. IUBio Biology Archive, Indiana University/
5. PIR Archive, University of Houston/
Press ? for Help, q to Quit, u to go up a menu Page: 1/1
Select option 2 from this list by moving the horizontal arrow down to that position with the terminal down arrow key and pressing the RETURN key. This moves you to the Brookhaven National Laboratory gopher site.
Internet Gopher Information Client v2.1.3
Brookhaven National Laboratory Protein Data Bank
--> 1. Welcome to the Brookhaven PDB Gopher Hole!
2. An (almost) full text search of the PDB Bibliographic Headers <?>
3. Search by entry id only <?>
4. *NEW* Check the Status of a Pending Entry by ID, Tracking, Auth.. <?>
5. *NEW* Check the Status of Entries on "HOLD" by ID, Tracking, Au.. <?>
6. Raw access (Try the indexed searches instead)/
7. Important message for BNL INFORM users.
8. Documents/
9. Information about the PDB Mailing List (List Server)
10. Recent Announcements and Changes/
11. Recent PDB Newsletters/
12. Related Databases and the rest of Gopherville/
13. Software Available from PDB and friends/
14. Some hints for searching the Brookhaven PDB/
15. The PDB's Anonymous FTP /
Press ? for Help, q to Quit, u to go up a menu Page: 1/1
Select option 3 by moving the horizontal arrow to that location with the terminal's down arrow key and pressing the RETURN key. Since the access code for the data is already known this is the most effective way to search for the data. This action results in the screen given below.
Internet Gopher Information Client v2.1.3
Brookhaven National Laboratory Protein Data Bank
1. Welcome to the Brookhaven PDB Gopher Hole!
2. An (almost) full text search of the PDB Bibliographic Headers <?>
--> 3. Search by entry id only <?>
4. *NEW* Check the Status of a Pending Entry by ID, Tracking, Auth.. <?>
----------------------------Search by entry id only----------------------------
| |
| Words to search for |
| |
| |
| |
| [Help: ^-] [Cancel: ^G] |
------------------------------------------------------------------------------
13. Software Available from PDB and friends/
14. Some hints for searching the Brookhaven PDB/
15. The PDB's Anonymous FTP /
Press ? for Help, q to Quit, u to go up a menu Page: 1/1
Enter the term 2bop in the highlighted box to begin the search for the data and press the RETURN key. The search will start and after a few moments the following screen will appear.
Internet Gopher Information Client v2.1.3
Search by entry id only: 2bop
--> 1. 2bop : BOVINE PAPILLOMAVIRUS-1 E2 (DNA-BINDING DOMAIN) (RESIDUES 3../
Press ? for Help, q to Quit, u to go up a menu Page: 1/1
Press the RETURN key to get the actual data file to appear on the screen. If a keyword search rather than an access code search had been requested, there would have been a number of possible selections instead of the single one our search resulted in. The following screen appears.
Internet Gopher Information Client v2.1.3
2bop : BOVINE PAPILLOMAVIRUS-1 E2 (DNA-BINDING DOMAIN) (RESIDUES 325 - 410) COM
--> 1. 2bop.biblio
2. 2bop.full
3. 2bop.gif <Picture>
Press ? for Help, q to Quit, u to go up a menu Page: 1/1
There are usually three files in the database for any given entry. The full file contains the coordinate data of the structure. This is the information that is being sought. Select the full file by moving the arrow and pressing the RETURN key. This is a moderately sized file and it may take some time for it to load if the network is busy. In time the following screen should appear.
2bop.full (96k) 0% ------------------------------------------------------------------------------- HEADER TRANSCRIPTION REGULATION 13-JAN-94 2BOP 2BOP 2 COMPND BOVINE PAPILLOMAVIRUS-1 E2 (DNA-BINDING DOMAIN) 2BOP 3 COMPND 2 (RESIDUES 325 - 410) COMPLEXED WITH DNA 2BOP 4 SOURCE BOVINE PAPILLOMAVIRUS-1 GENE FRAGMENT RECOMBINANT FORM 2BOP 5 SOURCE 2 EXPRESSED IN (ESCHERICHIA COLI) 2BOP 6 AUTHOR R.S.HEGDE,S.R.GROSSMAN,L.A.LAIMINS,P.B.SIGLER 2BOP 7 REVDAT 1 31-JAN-94 2BOP 0 2BOP 8 JRNL AUTH R.S.HEGDE,S.R.GROSSMAN,L.A.LAIMINS,P.B.SIGLER 2BOP 9 JRNL TITL CRYSTAL STRUCTURE AT 1.7 ANGSTROMS OF THE BOVINE 2BOP 10 JRNL TITL 2 PAPILLOMAVIRUS-1 E2 DNA-BINDING DOMAIN BOUND TO 2BOP 11 ------------------------------------------------------------------------------- [Help: ?] [Exit: u] [PageDown: Space]With this screen displayed, press the s key. This will tell the gopher site that you want to save this file in your own account on the platform you are using. The following screen appears asking if you want the default name for the file or you can enter something for the filename.
2bop.full (96k) 0% ------------------------------------------------------------------------------- HEADER TRANSCRIPTION REGULATION 13-JAN-94 2BOP 2BOP 2 COMPND BOVINE PAPILLOMAVIRUS-1 E2 (DNA-BINDING DOMAIN) 2BOP 3 COMPND 2 (RESIDUES 325 - 410) COMPLEXED WITH DNA 2BOP 4 -----------------------------------2bop.full----------------------------------5 | | | Save in file: |6 | | | 2bop.full |7 | | | [Help: ^-] [Cancel: ^G] |8 ------------------------------------------------------------------------------ JRNL AUTH R.S.HEGDE,S.R.GROSSMAN,L.A.LAIMINS,P.B.SIGLER 2BOP 9 JRNL TITL CRYSTAL STRUCTURE AT 1.7 ANGSTROMS OF THE BOVINE 2BOP 10 JRNL TITL 2 PAPILLOMAVIRUS-1 E2 DNA-BINDING DOMAIN BOUND TO 2BOP 11 ------------------------------------------------------------------------------- [Help: ?] [Exit: u] [PageDown: Space]
Just press the RETURN key and the file will be saved in your current directory location. You should give the process a few minutes to complete and then exit the gopher program. Press the q key twice and press the RETURN key to the query about quitting the program to return the machine prompt.
With the data in hand go through the process outlined in section 8 about. Use grep to locate the sequence lines. Check these results to insure that you are only working with the protein portion of the structure. If not, edit the file to contain only the desired data. Modify the grep output into raw GCG format and run REFORMAT on the results with the -three option. Change the filename of the resulting sequence file to (your lastname).full and send it over to the teacher account.
% mv 2bop.full (your lastname)-bop.full % rcp (your lastname)-bop.full teacher@ribozyme:receive
You are now in an account on ribozyme. This machine is fast, however even at this speed it is not possible for all of the modelling students in BC/BP 578 to do a complete CSD search at one time. Because of this, a subset of the database was created. This subset will be accessed in a manner exactly like that of a full-fledged search, only the data to be poked through is much smaller than the complete database.
The searching aspects of the system are contained in a program known as Quest. Quest can be run either as a text search or graphically. The user applies a list of tests to the database to see if it has information of interest. When a structure passes all the tests outlined for it, a hit is produced that appears both on the terminal and in a journal file of the search. Careful selection of tests for a search will produce the minimum number of hits that contain just the information sought and not a lot of extraneous stuff. A copy of the Quest instructional manual is in the desk drawer of the carrel. Consult this manual if more complex searches are needed.
There are numerous ways to search for data. Listed below are a number of tests based on the nature of compounds being sought and what will be done with the data. The tests are given in the format expected by the QUEST program.
1) Since the desired data for this exercise needs to have coordinates to create a MacroModel file, testing for coordinate data is the logical way to start.
t1 *coords.gt.0
2) The compounds of interest for this exercise are small, well-established compounds that have had their structures known for a long time. One way to search for items of interest is by their compound name.
t2 *xname 'acetic'
When searching for names of any kind, the software expects to see the term being sought for within single quotes.
Note: There are times when the name you know a compound by and how it is named in the database do not match. If such a situation arises, try searching for other possible ways of naming the compound, or try the formula test explained later in t4.
3) Since these structures are small, they often appear in the CSD as solvent molecules in a more complex crystal structure. Coupling the compound's name with a test for solvate will increase the chances of the desired structure being a separate entity.
t3 *xname 'solvate'
4) Such solvates have their formula given as a separate item in the formula line of an entry, and that is a good way to insure that the compound you find is the desired solvate molecule and not a different one. In this example the desired compound is acetic acid with a formula of C2H4O2.
t4 *formula c2 h4 o2
When formula tests are being used to find a compound with naming variations, select from the hit list only those refcodes that have the exact formula you are looking for. After coming up with a list of possibilities, the refcodes can be converted into files that can be viewed in MacroModel to find the structure being sought. There may be a number of structures with the correct formula that are not the desired one.
5) At times there are still too many hits to be easily dealt with. You need another test that will reduce hits. A lot of solvate structures have metal atoms in them. Having the search file require that no metals be present in a successful hit will reduce the list of hits dramatically. Testing for metals is one of the bit test functions of the software.
t5 *btest -253
The bit test for metals is number 253. By placing a negative sign before the test number, it means that the absence of metal atoms is sought, not their presence. Remember not to use this test if a metal atom is vital to the nature of the structure being sought, such as ferrocene.
6)The tests are summed together into a question in the following manner to form the final set of tests that a structure must pass prior to becoming a hit by the searching process. You and use .and. or a plus sign to create the question line. An example of both methods is given below.
quest t1.and.t2.and.t3.and.t4.and.t5 or quest t1+t2+t3+t4+t5
Sometimes the nature of the data suggests additional types of tests to be run. Oxalic acid for instance has different forms and you may want to seek only data on the dihydrate one.
t6 *xname 'dihydrate'
For the purpose of this section, pick any one of the following four compounds to do a Cambridge Structural Database Quest search. Carefully use the list of tests given on the previous pages or modifications thereof to formulate your own testing series.
Compounds to Choose from:
acetic acid C2H4O2 cyclohexane C6H12 oxalic acid C2H2O4 pyridine C5H5N
Select the tests you feel would be most effective for your search. Write them down below. Check them again to make sure that they will do what you want.
quest tests to be tried: _________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
Invoke the quest system with the following command line. In this
command line, johns is the name of the desired output files, -db
tells the program to use a different database than the standard one and
$CSDHOME/csd/little.ss is the name of that database. In your run you
would replace johns with the name you want for your output files. An
example screen trace is given for you to follow. User input is shown in
bold type.
% quest -j johns -db $CSDHOME/csd/little.ss ----------------------------------------------- Graphic QUEST3D Software Product id 512 Starting Quest from /disk2/usr/local/soft/model/csd/bin/d_sgi64v6/questfg.x Configuration file read from: /disk2/usr/local/soft/model/csd/csds/quest.fig Initialisation file read from: /disk2/usr/local/soft/model/csd/csds/quest.ini Templates file read from: /disk2/usr/local/soft/model/csd/csds/quest.tem Session: interactive ----------------------------------------------- number of database entries= 544 Now interpreting instructions in QUEST initialisation file >COMM +----------------------------------------------------------------------+ >COMM | These are comments in the QUEST initialisation file. This file can | >COMM | contain QUEST commands, such as terminal type, that are always read. | >COMM | For more information enter "HELP INITIALISATION FILES" within QUEST. | >COMM +----------------------------------------------------------------------+ >COMM | For more information on... | >COMM | the Brookhaven Protein Data Bank, type "HELP BROOKHAVEN" | >COMM | the QUEST/RASMOL link, type "HELP RASMOL" | >COMM | the database of CSDS citations, type "HELP DBUSE" | >COMM | starting the graphical interface, type "HELP GRAPHICS" | >COMM | the distributed release notes, type "HELP RELEASE NOTES" | >COMM | the PreQuest data input program, type "HELP PREQUEST" | >COMM | the CIF/MIF output file, type "HELP save" | >COMM +----------------------------------------------------------------------+ >COMM | Visit the CCDC web site at: http://www.ccdc.cam.ac.uk/ | >COMM +----------------------------------------------------------------------+ >COMM Set better PRINT style: > PRINT 10 >COMM +----------------------------------------------------------------------+ Now continue. >
Enter your various tests, one to a prompt. An example set of tests is given below along with its question line.
>t1 *coords.gt.0 >t2 *xname 'ferrocene' >quest t1+t2
With your tests and question entered, the system starts its search of the database for your desired information.
Now searching D/B ..... Converted value is 11
Any hits are shown here. The database is searched in blocks of 1000 entries. In the case of the little.ss database, the whole thing is looked at one time. The search process will start giving you information on the terminal allowing you to follow the search. A hit in this program takes the form of a reference listing such as that given below.
--------+---------+---------+---------+---------+---------+---------+----------+
DAPBIF
Dichloro-dihydrido-tetrakis(trimethylphosphine)-tungsten tetrafluoroborate ferro
cene solvate C10 H10 Fe1,C12 H38 Cl2 P4 W1 1+,B1 F4 1-
P.R.Sharp,K.G.Frank
Inorg.Chem., 24, 1808,1985
*COOR=74 //
---------+---------+---------+---------+---------+---------+---------+---------+
You will be asked if you want to keep the found hit or not with the following prompt:
Type "K"(Keep), "R"(Reject) or "O"(for list of options) k
Enter k and press the RETURN key to keep the hit in your journal file. You will be presented with hits until the end of the database is reached, when the following lines will appear. Respond to the prompt about keeping all the data with y if you are satisfied with the results. Data is then given on the nature of the search just completed and the quest process is cleaned up.
>End of database encountered If you do not EXIT QUEST now, you will lose all hits from this search... Do you want to exit QUEST? [Y] y Finished reading ASER Hits......... 4 D/B entries.. 544 CONNSER calls 0 SCREened out. 85% F2-4 skipped. 0 F1 read...... 80 F2 read...... 79 F3 read...... 79 F4 read...... 0 Max MDATe....960807 AFTER screens T1 succeeded 79 times T2 succeeded 4 times JRNL file....johns.jnl RECOver file.johns.rco QUEST exit Removing links.. Done.
There are over 500 entries in this mini-database. If you get more than 25 hits in a search, go back and revise your tests to reduce the successful hit rate. A journal file is produced that contains the results of your search. It has the same filename you requested in the command line with the extension jnl.
Study the result of your searches by typing off the journal file. If your file contains a number of hits, you may need to use the following command to study them carefully. The example the xxxx.jnl represents the filename of your journal file.
% more xxxx.jnl
With the CSD still being update and evolving and MacroModel being static, it is getting harder and harder to find CSD entries that will convert properly with the old MacroModel software. Given below are the instructions for creating the necessary files to do a successful conversion from data contained in the little.ss database. While this process might not work with all the data in CSD, it is handy to know about the process. The refcode is fopfiz. Use this as the name of the job to be run since its output files will be given that name.
% quest -j fopfiz -db $CSDHOME/csd/little.ss
Enter the following tests to retrieve that desired data and put that information into the necessary files to be ftped to model1.,
>save 1 2 3 4 >retrieve 2 >>fopfiz >>end
Information is given about the entry being retrieved and the creation of three data files. When the Now continue. prompt returns, enter exit to get out of the QUEST program.
>exit
QUEST runs without the -db command switch would to do a full-blown Cambridge search for information on a desired molecule. This section was designed as an example of using this type of database. Usually only folks interested in small organic molecules and metallo-organic complexes use this database. However, it can be helpful in locating drug structures or very small peptides (less than 15 residues).
There is a way of extracting files for CSD and converting them over for use in the MacroModel program. It requires that the necessary .dat and .con files for a structure be ftped over to model1 and a conversion process run on these two files. This will be done later in the exercise. This completes your exploration of the Cambridge Structural Database on ribozyme for today. The rest of this week's exercise will take place on other platforms. Log off of ribozyme.
10) Entering data via a modelling program.
When data doesn't exist elsewhere and/or the structures to be made are those of simple molecules or possible models, the data can be created directly within a modelling program.
This section will show you how to do structural minimizations. The process mimics research steps needed to solve structures on the computer. You will be shown how to enter a structure into the computer from the terminal. In section 10a, there is an automated run-through of the procedure, and in section 10b you will enter the desired structures.
Select model1 by moving the cursor arrow with the mouse over to the MODEL1 icon on the Launcher window and pressing the mouse button twice. Successful connection to model1 is denoted by the appearance of a model1 information line and a Username: prompt.
Welcome to OpenVMS V6.1 Username:
Once the Username: prompt appears, log on to the machine by entering the same account name you are using on ribozyme to the login: prompt, and then mygenes0 to the Password: prompt.
Your first task on model1 is to set the password on your account. This process on model1 is similar to what you did on ribozyme. An example of this process is given below. Follow it to change your password. User input is shown in bold type. When you have a number of accounts, it is handy to use the same password in all of them.
$ set password Old password:mygenes0 <rtn> New password: enter a new password <rtn> Verification: repeat that password <rtn>
Now that you are on model1 and your account is secure, enter structural data into the computer. To do this, you will go through a number of steps designed to give you insight into how structural data entry works.
section 10a
Enter the following commands to see the structure of acetic acid drawn on the screen in rough form, minimized with the MM2 force field, and then rotated to see the methyl group hydrogens better.
In order to do this automated run, first copy the automated file into your own directory .
$ copy draws *.*
Now get into MacroModel and see the demo of how acetic acid is entered into the computer. More complex molecules are entered in a similar manner. Respond to the questions asked as shown below. User response is shown in bold type.
$ mmv30 draw2.log n
The program will go into an automatic mode in which an acetic acid molecule is drawn onto the screen. Atom types will be changed; the program assumes that carbon is the atom of interest when drawing and so it is necessary to change carbons to the needed other elements such as hydrogen, oxygen, nitrogen, phosphorus and the halides. Once a structure is entered, it can have manipulations done on it, such as a minimization, to produce the lowest energy form of the molecule from force field parameters. Sometimes atoms line up behind each other and it is necessary to rotate the entire molecule to see them.
After the presentation, the program will exit and return you to your account prompt with a number of beeps. Repeat the above demo several times if you wish to study it before proceeding to the next section.
Before attempting to enter a structure into the computer with MacroModel, there are some facets of the program you should be familiar with.
1) MacroModel is set up with a working display window surrounded on the bottom and right-hand side of the screen by a series of option buttons, and an area for program messages to the user at the top. The option buttons are used to communicate user wishes to the program. To activate a button, move the cursor to the button's location and press the spacebar. Cursor control on your machine is with the mouse. An activated option is colored green. The message area either informs the user of the status of an option with multiple selection possibilities, requests parameter input, or relates error messages.
2) There are two different types of option buttons, those which move the program to major function areas, and those which set the parameters for a given function. The major function buttons are located at the bottom of the screen while parameter buttons are on the right-hand side.
3) The program assumes that all atoms drawn on the screen with the DRAW option are carbons until told otherwise. Atom types can be changed by choosing the desired atom from the list on the right-hand side of the screen, moving to the location in the structure where the change is required, and pressing the spacebar.
4) When entering double bonds with the DRAW option, it is best to go back and choose the DRAW option again, move to the starting point of the double bond, press the spacebar and then move off to the ending point of the bond, pressing the spacebar when the cursor is in the desired position. The DRAW option is always active when the button is green, and interesting bonds can result if you do not reset the option.
5) Structures can be grown on the screen using the GROW option. To use this option, select GROW and then the unit from the listing that is desired. The purple box denotes the site of the next addition to the growing molecule. This can be changed by using the Origin button.
6) When you make a mistake and wish to remove a bond, an atom or the entire structure use the DELT button. This button can toggle three ways. Press the mouse button slowly and deliberately to send the multiple signals. Pressing it once will allow the removal of a bond or an atom. Once this mode of deletion is selected, go over to the location of the atom to be removed, or the center of the bond to be removed with the cursor and press the mouse button. Pressing DELT twice will allow the removal of a molecule from a screen when more than one molecule is shown. Once this mode has been selected, move the cursor over to one of the atoms of the molecule to be removed and press the mouse button. The entire screen can be cleared by pressing the DELT button three times and responding with y to the question Confirm complete deletion (Y/N):.
7) The most common problem encountered in minimization is forgetting to add all of the necessary hydrogens to complete the structure prior to the minimizing process. If a problem occurs during a minimization attempt, check to see that all of the hydrogens are in place. Hydrogens can be added using the H ADD button in the INPUT menu. It is necessary to select this button three times to add hydrogen to all of the molecules on the screen. A message will appear at the top of the screen stating what sort of addition is currently possible.
8) Minimization errors not cured by the addition of hydrogens require the examination of the generated mmod.err file. The information contained in this file gives the atom type, number, and type of problem encountered. To look at this file from within the program, select the TTY option, enter the term syst and then t mmod.err. Viewing this file will give the numbers of the problem atoms. Enter log and press RETURN when the TTY prompt reappears to return to the program. To see how the structure is numbered, select ANALYZ and the NUM option from this menu. It may be necessary to redraw the section of the molecule where the error occurred, or it could be that parameters are missing in the MM2 force field to allow the minimization of the desired structure. If redrawing the suspect section doesn't correct the problem, contact Susan Johns for assistance.
section 10b
Use phenol as the test molecule. Its formula is C6H6O. Use the template and instructions on the next page to help you enter the structure.
1) Activate MacroModel by typing mmv30.
2) Once in the program, respond to the first question by pressing RETURN. Then answer the question about what terminal you are using by entering 7 for a Tektronix with Versa Term Pro.
3) mWhen the menu window comes up, move the cursor to the DRAW button and press the mouse button. Moving the mouse will move the cursor around the terminal screen.
4) Put the cursor inside the window near the top of the screen in the middle, and press the mouse button again. The letter C will appear on the screen. This point will correspond to position 1 of the template.
5) Move the cursor to various points on the screen, pressing the mouse button when a desired location is found until the desired structure appears on the screen. Use the template on the above as a guide in drawing your model.
6) Now select the O from the side buttons and move to position 2 of the structure, and press the mouse button. An O should appear at the chosen site. Select the H from the side buttons and change the atom in position 1 to a hydrogen.
7) Put in the double bonds. Select the DRAW button. Move over to the 4 position on the structure, press the mouse button, then move down to position 5 and press again. You need to be accurate in the location process, or additional atoms and not a double bond will appear on the screen. A double line should appear between locations 4 and 5 on the screen. Select DRAW again, and this time put a double bond between locations 6 and 7 in the structure. Finally, put a double bond in between locations 8 and 3 in your model.
8) Now move over to the H ADD button and select it to add the required hydrogens to all of the carbons of your molecule by pressing the mouse button three times. These hydrogens will appear as green lines off of the existing carbons of the structure.
9) Minimize the structure: select ENERGY, select MM2, and then select Start when the cursor reappears. Depending on the accuracy of your initial structure, you may be asked to continue the minimization process. If asked, respond with y and keep doing so until the process stops on it own.
10) Write your entered structure to a file. Select WRITE, answer the prompt for the name of file with (your lastname).phenol, and enter a short structure statement. The cursor will reappear when the file has been written to your account.
11) Scale down the image on the screen by selecting Scale and then moving the cursor to a point about half an inch up from the lower left hand corner of the working window. Click the mouse button. A cross appears at that point on the screen. Move to a point about half an inch off the upper right hand corner of the working window. Click the mouse button. A cross appears at that point on the screen.
12) Select ANALYZ, Model, ARad and respond to the Input Sphere Radius: prompt with .85 <rtn>. Select CPK and then Start to have a cpk structure drawn around the stick drawing.
13) After the drawing is complete, go up to the emulator control bar File location. From the pull down menu select the Print Graphics option. This causes a picture of the image on the screen to be produced on the teaching lab's printer. Go over to the printer to get your image. There will be no way of telling to whom the picture belongs, so pick it up as soon as you hear it being printed out.
11) Entering a more complex structure.
Using the picture of the structure of benzyl penicillin given below enter this structure into the computer. Do 50 iterations of the structure using the MM2 force field and then save your structure under a filename that reflects your last name and the fact that this file contains penicillin.
1) Return to the data input area of the program by selecting INPUT.
2) Clean off the previous molecule by selecting the DELT button three times and responding with y to the question the program asks. This will clear the screen and you can begin to enter your new structure.
3) Select Draw and enter in the structure shown above. Don't worry about hydrogen atoms at this point.
4) Make the necessary replacements of carbon atoms that should be something else.
5) Use H ADD to put on all the needed hydrogens for a complete structure.
6) Move over to the ENERGY working area and minimize the structure by selecting Start. Notice that once a button is colored green it is active. Since MM2 was active, it was not necessary to re-select it or re-load the force field before starting the minimization run. Only go through 50 iterations. The more complex a molecule is the longer it takes to minimize. If you have any errors in your structure, contact your instructor for assistance.
7) Once the structure has done 50 iterations, stop the minimization process by responding with n to the question if the procedure should continue. This question will appear at the top of the screen in an area reserved for program user interactions.
8) Save your data by writing it to a file. Select the WRITE button and give the file a name that reflects the name of the compound located therein such as pen.
9) Select STOP and respond with y to the Confirm Program Stop (Y/N):, your screen goes white. Select the Emulation from the Versa Term Pro menu bar and select DEC VT220 from the menu presented. The screen changes color from white to blue and has the question in it, Delete the current log file (Y/N):. Respond with y. The y you try to enter turns into ù and there is a message of junk on the line below the prompt. Select Emulation from the Versa Term Pro menu bar again, this time selecting Reset Terminal from the menu options presented. The type in the blue screen is now readable and you can continue on with your computing tasks.
12) Minimizing a complex structure to final configuration.
It will take many iterations to get the penicillin structure you entered into its final lowest energy form. When it takes more than a few hundred iterations to get a final confirmation for a structure, it is best to run the process in batch mode rather than sit there and baby-sit the machine.
Copy the file denoted by the logical name, batchlesson30 into your own account giving it a name that reflects its function as the penicillin batch job. In the example command line xxxxx.com represents the complete filename you have decided to use. This file should have the extension com.
$ copy batchlesson30 xxxxx.comA copy of this file is given below.
$! pcd xxxxx xxxxx xxxxx xxxxx ffff.ffff ffff.ffff ffff.ffff ffff.ffff $ set def [xxxxxx] $ define incloc1 disk2:[public.mmv30.mmv30.inc1] $ run disk1:[manage.loging]batchm30 $ run incloc1:batchmin.exe aspirin aspirin_out DEMX 0 20.0000 FFLD 1 BGIN READ 1 MINI 2 1 9945 CONV 2 1 END $ exit
You must change three items in this file to have it work properly. Change the set def [xxxxx] statement to the location of this file and the data to be used. Since you have all your files in your main directory, then just put an exclamation point after the dollar sign in that line. Change aspirin to be the name of the file you want it to work on, i.e., the penicillin file you saved (pen). If that file has a dat extension, it need not be repeated on this line. Change the aspirin_out term to reflect the name of the output file you want produced (pen_out) This template file is set up to use the steepest descent with force field MM2 until an rms value of less than .01 is reached or a maximum of 9945 iterations run. Use the editor on model1 to do this. Its name is eve and it works very similarly to pico on ribozyme. Documentation on eve is contained in your carrel drawer.
$ eve xxxxx.com
After changing the com file to reflect your individual information, submit the batch job as follows. xxxxx denotes your MacroModel batch com file.
$ batchs xxxxx
For further information on what the various features of the batch com file mean, see the MacroModel BATCHMIN manual located in the carrel drawer. The batch queue is set up to notify you if you are logged in when the batch job finishes.
13) Working with PDB data
MacroModel can be used for more than just creating small molecular structures. It can also display protein structures. Solved protein structures are stored in the Brookhaven Protein Database (PDB). This is a very old, rigidly formatted database and precedes the advent of modern modelling software packages. Therefore, PDB formatted data has become the media of exchange for protein information. Remember the 2sn3.pdb file you worked with earlier. This file contains the structural information for scorpion neurotoxin [a small protein (65 residues) containing 1 helix, 3 beta sheets and 4 disulfide bridges] from PDB.
To work with this data, you first need to get it from your account on ribozyme. While you are at it, you can pull over the CSD data you retrieved as well. This is done by using FTP (File Transfer Protocol) to move the data. Instructions for using FTP on Model1 is given below.
$ ftp ribozyme.vadms.wsu.edu
When the ribozyme machine prompt appears, enter your account name and password on that computer. You are using FTP protocol to transfer a file from model1 to ribozyme. Follow these commands to move the file. Replace the bcsxx of the example with our own account name. User input shown in bold type.
model1.vadms.wsu.edu MultiNet FTP user process 3.4(111) Connection opened (Assuming 8-bit connections) <ribozyme.vadms.wsu.edu FTP server ready. RIBOZYME.VADMS.WSU.EDU>l bcsxx<rtn> <Password required for prcadams. Password:(enter your own password<rtn>) <User prcadams logged in. RIBOZYME.VADMS.WSU.EDU>cd week4<rtn> <CWD command successful. RIBOZYME.VADMS.WSU.EDU>get 2sn3.pdb<rtn> To remote file:<rtn> <Opening ASCII mode data connection for '2sn3.pdb'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>get fopfiz.con<rtn> To remote file:<rtn> <Opening ASCII mode data connection for 'fopfiz.con'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>get fopfiz.dat<rtn> To remote file:<rtn> <Opening ASCII mode data connection for 'fopfiz.dat'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>quit<rtn> <Goodbye.
Now use the program bfiler to convert the 2sn3.pdb file into a data file that is usable in MacroModel. Modelling software can either access PDB files directly, or as in the case of MacroModel, have auxiliary software to handle the conversion into their own specific format. Instructions for using bfiler are given on the next two pages. User input shown in bold type.
$ bfiler
BFiler (v 0.2)
22-NOV-96 11:32:42
BFiler: SELECT A MENU ITEM FROM BELOW--
HELP=Information
TAPE=Read Brookhaven format files Brookhaven tape and
translate to MMOD format,
COPY=Copy files from Brookhaven tape to disk
without translation
DISK=Translate Brookhaven format files to MMOD files
BARE=Translate Bare Brookhaven atom table (from file(s)
disk) to MMOD format file(s)
EXIT=Exit BFiler
BFiler>disk<rtn>
BFILER-DISK:This routine attempts to translate Brookhaven
format files which are on a disk
BFILER-DISK:Continue?(y)><rtn>
Default suffix is ".BRK"
Type in the names of the files you want to process,
Hit return after each code name and
a bare "." to finish>
2sn3.pdb<rtn>
.
Below is a list of names for files you want to translate --
Options: (1) type in corrected entry;
(2) type "i" to insert an entry,
(3) type "x" to delete entry,
(4) type "." to finish,
(5) hit return to verify entry:
Go back and re-edit the filecodes?(n)><rtn>
Looking for file 2SN3.PDB
Reading 2SN3.PDB
This entry has an unusual atom, residue, or molecule:
HET MPD 66 9 2-METHYL-2,4-PENTANEDIOL 2SN3 79
Reading atomic coordinates...
-- Alternate location indicated
atom number 179 file 2SN3
[ a listing is given of a series of alternate locations ]
Typing atoms...
Creating bond entries...
ISOLATED HYDROGEN, ATOM NO: 93
ISOLATED HYDROGEN, ATOM NO: 729
BFiler: SELECT A MENU ITEM FROM BELOW--
HELP=Information
TAPE=Read Brookhaven format files Brookhaven tape and
translate to MMOD format,
COPY=Copy files from Brookhaven tape to disk
without translation<
DISK=Translate Brookhaven format files to MMOD files
BARE=Translate Bare Brookhaven atom table (from file(s)
disk) to MMOD format file(s)
EXIT=Exit BFiler
BFiler>exit <rtn>
There is now a file in your account call 2sn3.bdt. The bfiler program put the extension of bdt on all data files that it converts. You can now use this file in the MacroModel program to display the structure of the neurotoxin protein. Follow the instructions given below to do this task.
1) Activate MacroModel by typing mmv30.
2) Once in the program, respond to the first question by pressing RETURN. Then answer the question about what terminal you are using by entering 7 for a Tektronix with Versa Term Pro.
3) Select READ and respond to the File: prompt with 2sn3.bdt to read in your converted file. Respond to the Structure number: prompt with pressing the RETURN key.
Notice all the small green lines on this structure. Those lines denote hydrogens. Originally x-ray structures couldn't tell where hydrogens were on a structure and so they weren't reported or worried about. The latest programs put hydrogens in that have either been determined by NMR runs or are put in by the software. These extra hydrogens can cause problems with older software so go through and remove them.
4)Select H DEL and press the mouse button three times slowly. Select Updat to redraw the structure.
There is still too much data being shown to get any idea of the secondary structure for the protein. All those side chains make the backbone atoms hard to find and interpret. Strip down the structure on the screen to only its backbone atoms in the following manner.
5) Select ANALYZ, then SETS followed by MainS. This puts the protein backbone atoms into a working set within the program. To view the backbone, select DISPLA followed by Dis. The screen will clear and the structure is redrawn showing only the backbone atoms.
6) To save this data to be used later, select WRITE, respond to the Save displayed fragment only? (N/Y): with y and use 2sn3.back as the name of the desired output file (the File: prompt). Respond to the Structure Name: prompt as you wish, either enter some short description or press the RETURN key.
7)To save this data in a different format, select the PLOT button on this screen. This action will save the data in a format that can be used directly by an HP plotter or converted to be used with GCG compatible software and produce a postscript image. Get out of the program.
8) Select STOP and respond with y to the Confirm Program Stop (Y/N):, your screen goes white. Select the Emulation from the Versa Term Pro menu bar and select DEC VT220 from the menu presented. The screen changes color from white to blue and has the question in it, Delete the current log file (Y/N):. Respond with y. The y you try to enter turns into ù and there is a message of junk on the line below the prompt. Select Emulation from the Versa Term Pro menu bar again, this time selecting Reset Terminal from the menu options presented. The type in the blue screen is now readable and you can continue on with your computing tasks.
The plot option output was written to a file called mmod.plt. Since you never know when you might want to use this option again. Rename the file to something that reflects it contents. In this case rename it 2sn3.plt. Instructions for doing this is given below.
$ ren mmod.plt 2sn3.plt
When the PDB file was copied over so were the two cambridge data files. To convert these files and look at the resulting data with MacroModel go through the following procedure. Instructions are given on the below and on the next page. User input is shown in bold type. This process is an automated one. Once the number of structures and the code name(s) have been given the rest of the process is automatic.
$ cam_to_mmod
Cambridge to MacroModel conversion system
This system will do a maximum of 10 structures per pass.
The procedure is automatic.
use Ctrl-y to abort out of the procedure if needed
please note any error messages for a given data set
enter the number of structures to be converted : 1 <rtn>
use Ctrl-y to abort out of the procedure if needed
enter the code name for structure 1
code : fopfiz <rtn>
STARTING TIME
25-NOV-96 11:47:45
CFILER (Version 1.0) SELECT A MENU ITEM FROM BELOW--
1. HELP=Information
2. KWIC=Consult keyword-in-context file to find 6-character
Cambridge Codes
3. TAPE=Read Cambridge tapes and produce VMS files
4. VMS =Translate data and connectivity files already On
a VMS tape or disk to MacroModel files
5. EXIT=Exit CFILER
CFILER>
Translate files which have already been copied to disk
continue?(y):
Default suffix is "CAM"
Type in the names of the files you want to process:
LIST OF FILES TO TRANSLATE:
FOPFIZ.CAM
Go back and re-edit the filecodes?(n)>
reading FCONN entry for FOPFIZ
FDAT entry contains textual comments:
R=0.0737 Coordinates for H-atoms should be available, but we have been unable to
obtain them from stated sources. dx given as 1.25; we calculate 1.29
CFILER (Version 1.0) SELECT A MENU ITEM FROM BELOW--
1. HELP=Information
2. KWIC=Consult keyword-in-context file to find 6-character
Cambridge Codes
3. TAPE=Read Cambridge tapes and produce VMS files
4. VMS =Translate data and connectivity files already On
a VMS tape or disk to MacroModel files
5. EXIT=Exit CFILER
CFILER>
Directory DISK1:[BCSXX]
FOPFIZ.CDT;1 3
Total of 1 file, 3 blocks.
sometimes an error message data set will not produce
an output file, please check this listing carefully
The fopfiz.cdt is the file that you need to view the CSD structure with MacroModel. If you have any problems with this procedure contact your instructor for assistance. Not all data from CSD will convert into MacroModel viewable files with this process. It takes experimenting to find out which will work and which won't.
15) Viewing the CSD file.
The cfiler program used in the above process puts the extension of cdt on all data files that it converts. You can now use this file in the MacroModel program to display the structure of the CSD file. Follow the instructions given on the next page to do this task.
1) Activate MacroModel by typing mmv30.
2) Once in the program, respond to the first question by pressing RETURN. Then answer the question about what terminal you are using by entering 7 for a Tektronix with Versa Term Pro.
3) Select READ and respond to the File: prompt with fopfiz.cdt to read in your converted file. Respond to the Structure number: prompt with pressing the RETURN key.
The image on the screen is that of an compound with narcotic analgetic activity that was crystallized with oxalic acid, oxalate and two water molecules in its unit cell. Rotate the image in the Y direction 35 degrees to get a better view of the compound.
4) Select Rot Y and respond to the Enter rotation angle: prompt with 35. The image is rotated to give a better view of the principle molecule.
5) Select DELT, pressing the mouse button twice to get Molecular deletion. Then move the cursor over to an atom on one of the small non-water structures and press the mouse button. The terminal beeps at you and the structure disappears from the screen. Remove the other small molecule in the same manner. To remove the water representation, you will need to locate the cursor on the H of the H20 symbol and press the mouse button. Remove both water representations.
Create the plot file of the data shown on the screen by selecting ANALYZ and then PLOT.
6) To save this data to be used later, select WRITE and use csd as the name of the desired output file (the File: prompt). Respond to the Structure Name: prompt as you wish, either enter some short description or press the RETURN key.
7) Get out of the program. Select STOP and respond with y to the Confirm Program Stop (Y/N):, your screen goes white. Select the Emulation from the Versa Term Pro menu bar and select DEC VT220 from the menu presented. The screen changes color from white to blue and has the question in it, Delete the current log file (Y/N):. Respond with y. The y you try to enter turns into ù and there is a message of junk on the line below the prompt. Select Emulation from the Versa Term Pro menu bar again, this time selecting Reset Terminal from the menu options presented. The type in the blue screen is now readable and you can continue on with your computing tasks.
The plot option output was written to a file called mmod.plt. Since you never know when you might want to use this option again. Rename the file to something that reflects it contents. In this case rename it csd.plt. Instructions for doing this is given below. User input is shown in bold type.
$ ren mmod.plt csd.plt
A program called mmodpdb takes a MacroModel data file and converts it into a MacroModel version of a PDB format. In this format the atoms and their x, y, and z coordinates are given. An example file is listed below to give you an idea of what to expect when you run your own data file through the program. User input is shown in bold type.
$ mmodpdb THIS PROGRAM READS V1.5-2.0 MACROMODEL STRUCTURE FILES AND PRODUCES FORMATTED PDB STYLE OUTPUT FILES Enter MacroModel input filename:< (your lastname).phenol <rtn> Enter .PDB output filename: phenol.pdb <rtn> Charge file (.CHG) not found, charges set to 0.0 my phenol structureEnter MacroModel input filename: <rtn> FORTRAN STOP
A file has just been created that contains the x, y, and z coordinates for the structure you entered. The example file given below shows what this type of file looks like and the location of the items of interest. In the file below, UNK denotes the residue name. Since this is not a protein the term UNK is used instead of a real three letter code name for an amino acid. The X following UNK is the chain designation. Again this is a protein term and since there wasn't one in the initial data set the program put in an X. The CONECT lines show which atoms are connected to one another. Most modelling programs ignore these lines.
atom x value y value z value ATOM 1 H01 UNK X 0 10.072 11.709 0.000 1.00 0.0000 0 ATOM 2 O02 UNK X 0 9.168 11.353 0.000 1.00 0.0000 0 ATOM 3 C03 UNK X 0 9.187 9.990 0.000 1.00 0.0000 0 ATOM 4 C04 UNK X 0 7.981 9.289 0.000 1.00 0.0000 0 ATOM 5 C05 UNK X 0 7.977 7.895 0.000 1.00 0.0000 0 ATOM 6 C06 UNK X 0 9.183 7.195 0.000 1.00 0.0000 0 ATOM 7 C07 UNK X 0 10.391 7.892 0.000 1.00 0.0000 0 ATOM 8 C08 UNK X 0 10.393 9.286 0.000 1.00 0.0000 0 ATOM 9 H09 UNK X 0 7.025 9.840 0.000 1.00 0.0000 0 ATOM 10 H10 UNK X 0 7.021 7.345 0.000 1.00 0.0000 0 ATOM 11 H11 UNK X 0 9.182 6.092 0.000 1.00 0.0000 0 ATOM 12 H12 UNK X 0 11.346 7.339 0.000 1.00 0.0000 0 ATOM 13 H13 UNK X 0 11.353 9.830 0.000 1.00 0.0000 0 TER 14 UNK CONECT 1 2 CONECT 2 1 3 CONECT 3 2 4 8 8 CONECT 4 3 5 5 9 CONECT 5 4 4 6 10 CONECT 6 5 7 7 11 CONECT 7 6 6 8 12 CONECT 8 7 3 3 13 CONECT 9 4 CONECT 10 5 CONECT 11 6 CONECT 12 7 CONECT 13 8 END
Now convert a number of your files into MacroModel versions of PDB files. The files you will use are: 2sn3.back, csd.dat, (your lastname).phenol and pen_out.dat. For the respective output files name 2sn3.back's output file 2sn3-back.pdb, csd.dat's file csd.pdb, (your lastname).phenol's file phenol.pdb and pen_out's file pen_out.pdb.
You will need to check to see if your penicillin minimization is finished in order to convert this last file. One way to do this is to see if your batch job is still running on the machine. You can do this with the command line given on the next page.
$ sho que/batch
If your job is still running or hasn't started to run yet it will show up in the information given on the SYS$LONG batch queue for the machine. An example of what this looks like is given below.
Batch queue SYS$LONG, busy, on MODEL1::
Entry Jobname Username Status
----- ------- -------- ------
86 JOHNS BCSXX Executing
If your job is listed as pending on this list, you will have to wait to convert over the file until it is finished running. If your job is not listed there, then check to see if you have an output file with data in it. Computers have a nasty habit of creating output files with zero size at the beginning of a batch process, so looking for the desired name of the output file doesn't mean that the job is finished.
$ dir/siz pen_out.dat Directory DISK1:[BCSXX] PEN_OUT.DAT;1 3 Total of 1 file, 3 blocks.
If your minimization file is there and contains data then convert it along with the rest of the desired files using the example mmodpdb process given at the beginning of this section. If the minimization is not finished go on the next section and check again when you are finished with it.
17) Checking on the protein data set
MacroModel has it own version of a PDB format. The major change is that it doesn't keep the regular PDB atom names for the converted file. For a lot of things this isn't a problem, especially if you are not interested in working with protein data. However, if you want to use this output in another program that expects a real PDB formatted file, you need to see just what it is expecting and then modify the MacroModel produced one to get what you want through editing.
The eve editor on Model1 is a more powerful editor than pico on ribozyme. It allows you to do global replacements of one term for another. The final use of this data file on ribozyme is to produce a postscript image of the data using the program Molscript. Molscript requires a standard PDB format line for CA atoms when working with protein data to produce an image. In your 2sn3-back.pdb file the atom that corresponds to the CA of a regular PDB file is called C02 (C zero 2). It will therefore be necessary to edit your file to get it into the proper format.
Use eve to edit the file using the following instructions.
$ eve 2sn3-back.pdb
Once in the editor, press the F11 or Do key. The program will return with a Command: prompt. Respond to this with the term replace. You will be asked for an Old String:, respond with C02 [that is (C zero 2)]. To the New String: prompt, respond with CA [that is (C A space)]. You are replacing three characters with two and you need to keep the spacing in the file the same as before. The program will go an find the first instance of the term to be replaced, highlight it and then ask, Replace? Type Yes, No, All, Last, or Quit:, respond with a for all. You will get a statement back from the editor telling you how many replacements it has made. You can use this number to check to see if all the residues have been modified or not. There are 65 residues in the protein each with only one C02 term, therefore there should have been 65 replacements. Press Ctrl-z (the control and Z keys together) to write the file and get out of the editor.
18) Moving the data over to ribozyme for more processing.
To finish up this week's activities, you will need to put the converted PDB formatted files and the plt files into your account on ribozyme. This is done by using FTP (File Transfer Protocol) to move the data. Instructions for using FTP on Model1 is given below.
$ ftp ribozyme.vadms.wsu.edu
When the ribozyme machine prompt appears, enter your account name and password on that computer. You are using FTP protocol to transfer a file from model1 to ribozyme. Follow these commands to move the file. Replace the bcsxx of the example with our own account name. User input shown in bold type.
model1.vadms.wsu.edu MultiNet FTP user process 3.4(111) Connection opened (Assuming 8-bit connections) <ribozyme.vadms.wsu.edu FTP server ready. RIBOZYME.VADMS.WSU.EDU>l bcsxx<rtn> <Password required for prcadams. Password:(enter your own password<rtn>) <User prcadams logged in. RIBOZYME.VADMS.WSU.EDU>cd week4<rtn> <CWD command successful. RIBOZYME.VADMS.WSU.EDU>put 2sn3-back.pdb<rtn> To remote file:<rtn> <Opening ASCII mode data connection for '2sn3-back.pdb'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>put csd.pdb<rtn> To remote file:<rtn> <Opening ASCII mode data connection for 'csd.pdb'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>put phenol.pdb<rtn> To remote file:<rtn> <Opening ASCII mode data connection for '(your lastname)-phenol.pdb'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>put pen_out.pdb<rtn> To remote file:<rtn> <Opening ASCII mode data connection for 'pen_out.pdb'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>put csd.plt<rtn> To remote file:<rtn> <Opening ASCII mode data connection for 'csd.plt'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>put 2sn3.plt<rtn> To remote file:<rtn> <Opening ASCII mode data connection for '2sn3.plt'. <Transfer complete. RIBOZYME.VADMS.WSU.EDU>quit<rtn> <Goodbye.
This completes your work for the week on model1. Log off of the machine by doing the following command.
$ log
To completely finish up your work for this week log back into ribozyme. From the Launcher window, select the X RIBOZYME icon and press the mouse button twice. Successful connection to ribozyme is denoted by the appearance of a ribozyme information line and a login: prompt. Once the login: prompt appears, log on to the machine by entering first your account name to the login: prompt, and then your password to the Password: prompt.
Move over to the week4 subdirectory to finish up. There are some image files to create and a report form to complete and send off to the teacher account.
% cd week4
section 19 a - producing the Molscript image files
Produce the Molscript images in the following manner. Molscript uses a input control file to control the creation of the image. In this file is the name of the PDB formatted file to be used and the what is do be done to this data set. In the case of the three simple organic files there will be CPK images drawn of their structures. For the protein a image will be produced displaying the protein's CA carbons to give you an idea of its secondary structure.
% molscript <phenol.in> (your lastname)-phenol.ps % molscript <pen_out.in> (your lastname)-pen.ps % molscript <csd.in> (your lastname)-csd.ps % molscript <2sn3-back.in> (your lastname)-2sn3.ps
Print off a copy of each of these files for your files
% lpr (your lastname)-phenol.ps % lpr (your lastname)-pen.ps % lpr (your lastname)-csd.ps % lpr (your lastname)-2sn3.ps
Send off a copy of each of these files to the teacher account to record your efforts.
% rcp (your lastname)-phenol.ps teacher@ribozyme:receive % rcp (your lastname)-pen.ps teacher@ribozyme:receive % rcp (your lastname)-csd.ps teacher@ribozyme:receive % rcp (your lastname)-2sn3.ps teacher@ribozyme:receive
Check one of these files [(your lastname)-phenol.ps] by using ghostscript (gs) to view the results. Watch carefully as the image is drawn on the screen. Record your observations in the space provided below.
% gs (your lastname)-phenol.ps
cpk drawing observations: _______________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
When the image is completed, move the cursor into the window behind the white
GhostScript window and press the mouse button. The xterm window is now
in front of the other one. Press the RETURN key. The GS>
prompt appears on the screen. Get out of the program by entering
quit.
section 19 b - create mmhp images
Using the two plt files you created you will be exposed to another type of structural images. MacroModel plt files can be run through the program mmhp to produce either screen images or postscript files of the structures contained therein. To do this you need to activate the GCG software package since the mmhp software was created using GCG subroutines.
% gcg
After the GCG software is activated, then go through and set up your session for postscript output with the following commands. You set up the name of the output file during this process. We will create the postscript file for the csd.plt file first.
% postscript
Use Postscript graphics with what device:
LaserWriter
Lzr1200
LN03-ScriptPrinter
LPS20
ColorScript-100
EPSF (single page encapsulated postscript format)
CEPSF (color EPSF)
Please choose one ( * LASERWRITER * ) epsf<rtn>
To what port is your EPSF connected (* /dev/tty15 *) (your lastname)-csd-plt.ps<rtn>
Plotting Configuration set to:
Language: psd
Device: EPSF
Port or Queue: (your lastname)-csd.ps
With the graphics mode set, it is time to run the software to produce the desired output file. You don't need to use the -out= command switch if you don't want to. It just make the situation more straight forward.
% mmhp -out=(your lastname)-csd-plt.ps Process set to plot with EPSF attached to csd.ps using the psd graphic interface. Enter file to be plotted: csd.plt<rtn> Enter the format wanted B&W = 0, Color = 1 : 1<rtn> Plotting structure 1 PostScript instructions for a EPSF are now being sent to (your lastname)-csd-plt.ps. finished with plot
Now go through the process with your second file, 2sn3.plt. This is a protein data file in which only the backbone atoms are displayed.
% postscript Use Postscript graphics with what device: LaserWriter Lzr1200 LN03-ScriptPrinter LPS20 ColorScript-100 EPSF (single page encapsulated postscript format) CEPSF (color EPSF) Please choose one ( * LASERWRITER * ) epsf<rtn>To what port is your EPSF connected (* csd.ps *) (your lastname)-2sn3-plt.ps<rtn> Plotting Configuration set to: Language: psd Device: EPSF Port or Queue: (your lastname)-2sn3-plt.ps
Run through the mmhp program with the 2sn3.plt file.
% mmhp -out=(your lastname)-2sn3-plt.ps Process set to plot with EPSF attached to (your lastname)-2sn3.ps using the psd graphic interface. Enter file to be plotted: 2sn3.plt<rtn> Enter the format wanted B&W = 0, Color = 1 : 1<rtn> Plotting structure 1 PostScript instructions for a EPSF are now being sent to (your lastname)-2sn3.ps. finished with plot
Print off a copy of each of these files for your records.
% lpr (your lastname)-csd-plt.ps % lpr (your lastname)-2sn3-plt.ps
Send off a copy of each of these files to the teacher account to record your efforts.
% rcp (your lastname)-csd-plt.ps teacher@ribozyme:receive % rcp (your lastname)-2sn3-plt.ps teacher@ribozyme:receive
Copy over the report form for this week to have a filename that matches your last name. Use the editor, pico, to fill in the report form. Send the report form over to the teacher account.
% cp week4m.week4m (your lastname).week4m % pico (your lastname).week4m % rcp (your lastname).week4m teacher@ribozyme:receive
This concludes your computing session for this week. Log off the computer.
% logout
Now exit the emulator program by selecting the Quit option from the File location on the control bar. You will be returned to the Launcher window screen.
Per J. Kraulis, "MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures", Journal of Applied Crystallography (1991) vol 24, pp 946-950.