Learning about data visualization techniques using Molscript software. Recognizing desired structural data elements and combining them with PDB formatted data files from different sources for use in the software.
Author:
Susan Jean Johns
Presenting an attractive version of a structure can do a great deal to get a point across to an audience. It can be used to draw attention to points that you want to emphasize, or relationships you feel are important, or to provide a basis of reference for discussions.
To generate such a picture requires a great deal of understanding on your part. You must answer questions such as what information is desired, how can it be presented most effectively and what is involved in creating such a picture.
The information shown depends on the data collected and the purpose of the presentation. If it is a homology model of a new protein based on relationships with another protein whose x-ray structure is known, then a number of possible things may be illustrated with such a picture: the differences between the two proteins, points of consensus, or areas of uncertainty in the generated model. A study conducted on a given protein might display information on functional sites in the structure, secondary structure assignments, charge distribution, substrate interactions, etc.
Whatever the ideas you wish to convey, you must be able to create a view of the desired material that is the best way of displaying the idea. If the information is directly from a PDB file and requires little or no adjustment to the original orientation of the data, use that PDB data file directly in Molscript. If on the other hand, the original orientation requires a lot of adjustment, then make those adjustments in a modelling package prior to generating a PDB formatted data file to use in Molscript. Your data may have been generated entirely within a modelling package to begin with. If this is the case, you will need to generate a PDB formatted output file to use in the program. For most VADMS users, using a modelling package means working with MacroModel. It is helpful to color code the area(s) of interest prior to trying to find the best view. This allows you to see how these areas potentially interact with one another and locate an orientation that emphasizes it most effectively.
Some ideas are expressed very well within the MacroModel program's capabilities, others are not. Molscript is an effective software package that produces quality graphical output both off the terminal screen and in hard copy. It runs on ribozyme. Any data which requires extensive modifications should be worked on first in MacroModel on model1. The final form of the data should be converted over to a PDB formatted file with mmodpdb and then ftp-ed to ribozyme. The quality of the output makes the process, though challenging, well worth the effort.
Molscript was written to run on a number of different platforms. Our version is available is on ribozyme running under the IRIX operating system. It requires a standard formatted PDB file for input, a user generated file telling it what to do with the structure and familiarity with its limited documentation. The software processes information sequentially from the used data file. Complex images can be generated using a mixture of the possible display forms: space filled, ball and stick, and Jane Richardson protein images.
To understand the process of creating a Molscript image you need to know the best way to generate a view of your molecule, the requirements of the Molscript package and how it works, how to move files back and forth between different computing platforms, and perhaps how to move a final image around on the page. None of these steps are difficult, but combining this many new things all at once may seem overwhelming at first.
The first step is to understand the data well enough to know what you want to show. This means being very familiar with the functional sites of the molecule, its secondary structure or other features. The best way to work with a molecule is to color code those regions of interest, and work with the color coded data on the E&S terminal to derive the best view.
The E&S terminal in the VADMS central lab runs MacroModel. It has the advantage of being a 3D terminal giving a sense of depth to the data it works with. Movement on the screen is controlled by turning knobs, not the re-entering of values to produce new views of data. Pressing the transmit button on the terminal saves the view currently on the screen and all the work that went into creating that view can then be written to a file.
The key to using Molscript is having a view of the structure in a standard PDB format file. The device which produced that file is not important, the PDB file is. It is also vital that the atom coding for the data be in standard PDB code for most of the possible display modes. Therefore, editing the data files into the proper format may be required.
An example flow pattern for creating a Molscript image is given below. In general this process requires getting the desired structure into the proper orientation for the image, creating a PDB formatted file of that desired orientation and generating an instruction file to control Molscript operations. Most of the time you can just start with the default orientation of the PDB data file. Molscript will even allow rotations and translating of the coordinate data in its instruction file. However, when massive adjustments are required it is best to move the data over to model1 and create the desired orientation in MacroModel with the E&S terminal.
The process of working with MacroModel on the E&S to create desired molecular orientations involves actions similar to that given below in steps 1 and 2. Suitable data generated by other software packages or directly from PDB would insert themselves into step 3 or step 6 depending on the final desired image.
% ftp model1.vadms.wsu.edu
When the model1 machine prompt appears, enter your account name and password on that computer. Follow these commands to move the file. Replace the bcsxx of the example with your own account name. In the example given below xxxxx.pdb represents your chosen PDB file. User input is shown in bold type.
Connected to model1.vadms.wsu.edu
220 model1.vadms.wsu.edu MultiNet FTP Server Process 3.4(14) at Wed 24-Jan-96 11
:59AM-PST
Name (model1.vadms.wsu.edu:bcsxx): bcsxx<rtn>
331 user name (bcsxx) ok. Password, please.
Password:(enter your own password<rtn>)
230 User BCSXX logged into DISK1:{BCSXX} at WED 24-JAN-96 12:01PM-PST, job 23b.
Remote system type is VMS.
ftp>type ascii<rtn>
200 Type A ok.
ftp>put xxxx.pdb<rtn>
local xxxx.pdb remote xxxx.pdb
200 Port 10.72 at Host 134.121.43.151 accepted.
150 ASCII Wrote of DISK1:[BCSXX]XXXX.PDB:1 started.
226 Transfer completed. xxxxx (8) bytes transferred.
xxxxx bytes sent in 0.03 seconds (1077.52 Kbytes/s)
ftp>quit<rtn>
221 QUIT command received. Goodbye.
2) The data file is then worked on in MacroModel on the E&S terminal. The operation of MacroModel is very similar on the E&S as on the Tektronix emulating terminals that you are familiar with. Button selection is done with the white button on the mouse. And the mouse must be located on the white mouse pad. In this version of the software all the control buttons are at the bottom of the screen. To save a desired orientation of a molecule press the transmit button at the top of the black keyboard.
With the chosen view written to a file, the data must then be converted into PDB format. This is done by running the data file through the program mmodpdb. An example of running this program is given below. User input is shown in bold type.
$ mmodpdb THIS PROGRAM READS V1.5-2.0 MACROMODEL STRUCTURE FILES AND PRODUCES FORMATTED PDB STYLE OUTPUT FILES Enter MacroModel input filename: phenol.dat Enter .PDB output filename: phenol.pdb <rtn> Charge file (.CHG) not found, charges set to 0.0 my phenol structure Enter MacroModel input filename: <rtn>FORTRAN STOP
3) Look at the results of this conversion process or the PDB formatted file(s) generated by a program other than MacroModel. The Molscript software only looks at the ATOM and HETATM line of a PDB file. So all the other lines can be removed from the PDB formatted data file. If you only intend to do simple ball and stick or solid sphere (CPK) images, then no further changes need to be made to these PDB formatted data file(s). Any other forms of representation requires that at least the CA atoms be corrected to their proper PDB atom names. Mixed representations combining ball and stick sections with secondary structures require careful editing to accomplish as does the displaying of DNA structures.
In the mmodpdb conversion process the atom names are written out as MacroModel compatible names, not in standard PDB protein atom names. Therefore you must change the second atom in the listing for each residue to CA instead of its current name. The conversion process is not consistent and not all the atoms in the second position of a residue stack will have the same atom name. This is particularly true of multiple chain protein data sets. Take care when correcting this type of PDB formatted data file.
4) Work out the type of image you wish to produce and how to go about it. The information you wish to show may be a number of different treatments of the same data file. If the structure is composed of a number of chains, record their designations and use them in the instruction file. Record the names of any included non protein structures as well.
5) The Molscript system requires that an instruction file be created to provide the information for handling the PDB formatted data file. The creation of this file is best handled with the editor you feel most comfortable with, either eve on model1 or pico on ribozyme. A number of example instruction files have been given the logical names, mol_ins1, mol_ins2, mol_ins3 and mol_ins4 on model1 and called mol_ins1.in, mol_ins2.in, mol_ins3.in and mol_ins4.in on ribozyme. On ribozyme these example files are located in the $GRAD_DIR/week10m location.
Mol_ins1 is a simple Molscript instruction file for crambin. It contains just the instructions necessary to create a Jane Richardson protein image of the data and color the helixes red, the sheets green and the turn blue. It uses rgb values to make the color selection. There is no labeling attempted in this instruction file.
Mol_ins2 is a more complex instruction file for the human HIV protease molecule in which a Jane Richardson protein image surrounds a CPK model of the molecule's substrate. There is no coloring of the information. The CPK version of the substrate is in the atom default colors of the program. Again there is no labeling attempted in this file. It does show the treatment of different chains in a protein, however.
Mol_ins3 is the instruction file for a series of 6 color Jane Richardson protein images all on the same page. It shows how to divide up a single page to contain multiple images.
Mol_ins4 is the instruction file example of two images on the same page with labels.
6) Create an instruction file to produce the desired image of your data. Use the example instruction files as guides for creating your own files. The basic idea behind this file is to select the desired PDB formatted data file to use for the picture, move the coordinates around on the page to the proper position and then produce the desired image. Such an instruction file should use an exclamation point to start comment lines, have a plot line at the start of the file and an end_plot line at its end. All lines between the plot and end_plot lines should end with a semi-colon. In the simple example of an instruction file given below a crambin PDB formatted file is read into the program, the data set is centered on the page, and a coil representation of the protein is created.
! this is an attempt at using molscript plot read mol "1crn.pdb"; transform atom * by centre position atom *; coil from 1 to 46; end_plot
Adding the following lines to the example file would add a label to the image. The label would be below the centered below the structure. Depending on the size of the structure, the values currently -15.0 and 24.0 might have to be adjusted to produce an ideal image.
set depthcue 0.0, labelsize 24.0; label 0.0 -15.0 0.0 "Crambin";
7) With PDB formatted data files and instruction files in hand, you can start to use the Molscript program. Molscript only exists on ribozyme, so if you have generated data on model1 for your image, it will have to be ftp-ed over to that platform.
Each type of image has its own challenges. Simple typing mistakes will prevent the program from generating an image. Watch the progress of the program as it runs. This will give you clues about problem spots in your instruction file.
Molscript uses standard UNIX input and output symbols to direct the flow of its operation. The < symbol means that the filename that follows is to be used as the input file. The > symbol denotes the desired name of the output file. A typical command line for using Molscript would be as follows. User input is shown in bold type.
% molscript <crn.in> crn.ps
The following screen trace was produced by that command line. It indicates that a 46-residue structure was read in composed of 327 atoms. All these atoms were used to center the structure. The resulting protein was represented by a coil. The program ended normally.
MolScript v1.4 (C) 1993 Per Kraulis email: per.kraulis@sto.pharmacia.se ref: P J Kraulis, J Appl Cryst (1991) 24, 946-950. reading PDB file... 46 residues and 327 atoms read into molecule mol 327 atoms selected for position 327 atoms selected for transform matrix applied: 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 translation vector applied: -9.268826 -9.787284 -6.967090 coil from 1 to 46 setting window to 29.00 setting slab to 24.04 writing graphical segments to PostScript output... 0 lines, 1 spheres, 270 planes, 0 sticks and 1 labels written
Carefully checking this screen trace can point out problems. If you were attempting to create a ball and stick representation and got a screen trace that said that zero atoms were selected for ball-and-stick, then you would know that it didn't work.
8) When you feel that everything you want has been generated in the image, either create a hard copy of the results on the printer or view it on the terminal by using a xterm session and the Ghostscript (gs) program.
Without an image you are working blind. The only way you know that everything has worked is to examine your generated output carefully. There are times when the image is not positioned correctly or is too small. You can correct these problems. The image can be moved on the screen with translation and rotation commands. Colors can be changed, titles or labels added as well as the molecule centered according to your own tastes. Use the Molscript documentation or contact Susan Johns for assistance.
Image location problems
The portion of the instruction file that controls the location of the image is at the beginning of the file in the transform section. The example files only use the transform function centre. That is really how it is spelled (standard British English). Data can also be transformed using translation and rotation functions.
Image size problems
Controlling the size of an image is usually not a problem with this software. It normally automatically scales things for you. Problems could arise when attempting to do multiple data sets on one page. This is either controlled through modifying the initial data set or controlling the size of the area that the image is to appear in using the area function. See the mol_ins3 example instruction file for one solution to this problem.
Coloration problems
Coloring an image to produce the most dramatic picture takes some effort. Colors that look great on the display screen do not always show up well on the gray scale hard copy versions or in the flat colors of the postscript printers. The required instruction file can use three different methods of assigning colors. The program knows about a limited number of colors; red, blue, cyan, green, yellow, white, purple and grey. Other colors can be obtained by using either an rgb scale or a E&S color scheme. Consult the various instruction files for examples on doing this. There are no simple generalizations that will work to solve these problems.
Labeling problems
At times you might want to add things to your file that weren't there in the original data. An example of such an add-on would be a colored dot to show the nature of a molecular coloring scheme. These types of changes are done by actually editing the generated postscript file instead of making changes in the instruction file.
Week 10 Exercise
This exercise will acquaint you with using the Molscript software for various data visualization tasks. You will learn how to work with regular PDB files and MacroModel PDB formatted data files to generate images. A wide variety of data sets will be used to give you a broad spectrum of experience.
1) Log into ribozyme.
From the Launcher window, select the X RIBOZYME icon and press the mouse button twice. Successful connection to ribozyme is shown by the appearance of a ribozyme information line and a login: prompt.
2) Create a subdirectory to keep this week's work in.
Create the following subdirectory in your account to store this week's computing activities.
% mkdir week10
Now move over into this subdirectory location.
% cd week10
Copy over the data files needed for this week's activities.
% cp $GRAD_DIR/week10m/*.* .
3) Generate a simple image.
Simple organic molecules are easy to work with and provide good practice for learning about the fundamentals of the Molscript program. Usually such molecules are the result of a minimization process with a piece of modelling software such as MacroModel. As explained in the beginning of this exercise, the mmodpdb program converts MacroModel program data file into a PDB formatted file, but doesn't use the proper atom names for proteins. These files work well for simple organic molecules, however, where there are no real atom name requirements.
One such simple organic data file is the one for phenol that you copied over in section 2. This file has been stripped down to just its ATOM lines since those are the only lines in a PDB data file that the Molscript program is interested in. Look closely at the copy of this file on the next page. It can help you understand the nature of PDB files and the way their data is organized.
Coordinate data lines either start with ATOM or HETATM. The next item in the line is the number of the atom in the structure. This is followed by the atom name. In the example line given below the atom name is H01. Normally the name item in the line is the residue name of the amino acid that the atom is in. Since this is not a protein, the mmodpdb program used the term UNK for the residue name. Next would come the chain designation. In this case X was used. A residue number would normally follow, the example shows a 0. The x, y and z coordinate values follow.
ATOM 1 H01 UNK X 0 8.728 10.464 0.000 1.00 0.0000 0
Look closely at the following file. Record generic atom names for the hydrogen, carbon and oxygen atoms shown in the space provided.
ATOM 1 H01 UNK X 0 8.728 10.464 0.000 1.00 0.0000 0 ATOM 2 O02 UNK X 0 7.864 10.021 0.000 1.00 0.0000 0 ATOM 3 C03 UNK X 0 8.017 8.666 0.000 1.00 0.0000 0 ATOM 4 C04 UNK X 0 6.886 7.849 0.000 1.00 0.0000 0 ATOM 5 C05 UNK X 0 7.021 6.462 0.000 1.00 0.0000 0 ATOM 6 C06 UNK X 0 8.290 5.885 0.000 1.00 0.0000 0 ATOM 7 C07 UNK X 0 9.423 6.697 0.000 1.00 0.0000 0 ATOM 8 C08 UNK X 0 9.287 8.085 0.000 1.00 0.0000 0 ATOM 9 H09 UNK X 0 5.881 8.303 0.000 1.00 0.0000 0 ATOM 10 H10 UNK X 0 6.123 5.820 0.000 1.00 0.0000 0 ATOM 11 H11 UNK X 0 8.398 4.786 0.000 1.00 0.0000 0 ATOM 12 H12 UNK X 0 10.428 6.242 0.000 1.00 0.0000 0 ATOM 13 H13 UNK X 0 10.189 8.721 0.000 1.00 0.0000 0
generic atom names: __________________________________________________
Create the following instruction file for phenol with pico, calling it phenol.in. This file will read in the PDB data file, center the data and then draw a ball and stick image of the molecule. The sixth and seventh lines of the file are used to create a label for the image.
! doing the basics with a small organic file plot read mol "phenol.try-pdb"; transform atom * by centre position atom *; ball-and-stick in type UNK; set depthcue 0.0, labelsize 20.0; label 0.0 -5.0 0.0 "phenol"; end_plot
The fifth line of this file could also have been as given below since there is only one molecule in the data set.
ball-and-stick in type *;
Ball and stick drawings use the default atom colors in the program to color their generated images unless you tell it to do otherwise. So do CPK drawings. The default colors are, carbon - black, hydrogen - white, oxygen - red and nitrogen - blue. The default color for the plane of the image is white, therefore the bonds connecting the various atoms will be white.
After you have created the instruction file, run the Molscript program on it using the following command.
% molscript < phenol.in > phenol.ps
Watch the screen trace as the program runs. If the following two lines don't appear at the end of the run, get help from your instructor. Problems in such a simple file usually result from typing errors or forgetting the semi-colon.
writing graphical segments to PostScript output... 0 lines, 13 spheres, 13 planes, 0 sticks and 1 labels written
If the run goes well, then use ghostscript (gs) to view your results.
% gs phenol.ps
Copy your successful instruction file to phenol-cpk.in. Use pico to change the ball-and-stick line into the following one. Change the label to reflect that it is now a CPK image.
cpk in type *;
Run your new file through Molscript. Watch the progress of the run and then use ghostscript (gs) to view your results. For problems, contact your instructor.
% molscript < phenol-cpk.in > phenol-cpk.ps % gs phenol-cpk.ps
With two simple images completed, attempt coloring the image in something other than default atom colors. A number of colors come pre-defined in the program. These are black, white, red, blue, yellow, green, cyan, and purple. Colors can be defined globally, by residue type or atom name.
In this attempt at using color, do it by atom name using variations of the line given below. Copy your phenol-cpk.in file to phenol-cpk2.in. Use pico to add the necessary three lines to color each of the three different atoms found in phenol a different color. Change the label to reflect these colors.
set atomcolour atom X* cyan;
Run your revised file through Molscript. Watch the progress of the run and use ghostscript (gs) to view your results. For problems, contact your instructor.
% molscript < phenol-cpk2.in > phenol-cpk2.ps % gs phenol-cpk2.ps
Rename the result file when you are satisfied with it to (your lastname)-pcpk2.ps.
4) Working with a slightly bigger organic molecule.
The phenol data you have been using was a conversion product. Now use an actual small organic molecule from PBD. The file to use is 9lyz.full. Look at this file with the more command. Record below the various types of atom names you found: chain names, and residues types.
% more 9lyz.full atom names: _____________________________________________________________ chain designations: _____________________________________________________ residue types: __________________________________________________________
In a real PDB file, non protein coordinate data lines start with the
term HETATM. Copy your successful phenol.in file to
9lyz.in and use this as the basis of your editing efforts to get an
image of the NAM-NAG-NAM substrate.
Start by changing the name of the data file. Then change the label for the image, something like 9lyz colored. Choose a color for the NAM atoms and another for the NAG. Since this is a ball and stick model you need to set the color for the bonds between the atoms. This is done with a planecolour line like the one given below.
set planecolour cyan;
After the planecolour line, set atomcolours for the various generic atom names included in the NAM molecule. Follow this with a ball-and-stick line. With NAM complete, put in a planecolour line for NAG and its required set atomcolour lines. Finish it off a ball-and-stick line. You might want to drop the label to a -10.0 value instead of using the -5.0 that worked in the phenol image.
Run your instruction file through Molscript. If no problems show up, use ghostscript (gs) to view your results.
% molscript < 9lyz.in > 9lyz.ps % gs 9lyz.ps
Looks good doesn't it. With this image in hand, attempt a data rotation to see what that looks like. Copy your 9lyz.in file to 9lyz2.in and then copy 9lyz.full to 9lyz2.full. Make the necessary changes to reflect the changes in the name of the data file and the type of produced output. To rotate a data set, enter a second command in the transform area of the instruction file. Rotate the structure -60.0 degrees around the y axis. An appropriate line is given below.
transform atom * by rotation y -60.0;
Run your revised instruction file through Molscript. If no problems show up, use ghostscript (gs) to view your results.
% molscript < 9lyz2.in > 9lyz2.ps % gs 9lyz2.ps
Suddenly things don't look so good. Our structure should be connected. The above approach colored a residue and then created ball and stick image of that residue. Two steps create what appear to be three totally unrelated molecules. Fear not, with some clever editing this to can be corrected.
This time the changes need to occur in the PDB file. The atoms that actually make the connections between the three parts are atoms 1-27-20 for the first bridge and 20-42-37 for the second one. Make copies of these lines and then move them to the bottom of the coordinate data. Change the names of the atoms from NAM to either CX or OX depending on what they are and the names of the atoms from NAG to CY or OY. Change the residue name for these 6 atoms to NAX.
Back in the instruction file add some more set atomcolour lines. This time set the color in the X atom names to cyan and the Y atom names to green. These changes should be followed by a ball-and-stick call for the NAX residue.
Run your changed instruction file through Molscript. If no problems show up, try viewing your results with ghostscript (gs).
% molscript < 9lyz2.in > 9lyz2.ps % gs 9lyz2.ps
Actually you have to be very sharp to catch the problem that still remains. It doesn't look too bad. However, the color of the bond between atoms 37 and 42 is wrong. It is green when it should be cyan. This too can be fixed.
Go back into the 9lyz2.full file. Make copies of the 37 and 42 atom lines at the very end of the coordinate data. You will need to have 37 before 42. Change the residue type for these two new lines from NAX to NAY and change their residue number from 3 to 4.
Add to the instruction file four lines at the bottom of the file before the label section. First set planecolour to cyan, second, have lines to set the atomcolour in CX and OX to cyan, and then a ball and stick call for the NAY residue.
Try another run through Molscript. If no problems show up, use ghostscript to view your results.
% molscript < 9lyz2.in > 9lyz2.ps % gs 9lyz2.ps
This time all should be right with the image. Knowing how a piece of software works and effectively editing data can make the most awkward data sets behave properly. Rename the resulting file when you are satisfied with it to (your lastname)-lys2.ps.
5) Working with a small protein file.
Proteins can represent interesting problems when using Molscript. At times a user is not aware that what is on the screen is actually two or more chains connected by a disulfide bridge or just a number of intertwined chains. In the case of PDB files the chains are always denoted with a chain designator. The best way to determine if there are chains in a PDB file is to grep the file for its SEQRES lines. For this study we will be using the 2mlt.pdb file.
% grep SEQRES 2mlt.pdb
This results shows that there are two chains in the structure. Each appears to be 27 residues long, but the NH2 for the last residue name is not standard. It is best to check the actual number of alpha carbons in the data set. Use the following command.
% grep " CA " 2mlt.pdb
There are only 26 alpha carbons for each chain listed. Since the program actually uses alpha carbon lines to generate secondary structural elements, their actual number is the key to having a successful run. With some proteins data sets, the entire sequence does not have coordinate data, so checking on the actual number of the alpha carbons contained in the data is important.
Use grep again to get the secondary structure assignments. Record these assignments below.
% grep HELIX 2mlt.pdb % grep SHEET 2mlt.pdb % grep TURN 2mlt.pdb helix locations: ________________________________________________________ sheet locations: ________________________________________________________ turn locations: _________________________________________________________
Copy your phenol.in file to the name mel.in and use that file for your starting instruction file in the melittin visualization process. Change the name of the PDB file and the label for the image to melittin. Replace the ball and stick line with two lines to produce a coil image of the two chains. An example of this line is given below.
coil from A1 to A26;
Use your instruction file in Molscript to get an image. Watch to see if there are any problems with your instructions. If not, view the results with ghostscript.
This output is the bare bones of an image that the available data can provide. Re-edit the file to add in the secondary structure information that you collected earlier. Remember to have the coil segments connect all the secondary elements.
Now look once more at the actual PDB file. Does it contain any other data in it besides the protein information? Use grep to look for HETATM lines. Record your findings below.
% grep HETATM 2mlt.pdb HETATM results: _________________________________________________________
Revise your melittin instructions file to contain this new information. Create ball and stick drawings of the new data.
Re-submit your instruction file to Molscript to get an image. Watch to see if there are any problems. If not, use ghostscript to view the results.
By controlling the order in which you create plane colors, you can selectively color graphical elements that are composed of planes, i.e., the bonds in ball and stick drawings, and secondary structural elements (coils, turns, sheets and helices). Using this fact, rearrange the order of your existing melittin instruction file to have the following things happen:
1) The coil sections are colored white.
2) The ball and stick drawings appear in default atom colors with white bonds connecting the atoms.
3) Color each of the helices a different color using the pre-defined colors listed on page 10 and planecolour calls.
Pass your modified melittin instruction file through Molscript and generate an output file called (your lastname)-mel2.ps. Check to see that the process proceeded normally and then view your results. Stop working on this file when the image matches the description of what is expected.
6) Working with a MacroModel protein file.
As noted before, the mmodpdb program converts protein data into a file that works but contains nonstandard atom names. For Molscript to generate a secondary structure, a file must contain CA lines. Use the more command on the 2mlt-mmod.pdb file and see for yourself the nature of the problem.
% more 2mlt-mmod.pdb
Study this file carefully and devise a strategy for converting the data contained therein into a usable PDB file for Molscript. Record below the steps you would take to make this conversion.
conversion strategy: _______________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
7) Making a complex protein image.
Sometimes an image needs to be very complex in order to get the desired point across. For this study you will be using an insulin data set, ins-1.pdb. This file was generated by mmodpdb, but is already modified to contain the necessary CA lines to allow secondary structural element display.
Color code the various chains of the data set, show the determined secondary structural elements, do a CPK drawing of the included irons in the structure and show the inter-chain disulfide bridges in the structure. To do this will require getting additional information from both the PDB data file and its associated NRL_3D data files.
Use grep on the PDB data file to determine the number of chains, their starting and ending amino acids and their respective lengths.
% grep " CA " ins-1.pdb grep results: ______________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
Activate the GCG software suite and then use the typedata command to
look at the contents of the NRL_3D data files for these same chains. The name
of these files is 2insx, where x is the name of the chain. Items
of interest are secondary structure assignments (ignore turns), disulfide bond
locations and in particular any interchain disulfide bonds. Record the
information you find below.
% gcg % typedata nrl_3d:2insxtypedata information: ______________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
Copy the melittin instruction file to serve as the basis of this
visualization process. Edit it down to change the name of the data set to
be used and its title. Do the following steps. First, generate coil
structures for the protein chains in the structure and give each of them a
different color. Second, add the secondary structure assignments to the
chains. Run your evolving instruction file through Molscript to
make sure that it is behaving properly.
There are two iron atoms in this structure. Use grep to confirm this. Notice that the current residue name associated with them is PHE. If you were to try to do a CPK drawing of them there would be a lot of undesired PHE CPK structures in the image. Go into the PDB file and change the residue name associated with the iron lines to XXX. Then revise the instruction file to set an atom color for the iron and call for CPK in type XXX.
% grep FE ins-1.pdb
Not all the CYS residues in the data set are involved with interchain bonds. Go into the PDB data file and change the residue name of those that aren't to CYX. The next step is more complex. The final image is to have the alpha carbon and the side chain atoms of the interchain cysteines displayed in a distinct color from the rest of the structure so you can see them. To do this requires customizing the CYS residues of the data file.
Edit the involved CYS residues. Remove the N line for the residue. Delete the C and O lines that follow the normal CA line from the residue. Make a copy of the CA line and change the atom name in this new line to CX. Change the atom names of the remaining two unchanged lines to CX and SX instead of their current atom names. Your resulting CYS residues should now be composed of a CA line, two CX lines and a SX line.
Change the instruction file to add a section before the label to put in the cysteine side chain instructions. Set the planecolour and then the atomcolours for the three different atom names in the modified CYS residues. Call for ball and stick in type CYS. Run the instructions file through Molscript. Check the progress of the program. Use ghostscript to look at your results. It is interesting to note that while all the CA atoms in the data set were selected for a new atom color, only those in the CYS residues were drawn with the ball and stick command.
Copy your final insulin output file to (your lastname)-ins.ps.
8) Working with a complex organic structure.
Working with converted PDB data from even a fairly simple organic molecule can be an editing nightmare depending upon the number of structures in the dataset and problems which might not appear until the final visualization phase.
MacroModel uses LP to stand for a lone pair and treats them just as it does atoms. In this case what is actually produced on the screen is a representation of a lone pair of electrons, a colon next to an atom. Think back to the organic structures you have entered, lone pairs often are associated with oxygens. While Molscript doesn't crash with these lines present, it uses them as some unknown atoms and puts them in the final image. Your resulting image is a lot more crowded than you remembered it.
Edit the following organic data file, sucrose-1.pdb, and remove its LP lines. Remove all the CONECT lines as well. The data file is now a more manageable size and contains only the atom lines with data you really want to have displayed.
Molscript ignores the CONECT lines of a PDB file and uses its own distance system to connect atoms. Most of the time this works very well, particularly with proteins. It does have a problem with hydrogens, however. Normally you don't do ball and stick drawings of entire proteins, only a small number of side chains. A normal PDB file doesn't include the hydrogens anyway.
In organic molecules, the location and existence of hydrogens does matter. Their bond length to carbon is so short that the Molscript program connects them to one another. A methyl group ends up with a bonded triangle structure on the carbon. To get around this problem requires you to modify the initial structure in the modelling software, to replace the problem hydrogens with another atom with a slightly longer bond length. The problem is selecting the proper atom to replace the hydrogens with. You may have noticed that the file you just edited contained some unusual fluorine atoms.
All is well until the data is in Molscript again. The common atoms expected to be found in a PDB file all have been given default atom colors and sizes. Therefore, if you have changed the atom name to get a longer bond length, you have changed the atom color and its size. Fortunately, you can adjust atom color and radius back to what it should be.
The sucrose-1.pdb file you have been working with has a lot more wrong with it than extra LP lines. Because of the way the data was entered, all the atoms for one molecule are not in the same place, but interlaced with data lines for the second molecule. This is very confusing. By looking at the actual atom numbers on the screen with MacroModel, this data file was better organized into two separate molecules with different residue names.
The result of all these changes are contained in the sucrose.pdb file. This time the troublesome hydrogen locations are named starting with an N. The two residue names are MOL and TTT. Copy over the phenol.in file to sucrose.in and use it as your starting point for the instruction file for this image. Edit this file to reflect the change in the input PDB file and the name of the structure (sucrose).
Change the usual centering line to be:
transform atom * by centre position atom * by translation -2.5 0.0 0.0;
Since N is a standard atom name, you can change its color and size with the following two lines to the values normally expected for hydrogen.
set atomcolour atom N* white; set atomradius atom N* 1.0;
The image you will produce is that of side and front views of the sucrose molecule. These views must to be labeled. The labels should appear under the two molecules. The image has the side view on the left of the image and the frontal view on the right. The following three lines will produce a starting point for fine tuning the actual location of these two labels. You will have to adjust the location of the sucrose label as well.
set depthcue 0.0, labelsize 10.0; label -9.5 -5.0 0.0 "side"; label 3.5 -5.0 0.0 "front";
Have the Mol residue name drawn as CPK model and the TTT residue name as ball and stick. By using the following color control line just before the last label command for the image, you can change its color from the default black to green.
set linecolour green;
Run your sucrose instruction file through the Molscript program. View the result with ghostscript and use it to fine tune your image. You want the side and front labels to be in smaller print than the sucrose label. The sucrose label should be at the bottom of the final image.
When you are satisfied with your efforts, copy the final output postscript file to (your lastname)-sug.ps.
9) Dividing up the page for multiple images.
More than a single image can be placed on a page. This is done by dividing the pages up into pre-defined areas and using an area call to restrict a given image to that designated part of the page. Think of the physical page as having the following dimensions (x value, y value).
(0.0, 800.0) ________________________________________ (600.0, 800.0)
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
(0.0, 0.0) ________________________________________ (600.0, 0.0)
Molscript puts a bounding box around its images. This box is 49 units in from the edge of the physical sheet of paper. You can have images extend into this bounding area as long as the intrusion isn't too far. Normally consider the working page to start at 50.0, 50.0 and end at 550.0, 750.0.
When you want to put multiple images on one page, use one of the following approaches.
1) Go back to your modelling software and create a data set containing all the desired molecules. Keep track of the number of atoms in each molecule so that you can tell them apart after the conversion progress and give them residue names to tell them apart.
2) Divide up the working area of the page to contain the desired images. Dividing up the page allows you to rotate separate image coordinate data sets if needed.
For this week's exercise, the task will be to divide up the page to hold the desired images. The task at hand is to show two related images of the insulin material that you created earlier. The first image (on the top of the page) will be the original view of the data and the second image (on the bottom of the page) is rotated 90 degrees in the y direction (refer back to page 11 for instructions on doing this).
First, copy your existing instruction file to another name. Then edit this second instruction file to transform the coordinate data in the desired way and modify its label to denote the orientation change. Now merge the two files into one by using the following example command line. In this command line, ins1.in stands for your first insulin instruction file, ins2.in your second one and ins3.in the resulting combined file.
% cat ins1.in ins2.in > ins3.in
Decide how you want to cut up the page. Have at least a 50-unit dividing space between the two images. You will need to have a lower left-hand side of the desired area's x and y values, plus a set for the upper right-hand side point. Record your desired locations' starting and ending points below.
image one values: ________________________________________________ image two values: ________________________________________________
Now edit your combined file to put in the area restrictions for each of the
two images. This line should go before the read line in that portion of
the file that deals with a given image. The format of the area line first
lists the lower left-hand side starting point and then the upper right-hand
ending point. Values are given first for the x position and then the y. An
example area line would be as follows:
area 50.0 50.0 275.0 250.0;
Run your final instruction file through Molscript and view your results. If you are satisfied with it, copy the postscript file to (your lastname)-page.ps .
10) Using your experience.
Now that you have created a number of Molscript images, look at the following file, ethid.in, and figure out what sort of image it would produce. Record your observations below.
% more ethid.in image observations: _________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________
11) Finishing up
Rename the report form for this exercise and use pico to fill it in. Then rcp the following file you have created over to the teacher account.. At the end of this exercise is a copy of the documentation for the Molscript program .
% mv week10m.week10m (your lastname).week10m % pico (your lastname).week10m % rcp (your lastname).week10m teacher@ribozyme:receive % rcp (your lastname)-pcpk2.ps teacher@ribozyme:receive % rcp (your lastname)-lys2.ps teacher@ribozyme:receive % rcp (your lastname)-mel2.ps teacher@ribozyme:receive % rcp (your lastname)-ins.ps teacher@ribozyme:receive % rcp (your lastname)-sug.ps teacher@ribozyme:receive % rcp (your lastname)-page.ps teacher@ribozyme:receive
This concludes your computing session for this week. Log off the computer.
Now exit the emulator program by selecting Quit from the File menu of the control bar.
Per J. Kraulis, "MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures", Journal of Applied Crystallography (1991) vol 24, pp 946-950.