Universität Bielefeld - Technische Fakultät - AG Praktische Informatik - FSPM - Strukturbildungsprozesse
GSF Research Center - GSF, AG BioDV - GSF, Institute of Biomathematics and Biometry


DIALIGN 2

User Guide


Developed by:



at GSF, AG BioDV, University of Bielefeld - FSPM, GSF, Institute of Biomathematics and Biometry and North Carolina State University, Department of Genetics


E-mail contact: morgenstern@gsf.de



Important Note:

Use of DIALIGN 2 is subject to the copyright notice (COPYRIGHT).

Distribution of copies of this user guide to all users of the program is explicitly encouraged since it will facilitate using DIALIGN.



References:

The basic ideas behind DIALIGN are described in
The main improvement of DIALIGN 2 - compared to the first version of the program - is described in


Program Input:

There are two ways to run DIALIGN on your computer: You can run the program interactively or you can enter parameters via command line. In either case, sequences must be contained in a single

Sequence file:

DIALIGN requires an ASCII file containing the sequences to be aligned. Four different file formats are supported: IG, FASTA, EMBL and GCG-RSF format. The following is an example of the FASTA sequence file format:


        >HTL2  
        LDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLRAAKPWP
        SLNIFLDSKYLIKYLHSLAIGAFLGTSAHQTLQAALPPLLQGKTIYLHHVRSHTNLPDPI
        STFNEYTDSLILAPL
        >MMLV   
        PDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALDAGTSAQRAELIALTQALKMAE
        GKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIH
        CPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL
        >HEPB 
        RPGLCQVFADATPTGWGLVMGHQRMRGTFSAPLPIHTAELLAACFARSRSGANIIGTDNS
        VVLSRKYTSFPWLLGCAANWILRGTSFVYVPSALNPADDPSRGRLGLSRPLLRLPFRPTT
        GRTSLYADSPSVPSHLPDRVH
        >ECOL   
        MLKQVEIFTDGSCLGNPGPGGYGAILRYRGREKTFSAGYTRTTNNRMELMAAIVALEALK
        EHCEVILSTDSQYVRQGITQWIHNWKKRGWKTADKKPVKNVDLWQRLDAALGQHQIKWEW
        VKGHAGHPENERCDELARAAAMNPTLEDTGYQVEV



The first line for each sequence starts with ">" and contains the name of the sequence.

Options:

Entering parameters via command line:

If you want to enter options via command line, the program call is as follows:

 
  dialign [ -fn "out_file" ; -pln x ; -n ; -nt ; -sto ; -thr x ] "seq_file"
  
Here, "seq_file" is the name of the input sequence file. The meaning of the (optional) parameters in brackets is as follows:

-fn "out_file" output file is named "out_file"
-pln x maximum number of `*' characters representing
degree of local similarity among sequences = x
-n input sequences are nucleic acid sequences. No translation of diagonals.
-nt input sequences are nucleic acid sequences and
`nucleic acid diagonals' are translated into `peptide diagonals'.
-sto Results are written to the standard output.
-thr x Threshold T = x.


Similarity Matrix:

DIALIGN 2 uses the BLOSUM62 amino acid substitution matrix. In the current version, it is not possible to replace BLOSUM62 by other similarity matrices, since the probability values contained in the files n_prob and p_prob refer to the BLOSUM62 matrix.


Program Output:


DIALIGN creates a file containing


This is DIALIGN alignment format:

  
HTL2          1   ldtapcLFSD GS------PQ KAAYVLWDQT IL---QQDIT PLPSHethSA
MMLV          1   pdadhtwYTD GSSLLQEGQR KAGAAVTTET eviwaKALDA G---T---SA
HEPB          1   rpglcQVFAD AT------PT GWGLVMGHQR MR---GTFSA PLPIHt----
ECOL          1   mlkqvEIFTD GSCLGNPGPG GYGAILRYRG RE---KTFSA GytrT---TN
                                                                
                       ***** ********** ********** **   ***** *****   **
                        **** **      ** ********** **   ***** *****   **
                         *** **      ** ********** **   *****           
                                     ** ******                          
                                                                        


HTL2         42   QKGELLALIC GLRAAKPWPS LNIFLDSKYL IKYLHslaig aflgtsah--
MMLV         45   QRAELIALTQ ALKMAEgkk- LNVYTDSRYA FATAHIHGEI YRRRGLLTSE
HEPB         38   --AELLAACF Arsrsgan-- -IIGTDN--- ---------- ----------
ECOL         45   NRMELMAAIV ALEALKEHCE VILSTDSQYV RQGITQWIHN WKKRGWKTAD
                                                                
                  ********** ********** ********** ********** **********
                  ********** ********** ********** ********** **********
                     ******* ******     ********** *****                
                     ******* ******     ********** *****                
                                          ********                      


HTL2         90   -------QT- --LQAALPPL LQGKTIYLHH VRSHT----- -NLPDPISTF
MMLV         94   GKEIKNKDE- --ILALLKAL FLPKRLSIIH CPGHQ----- -KGHSAEARG
HEPB         60   ---------- ---SVVLSR- ---------- ---KYTSFPW LLGCAANWI-
ECOL         95   KKPVKNVDlw qrLDAALGQ- ---------- ---HQIKWEW VKGHAGHPE-
                                                                
                  *********    ******** ********** ********** **********
                  ********                                              
                         *                                              
                                                                        
                                                        


HTL2        124   NEYTDSLILA pl-------- ---------- ---------- ----------
MMLV        135   NRMADQAARK AAITETPDTS tll------- ---------- ----------
HEPB         82   LRGTSFVYVP SALNPADDPS rgrlglsrpl lrlpfrpttg rtslyadsps
ECOL        130   NERCDELARA AAMNPTledt gyqvev---- ---------- ----------
                                                                
                  ********** **********                                 
                  ********** ******                                     
                                                                        
                                                                        
                                                                        


HTL2        136   ----------
MMLV              ----------
HEPB        132   vpshlpdrvh
ECOL        156   ----------
                    



This is FASTA alignment format:


>HTL2
ldtapcLFSDGS------PQKAAYVLWDQTIL---QQDITPLPSHethSA
QKGELLALICGLRAAKPWPSLNIFLDSKYLIKYLHslaigaflgtsah--
-------QT---LQAALPPLLQGKTIYLHHVRSHT------NLPDPISTF
NEYTDSLILApl--------------------------------------
----------
>MMLV
pdadhtwYTDGSSLLQEGQRKAGAAVTTETeviwaKALDAG---T---SA
QRAELIALTQALKMAEgkk-LNVYTDSRYAFATAHIHGEIYRRRGLLTSE
GKEIKNKDE---ILALLKALFLPKRLSIIHCPGHQ------KGHSAEARG
NRMADQAARKAAITETPDTStll---------------------------
----------
>HEPB
rpglcQVFADAT------PTGWGLVMGHQRMR---GTFSAPLPIHt----
--AELLAACFArsrsgan---IIGTDN-----------------------
-------------SVVLSR--------------KYTSFPWLLGCAANWI-
LRGTSFVYVPSALNPADDPSrgrlglsrpllrlpfrpttgrtslyadsps
vpshlpdrvh
>ECOL
mlkqvEIFTDGSCLGNPGPGGYGAILRYRGRE---KTFSAGytrT---TN
NRMELMAAIVALEALKEHCEVILSTDSQYVRQGITQWIHNWKKRGWKTAD
KKPVKNVDlwqrLDAALGQ--------------HQIKWEWVKGHAGHPE-
NERCDELARAAAMNPTledtgyqvev------------------------
----------


This is PHYLIP tree format:

 
((HTL2:0.111024,
(MMLV:0.078471,
ECOL:0.078471):0.032554):0.121218,
HEPB:0.232242);


Trees can be visualized using the
treetool program contained in the PHYLIP software package.


burkhard morgenstern (last modified: May 11, 1998)