DIALIGN 2
User Guide
Distribution of copies of this user guide to all users of the program is explicitly encouraged since it will facilitate using DIALIGN.
>HTL2
LDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLRAAKPWP
SLNIFLDSKYLIKYLHSLAIGAFLGTSAHQTLQAALPPLLQGKTIYLHHVRSHTNLPDPI
STFNEYTDSLILAPL
>MMLV
PDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALDAGTSAQRAELIALTQALKMAE
GKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIH
CPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL
>HEPB
RPGLCQVFADATPTGWGLVMGHQRMRGTFSAPLPIHTAELLAACFARSRSGANIIGTDNS
VVLSRKYTSFPWLLGCAANWILRGTSFVYVPSALNPADDPSRGRLGLSRPLLRLPFRPTT
GRTSLYADSPSVPSHLPDRVH
>ECOL
MLKQVEIFTDGSCLGNPGPGGYGAILRYRGREKTFSAGYTRTTNNRMELMAAIVALEALK
EHCEVILSTDSQYVRQGITQWIHNWKKRGWKTADKKPVKNVDLWQRLDAALGQHQIKWEW
VKGHAGHPENERCDELARAAAMNPTLEDTGYQVEV
The first line for each sequence starts with ">" and contains the
name of the sequence.
The user can decide if nucleic acid or protein sequences are to be aligned.
As described in our papers, the program DIALIGN constructs alignments from gapfree pairs of similar segments of the sequences. Such segment pairs are referred to as `diagonals'.
Every possible diagonal is given a so-called weight reflecting the degree of similarity among the two segments involved. The overall score of an alignment is then defined as the sum of weights of the diagonals it consists of and the program tries to find an alignment with maximum score -- in other words: the program tries to find a consistent collection of diagonals with maximum sum of weights. This novel scoring scheme for alignments is the basic difference between DIALIGN and other global or local alignment methods. Note that DIALIGN does not employ any kind of gap penalty.
It is possible to use a threshold T for the quality of the diagonals. In this case, a diagonal is considered for alignment only if its `weight' exceeds this threshold. Regions of lower similarity are not aligned.
In the first version of the program (DIALIGN 1), this threshold was in many situations absolutely necessary to obtain meaningful alignments. By contrast, DIALIGN 2 should produce reasonable alignments without a threshold, i.e. with T = 0. This is the most important difference between DIALIGN 2 and the first version of the program.
Nevertheless, it is still possible to use a threshold T, so it is up to the user to experiment with this option.
If (possibly) coding nucleic acid sequences are to be aligned, DIALIGN optionally translates the compared `nucleic acid segments' to `peptide segments' according to the genetic code -- without (necessarily) presupposing any of the three possible reading frames, so all three of them get checked for significant similarity. In this case, the similarity among segments will be assessed on the `peptide level' rather than on the `nucleic acid level'.
We strongly recommend this option if nucleic acid sequences are expected to contain protein coding regions, as it will significantly increase the sensitivity of the alignment procedure in such cases.
The user can specify the maximum number of `*' characters per column indicating the degree of local similarity among sequences in the DIALIGN alignment.
dialign [ -fn "out_file" ; -pln x ; -n ; -nt ; -sto ; -thr x ] "seq_file"Here, "seq_file" is the name of the input sequence file. The meaning of the (optional) parameters in brackets is as follows:
| -fn "out_file" | output file is named "out_file" |
| -pln x | maximum number of `*' characters representing |
| degree of local similarity among sequences = x | |
| -n | input sequences are nucleic acid sequences. No translation of diagonals. |
| -nt | input sequences are nucleic acid sequences and |
| `nucleic acid diagonals' are translated into `peptide diagonals'. | |
| -sto | Results are written to the standard output. |
| -thr x | Threshold T = x. |
Program Output:
DIALIGN creates a file containing
This is DIALIGN alignment format:
HTL2 1 ldtapcLFSD GS------PQ KAAYVLWDQT IL---QQDIT PLPSHethSA
MMLV 1 pdadhtwYTD GSSLLQEGQR KAGAAVTTET eviwaKALDA G---T---SA
HEPB 1 rpglcQVFAD AT------PT GWGLVMGHQR MR---GTFSA PLPIHt----
ECOL 1 mlkqvEIFTD GSCLGNPGPG GYGAILRYRG RE---KTFSA GytrT---TN
***** ********** ********** ** ***** ***** **
**** ** ** ********** ** ***** ***** **
*** ** ** ********** ** *****
** ******
HTL2 42 QKGELLALIC GLRAAKPWPS LNIFLDSKYL IKYLHslaig aflgtsah--
MMLV 45 QRAELIALTQ ALKMAEgkk- LNVYTDSRYA FATAHIHGEI YRRRGLLTSE
HEPB 38 --AELLAACF Arsrsgan-- -IIGTDN--- ---------- ----------
ECOL 45 NRMELMAAIV ALEALKEHCE VILSTDSQYV RQGITQWIHN WKKRGWKTAD
********** ********** ********** ********** **********
********** ********** ********** ********** **********
******* ****** ********** *****
******* ****** ********** *****
********
HTL2 90 -------QT- --LQAALPPL LQGKTIYLHH VRSHT----- -NLPDPISTF
MMLV 94 GKEIKNKDE- --ILALLKAL FLPKRLSIIH CPGHQ----- -KGHSAEARG
HEPB 60 ---------- ---SVVLSR- ---------- ---KYTSFPW LLGCAANWI-
ECOL 95 KKPVKNVDlw qrLDAALGQ- ---------- ---HQIKWEW VKGHAGHPE-
********* ******** ********** ********** **********
********
*
HTL2 124 NEYTDSLILA pl-------- ---------- ---------- ----------
MMLV 135 NRMADQAARK AAITETPDTS tll------- ---------- ----------
HEPB 82 LRGTSFVYVP SALNPADDPS rgrlglsrpl lrlpfrpttg rtslyadsps
ECOL 130 NERCDELARA AAMNPTledt gyqvev---- ---------- ----------
********** **********
********** ******
HTL2 136 ----------
MMLV ----------
HEPB 132 vpshlpdrvh
ECOL 156 ----------
This is FASTA alignment format:
>HTL2
ldtapcLFSDGS------PQKAAYVLWDQTIL---QQDITPLPSHethSA
QKGELLALICGLRAAKPWPSLNIFLDSKYLIKYLHslaigaflgtsah--
-------QT---LQAALPPLLQGKTIYLHHVRSHT------NLPDPISTF
NEYTDSLILApl--------------------------------------
----------
>MMLV
pdadhtwYTDGSSLLQEGQRKAGAAVTTETeviwaKALDAG---T---SA
QRAELIALTQALKMAEgkk-LNVYTDSRYAFATAHIHGEIYRRRGLLTSE
GKEIKNKDE---ILALLKALFLPKRLSIIHCPGHQ------KGHSAEARG
NRMADQAARKAAITETPDTStll---------------------------
----------
>HEPB
rpglcQVFADAT------PTGWGLVMGHQRMR---GTFSAPLPIHt----
--AELLAACFArsrsgan---IIGTDN-----------------------
-------------SVVLSR--------------KYTSFPWLLGCAANWI-
LRGTSFVYVPSALNPADDPSrgrlglsrpllrlpfrpttgrtslyadsps
vpshlpdrvh
>ECOL
mlkqvEIFTDGSCLGNPGPGGYGAILRYRGRE---KTFSAGytrT---TN
NRMELMAAIVALEALKEHCEVILSTDSQYVRQGITQWIHNWKKRGWKTAD
KKPVKNVDlwqrLDAALGQ--------------HQIKWEWVKGHAGHPE-
NERCDELARAAAMNPTledtgyqvev------------------------
----------
This is PHYLIP tree format:
((HTL2:0.111024,
(MMLV:0.078471,
ECOL:0.078471):0.032554):0.121218,
HEPB:0.232242);
Trees can be visualized using the treetool
program contained in the PHYLIP software package.
burkhard morgenstern (last modified: May 11, 1998)