If you use MView in your work, please cite:
Brown, N.P., Leroy C., Sander C. (1998). MView: A Web compatible database search or multiple alignment viewer. Bioinformatics. 14(4):380-381. HTML MEDLINE
The EBI operates a web service for BLAST2 and FASTA3, which uses MView, as do GeneQuiz and the GeneQuiz sequence submission services.
| BLAST (NCBI series 2.0) | |||
|---|---|---|---|
| format | tested versions | status | MView option |
| blastp | 2.0.4, 2.0.5 | ok | -in blast |
| blastn | 2.0.4, 2.0.5 | ok | -in blast |
| blastx | 2.0.5 | ok | -in blast |
| tblastn | 2.0.5 | ok | -in blast |
| tblastx | 2.0.5 | ok | -in blast |
| psi-blast | 2.0.2, 2.0.4, 2.0.5, 2.0.6 | ok | -in blast |
| BLAST (NCBI series 1.4) | |||
| format | tested versions | status | MView option |
| blastp | 1.4.7, 1.4.9 | ok | -in blast |
| blastn | 1.4.9 | ok | -in blast |
| blastx | 1.4.9 | ok | -in blast |
| tblastn | 1.4.9 | ok | -in blast |
| tblastx | 1.4.9 | ok | -in blast |
| BLAST (WashU series 2.0) | |||
| format | tested versions | status | MView option |
| blastp | 2.0a13, 2.0a19 | ok | -in blast |
| blastn | 2.0a19 | ok | -in blast |
| blastx | 2.0a19 | ok | -in blast |
| tblastn | 2.0a19 | ok | -in blast |
| tblastx | 2.0a19 | ok | -in blast |
| FASTA (series 3) | |||
| format | tested versions | status | MView option |
| fasta3 | 3.0t76 | ok | -in fasta |
| tfastx3 | 3.0t82, 3.1t07 | ok | -in fasta |
| FASTA (series 2) | |||
| format | tested versions | status | MView option |
| fasta | 2.0u | ok | -in fasta |
| tfastx | 2.0u63 | ok | -in fasta |
| FASTA (series 1) | |||
| format | tested versions | status | MView option |
| fasta | 1.6c24 | ok | -in fasta |
| multiple alignment formats | |||
| format | versions | status | MView option |
| plain | - | ok | -in plain |
| Pearson/FASTA | - | ok | -in pearson |
| PIR | - | ok | -in pir |
| MSF | - | ok | -in msf |
| CLUSTAL W | 1.60, 1.70 | ok | -in clustal |
| MaxHom/HSSP | 1.0 1991 | ok | -in hssp |
| MULTAS/MULTAL | - | experimental | -in multas |
The "plain" multiple alignment format is a trivial format comprising a column of identifiers and an adjacent column of aligned sequences. If you can convert some strange alignment to this you can always read it into MView. More formats can be expected to follow.
Note that as of version 1.37 MView automatically selects an appropriate parser for the particular BLAST or FASTA program/version once it knows it is dealing with input from either program suite.
-out pearson), PIR format
(-out pir), or MSF format (-out msf) for
processing by another program , or as an RDB table for storage/manipulation
in relational database form (-out rdb).
Historically, the default mode can also be explicitly selected
(-out new) in contrast to an obsolete mode (-out
old) which interdigitates extra rows into the alignment containing
just the sequence identities as in a sequence comparison.
mview -help
There are a lot of options, but the commonest ones are detailed here. The basic action of the program is to generate a plain text dump of the input data with percent sequence identities computed with respect to the first sequence in the output.
There are more command line options that I haven't documented below - some were added for locally used features. Expect changes and new options as the software evolves.
mview -in plain data > data.outOr you might attach MView on the end of a pipeline:
some_process | mview -in plain > data.out
To change the input format to scan a FASTA run, also in "data", use:
mview -in fasta data > data.out
mview -in fasta -html body data > data.html
produces a page of HTML wrapped inside <BODY>
</BODY> tags with a coloured background, and you can load
this into your Web browser with a URL like
"file://your_path/data.html".
If you want a complete Web page, you can use -html full
(gives MIME-type, <HTML>, <BODY>
tags) or -html head (gives <HTML>,
<BODY> tags).
To get just the alignment block without these tags use -html
data.
Adding some colour is simple. To colour all the residues:
mview -in fasta -html head -coloring any data > data.html
and this looks better in my Netscape if the residues are emboldened, so
mview -in fasta -html head -coloring any -bold data > data.html
Now try colouring by identity to the first sequence:
mview -in fasta -html head -coloring identity -bold data > data.html
and then make the non-identical residues and gaps grey, instead of black:
mview -in fasta -html head -coloring identity -bold -symcolor gray
-gapcolor gray data > data.html
Now try using an internal style sheet to get blocked
colouring. The -bold option is no longer needed:
mview -in fasta -html head -css on -coloring identity -symcolor gray
-gapcolor gray data > data.html
The -in option isn't always necessary. If the filename
extension, or the filename itself minus any directory path begins with or
contains the first few letters of the valid -in options (eg.,
mydata.msf or mydata.fasta or
tfastx_run1.dat), MView tries to choose a
sensible input format, allowing multiple files in mixed formats to be
supplied on the command line. The -in option will always
override this mechanism but requires that all input files be of the same
format.
-ruler on. Only one kind of
ruler is currently provided, numbering the columns of the final alignment
from M to N (incrementing) or N to M (decrementing) based on the input
sequence numbering, if any. This defaults to 1 to the length of the
alignment for multiple alignments. TBLASTX rulers differ slightly in that
the native query numbering is given in nucleotide units, but
MView reports amino acid units instead (using modulo 3
arithmetic).
-coloring any, will colour every residue according to the
currently selected palette.
-coloring identity, will colour only those residues that are
identical to some reference sequence (usually the
query or first row).
-coloring consensus, will colour only those residues that
belong to a specified physicochemical class that is conserved in at least
a specified percentage of all rows for a given column. This defaults to 70%
and and may be set to another threshold, eg., -coloring
consensus -threshold 80 would specify 80%. Note that the physicochemical
classes in question can be confined to individual residues.
-coloring group, is like -coloring consensus,
but colours residues by the colour of the class to which they belong.
By default, the consensus computation counts gap characters, so that
sections of the alignment may be uncolored where the presence of gaps
prevents the non-gap count from reaching the threshold. Setting
-con_gaps off prevents this, allowing sequence-only based
consensus thresholding.
The default palette assumes the input alignment is of protein sequences and sets their colours according to amino acid physicochemical properties: another palette should be selected for DNA or RNA alignments.
Consensus colouring is complicated and some understanding of palettes and consensus patterns is required first before trying to explain alignment consensus colouring.
P1 for proteins
or D1 for nucleotides. To change default molecule type use
-dna. Different palettes are explicitly selected using the
-colormap option. For example, to select one of the built-in
palettes for viewing nucleotide sequences, use -colormap D1.
There are default palettes for protein and nucleotide sequences. The
latter can be selected with the -dna option.
The built-in palettes can be listed from the command line with
-listcolors, and new colour schemes can be loaded from a file
using -colorfile in exactly the same format as produced by
-listcolors. Palette names are case-insensitive, while symbols
to be coloured are case-sensitive. Lines can contain comments beginning
with a hash '#' character. Colours are specified as hexadecimal RGB codes
prefixed with hash '#', exactly as used in HTML markup (named colours may
not be supported equally by all browsers). Here are the default palettes:
#symbol ->/=> colour (RGB hex or colorname) #comment #amino acids [P1] Gg => #33cc00 #bright green Aa => #33cc00 #bright green Ii => #33cc00 #bright green Vv => #33cc00 #bright green Ll => #33cc00 #bright green Mm => #33cc00 #bright green Ff => #009900 #dark green Yy => #009900 #dark green Ww => #009900 #dark green Hh => #009900 #dark green Cc => #ffff00 #yellow Pp => #33cc00 #bright green Kk => #cc0000 #bright red Rr => #cc0000 #bright red Dd => #0033ff #bright blue Ee => #0033ff #bright blue Qq => #6600cc #purple Nn => #6600cc #purple Ss => #0099ff #dull blue Tt => #0099ff #dull blue Bb => #666666 #dark grey (D or N) Zz => #666666 #dark grey (E or Q) Xx => #666666 #dark grey ? => #999999 #light grey * => #666666 #dark grey #DNA/RNA #symbol ->/=> colour (RGB hex or colorname) #comment [D1] Aa => #0033ff #bright blue Gg => #0033ff #bright blue Tt => #0099ff #dull blue Cc => #0099ff #dull blue Uu => #0099ff #dull blue Mm => #666666 #dark grey (A or C) Rr => #666666 #dark grey (A or G) Ww => #666666 #dark grey (A or T) Ss => #666666 #dark grey (C or G) Yy => #666666 #dark grey (C or T) Kk => #666666 #dark grey (G or T) Vv => #666666 #dark grey (A or C or G; not T) Hh => #666666 #dark grey (A or C or T; not G) Dd => #666666 #dark grey (A or G or T; not C) Bb => #666666 #dark grey (C or G or T; not A) Nn => #666666 #dark grey (A or C or G or T) Xx => #666666 #dark grey ? => #999999 #light grey * => #666666 #dark greyIn these examples, both lower and uppercase versions of each residue are given with their associated colour to ensure that either case is coloured the same.
The arrow separating the symbols from the colour codes can be double
=> or single ->. When style
sheets have been selected -css on, a double arrow means
that the colour should be applied to the background of the symbol while a
single arrow means that only the letter should be coloured. When Style
Sheets are off, only letters can be coloured anyway and the arrows are
equivalent.
-consensus on. By default, this adds 4 extra lines giving
consensus patterns computed at thresholds of 100,90,80,70%.
Consensus patterns are based on residue equivalence classes, that is,
sets of residues that share some physicochemical property. There are two
default consensus group definitions for protein P1 and
nucleotide D1 alignments, the latter being selected with the
-dna option.
At a given percentage threshold, the most discriminating equivalence class is chosen to represent the residues in a given column and an associated symbol is displayed. For example, the default protein and nucleotide consensus groups define the following symbols and equivalence class mappings:
#description => symbol members
[P1]
* => .
A => A { A }
C => C { C }
D => D { D }
E => E { E }
F => F { F }
G => G { G }
H => H { H }
I => I { I }
K => K { K }
L => L { L }
M => M { M }
N => N { N }
P => P { P }
Q => Q { Q }
R => R { R }
S => S { S }
T => T { T }
V => V { V }
W => W { W }
Y => Y { Y }
alcohol => o { S, T }
aliphatic => l { I, L, V }
aromatic => a { F, H, W, Y }
charged => c { D, E, H, K, R }
hydrophobic => h { A, C, F, G, H, I, K, L, M, R, T, V, W, Y }
negative => - { D, E }
polar => p { C, D, E, H, K, N, Q, R, S, T }
positive => + { H, K, R }
small => s { A, C, D, G, N, P, S, T, V }
tiny => u { A, G, S }
turnlike => t { A, C, D, E, G, H, K, N, Q, R, S, T }
#description => symbol members
[D1]
A
G
C
T
U
purine => r { A, G }
pyrimidine => y { C, T, U }
Alternative equivalence classes can be selected using
-con_groupmap, the available list of built-ins can be seen
with -listgroups, and new groups can be defined in the same
format and read in from a file using -groupfile.
Alternative thresholds to be displayed can be specified as a
comma-separated list using the -con_threshold option.
Tip: A useful capability is to control whether only consensus properties
(-con_ignore singleton) or just the conserved residues
themselves (-con_ignore class) are displayed in consensus
lines. The default is to show both using whichever equivalence class is
most specific.
By default, the consensus computation counts gap characters, so that
sections of the alignment may have gaps as the consensus. Setting
-con_gaps off prevents this, producing consensi based only on
sequence.
You can specify a colour scheme for the consensus lines using
-con_coloring and -con_colormap to change the
default palette (PC1 for protein or DC1 for
nucleotide). These options are analogous to those for controlling the
alignment colouring and follow the same naming scheme.
Colouring of an alignment by consensus determines which residues to colour and the colours to use based on (1) the consensus threshold chosen for the colouring operation (covered in the section on alignment colouring modes), (2) a consideration of the common physicochemical properties of the residues in that column, and (3) the chosen colour scheme:
Given the most specific equivalence class describing the column using the prevailing consensus equivalence classes, any residues in the column belonging to that class will be coloured using the prevailing palette.
In practice, for the default situation of a protein alignment and no
special selection of palettes or consensus groups from the command line,
then the P1 (D1) equivalence classes and the
P1 (D1) colour palette will be used (option
-dna).
Tip: If you want to see only the conserved residues above the threshold
(ie., only one type of conserved residue per column), add the option
-ignore class.
Alternative consensus classes and palettes can be specified using
-groupmap and -colormap. Note that these are
distinct from any settings used to control displayed consensus lines,
although the option naming is similar.
-reference
option. This takes either the sequence identifier or an integer argument
corresponding to the ranking or ordering of a sequence. For multiple
alignment input formats, sequences are numbered from 1, while for searches
the hits are numbered from 1, but the query itself is 0, so
beware.
-label2 to remove descriptions.
The default layout is a single unbroken horizontal band of alignment -
fine if scrolling inside Netscape. However, you may prefer to break the
alignment into vertically stacked panes. For panes, for example, 80 columns
wide, set -width 80. Widths refer to the alignment, not to the
descriptor information at left.
It is possible to narrow (or expand!) the displayed sequence range, for
example, -range 10:78 would select only that column range of
the alignment using the numbering scheme reported when -ruler
on is set (see Rulers). The order of the
numbers is unimportant making it simpler to state interest in a region of
the alignment that might actually be reversed in the output (eg., a BLASTN
search hit matching the reverse complement of the query strand).
-top 10.
You also can squeeze more out of a deep alignment and get a less biased
view if a threshold on the pairwise sequence identity is set using
-maxident N, where N is some value between 0 and 100.
Other filters specific to BLASTP, FASTA, etc., input formats allow
cutoffs on scores or p-values, etc. In particular, it is possible to apply
some control over the selection of HSPs used in
building the MView alignment using the -hsp
filtering option.
Of interest to anyone using PSI-BLAST, you can display alignments for any/all iterations of a psi-blast run using, say:
mview -in blast -cycle 1,5,10,20 mydatato get just those iterations. The default is to display only the last iteration. If you want all output, use
-cycle '*'.
If you want to apply filtering, yet wish to force some sequence or
sequences to remain, you can do this with the -keep option,
which requires a comma separated list of identifiers or numbers or number
ranges (see above for an explanation of sequence rank numbers) as its
argument.
Similarly, if having examined the output, you wish to discard selected
rows, use the -disc option with a comma separated list of row
numbers and number ranges (eg., -disc 6,7,30-35,42).
The -disc option overrides -keep whenever a
row id or number is common to both. The reference
row is always kept, so if you explicitly attempt to discard it, you must
select an alternative row to act as the reference sequence.
Another control option can be used to prevent MView
from using rows for colouring or for calculation of percent identities
although these rows will still be displayed. Use -nop to
specify a list (comma separated as usual) of id's or row numbers to flag
for 'no processing'. This is useful for displaying non-alignment data (eg.,
secondary structure predictions) alongside the alignment.
database|accession|identifier
database:identifier
as produced by some BLAST and FASTA servers. Such links will be to
the EBI and EMBL SRS services and will only be constructed if the database
names are listed in the SRS.pm library with this software. This library can
be modified for your site if you know some Perl and a little SRS syntax.
Release 1.40 added cascading style sheets allowing more specific control of HTML elements. In particular, this enables selective colouring of text fore/backgrounds allowing alignments to use coloured blocks instead of just coloured lettering.
This is enabled with the -css on option in combination with
the -html option to switch HTML processing on generally. It is
disabled with -css off. You can refer to an external style
file with -css URL where the URL give a valid path for the Web
server to find the file (ie., file:/some/path or
http://server/path).
Having loaded your own colour schemes into MView with
the -colorfile option, you can dump these as a style
file with -html css which just dumps the style sheet to
standard output for redirection to a file.
Controlling coloured fore/backgrounds for alignment lettering is handled in the colour scheme definition mechanism.
-width 60 or similar and turn off some of the leading text,
eg., -label2 -label3.
number of identical residues
------------------------------ x 100
length of ungapped reference
sequence over aligned region
Still, in the case of BLAST MView output, minor deviations
from the percentages reported by BLAST are due to 1) different rounding,
and 2) the way MView assembles a single pseudo-sequence for a hit composed of multiple HSPs,
giving an averaged percent identity.
-html option is set. Also, selecting the relational RDB output
format will switch off HTML.
In outline the default method of processing of HSPs is as follows:
For BLAST (series 1), as of MView version 1.37, only the HSPs contributing to the ranked hit contribute to this overlay process. A sorting scheme ensures that the best of these fragments are overlayed last and are not obscured by weaker ones, for example, BLAST hits are sorted by score and length. Differences of ordering of fragments along query and hit naturally result in a patchwork that may not correspond exactly to the real hit sequences. Nevertheless, the resulting alignment stack is very informative, and the user can always run and view a gapped search if that is preferred.
For BLAST (series 2), only the single gapped hit reported in the ranking is used and the patchwork problem does not arise.
More detailed descriptions of the rules for
HSP selection and tiling are available. Some control over the choice of
HSPs is available through the -hsp mode option described
therein which allows a) only ranked HSPs (the default) to be tiled; b) all
HSPs to be tiled, or c) all HSPs to be extracted separately.
As of MView release 1.40, the code requires a minimum of perl version 5.004 and has been tested with 5.004_03/04 and 5.005_02. However, if you only have perl 5.003, you can run older versions of MView, also available from the ftp site.
Formatting and colouring of HTML alignments requires a fixed-width font (eg., Courier) and support for the <FONT> tag, so a recent version of a browser such as Netscape is recommended. In particular, use of style sheets as of MView release 1.40 requires that your browser supports HTML 4.0.
Save the archive to your software area, eg., /usr/local.
gunzip < mview-1.24.tar.gz | tar xvof -This would create the subdirectory
mview-1.24.
bin/mview into an
editor.
#!' magic number.
use lib 'some stuff';" line to, in our example,
use lib '/usr/local/mview-1.24/lib';
mview to somewhere on your PATH
and rehash or login again.
| Nigel P. Brown | ||
| National Institute for Medical Research, | Tel: | +44 (0)181 959 3666 |
| The Ridgeway, Mill Hill, | Fax: | +44 (0)181 913 8545 |
| London NW7 1AA, U.K. | Email: | nbrown@nimr.mrc.ac.uk |
People who have contributed include C. Leroy (prototype FASTA parser, BLAST2 (WashU) modifications to BLAST 1.4 parser, prototype PSI-BLAST parser). Useful suggestions came from R. Lopez, and my former colleagues in the defunct Sander group.
This project is unrelated to the Bioperl project, but probably should be...