DragonFly On-Line Manual Pages
ALIGN(1) DragonFly General Commands Manual ALIGN(1)
NAME
align - compute the global alignment of two protein or DNA sequences
align0 - compute the global alignment of two protein or DNA sequences
without penalizing for end-gaps
SYNOPSIS
align [ -f # -g # -O filename -m # -s SMATRIX -w # ] sequence-file-1
sequence-file-2
DESCRIPTION
align produces an optimal global alignment between two protein or DNA
sequences. align will automatically decide whether the query sequence
is DNA or protein by reading the query sequence as protein and
determining whether the `amino-acid composition' is more than 85%
A+C+G+T. align uses a modification of the algorithm described by E.
Myers and W. Miller in "Optimal Alignments in Linear Space" CABIOS
(1988) 4:11-17. The program can be invoked either with command line
arguments or in interactive mode.
align weights end gaps, so that an alignment of the form
-----MACF
SRTKIMACF
will have a higher score than:
MACF
MACF
align0 uses the same algorithm, but does not weight end gaps.
Sometimes this can have surprising effects.
align and align0 use the standard fasta format sequence file. Lines
beginning with '>' or ';' are considered comments and ignored;
sequences can be upper or lower case, blanks,tabs and unrecognizable
characters are ignored. align expects sequences to use the single
letter amino acid codes, see protcodes(1) .
OPTIONS
align can be directed to change the scoring matrix and output format by
entering options on the command line (preceeded by a `-' or `/' for MS-
DOS). All of the options should preceed the file name arguments.
Alternately, these options can be changed by setting environment
variables. The options and environment variables are:
-f # Penalty for the first residue in a gap (-12 by default).
-g # Penalty for additional residues in a gap (-2 by default).
-O filename
Sends copy of results to "filename".
-m # (MARKX) =1,2,3. Alternate display of matches and mismatches in
alignments. MARKX=1 uses ":","."," ", for identities,
consevative replacements, and non-conservative replacements,
respectively. MARKX=2 uses " ","x", and "X". MARKX=3 does not
show the second sequence, but uses the second alignment line to
display matches with a "." for identity, or with the mismatched
residue for mismatches. MARKX=3 is useful for aligning large
numbers of similar sequences.
-s str (SMATRIX) the filename of an alternative scoring matrix file or
"250" to use the PAM250 matrix.
-w # (LINLEN) output line length for sequence alignments. (normally
60, can be set up to 200).
EXAMPLES
(1) align musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with the amino
acid sequence in the file lcbo.aa Each sequence should be in the form:
>LCBO bovine preprolactin
WILLLSQ ...
(2) align -w 80 musplfm.aa lcbo.aa > musplfm.aln
Compare the amino acid sequence in the file musplfm.aa with the
sequences in the file lcbo.aa Show both sequences with 80 residues on
each output line and write the output to the file musplfm.aln.
(3) align
Run the align program in interactive mode. The program will prompt for
the file name for the first sequence and the second sequence.
SEE ALSO
rdf2(1),protcodes(5), dnacodes(5)
AUTHOR
Bill Pearson
wrp@virginia.EDU
local ALIGN(1)