DragonFly On-Line Manual Pages
MIGRATE(1) DragonFly General Commands Manual MIGRATE(1)
NAME
MIGRATE - estimate population parameters: migration rate and population
size
SYNOPSIS
migrate-n
DESCRIPTION
Migrate estimates population parameters (effective population size and
migration rates) using genetic data (Electrophoretic markers,
microsatellite markers, sequence data, and single nucleotide
polymorphism data). It is a maximum likelihood estimator or Bayesian
estimator and uses a coalescent theory approach taking into account
history of mutations and uncertainty of the genealogy.
or get a copy of the manual in PDF format from
http://popgen.scs.fsu.edu
OPTIONS
there are no options on the commandline, but you can specify the
options in a parmfile or in the menu
PARMFILE OPTIONS
The parmfile options are split into Datatype, Input/Output, Start
parameters, Search strategy
DATATYPE
datatype=<Allele | Microsatellites | Brownian | Sequences |
Nucleotide-polymorphisms | Panel-SNP | Genealogies >
specifies the datatype used for the analyses, needless to say
that if you have the wrong data for the chosen type the program
will crash. Allele: infinite allele model, suitable for
electrophoretic markers, perhaps the "best" guess for codominant
markers of which we do not know the mutation model.
Microsatellite: a simple electrophoretic ladder model is used
for the change along the branches in genealogy. Brownian: a
Brownian motion approximation to the stepwise mutation model for
microsatellites us used (this is MUCH faster than exact model,
but is not a good approximation if population sizes are small
(say below 10). Sequences: Data are DNA or RNA sequences and
the mutation model used is F84, first used by Felsenstein 1984
(actually the same as in dnaml (Phylip version 3.5), a
description of this model can be found in Swofford et al. 1996.
Nucleotide-polymorphism: [SNP] the data likelihood is corrected
for sampling only variable sites. We assume that the data was
used to find the SNP. Panel-SNP: the data likelihood is
corrected for using a panel of SNP sites, that were polymorphic.
The panel has to be population 1. Genealogies: Reads the
sumfile of a previous run, with this options the genealogy
sampling step will not be done and the genealogies provided in
the sumfile are analyzed. This datatype makes it easy to rerun
the program for different likelihood ratio test or different
settings for the profile likelihood printouts.
Sequence data specific options
freq-from-data=< Yes | No:freqA freqG freqC freqT>
ttratio=< r1 r2 .....>
interleaved=<Yes | No >
categories=<Yes | No>
If you specify Yes you need a file named catfile
in the same directory with the following Syntax:
number_of_categories cat1 cat2 cat3 ..
categorylabel_for_each_site for each locus, a # in the first
column can be used to start a comment-line. Example is for a
data set with 2 loci and 20 base pairs each
# Example catfile for two loci
# in migrate you can use # as comments
2 1 10 11111111112222222222
5 0.1 2 5 23 3 11111122223333445555
rates=< n : r1 r2 r3 ..rn>
prob-rates=< n : p1 p2 p3 ... pn>
autocorrelation=<Yes:value | No>
weights=<Yes | No>
If you specify Yes you need a file weightfile with weights for
each site, the weights can be the following numbers 0-9 and
letters A-Z, so you have 35 possible weights available.
# Example weightfile for two loci
11111111112222222222
1111112222AAAA445XXXX5
distfile=<Yes | No>
You can supply a distance file for each locus (using PHYLIP
syntax). The sequence of indiviudals must be same as in the
infile. This option appears in the menu when you choose
0 Start genealogy is estimated using a UPGMA topology
The distance file is then used to create an UPGMA tree with a
minimal number of migration events. For large trees this is
options help to get better starting trees than the automatic
tree
generation which uses a rather unsophisticated distance
method (differences).
usertree=<Yes | No>
If you specify Yes you need a file intree. In this file you have
starting trees for each locus. BUT these trees need to have
migration events in them!
Microsatellite data
micro-threshold=value
specifies the window in which probabilities of change are
calculated if we have allele 34 then only probabilities of a
change from 34 to 35-44 and 24-34 are considered, the higher
this value is the longer you wait for your
result, choosing it too small will produce wrong results.
Default is micro-threshold=10
Electrophoretic data
No special variables.
Nucleotide polymorphism
Similar to sequence data.
INPUT/OUTPUT
infile=filename
Default is infile
random-seed=<Auto | Noauto | Own:seedvalue>
The random number seed guarantees that you can reproduce a run
exactly. Good random number seeds are (values * 4) * 1. If you
do not specify the random number seed ( seed=Auto ) the program
will use the system clock. With seed=Noauto the program expects
to find a file named seedfile with the random number seed. With
random-seed=Own:seedvalue you can specify the seed value in the
parmfile (or in the menu).
title=titletext
progress=<Yes|No|Verbose>
The default is progress=Yes
outfile=filename
The default is obviously outfile=outfile
print-data=<Yes|No>
Print the data in the outfile. Default is print-data=No.
print-fst=<Yes|No>
Print a table of an FST estimate for comparison (Beerli and
Felsenstein 1999, Beerli 1998) [not recommended].
plot=<No |
Yes>[:<Outfile|Both>[:<std|log>:{mig-axis-start,mig-axis-end,theta-axis-start,theta-axis-end}<:printpos<M
| Nm>>]]
If plot=No then no plot of the parameter space is shown in the
outfile, if Yes then you can specify whether you want to have
the accurate numbers in a separate file ( mathfile ) using
printpos
"pixel" in each direction,or only the ASCII-graphics plot in
the outfile. The last option ( M or N )let you define whether
you want the plot in M=m/mu or (default) 4Nm units. Default is
plot=Yes:Outfile. Example of a more complicated statement:
plot=Yes:Both:std:0,10,0,0.025:100N For syntax in mathfile see
documentation
profile=<No|Yes<:<Fast|Percentile|Spline|Discrete|Quick >><:M | Nm >
Print profile likelihood. See section Likelihood ratio
tests and profile likelihood. Default
is profile=Yes:Fast:N.
l-ratio=<None | <Mean|Loci>:testparam> (N-POP)
Likelihood ratio tests. See section Likelihood ratio tests
and profile likelihood. Default is l-ratio=None.
print-trees=<All | None | Last | Best>
Default is print-trees=None
mathfile=filename
sumfile=<No | Yes | Yes:filename >
Intermediate results of the genealogy sampling process are save
into a file named sumfile or into the file for that you specify
the filename. You can use this sumfile to rerun the program for
further analysis, e.g. calculating likelihood ratios or
profile likelihoods, see datatype=Genealogy.
START VALUES FOR THE PARAMETERS
theta=<Fst | Own:{value1,value2 ,...}>
With Fst the programs tries to use an FST based measure
(Maynard Smith 1970, Nei and Feldman 1972) Own: { value1,
value2, ... }
defines arbitrary start values.
migration=<Fst|Own:Migration matrix > (N-POP)
The migration matrix is a n by n table with - on the diagonal
and can look like this for four populations migration=OWN:{ -
1.0 1.1 1.2 0.9 - 0.8 0.7 2.1 2.2 - 2.3 1.4 1.5 1.6 - } or like
this
migration=OWN:{ - 1.0 1.1 1.2
0.9 - 0.8 0.7
2.1 2.2 - 2.3
1.4 1.5 1.6 - }
mutation=<Gamma | NoGamma>
The default is mutation=Nogamma
fst-type=<Theta | Migration >
custom-migration=< NONE|migration - matrix >
The migration matrix contains the migration rates from j to i on
row i, and the are on the diagonal. The migration matrix can
consist of connections that are *: no restriction
0: not estimated
m: mean value of either 4Nm or M.
s: symmetric migration [only for M]
c: constant value (together with migration=OWN.. or theta=OWN..)
The values can be spaced by blanks, newlines. A few examples
for 4 populations:
Full model: custom-migration={**** **** **** ****}
N-island model: custom-migration={m m m m mm mm m mmm mmmm}
Stepping Stone model: with symmetric migrations, and
unrestricted estimates: custom-migration={*s00 s*s0 0s*s 00s*}
Source-Sink: (the first population is the source):
custom-migration={*000**000**0*000}
SEARCH STRATEGY
Please read the documentation ,these settings are important and will
influence the accuracy of your results.
short-chains=value
Default is 10.
short-inc=value
Default is 20.
short-sample=value
Default is 500.
long-chains=value
Default is 2.
long-inc=value
Default is 20.
long-sample=value
Default is 5000.
burn-in=value
Default is 10000.
replicate=<NO | YES<:LONGCHAINS | number>>
heating=<NO | YES<:{1,1.1,1.2,1.3}>>
Obscure options
see documentation
BUGS
This man page is not up to date and misses the Bayesian inference
section, but see documentation.
MAIN DISTRIBUTION WEBSITE
http://popgen.csit.fsu.edu
SEE ALSO
coalesce, fluctuate, recombine, lamarc (the program) available from
http://evolution.gs.washington.edu/lamarc.html
AUTHOR
Peter Beerli <beerli@csit.fsu.edu>
[if you use this man page, please let me know]
4.2 Berkeley Distribution July 20 2006 MIGRATE(1)