Tutorial#
YAML#
A Demes model is written as a YAML file.
A YAML file contains property: value
pairs, and Demes imposes restrictions
by allowing only properties with specific names, which have specific types
of values.
YAML files have the extension .yaml
, or sometimes .yml
.
Don’t worry if you’ve never heard of YAML before, as the details of YAML
aren’t particularly important.
The example below gives an indication of what a Demes file looks like,
but we’ll explain each component gradually using additional examples.
For now, select the “Drawing” tab to see a diagrammatic overview
of the demographic model.
Example 01
# Comments start with a hash.
description:
Asymmetric migration between two extant demes.
time_units: generations
defaults:
epoch:
start_size: 5000
demes:
- name: X
epochs:
- end_time: 1000
- name: A
ancestors: [X]
- name: B
ancestors: [X]
epochs:
- start_size: 2000
end_time: 500
- start_size: 400
end_size: 10000
migrations:
- source: A
dest: B
rate: 1e-4
Note
The Demes specification defines a data model, not the YAML format itself. A demographic model that uses the Demes data model can be easily converted to any desired data-serialisation language.
Terminology#
We use the word “Demes” with a capital D to refer to the Demes data model or the Demes specification, and “deme” or “demes” with a lowercase d when we refer to a collection of individuals (a deme) or a property in the Demes data model.
What is a deme?#
A deme is a collection of individuals that share a set of population parameters at any given time. Consider the simplest population model: a single deme with constant population size.
Example 02
time_units: generations
demes:
- name: A
start_time: .inf
epochs:
- start_size: 1000
end_time: 0
This deme exists now (zero generations ago), and also exists as far back
in time as we’re interested in looking (towards infinity generations ago).
In Demes, we say that the deme has an infinite start_time
.
Select the “YAML” tab above the drawing to see how this model was implemented.
Note
In YAML, infinity is spelled .inf
, not inf
.
Note
Each Demes model must define at least one deme with an infinite start_time
.
Default values#
Having a deme’s start_time
property be infinite is very common.
So for demes without ancestors, the start_time
may be omitted,
and will default to .inf
. Similarly, the end_time
property of
the final epoch may be omitted, and will default to 0
.
The following Demes file describes an equivalent model.
Example 03
time_units: generations
demes:
- name: A
epochs:
- start_size: 1000
What is an epoch?#
We partition a deme’s interval of existence into distinct epochs
.
A deme always has at least one epoch. The population parameters for a
deme are allowed to change over time, but are fixed within an epoch.
Consider a single deme that suffered a bottleneck 50 generations ago.
Example 04
time_units: generations
demes:
- name: B
start_time: .inf
epochs:
- start_size: 1000
end_time: 50
- start_size: 200
end_time: 0
To specify the change in the population size, we’ve introduced a second epoch
into the YAML file.
We now need three time values to define the epoch boundaries:
the start_time
of the deme, the end_time
of epoch 0,
and the end_time
of epoch 1. All epochs must be listed in time-descending
order (from the past towards the present).
The same model can be written using default values for the deme’s
start_time
and the final epoch’s end_time
.
Example 05
time_units: generations
demes:
- name: B
epochs:
- start_size: 1000
end_time: 50
- start_size: 200
Why don’t epochs have a start_time
too?#
The start time of an epoch can be inferred indirectly, by looking at the
deme’s start_time
(for epoch 0), or by looking at the previous epoch’s
end_time
(for epoch 1, 2, etc.).
Exponential size changes#
In the previous model, the population size was constant in each epoch.
But what if we wanted to model a period of exponential population size growth,
or an exponential decay? In any given epoch, there are actually two
population size parameters, start_size
and end_size
, which correspond
to the size at the start and end of the epoch. If both parameters have the
same value, then the size is constant over the epoch.
However, If an epoch’s start_size
and end_size
properties are not equal,
then the epoch is defined to have an exponentially-changing population size
over the epoch’s time interval. Two important implications of this system are:
We don’t need to specify a rate parameter for the exponential. If this really is needed, it can be calculated from the
start_size
,end_size
,start_time
andend_time
values for an epoch.Infinitely long epochs must have a constant population size. So if a deme has an infinite
start_time
, thenstart_size
andend_size
must be equal for epoch 0.
To make these ideas more concrete, let’s look at an implementation of the well-known ZigZag model from Schiffels & Durbin (2014).
Example 06
description: A single population model with epochs of exponential growth and decay.
doi:
- https://doi.org/10.1038/ng.3015
time_units: generations
demes:
- name: generic
epochs:
- {end_time: 34133.31, start_size: 7156}
- {end_time: 8533.33, end_size: 71560}
- {end_time: 2133.33, end_size: 7156}
- {end_time: 533.33, end_size: 71560}
- {end_time: 133.33, end_size: 7156}
- {end_time: 33.333, end_size: 71560}
- {end_time: 0, end_size: 71560}
We’ve introduced several new features to implement this model, so let’s step through it from top to bottom.
The
doi
property is a list of strings corresponding to the DOI(s) for publication(s) in which the model was described. By convention, the elements of thedoi
list are URLs, but any string value can be used.A compact-form syntax is used for each epoch in the
epochs
list (with curly braces{
and}
), rather than using multiple lines. This is known as “flow style” in YAML parlance.Epoch 0 has a
start_size
, but noend_size
. Because this epoch has an infinite time span, the population size must be constant, so the epoch’send_size
will be the same as thestart_size
.Epoch 1 has an
end_size
, but nostart_size
. So thestart_size
is inherited from theend_size
of the previous epoch. This means theend_size
andstart_size
for this epoch are different, and there will be exponential population growth over the epoch.Epochs 2 through 5 also inherit their
start_size
from the previous epoch, and in each case these are different from theend_size
provided.Epochs 2 and 4 have exponential decay, whereas epochs 3 and 5 have exponential growth.
The final epoch has a constant size.
Warning
Other modelling frameworks may use the terms “epoch” or “epochs” to refer to time intervals that partition an entire model. However, in Demes, epochs are a deme-specific property, and each deme has its own list of epochs which do not apply to the other demes in the model.
Multiple demes#
A split event#
Suppose we’re interested in modelling two demes, A
and B
.
The two demes are related by a common ancestor, from which they split
1000 generations ago. In addition to A
and B
, we’ll model their
common ancestor as an additional deme, X
.
Here, we introduce the ancestors
property of a deme,
which is a list of deme names.
Example 07
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors:
- X
epochs:
- start_size: 2000
- name: B
ancestors:
- X
epochs:
- start_size: 2000
When a deme has an ancestor, its start_time
does not default to .inf
.
In this case, the start_time
for the deme is inherited from the
end_time
of the ancestor.
I.e. for the model above, both A
and B
have a start_time
1000 generations ago.
Note
A deme cannot appear in the demes list before its ancestor(s). This means models must be written in a “top down” manner, starting with the ancestral (root) deme(s), and followed by increasingly recent demes.
By convention, we use a more compact form for the ancestors list.
Lists can be written more compactly with YAML “flow style”, which uses
square brackets ([
and ]
, with a comma separating list items).
The following Demes file describes an equivalent model.
Example 08
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors: [X]
epochs:
- start_size: 2000
- name: B
ancestors: [X]
epochs:
- start_size: 2000
A branch event#
An alternative way of modelling a population split is for the
ancestral deme to remain alive after the split. We will refer to this
as a branch event, rather than a split event.
In the model below, deme A
has X
as an ancestor like the previous model,
except here X
continues to exist until 0 generations ago
(recall that 0 is the default value for the final epoch’s end_time
).
Now that A
’s ancestor exists until 0 generations ago we must
explicitly provide a start_time
for A
.
Example 09
time_units: generations
demes:
- name: X
epochs:
- start_size: 2000
- name: A
ancestors: [X]
start_time: 1000
epochs:
- start_size: 2000
Multiple ancestors#
When a deme has multiple ancestors, these appear in the ancestors
list
as one might expect. But for multiple ancestors we need to also specify the
proportion of ancestry inherited from each ancestor. This is done using
the deme’s proportions
list property. The first proportion in the
proportions
list is for the first ancestor in the ancestors
list,
the second proportion is for the second ancestor, and so on.
Just like the case of a single ancestor, an ancestor can terminate at
the descendant’s start_time
, or can instead continue to exist.
Example 10
time_units: generations
demes:
- name: X
epochs:
- start_size: 2000
- name: Y
epochs:
- start_size: 2000
end_time: 1000
- name: A
ancestors: [X, Y]
proportions: [0.1, 0.9]
start_time: 1000
epochs:
- start_size: 2000
Note
With multiple ancestors, the start_time
of the descendant deme does not
default to the end_time
of any of its ancestors. So the start_time
must always be specified for a deme with multiple ancestors.
Continuous migration#
Let’s again consider a model with demes A
and B
, which are related
via a common ancestor X
(a split event).
Asymmetric migration#
To define continuous migration from A
to B
, we’ll add an entry to the
migrations
list.
Concretely, we are modelling migrants born in deme A
, the source
deme,
who (potentially) have offspring in deme B
, the dest
deme.
Example 11
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors: [X]
epochs:
- start_size: 2000
- name: B
ancestors: [X]
epochs:
- start_size: 2000
migrations:
- source: A
dest: B
rate: 1e-4
The migrations here occur at a rate
of 1e-4
per generation,
and occur continuously over the lifetime of both A
and B
.
Warning
Migration rate units are always “per generation” regardless of
the chosen value for time_units
(described later).
In the example above, A
and B
have identical existence time intervals.
If the source
and dest
demes do not have identical start
or end times, then by default the migration will occur over
the period of time when both demes exist simultaneously.
Example 12
time_units: generations
demes:
- name: X
epochs:
- start_size: 2000
- name: A
ancestors: [X]
start_time: 1000
epochs:
- start_size: 2000
migrations:
- source: A
dest: X
rate: 1e-4
To obtain greater control over when migrations occur,
we can use the start_time
and end_time
migration properties.
Below we model three periods of continuous migration.
Example 13
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors: [X]
epochs:
- start_size: 2000
- name: B
ancestors: [X]
epochs:
- start_size: 2000
migrations:
- {source: A, dest: B, rate: 1e-4, start_time: 1000, end_time: 800}
- {source: B, dest: A, rate: 1e-3, start_time: 500, end_time: 200}
- {source: A, dest: B, rate: 1e-5, start_time: 200, end_time: 0}
Question
What do you think will happen if we list two or more migrations with
time intervals that overlap? Does it matter if the source
and dest
demes are the same in each case?
Answer
Overlapping migrations are allowed in general.
But because it’s not clear what the expected behaviour should be when
overlapping migrations are defined with the same source
and dest
,
it’s an error to define such migrations.
Symmetric migration#
It’s common to model migrants in both directions simultaneously,
with the same rate
. We could use multiple asymmetric migrations,
but it’s simpler to specify a list of deme names (the migration’s
demes
property), instead of a source
and dest
.
Example 14
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors: [X]
epochs:
- start_size: 2000
- name: B
ancestors: [X]
epochs:
- start_size: 2000
migrations:
- demes: [A, B]
rate: 1e-4
The demes
property can list arbitrarily many demes. For example,
we could define symmetric migration between all pairwise combinations
of four demes.
Example 15
time_units: generations
demes:
- name: alpha
epochs:
- start_size: 1000
- name: beta
epochs:
- start_size: 1000
- name: gamma
epochs:
- start_size: 1000
- name: delta
epochs:
- start_size: 1000
migrations:
- demes: [alpha, beta, gamma, delta]
rate: 1e-4
When the following three conditions are satisfied:
a symmetric migration is defined between more than two demes,
and the demes’ start and/or end times are not all identical,
and the migration’s
start_time
and/orend_time
are omitted,
then the time intervals for the migration are resolved separately for
each pair of participating demes. E.g. for the model below, migration
between A
and B
occurs at all times because both A
and B
exist for all time. However, migrations between A
and C
, and between
B
and C
, are limited to the period of time after 100 generations ago,
because C
does not exist before then.
Example 16
time_units: generations
demes:
- name: A
epochs:
- start_size: 1000
- name: B
epochs:
- start_size: 1000
- name: C
start_time: 100
ancestors: [B]
epochs:
- start_size: 1000
migrations:
- demes: [A, B, C]
rate: 1e-5
A pulse of admixture#
To model migration that is limited to a very short period of time,
we can define one or more pulses
.
A pulse has a proportions
list property and a sources
list property
(analogous to the proportions
and ancestors
properties of a deme
).
Each pulse proportion defines the proportion of the dest
deme that is
made up of ancestry from the corresponding source deme at the instant after
the pulse’s time
.
Note
The exact duration of a pulse is not defined by the Demes specification. Software which implements a continuous-time model (such as a coalescent simulator) might treat a pulse as occurring instantaneously. In contrast, software which implements a discrete-time model is free to treat the pulse as occurring over a single time step (such as a single generation).
Example 17
time_units: generations
demes:
- name: X
epochs:
- end_time: 1000
start_size: 2000
- name: A
ancestors: [X]
epochs:
- start_size: 2000
- name: B
ancestors: [X]
epochs:
- start_size: 2000
pulses:
- sources: [A]
dest: B
proportions: [0.05]
time: 500
Excercise
The dest
deme in an admixture pulse could instead be modelled using
multiple ancestors.
Try doing this for deme B
in the model above. What do you think are
the advantages of one approach over the other?
Question
How should one interpret multiple pulses that occur at the same time
?
Does it matter whether the sources
and dest
demes are the same in
each pulse?
Answer
When multiple pulses are specified with the same time
,
the migration pulses occur in the order in which they are written.
Consider the following two pulses into deme A
at time 100.
pulses:
- sources: [B]
dest: A
time: 100
proportions: [0.1]
- sources: [C]
dest: A
time: 100
proportions: [0.2]
The second pulse replaces 20% of A
’s ancestry, including 20% of the ancestry
that was inherited from B
in the first pulse.
So immediately after time 100, A
has 20% ancestry from C
but only 8%
ancestry from B
.
As this may be confusing, we recommend avoiding the use of multiple pulses
in this way, and instead implement the model using multiple sources
with
the desired final ancestry proportions
.
pulses:
- sources: [B, C]
dest: A
time: 100
proportions: [0.08, 0.2]
More complex models involving multiple simultaneous pulses are possible, but we caution that they can be difficult to reason about.
Setting defaults#
To avoid duplication in a Demes graph with many features, it’s possible to set default values for some properties. Suppose we wish to define multiple demes, each with only one epoch, and each with a constant population size.
Example 18
time_units: generations
defaults:
# Note: this is spelled "epoch", as distinct from the "epochs" list.
epoch:
start_size: 1000
demes:
- name: alpha
- name: beta
- name: gamma
- name: delta
migrations:
- demes: [alpha, beta, gamma, delta]
rate: 1e-4
The epoch defaults can be overridden by providing an explicit value inside the desired epoch.
Example 19
time_units: generations
defaults:
epoch:
start_size: 1000
demes:
- name: alpha
- name: beta
- name: gamma
- name: delta
epochs:
- end_time: 500
- start_size: 200
end_time: 0
migrations:
- demes: [alpha, beta, gamma, delta]
rate: 1e-4
It’s also possible to provide defaults for properties of a deme,
such as the start_time
and ancestors
.
Example 20
time_units: generations
defaults:
deme: {start_time: 1000, ancestors: [X]}
epoch: {start_size: 1000}
demes:
- name: X
start_time: .inf
ancestors: []
epochs:
- end_time: 1000
- name: alpha
- name: beta
- name: gamma
- name: delta
epochs:
- end_time: 500
- start_size: 200
end_time: 0
migrations:
- demes: [alpha, beta, gamma, delta]
rate: 1e-4
Question
How would the model be interpreted if we failed to override the start_time
and ancestors
deme defaults for deme X
?
Answer
If we didn’t override the default start_time
, then X
would have
both a start_time
and end_time
of 1000. This would be invalid,
because (a) there would be no deme with an infinite start_time
,
and (b) the time span over which X
existed would be zero.
If we didn’t override the default ancestors
list, then X
would
be in its own ancestors
list. This would be invalid, because each ancestor
in the ancestors
list must already be defined (earlier in the demes
list).
This requirement has the pleasant side-effect that the directed graph of
ancestor/descendant relations cannot have cycles.
Alas, it is not possible to model time travel using Demes.
Defaults for migration
and pulse
objects may also be specified for
elements of the migrations
and pulses
lists.
Time units and generation time#
In the previous examples, we’ve exclusively set time_units
to “generations”.
This is appropriate in many cases, but sometimes other units are preferred.
For example, it’s sometimes more natural to describe times using years,
or even thousands of years. However, most simulation software operates
using generations as the canonical unit of time. In Demes, the time_units
property may be any string, but “generations” is special. If the time_units
are not “generations”, then an additional generation_time
property must
be specified. This latter property can then be used by the simulator to
convert from the chosen time_units
into units of generations.
Warning
The units for the rate
of migrations
are always per generation,
even when the time_units
are not generations.
Only the various start_time
, end_time
, and time
properties should
match the time_units
.
Example 21
description: The Gutenkunst et al. (2009) OOA model.
doi:
- https://doi.org/10.1371/journal.pgen.1000695
time_units: years
generation_time: 25
demes:
- name: ancestral
description: Equilibrium/root population
epochs:
- {end_time: 220e3, start_size: 7300}
- name: AMH
description: Anatomically modern humans
ancestors: [ancestral]
epochs:
- {end_time: 140e3, start_size: 12300}
- name: OOA
description: Bottleneck out-of-Africa population
ancestors: [AMH]
epochs:
- {end_time: 21.2e3, start_size: 2100}
- name: YRI
description: Yoruba in Ibadan, Nigeria
ancestors: [AMH]
epochs:
- start_size: 12300
- name: CEU
description: Utah Residents (CEPH) with Northern and Western European Ancestry
ancestors: [OOA]
epochs:
- {start_size: 1000, end_size: 29725}
- name: CHB
description: Han Chinese in Beijing, China
ancestors: [OOA]
epochs:
- {start_size: 510, end_size: 54090}
migrations:
- {demes: [YRI, OOA], rate: 25e-5}
- {demes: [YRI, CEU], rate: 3e-5}
- {demes: [YRI, CHB], rate: 1.9e-5}
- {demes: [CEU, CHB], rate: 9.6e-5}
Selfing and cloning#
Epochs also have selfing_rate
and cloning_rate
properties,
which default to 0 if not specified.
Todo
Give examples.