Tutorial

Tutorial#

YAML#

A Demes model is written as a YAML file. A YAML file contains property: value pairs, and Demes imposes restrictions by allowing only properties with specific names, which have specific types of values. YAML files have the extension .yaml, or sometimes .yml. Don’t worry if you’ve never heard of YAML before, as the details of YAML aren’t particularly important. The example below gives an indication of what a Demes file looks like, but we’ll explain each component gradually using additional examples. For now, select the “Drawing” tab to see a diagrammatic overview of the demographic model.

Example 01

YAML

# Comments start with a hash.
description:
  Asymmetric migration between two extant demes.
time_units: generations
defaults:
  epoch:
    start_size: 5000
demes:
  - name: X
    epochs:
      - end_time: 1000
  - name: A
    ancestors: [X]
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
        end_time: 500
      - start_size: 400
        end_size: 10000
migrations:
  - source: A
    dest: B
    rate: 1e-4

Drawing

_images/e96a38c21b1548d3492ce74aad052f0c439db2c183e96c5179e3b445dcb8c613.svg

Note

The Demes specification defines a data model, not the YAML format itself. A demographic model that uses the Demes data model can be easily converted to any desired data-serialisation language.

Terminology#

We use the word “Demes” with a capital D to refer to the Demes data model or the Demes specification, and “deme” or “demes” with a lowercase d when we refer to a collection of individuals (a deme) or a property in the Demes data model.

What is a deme?#

A deme is a collection of individuals that share a set of population parameters at any given time. Consider the simplest population model: a single deme with constant population size.

Example 02

YAML

time_units: generations
demes:
  - name: A
    start_time: .inf
    epochs:
      - start_size: 1000
        end_time: 0

Drawing

_images/643e41e8a45e0efca4e6309be560adc056490211bf0a7c4049ded089a28692eb.svg

This deme exists now (zero generations ago), and also exists as far back in time as we’re interested in looking (towards infinity generations ago). In Demes, we say that the deme has an infinite start_time. Select the “YAML” tab above the drawing to see how this model was implemented.

Note

In YAML, infinity is spelled .inf, not inf.

Note

Each Demes model must define at least one deme with an infinite start_time.

Default values#

Having a deme’s start_time property be infinite is very common. So for demes without ancestors, the start_time may be omitted, and will default to .inf. Similarly, the end_time property of the final epoch may be omitted, and will default to 0. The following Demes file describes an equivalent model.

Example 03

YAML

time_units: generations
demes:
  - name: A
    epochs:
      - start_size: 1000

Drawing

_images/817c2217c3e0c5e28bfc267f69399576fd33a1a8e18a2aaad3827daa18fedc79.svg

What is an epoch?#

We partition a deme’s interval of existence into distinct epochs. A deme always has at least one epoch. The population parameters for a deme are allowed to change over time, but are fixed within an epoch. Consider a single deme that suffered a bottleneck 50 generations ago.

Example 04

YAML

time_units: generations
demes:
  - name: B
    start_time: .inf
    epochs:
      - start_size: 1000
        end_time: 50
      - start_size: 200
        end_time: 0

Drawing

_images/e1e528d3511323709468f039f125c5fe9e42c1678dc3fb687ce1edde23a3d2f5.svg

To specify the change in the population size, we’ve introduced a second epoch into the YAML file. We now need three time values to define the epoch boundaries: the start_time of the deme, the end_time of epoch 0, and the end_time of epoch 1. All epochs must be listed in time-descending order (from the past towards the present).

The same model can be written using default values for the deme’s start_time and the final epoch’s end_time.

Example 05

YAML

time_units: generations
demes:
  - name: B
    epochs:
      - start_size: 1000
        end_time: 50
      - start_size: 200

Drawing

_images/722e8b20a3b91445648e4c3b45215af2394c736d92a9f99bf7669867990faaf5.svg

Why don’t epochs have a `start_time` too?#

The start time of an epoch can be inferred indirectly, by looking at the deme’s start_time (for epoch 0), or by looking at the previous epoch’s end_time (for epoch 1, 2, etc.).

Exponential size changes#

In the previous model, the population size was constant in each epoch. But what if we wanted to model a period of exponential population size growth, or an exponential decay? In any given epoch, there are actually two population size parameters, start_size and end_size, which correspond to the size at the start and end of the epoch. If both parameters have the same value, then the size is constant over the epoch. However, If an epoch’s start_size and end_size properties are not equal, then the epoch is defined to have an exponentially-changing population size over the epoch’s time interval. Two important implications of this system are:

We don’t need to specify a rate parameter for the exponential. If this really is needed, it can be calculated from the start_size, end_size, start_time and end_time values for an epoch.
Infinitely long epochs must have a constant population size. So if a deme has an infinite start_time, then start_size and end_size must be equal for epoch 0.

To make these ideas more concrete, let’s look at an implementation of the well-known ZigZag model from Schiffels & Durbin (2014).

Example 06

YAML

description: A single population model with epochs of exponential growth and decay.
doi:
  - https://doi.org/10.1038/ng.3015
time_units: generations
demes:
  - name: generic
    epochs:
    - {end_time: 34133.31, start_size: 7156}
    - {end_time: 8533.33, end_size: 71560}
    - {end_time: 2133.33, end_size: 7156}
    - {end_time: 533.33, end_size: 71560}
    - {end_time: 133.33, end_size: 7156}
    - {end_time: 33.333, end_size: 71560}
    - {end_time: 0, end_size: 71560}

Drawing

_images/1b893ac1967041f39865664e8ac691cadf041195e05dd5bb8c19a43c61f4e773.svg

We’ve introduced several new features to implement this model, so let’s step through it from top to bottom.

The doi property is a list of strings corresponding to the DOI(s) for publication(s) in which the model was described. By convention, the elements of the doi list are URLs, but any string value can be used.
A compact-form syntax is used for each epoch in the epochs list (with curly braces { and }), rather than using multiple lines. This is known as “flow style” in YAML parlance.
Epoch 0 has a start_size, but no end_size. Because this epoch has an infinite time span, the population size must be constant, so the epoch’s end_size will be the same as the start_size.
Epoch 1 has an end_size, but no start_size. So the start_size is inherited from the end_size of the previous epoch. This means the end_size and start_size for this epoch are different, and there will be exponential population growth over the epoch.
Epochs 2 through 5 also inherit their start_size from the previous epoch, and in each case these are different from the end_size provided.
Epochs 2 and 4 have exponential decay, whereas epochs 3 and 5 have exponential growth.
The final epoch has a constant size.

Warning

Other modelling frameworks may use the terms “epoch” or “epochs” to refer to time intervals that partition an entire model. However, in Demes, epochs are a deme-specific property, and each deme has its own list of epochs which do not apply to the other demes in the model.

Multiple demes#

A split event#

Suppose we’re interested in modelling two demes, A and B. The two demes are related by a common ancestor, from which they split 1000 generations ago. In addition to A and B, we’ll model their common ancestor as an additional deme, X. Here, we introduce the ancestors property of a deme, which is a list of deme names.

Example 07

YAML

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors:
      - X
    epochs:
      - start_size: 2000
  - name: B
    ancestors:
      - X
    epochs:
      - start_size: 2000

Drawing

_images/540602d4b0a557def71cc683b2db95e240a86cff5c84d522cfe2b48f2d31fa40.svg

When a deme has an ancestor, its start_time does not default to .inf. In this case, the start_time for the deme is inherited from the end_time of the ancestor. I.e. for the model above, both A and B have a start_time 1000 generations ago.

Note

A deme cannot appear in the demes list before its ancestor(s). This means models must be written in a “top down” manner, starting with the ancestral (root) deme(s), and followed by increasingly recent demes.

By convention, we use a more compact form for the ancestors list. Lists can be written more compactly with YAML “flow style”, which uses square brackets ([ and ], with a comma separating list items). The following Demes file describes an equivalent model.

Example 08

YAML

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000

Drawing

_images/4a7899301a19d0e14628c00d574abdfc0ed435341be4b7015716ec806d179698.svg

A branch event#

An alternative way of modelling a population split is for the ancestral deme to remain alive after the split. We will refer to this as a branch event, rather than a split event. In the model below, deme A has X as an ancestor like the previous model, except here X continues to exist until 0 generations ago (recall that 0 is the default value for the final epoch’s end_time). Now that A’s ancestor exists until 0 generations ago we must explicitly provide a start_time for A.

Example 09

YAML

time_units: generations
demes:
  - name: X
    epochs:
      - start_size: 2000
  - name: A
    ancestors: [X]
    start_time: 1000
    epochs:
      - start_size: 2000

Drawing

_images/57974efd2df58b5d2edaff6f86dd39e281fa3d81b8ac7dbeb888fe9e46a53dec.svg

Multiple ancestors#

When a deme has multiple ancestors, these appear in the ancestors list as one might expect. But for multiple ancestors we need to also specify the proportion of ancestry inherited from each ancestor. This is done using the deme’s proportions list property. The first proportion in the proportions list is for the first ancestor in the ancestors list, the second proportion is for the second ancestor, and so on. Just like the case of a single ancestor, an ancestor can terminate at the descendant’s start_time, or can instead continue to exist.

Example 10

YAML

time_units: generations
demes:
  - name: X
    epochs:
      - start_size: 2000
  - name: Y
    epochs:
      - start_size: 2000
        end_time: 1000
  - name: A
    ancestors: [X, Y]
    proportions: [0.1, 0.9]
    start_time: 1000
    epochs:
      - start_size: 2000

Drawing

_images/8cc609836b450ae0594867b53d40e7f44084d0664d299a6274820a74fbbb0912.svg

Note

With multiple ancestors, the start_time of the descendant deme does not default to the end_time of any of its ancestors. So the start_time must always be specified for a deme with multiple ancestors.

A pulse of admixture#

To model migration that is limited to a very short period of time, we can define one or more pulses. A pulse has a proportions list property and a sources list property (analogous to the proportions and ancestors properties of a deme). Each pulse proportion defines the proportion of the dest deme that is made up of ancestry from the corresponding source deme at the instant after the pulse’s time.

Note

The exact duration of a pulse is not defined by the Demes specification. Software which implements a continuous-time model (such as a coalescent simulator) might treat a pulse as occurring instantaneously. In contrast, software which implements a discrete-time model is free to treat the pulse as occurring over a single time step (such as a single generation).

Example 17

YAML

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
pulses:
  - sources: [A]
    dest: B
    proportions: [0.05]
    time: 500

Drawing

_images/59e6184c19738e1d95d059feeeaa11e466475b3c909b2f8b7f788d59d94acb67.svg

Excercise

The dest deme in an admixture pulse could instead be modelled using multiple ancestors. Try doing this for deme B in the model above. What do you think are the advantages of one approach over the other?

Question

How should one interpret multiple pulses that occur at the same time? Does it matter whether the sources and dest demes are the same in each pulse?

Answer

When multiple pulses are specified with the same time, the migration pulses occur in the order in which they are written. Consider the following two pulses into deme A at time 100.

pulses:
- sources: [B]
  dest: A
  time: 100
  proportions: [0.1]
- sources: [C]
  dest: A
  time: 100
  proportions: [0.2]

The second pulse replaces 20% of A’s ancestry, including 20% of the ancestry that was inherited from B in the first pulse. So immediately after time 100, A has 20% ancestry from C but only 8% ancestry from B. As this may be confusing, we recommend avoiding the use of multiple pulses in this way, and instead implement the model using multiple sources with the desired final ancestry proportions.

pulses:
- sources: [B, C]
  dest: A
  time: 100
  proportions: [0.08, 0.2]

More complex models involving multiple simultaneous pulses are possible, but we caution that they can be difficult to reason about.

Setting defaults#

To avoid duplication in a Demes graph with many features, it’s possible to set default values for some properties. Suppose we wish to define multiple demes, each with only one epoch, and each with a constant population size.

Example 18

YAML

time_units: generations
defaults:
  # Note: this is spelled "epoch", as distinct from the "epochs" list.
  epoch:
    start_size: 1000
demes:
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4

Drawing

_images/6775cc662cca4cdb050accfb752054e5fbb2bc4f3bfcdd79da8071e5d6ac4a04.svg

The epoch defaults can be overridden by providing an explicit value inside the desired epoch.

Example 19

YAML

time_units: generations
defaults:
  epoch:
    start_size: 1000
demes:
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
    epochs:
      - end_time: 500
      - start_size: 200
        end_time: 0
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4

Drawing

_images/97a6c5240308fe147cb8b799f9858a81cbbcd021f7c5bd76fe0c5cda61b640fe.svg

It’s also possible to provide defaults for properties of a deme, such as the start_time and ancestors.

Example 20

YAML

time_units: generations
defaults:
  deme: {start_time: 1000, ancestors: [X]}
  epoch: {start_size: 1000}
demes:
  - name: X
    start_time: .inf
    ancestors: []
    epochs:
      - end_time: 1000
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
    epochs:
      - end_time: 500
      - start_size: 200
        end_time: 0
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4

Drawing

_images/c428e006db89b64bf7954f160852d9bb0c2441e180ad092d99c1fa463cbdd058.svg

Question

How would the model be interpreted if we failed to override the start_time and ancestors deme defaults for deme X?

Answer

If we didn’t override the default start_time, then X would have both a start_time and end_time of 1000. This would be invalid, because (a) there would be no deme with an infinite start_time, and (b) the time span over which X existed would be zero.

If we didn’t override the default ancestors list, then X would be in its own ancestors list. This would be invalid, because each ancestor in the ancestors list must already be defined (earlier in the demes list). This requirement has the pleasant side-effect that the directed graph of ancestor/descendant relations cannot have cycles. Alas, it is not possible to model time travel using Demes.

Defaults for migration and pulse objects may also be specified for elements of the migrations and pulses lists.

Time units and generation time#

In the previous examples, we’ve exclusively set time_units to “generations”. This is appropriate in many cases, but sometimes other units are preferred. For example, it’s sometimes more natural to describe times using years, or even thousands of years. However, most simulation software operates using generations as the canonical unit of time. In Demes, the time_units property may be any string, but “generations” is special. If the time_units are not “generations”, then an additional generation_time property must be specified. This latter property can then be used by the simulator to convert from the chosen time_units into units of generations.

Warning

The units for the rate of migrations are always per generation, even when the time_units are not generations. Only the various start_time, end_time, and time properties should match the time_units.

Example 21

YAML

description: The Gutenkunst et al. (2009) OOA model.
doi:
- https://doi.org/10.1371/journal.pgen.1000695
time_units: years
generation_time: 25

demes:
- name: ancestral
  description: Equilibrium/root population
  epochs:
  - {end_time: 220e3, start_size: 7300}
- name: AMH
  description: Anatomically modern humans
  ancestors: [ancestral]
  epochs:
  - {end_time: 140e3, start_size: 12300}
- name: OOA
  description: Bottleneck out-of-Africa population
  ancestors: [AMH]
  epochs:
  - {end_time: 21.2e3, start_size: 2100}
- name: YRI
  description: Yoruba in Ibadan, Nigeria
  ancestors: [AMH]
  epochs:
  - start_size: 12300
- name: CEU
  description: Utah Residents (CEPH) with Northern and Western European Ancestry
  ancestors: [OOA]
  epochs:
  - {start_size: 1000, end_size: 29725}
- name: CHB
  description: Han Chinese in Beijing, China
  ancestors: [OOA]
  epochs:
  - {start_size: 510, end_size: 54090}

migrations:
- {demes: [YRI, OOA], rate: 25e-5}
- {demes: [YRI, CEU], rate: 3e-5}
- {demes: [YRI, CHB], rate: 1.9e-5}
- {demes: [CEU, CHB], rate: 9.6e-5}

Drawing

_images/b696b3e2579da6f84569a4b471864073d1a17f9a219cdbc69a4585bbeda93d9d.svg

Selfing and cloning#

Epochs also have selfing_rate and cloning_rate properties, which default to 0 if not specified.

Todo

Give examples.

Tutorial

Contents

Tutorial#

YAML#

Terminology#

What is a deme?#

Default values#

What is an epoch?#

Why don’t epochs have a `start_time` too?#

Exponential size changes#

Multiple demes#

A split event#

A branch event#

Multiple ancestors#

Continuous migration#

Asymmetric migration#

Symmetric migration#

A pulse of admixture#

Setting defaults#

Time units and generation time#

Selfing and cloning#

Tutorial

Contents

Tutorial#

YAML#

Terminology#

What is a deme?#

Default values#

What is an epoch?#

Why don’t epochs have a start_time too?#

Exponential size changes#

Multiple demes#

A split event#

A branch event#

Multiple ancestors#

Continuous migration#

Asymmetric migration#

Symmetric migration#

A pulse of admixture#

Setting defaults#

Time units and generation time#

Selfing and cloning#

Why don’t epochs have a `start_time` too?#