Tutorial#

YAML#

A Demes model is written as a YAML file. A YAML file contains property: value pairs, and Demes imposes restrictions by allowing only properties with specific names, which have specific types of values. YAML files have the extension .yaml, or sometimes .yml. Don’t worry if you’ve never heard of YAML before, as the details of YAML aren’t particularly important. The example below gives an indication of what a Demes file looks like, but we’ll explain each component gradually using additional examples. For now, select the “Drawing” tab to see a diagrammatic overview of the demographic model.

Example 01

# Comments start with a hash.
description:
  Asymmetric migration between two extant demes.
time_units: generations
defaults:
  epoch:
    start_size: 5000
demes:
  - name: X
    epochs:
      - end_time: 1000
  - name: A
    ancestors: [X]
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
        end_time: 500
      - start_size: 400
        end_size: 10000
migrations:
  - source: A
    dest: B
    rate: 1e-4
_images/tutorial_2_0.svg

Note

The Demes specification defines a data model, not the YAML format itself. A demographic model that uses the Demes data model can be easily converted to any desired data-serialisation language.

Terminology#

We use the word “Demes” with a capital D to refer to the Demes data model or the Demes specification, and “deme” or “demes” with a lowercase d when we refer to a collection of individuals (a deme) or a property in the Demes data model.

What is a deme?#

A deme is a collection of individuals that share a set of population parameters at any given time. Consider the simplest population model: a single deme with constant population size.

Example 02

time_units: generations
demes:
  - name: A
    start_time: .inf
    epochs:
      - start_size: 1000
        end_time: 0
_images/tutorial_4_0.svg

This deme exists now (zero generations ago), and also exists as far back in time as we’re interested in looking (towards infinity generations ago). In Demes, we say that the deme has an infinite start_time. Select the “YAML” tab above the drawing to see how this model was implemented.

Note

In YAML, infinity is spelled .inf, not inf.

Note

Each Demes model must define at least one deme with an infinite start_time.

Default values#

Having a deme’s start_time property be infinite is very common. So for demes without ancestors, the start_time may be omitted, and will default to .inf. Similarly, the end_time property of the final epoch may be omitted, and will default to 0. The following Demes file describes an equivalent model.

Example 03

time_units: generations
demes:
  - name: A
    epochs:
      - start_size: 1000
_images/tutorial_6_0.svg

What is an epoch?#

We partition a deme’s interval of existence into distinct epochs. A deme always has at least one epoch. The population parameters for a deme are allowed to change over time, but are fixed within an epoch. Consider a single deme that suffered a bottleneck 50 generations ago.

Example 04

time_units: generations
demes:
  - name: B
    start_time: .inf
    epochs:
      - start_size: 1000
        end_time: 50
      - start_size: 200
        end_time: 0
_images/tutorial_8_0.svg

To specify the change in the population size, we’ve introduced a second epoch into the YAML file. We now need three time values to define the epoch boundaries: the start_time of the deme, the end_time of epoch 0, and the end_time of epoch 1. All epochs must be listed in time-descending order (from the past towards the present).

The same model can be written using default values for the deme’s start_time and the final epoch’s end_time.

Example 05

time_units: generations
demes:
  - name: B
    epochs:
      - start_size: 1000
        end_time: 50
      - start_size: 200
_images/tutorial_10_0.svg

Why don’t epochs have a start_time too?#

The start time of an epoch can be inferred indirectly, by looking at the deme’s start_time (for epoch 0), or by looking at the previous epoch’s end_time (for epoch 1, 2, etc.).

Exponential size changes#

In the previous model, the population size was constant in each epoch. But what if we wanted to model a period of exponential population size growth, or an exponential decay? In any given epoch, there are actually two population size parameters, start_size and end_size, which correspond to the size at the start and end of the epoch. If both parameters have the same value, then the size is constant over the epoch. However, If an epoch’s start_size and end_size properties are not equal, then the epoch is defined to have an exponentially-changing population size over the epoch’s time interval. Two important implications of this system are:

  • We don’t need to specify a rate parameter for the exponential. If this really is needed, it can be calculated from the start_size, end_size, start_time and end_time values for an epoch.

  • Infinitely long epochs must have a constant population size. So if a deme has an infinite start_time, then start_size and end_size must be equal for epoch 0.

To make these ideas more concrete, let’s look at an implementation of the well-known ZigZag model from Schiffels & Durbin (2014).

Example 06

description: A single population model with epochs of exponential growth and decay.
doi:
  - https://doi.org/10.1038/ng.3015
time_units: generations
demes:
  - name: generic
    epochs:
    - {end_time: 34133.31, start_size: 7156}
    - {end_time: 8533.33, end_size: 71560}
    - {end_time: 2133.33, end_size: 7156}
    - {end_time: 533.33, end_size: 71560}
    - {end_time: 133.33, end_size: 7156}
    - {end_time: 33.333, end_size: 71560}
    - {end_time: 0, end_size: 71560}
_images/tutorial_12_0.svg

We’ve introduced several new features to implement this model, so let’s step through it from top to bottom.

  • The doi property is a list of strings corresponding to the DOI(s) for publication(s) in which the model was described. By convention, the elements of the doi list are URLs, but any string value can be used.

  • A compact-form syntax is used for each epoch in the epochs list (with curly braces { and }), rather than using multiple lines. This is known as “flow style” in YAML parlance.

  • Epoch 0 has a start_size, but no end_size. Because this epoch has an infinite time span, the population size must be constant, so the epoch’s end_size will be the same as the start_size.

  • Epoch 1 has an end_size, but no start_size. So the start_size is inherited from the end_size of the previous epoch. This means the end_size and start_size for this epoch are different, and there will be exponential population growth over the epoch.

  • Epochs 2 through 5 also inherit their start_size from the previous epoch, and in each case these are different from the end_size provided.

  • Epochs 2 and 4 have exponential decay, whereas epochs 3 and 5 have exponential growth.

  • The final epoch has a constant size.

Warning

Other modelling frameworks may use the terms “epoch” or “epochs” to refer to time intervals that partition an entire model. However, in Demes, epochs are a deme-specific property, and each deme has its own list of epochs which do not apply to the other demes in the model.

Multiple demes#

A split event#

Suppose we’re interested in modelling two demes, A and B. The two demes are related by a common ancestor, from which they split 1000 generations ago. In addition to A and B, we’ll model their common ancestor as an additional deme, X. Here, we introduce the ancestors property of a deme, which is a list of deme names.

Example 07

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors:
      - X
    epochs:
      - start_size: 2000
  - name: B
    ancestors:
      - X
    epochs:
      - start_size: 2000
_images/tutorial_14_0.svg

When a deme has an ancestor, its start_time does not default to .inf. In this case, the start_time for the deme is inherited from the end_time of the ancestor. I.e. for the model above, both A and B have a start_time 1000 generations ago.

Note

A deme cannot appear in the demes list before its ancestor(s). This means models must be written in a “top down” manner, starting with the ancestral (root) deme(s), and followed by increasingly recent demes.

By convention, we use a more compact form for the ancestors list. Lists can be written more compactly with YAML “flow style”, which uses square brackets ([ and ], with a comma separating list items). The following Demes file describes an equivalent model.

Example 08

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
_images/tutorial_16_0.svg

A branch event#

An alternative way of modelling a population split is for the ancestral deme to remain alive after the split. We will refer to this as a branch event, rather than a split event. In the model below, deme A has X as an ancestor like the previous model, except here X continues to exist until 0 generations ago (recall that 0 is the default value for the final epoch’s end_time). Now that A’s ancestor exists until 0 generations ago we must explicitly provide a start_time for A.

Example 09

time_units: generations
demes:
  - name: X
    epochs:
      - start_size: 2000
  - name: A
    ancestors: [X]
    start_time: 1000
    epochs:
      - start_size: 2000
_images/tutorial_18_0.svg

Multiple ancestors#

When a deme has multiple ancestors, these appear in the ancestors list as one might expect. But for multiple ancestors we need to also specify the proportion of ancestry inherited from each ancestor. This is done using the deme’s proportions list property. The first proportion in the proportions list is for the first ancestor in the ancestors list, the second proportion is for the second ancestor, and so on. Just like the case of a single ancestor, an ancestor can terminate at the descendant’s start_time, or can instead continue to exist.

Example 10

time_units: generations
demes:
  - name: X
    epochs:
      - start_size: 2000
  - name: Y
    epochs:
      - start_size: 2000
        end_time: 1000
  - name: A
    ancestors: [X, Y]
    proportions: [0.1, 0.9]
    start_time: 1000
    epochs:
      - start_size: 2000
_images/tutorial_20_0.svg

Note

With multiple ancestors, the start_time of the descendant deme does not default to the end_time of any of its ancestors. So the start_time must always be specified for a deme with multiple ancestors.

Continuous migration#

Let’s again consider a model with demes A and B, which are related via a common ancestor X (a split event).

Asymmetric migration#

To define continuous migration from A to B, we’ll add an entry to the migrations list. Concretely, we are modelling migrants born in deme A, the source deme, who (potentially) have offspring in deme B, the dest deme.

Example 11

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
migrations:
  - source: A
    dest: B
    rate: 1e-4
_images/tutorial_22_0.svg

The migrations here occur at a rate of 1e-4 per generation, and occur continuously over the lifetime of both A and B.

Warning

Migration rate units are always “per generation” regardless of the chosen value for time_units (described later).

In the example above, A and B have identical existence time intervals. If the source and dest demes do not have identical start or end times, then by default the migration will occur over the period of time when both demes exist simultaneously.

Example 12

time_units: generations
demes:
  - name: X
    epochs:
      - start_size: 2000
  - name: A
    ancestors: [X]
    start_time: 1000
    epochs:
      - start_size: 2000
migrations:
  - source: A
    dest: X
    rate: 1e-4
_images/tutorial_24_0.svg

To obtain greater control over when migrations occur, we can use the start_time and end_time migration properties. Below we model three periods of continuous migration.

Example 13

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
migrations:
  - {source: A, dest: B, rate: 1e-4, start_time: 1000, end_time: 800}
  - {source: B, dest: A, rate: 1e-3, start_time: 500, end_time: 200}
  - {source: A, dest: B, rate: 1e-5, start_time: 200, end_time: 0}
_images/tutorial_26_0.svg

Question

What do you think will happen if we list two or more migrations with time intervals that overlap? Does it matter if the source and dest demes are the same in each case?

Symmetric migration#

It’s common to model migrants in both directions simultaneously, with the same rate. We could use multiple asymmetric migrations, but it’s simpler to specify a list of deme names (the migration’s demes property), instead of a source and dest.

Example 14

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
migrations:
  - demes: [A, B]
    rate: 1e-4
_images/tutorial_28_0.svg

The demes property can list arbitrarily many demes. For example, we could define symmetric migration between all pairwise combinations of four demes.

Example 15

time_units: generations
demes:
  - name: alpha
    epochs: 
      - start_size: 1000
  - name: beta
    epochs: 
      - start_size: 1000
  - name: gamma
    epochs: 
      - start_size: 1000
  - name: delta
    epochs: 
      - start_size: 1000
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4
_images/tutorial_30_0.svg

When the following three conditions are satisfied:

  • a symmetric migration is defined between more than two demes,

  • and the demes’ start and/or end times are not all identical,

  • and the migration’s start_time and/or end_time are omitted,

then the time intervals for the migration are resolved separately for each pair of participating demes. E.g. for the model below, migration between A and B occurs at all times because both A and B exist for all time. However, migrations between A and C, and between B and C, are limited to the period of time after 100 generations ago, because C does not exist before then.

Example 16

time_units: generations
demes:
- name: A
  epochs:
    - start_size: 1000
- name: B
  epochs:
    - start_size: 1000
- name: C
  start_time: 100
  ancestors: [B]
  epochs:
    - start_size: 1000
migrations:
- demes: [A, B, C]
  rate: 1e-5
_images/tutorial_32_0.svg

A pulse of admixture#

To model migration that is limited to a very short period of time, we can define one or more pulses. A pulse has a proportions list property and a sources list property (analogous to the proportions and ancestors properties of a deme). Each pulse proportion defines the proportion of the dest deme that is made up of ancestry from the corresponding source deme at the instant after the pulse’s time.

Note

The exact duration of a pulse is not defined by the Demes specification. Software which implements a continuous-time model (such as a coalescent simulator) might treat a pulse as occurring instantaneously. In contrast, software which implements a discrete-time model is free to treat the pulse as occurring over a single time step (such as a single generation).

Example 17

time_units: generations
demes:
  - name: X
    epochs:
      - end_time: 1000
        start_size: 2000
  - name: A
    ancestors: [X]
    epochs:
      - start_size: 2000
  - name: B
    ancestors: [X]
    epochs:
      - start_size: 2000
pulses:
  - sources: [A]
    dest: B
    proportions: [0.05]
    time: 500
_images/tutorial_34_0.svg

Excercise

The dest deme in an admixture pulse could instead be modelled using multiple ancestors. Try doing this for deme B in the model above. What do you think are the advantages of one approach over the other?

Question

How should one interpret multiple pulses that occur at the same time? Does it matter whether the sources and dest demes are the same in each pulse?

Setting defaults#

To avoid duplication in a Demes graph with many features, it’s possible to set default values for some properties. Suppose we wish to define multiple demes, each with only one epoch, and each with a constant population size.

Example 18

time_units: generations
defaults:
  # Note: this is spelled "epoch", as distinct from the "epochs" list.
  epoch:
    start_size: 1000
demes:
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4
_images/tutorial_36_0.svg

The epoch defaults can be overridden by providing an explicit value inside the desired epoch.

Example 19

time_units: generations
defaults:
  epoch:
    start_size: 1000
demes:
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
    epochs:
      - end_time: 500
      - start_size: 200
        end_time: 0
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4
_images/tutorial_38_0.svg

It’s also possible to provide defaults for properties of a deme, such as the start_time and ancestors.

Example 20

time_units: generations
defaults:
  deme: {start_time: 1000, ancestors: [X]}
  epoch: {start_size: 1000}
demes:
  - name: X
    start_time: .inf
    ancestors: []
    epochs:
      - end_time: 1000
  - name: alpha
  - name: beta
  - name: gamma
  - name: delta
    epochs:
      - end_time: 500
      - start_size: 200
        end_time: 0
migrations:
  - demes: [alpha, beta, gamma, delta]
    rate: 1e-4
_images/tutorial_40_0.svg

Question

How would the model be interpreted if we failed to override the start_time and ancestors deme defaults for deme X?

Defaults for migration and pulse objects may also be specified for elements of the migrations and pulses lists.

Time units and generation time#

In the previous examples, we’ve exclusively set time_units to “generations”. This is appropriate in many cases, but sometimes other units are preferred. For example, it’s sometimes more natural to describe times using years, or even thousands of years. However, most simulation software operates using generations as the canonical unit of time. In Demes, the time_units property may be any string, but “generations” is special. If the time_units are not “generations”, then an additional generation_time property must be specified. This latter property can then be used by the simulator to convert from the chosen time_units into units of generations.

Warning

The units for the rate of migrations are always per generation, even when the time_units are not generations. Only the various start_time, end_time, and time properties should match the time_units.

Example 21

description: The Gutenkunst et al. (2009) OOA model.
doi:
- https://doi.org/10.1371/journal.pgen.1000695
time_units: years
generation_time: 25

demes:
- name: ancestral
  description: Equilibrium/root population
  epochs:
  - {end_time: 220e3, start_size: 7300}
- name: AMH
  description: Anatomically modern humans
  ancestors: [ancestral]
  epochs:
  - {end_time: 140e3, start_size: 12300}
- name: OOA
  description: Bottleneck out-of-Africa population
  ancestors: [AMH]
  epochs:
  - {end_time: 21.2e3, start_size: 2100}
- name: YRI
  description: Yoruba in Ibadan, Nigeria
  ancestors: [AMH]
  epochs:
  - start_size: 12300
- name: CEU
  description: Utah Residents (CEPH) with Northern and Western European Ancestry
  ancestors: [OOA]
  epochs:
  - {start_size: 1000, end_size: 29725}
- name: CHB
  description: Han Chinese in Beijing, China
  ancestors: [OOA]
  epochs:
  - {start_size: 510, end_size: 54090}

migrations:
- {demes: [YRI, OOA], rate: 25e-5}
- {demes: [YRI, CEU], rate: 3e-5}
- {demes: [YRI, CHB], rate: 1.9e-5}
- {demes: [CEU, CHB], rate: 9.6e-5}
_images/tutorial_42_0.svg

Selfing and cloning#

Epochs also have selfing_rate and cloning_rate properties, which default to 0 if not specified.

Todo

Give examples.