Specification#

Introduction#

Demes is a specification for describing population genetic models of demographic history. This specification precisely defines the population genetics model and its assumptions, along with the data model used for interchange and the required behaviour of implementations.

The Demes standard is largely agnostic to the processes that occur within populations, and provides a minimal set of parameters that can accommodate a wide spectrum of population genetic models.

Who is this specification for?

This specification is intended to provide a detailed and definitive resource for the following groups:

  • Those implementing support for Demes as an input or output format in their programs

  • Those implementing a Demes parser

As such, this specification contains a lot of detail that is not interesting to most users. If you wish to learn how to understand and create your own Demes models, please see the Tutorial instead.

Note to Readers#

To provide feedback on this specification, please use the issue tracker.

Conventions and Terminology#

The term “Demes” in this document is to be interpreted as a reference to this specification. A deme refers to a set of individuals that can be modelled by a fixed set of parameters; to avoid confusion with name of the specification we will usually use the term “population”, in the understanding that the terms are equivalent for the purposes of this document.

Todo

Define the human and machine data models. And link to assumptions.

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

The terms “JSON”, “JSON text”, “JSON value”, “member”, “element”, “object”, “array”, “number”, “string”, “boolean”, “true”, “false”, and “null” in this document are to be interpreted as defined in RFC 8259.

The term “JSON Schema” in this document is to be interpreted as defined in the JSON Schema core specification.

Infinity#

JSON does not define an encoding for infinite-valued numbers. However, infinite values are used in Demes for start times. When writing a Demes model to a format that does not permit infinity, such as JSON, the string “Infinity” must be used to encode infinity. When writing to formats that do support infinity, such as YAML, the native encoding for infinity should be used instead (.inf in YAML). When reading a Demes model, the string “Infinity” must be decoded to mean an infinite-valued number. When reading from formats that do support infinity, the format’s native encoding for infinite-valued numbers must also be supported.

Machine Data Model#

The Demes Machine Data Model (MDM) is a formal representation of the Demes model as a JSON document. The MDM is designed to be used as input by programs such as population genetics simulators, and explicitly includes all necessary details. The structure of JSON documents conforming to the specification is formally defined in the Schema, and the detailed requirements for each of the elements in this data model are defined in this section.

The Demes Human Data Model (HDM) is a closely related specification that is intended to be easily human-readable and writable. An MDM document is also a valid HDM document. An MDM document is constructed by resolving and validating an HDM document.

Common concepts#

This section provides details on properties that occur in multiple contexts.

Time#

Times are specified as units in the past, so that time zero corresponds to the final generation or “now”, and event times in the past are values greater than zero with larger values for events that occur in the more distant past. By default, time is measured in generations, but other values (“years”, for example) are allowed. When time units are not given in generations, the generation time must also be specified so that times can be converted into generations. In general, as time flows from the past to the present, populations, epochs, and migration events should be specified in their order of appearance, so that their times are in descending order.

Population sizes#

A fundamental concept in demes is the population size.

Todo

Things to cover:

  • We’re counting individuals not genomes

  • We usually mean the population size in expectation, but there’s no hard requirements. For example, it’s up the implementation whether it thinks a population size of 0.33 is meaningful. Clarify with some examples from forward and backward sims.

  • What do proportions mean? Similar point to pop size above. Give forward pointers to sections we mention proportions in.

  • What do we mean by migration of individuals? Forward pointers to sections.

Metadata#

Todo

Discussion of metadata. What’s it for?

MDM documents#

The top-level MDM document describes a single instance of a Demes model.

Each MDM document contains the following list of properties. All properties MUST be specified, and additional properties MUST NOT be included in these documents. Please see the Schema for definitive details on the types and structure of these properties.

description#

A concise description of the demographic model.

doi#

The DOI(s) of the publication(s) in which the model was inferred or originally described.

metadata#

An object containing arbitrary additional properties and values. May be empty.

time_units#

The units of time used to specify times and time intervals. These SHOULD be one of “generations” or “years”.

generation_time#

The number by which times must be divided, to convert them to have units of generations. Hence generation_time uses the same time units specified by time_units.

demes#

The list of demes in the model. At least one deme MUST be specified.

pulses#

The list of pulses in the model.

migrations#

The list of migrations in the model.

Deme#

A Deme is a single population (see the Conventions and Terminology for clarification of these two terms) that exists for some non-empty time interval. A population is defined operationally as some set of individuals that can be modelled by a set of fixed parameters over a series of epochs. Population parameters are defined per epoch, and are defined in the Epoch section below.

A population may have one or more ancestors, which are other populations that exist at the population’s start time. If one ancestor is specified, the first generation is constructed by randomly sampling parents from the ancestral population to contribute to offspring in the newly generated population.

If more than one ancestor is specified, the proportions of ancestry from each contributing population must be provided, and those proportions must sum to one. In this case, parents are chosen randomly from each ancestral population with probability given by those proportions. If no ancestors are specified, the population is assumed to have start time equal to infinity.

The deme may be a descendant of one or more demes in the graph, and may be an ancestor to others. The deme exists over the half-open time interval (start_time, end_time], and it may continue to exist after contributing ancestry to a descendant deme. The deme’s end_time is implicit (there is no end_time deme property), but for convenience we define it as the end_time of the deme’s last epoch.

name#

A string identifier for a deme, which MUST be unique among all demes in a document. Must be a valid python identifier

description#

A concise description of the deme.

ancestors#

The list of ancestors of the deme at the start of the deme’s first epoch. May be an empty list if the deme has no ancestors in the graph, in which case the start_time must be infinite. Each ancestor must be in the graph, and each ancestor must be specified only once. A deme must not be one of its own ancestors.

proportions#

The proportions of ancestry derived from each of the ancestors at the start of the deme’s first epoch. The proportions must be ordered to correspond with the order of ancestors. The proportions must be an empty list or sum to 1 (within a reasonable tolerance, e.g. 1e-9). See the Population sizes section for more details on how these proportions should be interpreted.

start_time#

The most ancient time at which the deme exists, in time_units before the present. Demes with no ancestors are root demes and must have an infinite start_time. Otherwise, the start_time must correspond with the interval of existence for each of the deme’s ancestors. I.e. the start_time must be within the half-open interval (deme.start_time, deme.end_time] for each deme in ancestors.

epochs#

The list of epochs for this deme. There MUST be at least one epoch for each deme.

Epoch#

A deme-specific period of time spanning the half-open interval (start_time, end_time], in which a fixed set of population parameters apply. The epoch’s start_time is implicit (there is no start_time epoch property), but for convenience we define it as the end_time of the previous epoch, or the deme’s start_time if it is the first epoch.

Each epoch specifies the population size over that interval, which can be a constant value or function defined by start and end sizes that must remain positive. If an epoch has a start time of infinity, the population size for that epoch must be constant.

Epochs can also specify parameters for nonrandom mating, such as selfing or cloning rates, which give the probability that offspring are generated from one generation to the next by self-fertilisation or cloning of an individual. Selfing and cloning rates take values between zero and one.

end_time#

The most recent time of the epoch, in time_units before the present.

start_size#

The population size at the epoch’s start_time.

end_size#

The population size at the epoch’s end_time.

size_function#

A function describing the population size change between start_time and end_time. This may be any string, but the values “constant” and “exponential” are explicitly acknowledged to have the following meanings.

  • constant: the deme’s size does not change over the epoch. start_size and end_size must be equal.

  • exponential: the deme’s size changes exponentially from start_size to end_size over the epoch. If t is a time within the span of the epoch, the deme size N at time t can be calculated as:

    dt = (epoch.start_time - t) / (epoch.start_time - epoch.end_time)
    r = log(epoch.end_size / epoch.start_size)
    N = epoch.start_size * exp(r * dt)
    

size_function must be constant if the epoch has an infinite start_time.

cloning_rate#

The proportion of offspring in each generation that are expected to be generated through clonal reproduction. 1 - cloning_rate are expected to arise through sexual reproduction.

selfing_rate#

Within the sexually-reproduced offspring, selfing_rate are born via self-fertilisation while the rest have parents drawn at random from the previous generation.

Note

Depending on the simulator, this random drawing of parent may occur either with or without replacement. When drawing occurs with replacement, a small amount of residual selfing is expected, so that even with cloning_rate=0 and selfing_rate=0, selfing may still occur with probability 1/N. Simulators that allow variable rates of selfing are expected to clearly document their behaviour.

Pulse#

An instantaneous pulse of migration at time, from a list of source demes (sources) into the destination deme (dest).

Pulse migration events specify the instantaneous replacement of a given fraction of individuals in a destination population by individuals with parents from a source population. The fraction must be between zero and one, and if more than one pulse occurs at the same time, those replacement events are applied sequentially in the order that they are specified in the model. The list of pulses must be sorted in time-descending order.

sources#

The list of deme names of the migration sources.

dest#

The deme name of the migration destination.

time#

The time of migration, in in time_units before the present. The demes defined by sources and dest must both exist at the given time. I.e. time must be contained in the (deme.start_time, deme.end_time] interval of the sources demes and the dest deme.

proportions#

The proportions of ancestry in the dest deme derived from the demes in sources immediately after the time of migration. The proportions must be ordered to correspond with the order of sources. The proportions must sum to less than or equal to 1 (within a reasonable tolerance, e.g. 1e-9). See the Population sizes section for more details on how proportions should be interpreted.

Example: sequential application of pulses#

Consider the following model:

time_units: generations
demes:
 - name: A
   epochs:
    - start_size: 1000
 - name: B
   epochs:
    - start_size: 1000
 - name: C
   epochs:
    - start_size: 1000
pulses:
- sources: [A]
  dest: C
  proportions: [0.25]
  time: 10
- sources: [B]
  dest: C
  proportions: [0.2]
  time: 10

Ten (10) generations ago, pulse events occur from source demes A and B into destination deme C.

We need to arrive at the final ancestry proportions for destination deme C after this time. Software implementing pulse events must generate output that is equivalent to the following procedure.

The steps are:

  1. Initialize an array of zeros with length equal to the number of demes.

  2. Set the ancestry proportion of the destination deme to 1.

  3. For each pulse: a. Multiply the array by one (1) minus the sum of proportions. b. For each source, add its proportion to the array.

For the above model, the steps are:

1. x = [0, 0, 0]
2. x = [0, 0, 1]
3. p = 1 - 0.25
   x = x*p = [0, 0, 0.75]
   x[A] += 0.25, x = [0.25, 0, 0.75]
   p = 1 - 0.2
   x = x*p = [0.2, 0, 0.6]
   x[B] += 0.2, x = [0.2, 0.2, 0.6]

Thus, our final ancestry proportions for deme C after time 10 are [0.2, 0.2, 0.6].

Important considerations#

  • The final ancestry proportions depend on the order of the pulses in the model! If we reverse the above model such that:

    pulses:
    - sources: [B]
      dest: C
      proportions: [0.2]
      time: 10
    - sources: [A]
      dest: C
      proportions: [0.25]
      time: 10
    

    We get [0.25, 0.15, 0.6] as our ancestry proportions due to pulses.

    The fact that the outcome of applying sequential pulses depends on the order is why demes-python emits a warning when resolving such models.

  • Given the procedure used to apply sequential pulses at the same time, the followng two sets of Pulses are not equivalent:

    pulses:
    - sources: [A]
      dest: C
      proportions: [0.2]
      time: 10
    - sources: [B]
      dest: C
      proportions: [0.2]
      time: 10
    
    pulses:
    - sources: [B, A]
      dest: C
      proportions: [0.2, 0.2]
      time: 10
    
  • Therefore, we strongly recommend that models be represented using the following syntax that makes the intended outcome of the model explicit:

    pulses:
    - sources: [A, B]
      dest: C
      proportions: [0.2, 0.2]
      time: 10
    

Migration#

Continuous asymmetric migration over the half-open time interval (start_time, end_time], from the deme with name source to the deme with name dest. Rates are defined as the probability that parents in the “destination” population are chosen from the “source” population. Migration rates are thus per generation and must be less than or equal to one. There must be at most one migration specified per source/destination pair for any given time interval. Furthermore, if more than one source population have continuous migration into the same destination population, the sum of those migration rates must also be less than or equal to one, as rates define probabilities. The probability that parents come from the same population is just one minus the sum of incoming migration rates.

Warning

When continuous migration occurs over a time period that includes a pulse, the continuous migration probabilities define the probability of choosing parents from each deme conditional on individuals not arriving via the pulse.

source#

The deme name of the asymmetric migration source.

dest#

The deme name of the asymmetric migration destination.

start_time#

The time at which migration begins, in time_units before the present. The start_time must be contained in the [deme.start_time, deme.end_time) interval of the source deme and the dest deme.

end_time#

The time at which migration stops, in time_units before the present. The end_time must be contained in the (deme.start_time, deme.end_time] interval of the source deme and the dest deme.

rate#

The rate of migration per generation.

Schema#

The schema listed here is definitive in terms of types and the structure of the JSON documents that are considered to be valid instances of the MDM

$schema: http://json-schema.org/draft-07/schema#
title: Fully qualified Demes graph
type: object
additionalProperties: false
properties:
  description:
    type: "string"
  doi:
    type: array
    items:
      type: string

  time_units:
    type: string
    # TODO: shouldn't this be an enum?

  generation_time:
    type: "number"
    exclusiveMinimum: 0

  metadata:
    type: object
    additionalProperties: true

  demes:
    type: array
    minItems: 1
    items:
      $ref: '#/definitions/deme'

  pulses:
    type: array
    items:
      $ref: '#/definitions/pulse'

  migrations:
    type: array
    items:
      $ref: '#/definitions/migration'

required:
- description
- doi
- time_units
- demes
- generation_time
- pulses
- migrations

definitions:
  name:
    type: string

  rate:
    type: number
    minimum: 0
    maximum: 1

  proportion:
    type: number
    exclusiveMinimum: 0
    maximum: 1

  size:
    type: number
    exclusiveMinimum: 0

  start_time:
    oneOf:
      - type: number
        exclusiveMinimum: 0
      - const: "Infinity"

  end_time:
    type: number
    minimum: 0

  epoch:
    type: object
    additionalProperties: false
    properties:
      end_time:
        $ref: '#/definitions/end_time'
      start_size:
        $ref: '#/definitions/size'
      end_size:
        $ref: '#/definitions/size'
      size_function:
        # TODO: make this an enumeration
        type: string
      cloning_rate:
        $ref: '#/definitions/rate'
      selfing_rate:
        $ref: '#/definitions/rate'
    required:
    - end_time
    - start_size
    - end_size
    - size_function
    - cloning_rate
    - selfing_rate

  deme:
    type: object
    additionalProperties: false
    properties:
      name:
        $ref: '#/definitions/name'
      description:
        type: "string"
      ancestors:
        type: array
        items:
          $ref: '#/definitions/name'
      proportions:
        type: array
        items:
          $ref: '#/definitions/proportion'
      start_time:
        $ref: '#/definitions/start_time'
      epochs:
        type: array
        minItems: 1
        items:
          $ref: '#/definitions/epoch'
    required:
    - name
    - description
    - ancestors
    - proportions
    - start_time
    - epochs

  pulse:
    type: object
    additionalProperties: false
    properties:
      sources:
        type: array
        items:
          $ref: '#/definitions/name'
      dest:
        $ref: '#/definitions/name'
      time:
        type: number
        exclusiveMinimum: 0
      proportions:
        type: array
        items:
          $ref: '#/definitions/proportion'
    required:
    - sources
    - dest
    - time
    - proportions

  migration:
    type: object
    additionalProperties: false
    properties:
      source:
        $ref: '#/definitions/name'
      dest:
        $ref: '#/definitions/name'
      start_time:
        $ref: '#/definitions/start_time'
      end_time:
        $ref: '#/definitions/end_time'
      rate:
        $ref: '#/definitions/rate'
    required:
    - source
    - dest
    - start_time
    - end_time
    - rate

Human Data Model#

The Demes Human Data Model (HDM) is an extension of the Machine Data Model that is designed for human readability. The HDM provides default values for many parameters, removes redundant information in the MDM via rules described in this section and also provides a default value replacement mechanism. JSON documents conforming to the HDM are intended to be processed by a parser, which outputs the corresponding MDM document.

This section defines the structure of HDM documents, the rules by which they are transformed into MDM documents, and the error conditions that should be detected by parsers.

We also provide a reference implementation of a parser for the HDM (which also parses the MDM, by definition) written in Python. This implementation is intended to clarify any ambiguities there may be in this specification, but is not intended to be used directly in downstream software. Please use the demes Python library instead.

Defaults#

Repeated values such as shared population sizes represent a significant opportunity for error in human-generated models. The HDM provides the default value propagation mechanism to avoid this repetition. The essential idea is that we declare default values hierarchically within the document, and that the ultimate value assigned to a property is prioritised by proximity within the document hierarchy.

Default values can be provided in two places within a HDM document: at the top-level or within a deme definition.

See also

See the tutorial for examples of how the defaults section can be used.

See also

See the reference implementation for a practical example of how defaults in the HDM can be implemented.

Top-level defaults#

The top-level HDM document may contain a propery defaults, which defines values to use for entities within rest of the document unless otherwise specified. The defaults object can have the following properties:

  • epoch: this can specify any property valid for an MDM epoch. All epochs in the document will be assigned these properties, unless specified within the epoch or the deme defaults.

  • migration: this can specify any property valid for an MDM migration. All migrations in the document will be assigned these properties, unless specified within the migration itself.

  • pulse: this can specify any property valid for an MDM pulse. All pulses in the document will be assigned these properties, unless specified within the pulse itself.

  • deme: this can specify the following properties for a deme only: description, ancestors, proportions and start_time.

See also

See the schema for definitive information on the structural properties of the top level defaults section.

Deme defaults#

Deme defaults operate in the same manner as top-level defaults: the specified values will be used if omitted in any epochs within the deme. Defaults specified within a deme override any values specified in the top level deme defaults.

This can specify any property valid for an MDM epoch. Any epochs in the deme will be assigned these properties, unless specified within an epoch itself.

See also

See the schema for definitive information on the structural properties of the deme defaults section.

Resolution#

The Demes machine data model contains many values that are technically redundant, in that they can be reliably inferred from other values in the model. For example, if a deme’s size_function is "constant" during an Epoch, then clearly the start_size and end_size will be equal. The MDM still requires that both be specified, because it is intended for machine consumption, and having a fully specified and complete data model allows code that consumes this model to be simple and straightforward. However, such redundancy is a significant downside for human consumption, where having repeated or redundant values leads to poorer readability and increases the probability of errors. Thus, one of the differences between the Demes Human Data Model and Machine Data Model is that the HDM tries to remove as much redundancy as possible. A major part of a Demes parser implementation’s task is to fill in the redundant information, a process that we refer to as model “resolution”. Please consult the reference implementation for more detailed information.

Resolution is idempotent; that is, resolution of an already resolved model (i.e., in MDM form) MUST result in identical output. Thus a parser need not know if a model is in HDM or MDM form a priori.

Resolution happens in a set of steps in a defined order:

Todo

Resolution order matters for some things, but not for others. Clarify where order matters (and why).

time_units#

time_units must be specified. The value “generations” is special, in that it implies that the generation_time will be 1 and may thus be omitted.

generation_time#

If time_units is not “generations”, then generation_time MUST be specified.

If time_units is “generations”, then

  • generation_time may be omitted, in which case it shall be given the value 1.

  • an error shall be raised if generation_time is not 1.

metadata#

If metadata is omitted, it shall be given the value of an empty dictionary. If metadata is present, the value must be a dictionary, but metadata is otherwise transferred to the output without further processing. Errors may be raised if metadata is not parsable (e.g. invalid YAML), but the parser shall not attempt to validate fields within the metdata.

description#

If description is omitted, it shall be given the value of an empty string.

doi#

If doi is omitted, it shall be given the value of an empty list.

defaults#

If top-level defaults is provided, default values shall be validated to the extent possible, to avoid propagating invalid values. E.g. defaults.epoch.start_size cannot be negative.

Deme resolution#

Each deme is resolved in the order that it occurs in the input. A deme can only be resolved if its ancestor demes have already been resolved, and the parser MUST raise an error if a deme is encountered that has unresolved ancestors. Thus, valid input files list the demes in topologically sorted order, such that ancestors are listed before their descendants.

Resolution order:

defaults#

If deme-level defaults is provided, default values shall be validated to the extent possible, to avoid propagating invalid values. E.g. defaults.epoch.start_size cannot be negative.

For each deme, deme-level defaults override top-level defaults.

description#

If description is omitted,

  • If the deme.description defaults field is present, description shall be given this value.

  • Otherwise, description shall be given the value of the empty string.

ancestors#

If ancestors is omitted,

  • If the deme.ancestors defaults field is present, ancestors shall be given this value.

  • Otherwise, ancestors shall be given the value of the empty list.

proportions#

If proportions is omitted,

  • If the deme.proportions defaults field is present, proportions shall be given this value.

  • Otherwise, if ancestors has length one, proportions shall be a single-element list containing the element 1.0.

  • Otherwise, if ancestors has length zero, proportions shall be given the value of the empty list.

  • Otherwise, proportions cannot be determined and an error MUST be raised.

start_time#

If start_time is omitted,

  • If the epoch.start_time defaults field is present, start_time shall be given this value.

  • Otherwise, if ancestors has length one and the ancestor has an end_time > 0, the ancestor’s end_time value shall be used.

  • Otherwise, if ancestors has length zero, start_time shall be given the value infinity.

  • Otherwise, start_time cannot be determined and an error MUST be raised.

Epoch resolution#

Epochs are listed in time-descending order (from oldest to youngest), and population sizes are inherited from older epochs. Resolution order:

If a deme’s epochs field is omitted, it will be given the value of a single-element list, where the list element has the value of an epoch with all fields omitted. This may produce a valid epoch during subsequent resolution, e.g. if the epoch.start_size defaults field has a value.

end_time#

If end_time is omitted,

  • If the epoch.end_time defaults field is present, end_time shall be given this value.

  • Otherwise, if this is the last epoch, end_time shall be given the value 0.

  • Otherwise, end_time cannot be determined and an error MUST be raised.

The end_time value of the first epoch MUST be strictly smaller than the deme’s start_time. The end_time values of successive epochs MUST be strictly decreasing.

start_size, end_size#

Note

Sizes are never inherited from ancestors.

If start_size is omitted and the epoch.start_size defaults field is present, then the epoch’s start_size shall be given this value. If end_size is omitted and the epoch.end_size defaults field is present, then the epoch’s end_size shall be given this value.

In the first epoch,

  • at least one of start_size or end_size MUST be specified (possibly via a defaults field).

  • If start_size is omitted (and no default exists), it shall be given the same value as end_size.

  • If end_size is omitted (and no default exists), it shall be given the same value as start_size.

  • If the deme’s start_time is infinite, start_size MUST have the same value as end_size.

In subsequent epochs,

  • If start_size is omitted (and no defaults exist), it shall be given the same value as the previous epoch’s end_size.

  • If end_size is omitted (and no defaults exist), it shall be given the same value as start_size.

size_function#

If size_function is omitted,

  • if the epoch.size_function defaults field is present, size_function shall be given this value.

  • Otherwise, if start_size has the same value as end_size, size_function will be given the value "constant".

  • Otherwise, size_function will be given the value "exponential".

selfing_rate#

If selfing_rate is omitted,

  • if the epoch.selfing_rate defaults field is present, selfing_rate shall be given this value.

  • Otherwise, selfing_rate shall be given the value 0.

cloning_rate#

If cloning_rate is omitted,

  • if the epoch.cloning_rate defaults field is present, cloning_rate shall be given this value.

  • Otherwise, cloning_rate shall be given the value 0.

Migration resolution#

Migrations must be resolved after all demes are resolved. Asymmetric migration can be specified using the source and dest properties, or symmetric migration can be specified using the demes property to list the names of the participating demes. Each symmetric migration is resolved into two asymmetric migrations (one in each direction) for each pair of participating demes.

Resolution order:

rate#

If the rate is omitted,

  • if the migration.rate defaults field is present, rate shall be given this value.

  • Otherwise, an error MUST be raised.

source#

If source is omitted and the migration.source defaults field is present, source shall be given this value.

dest#

If dest is omitted and the migration.dest defaults field is present, dest shall be given this value.

demes#

If demes is omitted and the migration.demes defaults field is present, demes shall be given this value.

Symmetric migration#

The following rules shall determine the mode of migration (either asymmetric or symmetric):

  • If demes does not have a value, and both source and dest have values, the migration is asymmetric. Resolution continues from start_time.

  • If demes has a value, and neither source nor dest have values, the migration is symmetric.

  • Otherwise, the mode of migration cannot be determined, and an error MUST be raised.

If the migration is symmetric, demes MUST be validated before further resolution:

  • demes MUST be a list of at least two deme names.

  • Each element of demes must be unique.

  • Each element of demes must be the name of a resolved deme.

If any of the previous conditions are not met, an error MUST be raised.

If the migration is symmetric, two new asymmetric migrations shall be constructed for each pair of deme names in demes. E.g. if demes = ["a", "b", "c"], then asymmetric migrations shall be constructed for the following cases:

  • source="a", dest="b",

  • source="b", dest="a",

  • source="a", dest="c",

  • source="c", dest="a",

  • source="b", dest="c",

  • source="c", dest="b".

Values for rate, start_time, and end_time for the new asymmetric migrations shall be taken from the symmetric migration. If start_time and/or end_time are omitted from the symmetric migration, these shall also be omitted for the new asymmetric migrations. Resolution now proceeds separately for each distinct asymmetric migration.

Note

The symmetric migration shall not appear in the MDM output. Once the symmetric migration has been resolved into the corresponding asymmetric migrations, the symmetric migration may be discarded.

start_time#

If start_time is omitted,

  • If the migration.start_time defaults field has a value, start_time shall be given this value.

  • Otherwise, start_time shall be the oldest time at which both the source and dest demes exist. I.e. min(source.start_time, dest.start_time).

end_time#

If end_time is omitted,

  • If the migration.end_time defaults field has a value, end_time shall be given this value.

  • Otherwise, end_time shall be the most recent time at which both the source and dest demes exist. I.e. max(source.end_time, dest.end_time).

Pulse resolution#

Pulses must be resolved after all demes are resolved.

Resolution order:

sources#

If sources is omitted,

  • if the pulse.sources defaults field has a value, sources shall be given this value.

  • Otherwise, an error MUST be raised.

proportions#

If proportions is omitted,

  • if the pulse.proportions defaults field has a value, proportions shall be given this value.

  • Otherwise, an error MUST be raised.

dest#

If dest is omitted,

  • if the pulse.dest defaults field has a value, dest shall be given this value.

  • Otherwise, an error MUST be raised.

time#

If time is omitted,

  • if the pulse.time defaults field has a value, time shall be given this value.

  • Otherwise, an error MUST be raised.

Sort pulses#

Pulses MUST be sorted in time-descending order (from oldest to youngest). A stable sorting algorithm MUST be used to avoid changing the model interpretation when multiple pulses are specified with the same time value.

Note

In a discrete-time setting, non-integer pulse times that are distinct could be rounded to the same time value. If pulses are in time-ascending order when times are rounded, then the pulses would be applied in the opposite order compared to a continuous-time setting. Sorting in time-descending order avoids this discrepancy.

Validation#

Note

It may be convenient to perform some or all validation during model resolution. E.g. to avoid code duplication, or to provide better error messages to the user.

Following resolution, the model must be validated against the MDM schema. This includes checking:

  • all required properties now have values,

  • no additional properties are present (except where permitted by the schema),

  • the types of properties match the schema,

  • the values are within the ranges specified (noting that infinity is permitted only for deme start_time and for migration start_time).

In addition to validation against the schema, the following constraints must be checked to ensure overall consistency of the model. If any condition is not met, an error must be raised.

generation_time#

If time_units is “generations”, then generation_time must be 1.

demes#

  • There must be at least one deme.

  • Each deme’s name must be unique in the model.

  • name must be a valid Python identifier.

  • If start_time is infinity, ancestors must be an empty list.

  • If ancestors is an empty list, start_time must have the value infinity.

  • No deme may appear in its own ancestors list.

  • Each element of the ancestors list must be unique.

  • The proportions list must have the same length as the ancestors list.

  • If the proportions list is not empty, then the values must sum to 1 (within a reasonable tolerance, e.g. 1e-9).

epochs#
  • Each deme must have at least one epoch.

  • The end_time values of successive epochs must be strictly descending (ordered from the past towards the present).

  • The end_time values must be strictly smaller than the deme’s start_time.

  • If the deme has an infinite start_time, the first epoch’s size_function must have the value “constant”.

  • If the size_function is “constant”, the start_size and end_size must be equal.

migrations#

This section assumes that symmetric migrations have been resolved into pairs of asymmetric migrations and validated as per the migration resolution section. Resolution of symmetric migrations includes validation of the migration.demes property, and this property is not considered below as it is not part of the MDM.

  • source must not be the same as dest.

  • start_time and end_time must both be in the closed interval [deme.start_time, deme.end_time], for both the source deme and the dest deme.

  • start_time must be strictly greater than end_time.

  • There must be at most one migration specified per source/destination pair for any given time interval.

  • If more than one source population have continuous migration into the same destination population, the sum of those migration rates must also be less than or equal to 1 (within a reasonable tolerance, e.g. 1e-9).

pulses#

  • sources must be list containing at least one element.

  • Each element of sources must be unique.

  • The dest deme must not appear in the sources list.

  • For each source deme in sources, time must be in the open-closed interval (deme.start_time, deme.end_time], defined by the existence interval of the source deme.

  • time must be in the closed-open interval [deme.start_time, deme.end_time), defined by the existence interval of the dest deme.

  • Hence, time must not have the value infinity, nor the value 0.

  • The proportions list must have the same length as the sources list.

  • The sum of values in the proportions list must be less than or equal to 1 (within a reasonable tolerance, e.g. 1e-9).

Schema#

The schema listed here is definitive in terms of types and the structure of the JSON documents that are considered to be valid instances of the Demes standard.

$schema: http://json-schema.org/draft-07/schema#
title: Demes graph
type: object
additionalProperties: false
properties:
  description:
    type: "string"
    default: ""
  doi:
    type: array
    items:
      type: string
    default: []

  time_units:
    type: string
    # TODO: shouldn't this be an enum?

  generation_time:
    type: "number"
    exclusiveMinimum: 0

  metadata:
    type: object
    default: {}
    additionalProperties: true

  defaults:
    type: object
    default: {}
    additionalProperties: false
    properties:
      epoch:
        $ref: '#/definitions/epoch'
      migration:
        $ref: '#/definitions/migration'
      pulse:
        $ref: '#/definitions/pulse'
      deme:
        properties:
          description:
            type: "string"
          ancestors:
            type: array
            items:
              $ref: '#/definitions/name'
          proportions:
            type: array
            items:
              $ref: '#/definitions/proportion'
          start_time:
            $ref: '#/definitions/start_time'

  demes:
    type: array
    minItems: 1
    items:
      $ref: '#/definitions/deme'

  pulses:
    type: array
    items:
      $ref: '#/definitions/pulse'
    default: []

  migrations:
    type: array
    items:
      $ref: '#/definitions/migration'
    default: []

required:
- time_units
- demes

definitions:
  name:
    type: string

  rate:
    type: number
    minimum: 0
    maximum: 1

  proportion:
    type: number
    exclusiveMinimum: 0
    maximum: 1

  size:
    type: number
    exclusiveMinimum: 0

  start_time:
    oneOf:
      - type: number
        exclusiveMinimum: 0
      - const: "Infinity"

  end_time:
    type: number
    minimum: 0

  epoch:
    type: object
    additionalProperties: false
    properties:
      end_time:
        $ref: '#/definitions/end_time'
      start_size:
        $ref: '#/definitions/size'
      end_size:
        $ref: '#/definitions/size'
      size_function:
        # TODO: make this an enumeration
        type: string
        default: exponential
      cloning_rate:
        $ref: '#/definitions/rate'
      selfing_rate:
        $ref: '#/definitions/rate'

  deme:
    type: object
    additionalProperties: false
    properties:
      name:
        $ref: '#/definitions/name'
      description:
        type: "string"
        default: ""
      ancestors:
        type: array
        items:
          $ref: '#/definitions/name'
        default: []
      proportions:
        type: array
        items:
          $ref: '#/definitions/proportion'
      start_time:
        $ref: '#/definitions/start_time'
      epochs:
        type: array
        default: []
        minItems: 0
        items:
          $ref: '#/definitions/epoch'
      defaults:
        type: object
        default: {}
        additionalProperties: false
        properties:
          epoch:
            $ref: '#/definitions/epoch'
    required:
    - name

  pulse:
    type: object
    additionalProperties: false
    properties:
      sources:
        type: array
        items:
          $ref: '#/definitions/name'
      dest:
        $ref: '#/definitions/name'
      time:
        type: number
        exclusiveMinimum: 0
      proportions:
        type: array
        items:
          $ref: '#/definitions/proportion'

  migration:
    anyOf:
    # Asymmetric
    - type: object
      additionalProperties: false
      properties:
        source:
          $ref: '#/definitions/name'
        dest:
          $ref: '#/definitions/name'
        start_time:
          $ref: '#/definitions/start_time'
        end_time:
          $ref: '#/definitions/end_time'
        rate:
          $ref: '#/definitions/rate'
    # Symmetric
    - type: object
      additionalProperties: false
      properties:
        demes:
          type: array
          items:
            $ref: '#/definitions/name'
          minItems: 2
        start_time:
          $ref: '#/definitions/start_time'
        end_time:
          $ref: '#/definitions/end_time'
        rate:
          $ref: '#/definitions/rate'

Reference parser implementation#

# A simple parser that builds a fully-qualified Demes Graph from an input JSON
# string.
#
# Requires Python 3.7+.
#
# This implementation is NOT recommended for use in any downstream software and
# is provided purely as reference material for parser writers (i.e., in other
# programming languages). Python users should use the "demes" package in their
# software: https://github.com/popsim-consortium/demes-python
#
# The entry point is the ``parse`` function, which returns a fully-qualified
# Graph. The implementation is written with clarity and correctness as the main
# priorities. Its main purpose is to remove any potential ambiguities that may
# exist in the written specification and to simplify the process of writing
# other parsers. In the interest of simplicity, the parser does not generate
# useful error messages in all cases (but we would hope that practical
# implementations would).
#
# Type annotations are used where they help with readability, but not applied
# exhaustively.
from __future__ import annotations

import math
import numbers
import copy
import pprint
import dataclasses
from typing import Dict, List, Union

# Numerical wiggle room.
EPSILON = 1e-6

# JSON does not provide a way to encode IEEE infinity values, which we
# require to describe start_time values. To work around this we use the
# string "Infinity" to represent IEEE positive infinity.
JSON_INFINITY_STR = "Infinity"


def parse(data: dict) -> Graph:
    # Parsing is done by popping items out of the input data dictionary and
    # creating the appropriate Python objects. We ensure that extra items
    # have not been included in the data payload by checking if the objects
    # are empty once we have removed all the values defined in the
    # specification. Type and range validation of simple items (e.g., the
    # value must be a positive integer) is performed at the same time,
    # using the pop_x functions. Once the full object model of the input
    # data has been built, the rules for creating a fully-qualified Demes
    # graph are applied in the "resolve" functions. Finally, we validate
    # the fully-qualified graph to ensure that relationships between the
    # entities have been specified correctly.
    data = copy.deepcopy(data)

    defaults = pop_object(data, "defaults", {})
    deme_defaults = pop_object(defaults, "deme", {})
    migration_defaults = pop_object(defaults, "migration", {})
    pulse_defaults = pop_object(defaults, "pulse", {})
    # epoch defaults may also be specified within a Deme definition.
    global_epoch_defaults = pop_object(defaults, "epoch", {})
    check_empty(defaults)

    graph = Graph(
        description=pop_string(data, "description", ""),
        time_units=pop_string(data, "time_units", None),
        doi=pop_list(data, "doi", [], str, is_nonempty),
        generation_time=pop_number(
            data, "generation_time", None, is_positive_and_finite
        ),
        metadata=pop_object(data, "metadata", {}),
    )
    check_defaults(
        deme_defaults,
        dict(
            description=(str, None),
            start_time=((str, numbers.Number), is_positive_or_json_infinity),
            ancestors=(list, is_list_of_identifiers),
            proportions=(list, is_list_of_proportions),
        ),
    )

    allowed_epoch_defaults = dict(
        end_time=(numbers.Number, is_non_negative_and_finite),
        start_size=(numbers.Number, is_positive_and_finite),
        end_size=(numbers.Number, is_positive_and_finite),
        selfing_rate=(numbers.Number, is_rate),
        cloning_rate=(numbers.Number, is_rate),
        size_function=(str, None),
    )
    check_defaults(global_epoch_defaults, allowed_epoch_defaults)

    for deme_data in pop_list(data, "demes"):
        insert_defaults(deme_data, deme_defaults)
        deme = graph.add_deme(
            name=pop_string(deme_data, "name", validator=is_identifier),
            description=pop_string(deme_data, "description", ""),
            start_time=pop_number(
                deme_data,
                "start_time",
                None,
                is_positive_or_json_infinity,
                allow_inf=True,
            ),
            ancestors=pop_list(deme_data, "ancestors", [], str, is_identifier),
            proportions=pop_list(
                deme_data, "proportions", None, numbers.Number, is_proportion
            ),
        )

        local_defaults = pop_object(deme_data, "defaults", {})
        local_epoch_defaults = pop_object(local_defaults, "epoch", {})
        check_empty(local_defaults)
        check_defaults(local_epoch_defaults, allowed_epoch_defaults)
        epoch_defaults = global_epoch_defaults.copy()
        epoch_defaults.update(local_epoch_defaults)
        check_defaults(epoch_defaults, allowed_epoch_defaults)

        # There is always at least one epoch defined with the default values.
        for epoch_data in pop_list(deme_data, "epochs", [{}]):
            insert_defaults(epoch_data, epoch_defaults)
            deme.add_epoch(
                end_time=pop_number(
                    epoch_data, "end_time", None, is_non_negative_and_finite
                ),
                start_size=pop_number(
                    epoch_data, "start_size", None, is_positive_and_finite
                ),
                end_size=pop_number(
                    epoch_data, "end_size", None, is_positive_and_finite
                ),
                selfing_rate=pop_number(epoch_data, "selfing_rate", 0, is_rate),
                cloning_rate=pop_number(epoch_data, "cloning_rate", 0, is_rate),
                size_function=pop_string(epoch_data, "size_function", None),
            )
            check_empty(epoch_data)
        check_empty(deme_data)

        if len(deme.epochs) == 0:
            raise ValueError(f"no epochs for deme {deme.name}")

    if len(graph.demes) == 0:
        raise ValueError("the graph must have one or more demes")

    check_defaults(
        migration_defaults,
        dict(
            rate=(numbers.Number, is_rate),
            start_time=((numbers.Number, str), is_positive_or_json_infinity),
            end_time=(numbers.Number, is_non_negative_and_finite),
            source=(str, is_identifier),
            dest=(str, is_identifier),
            demes=(list, is_list_of_identifiers),
        ),
    )
    for migration_data in pop_list(data, "migrations", []):
        insert_defaults(migration_data, migration_defaults)
        graph.add_migration(
            rate=pop_number(migration_data, "rate", validator=is_rate),
            start_time=pop_number(
                migration_data,
                "start_time",
                None,
                is_positive_or_json_infinity,
                allow_inf=True,
            ),
            end_time=pop_number(
                migration_data, "end_time", None, is_non_negative_and_finite
            ),
            source=pop_string(migration_data, "source", None, is_nonempty),
            dest=pop_string(migration_data, "dest", None, is_nonempty),
            demes=pop_list(
                migration_data,
                "demes",
                default=None,
                required_type=str,
                validator=is_identifier,
            ),
        )
        check_empty(migration_data)

    check_defaults(
        pulse_defaults,
        dict(
            sources=(list, is_nonempty_list_of_identifiers),
            dest=(str, is_identifier),
            time=(numbers.Number, is_positive_and_finite),
            proportions=(list, is_nonempty_list_of_proportions_with_sum_less_than_1),
        ),
    )
    for pulse_data in pop_list(data, "pulses", []):
        insert_defaults(pulse_data, pulse_defaults)
        graph.add_pulse(
            sources=pop_list(
                pulse_data,
                "sources",
                default=[],
                required_type=str,
                validator=is_identifier,
            ),
            dest=pop_string(pulse_data, "dest", validator=is_identifier),
            time=pop_number(pulse_data, "time", validator=is_positive_and_finite),
            proportions=pop_list(
                pulse_data,
                "proportions",
                default=[],
                required_type=numbers.Number,
                validator=is_proportion,
            ),
        )
        check_empty(pulse_data)

    check_empty(data)

    # The input object model has now been fully populated, and local type and
    # value checking done. Default values (either from the schema or set explicitly
    # by the user via "defaults" sections) have been assigned. We now "resolve"
    # the model so that any values that can be imputed from the structure of the
    # model are set explicitly. Once this is done, we then validate the model to
    # check that the relationships between various entities make sense. Note that
    # there isn't a clean separation between resolution and validation here, since
    # some validation is simplest to perform as part of the resolution logic in
    # this particular implementation.
    graph.resolve()
    graph.validate()

    return graph


def encode_inf(value):
    if math.isinf(value):
        return JSON_INFINITY_STR
    return value


# Validator functions. These are used as arguments to the pop_x functions and
# check properties of the values.


def is_positive_or_json_infinity(value):
    return value == JSON_INFINITY_STR or value > 0


def is_positive_and_finite(value):
    return value > 0 and not math.isinf(value)


def is_non_negative_and_finite(value):
    return value >= 0 and not math.isinf(value)


def is_rate(value):
    return 0 <= value <= 1


def is_proportion(value):
    return 0 < value <= 1


def is_nonempty(value):
    return len(value) > 0


def is_identifier(value):
    return value.isidentifier()


def is_list_of_identifiers(value):
    return all(isinstance(v, str) and is_identifier(v) for v in value)


def is_nonempty_list_of_identifiers(value):
    return is_list_of_identifiers(value) and len(value) > 0


def is_list_of_proportions(value):
    return all(isinstance(v, numbers.Number) and is_proportion(v) for v in value)


def is_nonempty_list_of_proportions_with_sum_less_than_1(value):
    return is_list_of_proportions(value) and len(value) > 0 and sum(value) <= 1


def validate_item(name, value, required_type, validator=None):
    if not isinstance(value, required_type):
        raise TypeError(
            f"Attribute '{name}' must be a {required_type}; "
            f"current type is {type(value)}."
        )
    if validator is not None and not validator(value):
        validator_name = validator.__name__[3:]  # Strip off is_ from function name
        raise ValueError(f"Attribute '{name}' is not {validator_name}")


# We need to use this trick because None is a meaningful input value for these
# pop_x functions.
NO_DEFAULT = object()


def pop_item(data, name, *, required_type, default=NO_DEFAULT, validator=None):
    if name in data:
        value = data.pop(name)
        validate_item(name, value, required_type, validator)
    else:
        if default is NO_DEFAULT:
            raise KeyError(f"Attribute '{name}' is required")
        value = default
    return value


def pop_list(data, name, default=NO_DEFAULT, required_type=None, validator=None):
    value = pop_item(data, name, default=default, required_type=list)
    if required_type is not None and value is not None:
        for item in value:
            validate_item(name, item, required_type, validator)
    return value


def pop_object(data, name, default=NO_DEFAULT):
    return pop_item(data, name, default=default, required_type=dict)


def pop_string(data, name, default=NO_DEFAULT, validator=None):
    return pop_item(data, name, default=default, required_type=str, validator=validator)


def pop_number(data, name, default=NO_DEFAULT, validator=None, allow_inf=False):
    # If infinite values are allowed for this number, the string "Infinity"
    # is also accepted, and so str is an accepted type. There is a small loophole
    # here in which string numbers like "1000" will be accepted by the type
    # checking machinery, but the is_positive_or_json_infinity validator
    # will catch this and raise a TypeError when it tries to compare with 0.
    if allow_inf:
        assert validator is is_positive_or_json_infinity
    required_type = (numbers.Number, str) if allow_inf else numbers.Number
    value = pop_item(
        data,
        name,
        default=default,
        required_type=required_type,
        validator=validator,
    )
    if value == JSON_INFINITY_STR:
        return math.inf
    return value


def check_empty(data):
    if len(data) != 0:
        raise ValueError(f"Extra fields are not permitted:{data}")


def check_defaults(defaults, allowed_fields):
    for key, value in defaults.items():
        if key not in allowed_fields:
            raise ValueError(
                f"Only fields {list(allowed_fields.keys())} can be specified "
                "in the defaults"
            )
        required_type, validator = allowed_fields[key]
        validate_item(key, value, required_type, validator)


def insert_defaults(data, defaults):
    for key, value in defaults.items():
        if key not in data:
            data[key] = value


@dataclasses.dataclass
class Interval:
    """
    A half-open time interval (start_time, end_time].
    """

    start_time: float
    end_time: float

    def __init__(self, start_time, end_time):
        assert start_time > end_time
        self.start_time = start_time
        self.end_time = end_time

    def intersects(self, other):
        """True if self and other intersect, False otherwise."""
        assert isinstance(other, self.__class__)
        return not (
            self.end_time >= other.start_time or other.end_time >= self.start_time
        )

    def is_subinterval(self, other):
        """True if self is completely contained within other, False otherwise."""
        assert isinstance(other, self.__class__)
        return self.start_time <= other.start_time and self.end_time >= other.end_time

    def __contains__(self, time):
        return self.start_time > time >= self.end_time


@dataclasses.dataclass
class Epoch:
    end_time: Union[float, None]
    start_size: Union[float, None]
    end_size: Union[float, None]
    size_function: str
    selfing_rate: float
    cloning_rate: float

    def as_json_dict(self) -> dict:
        return dataclasses.asdict(self)

    def resolve(self):
        if self.size_function is None:
            if self.start_size == self.end_size:
                self.size_function = "constant"
            else:
                self.size_function = "exponential"

    def validate(self):
        if self.size_function not in ("constant", "exponential", "linear"):
            raise ValueError(f"unknown size_function '{self.size_function}'")
        if self.size_function == "constant" and self.start_size != self.end_size:
            raise ValueError(
                "size_function is constant but "
                f"start_size ({self.start_size}) != end_size ({self.end_size})"
            )


@dataclasses.dataclass
class Deme:
    name: str
    start_time: Union[None, float]
    description: str
    ancestors: List[Deme]
    proportions: Union[List[float], None]
    epochs: List[Epoch] = dataclasses.field(default_factory=list)

    def add_epoch(
        self,
        end_time: Union[float, None],
        start_size: Union[float, None],
        end_size: Union[float, None],
        selfing_rate: float,
        cloning_rate: float,
        size_function: str,
    ) -> Epoch:
        epoch = Epoch(
            end_time=end_time,
            start_size=start_size,
            end_size=end_size,
            selfing_rate=selfing_rate,
            cloning_rate=cloning_rate,
            size_function=size_function,
        )
        self.epochs.append(epoch)
        return epoch

    @property
    def end_time(self):
        return self.epochs[-1].end_time

    @property
    def time_interval(self):
        return Interval(self.start_time, self.end_time)

    def as_json_dict(self) -> dict:
        return {
            "name": self.name,
            "description": self.description,
            "start_time": encode_inf(self.start_time),
            "epochs": [epoch.as_json_dict() for epoch in self.epochs],
            "proportions": self.proportions,
            "ancestors": [deme.name for deme in self.ancestors],
        }

    def __resolve_times(self):
        if self.start_time is None:
            default = math.inf
            if len(self.ancestors) == 1:
                default = self.ancestors[0].epochs[-1].end_time
            elif len(self.ancestors) > 1:
                raise ValueError(
                    "Must explicitly set Deme.start_time when > 1 ancestor"
                )
            self.start_time = default
        if len(self.ancestors) == 0 and not math.isinf(self.start_time):
            raise ValueError(
                f"deme {self.name} has finite start_time, but no ancestors"
            )

        for ancestor in self.ancestors:
            if self.start_time not in ancestor.time_interval:
                raise ValueError(
                    f"Deme {ancestor.name} ({ancestor.time_interval}) doesn't "
                    f"exist at deme {self.name}'s start_time ({self.start_time})"
                )

        # The last epoch has a default end_time of 0
        last_epoch = self.epochs[-1]
        if last_epoch.end_time is None:
            last_epoch.end_time = 0
        last_time = self.start_time
        for epoch in self.epochs:
            if epoch.end_time is None:
                raise ValueError("Epoch end_time must be specified")
            if epoch.end_time >= last_time:
                raise ValueError("Epoch end_times must be in decreasing order.")
            last_time = epoch.end_time

    def __resolve_sizes(self):
        first_epoch = self.epochs[0]
        # The first epoch must specify either start_size or end_size
        if first_epoch.start_size is None and first_epoch.end_size is None:
            raise ValueError(
                "Must specify one or more of start_size and end_size "
                "for the initial epoch"
            )
        if first_epoch.start_size is None:
            first_epoch.start_size = first_epoch.end_size
        if first_epoch.end_size is None:
            first_epoch.end_size = first_epoch.start_size
        last_epoch = first_epoch
        for epoch in self.epochs[1:]:
            if epoch.start_size is None:
                epoch.start_size = last_epoch.end_size
            if epoch.end_size is None:
                epoch.end_size = epoch.start_size
            last_epoch = epoch

        if self.start_time == math.inf:
            if first_epoch.start_size != first_epoch.end_size:
                raise ValueError(
                    "Cannot have varying population size in an infinite time interval"
                )

    def __resolve_proportions(self):
        if self.proportions is None:
            if len(self.ancestors) == 0:
                self.proportions = []
            elif len(self.ancestors) == 1:
                self.proportions = [1]
            else:
                raise ValueError("Must specify proportions for > 1 ancestor demes")

    def resolve(self):
        self.__resolve_times()
        self.__resolve_sizes()
        self.__resolve_proportions()
        for epoch in self.epochs:
            epoch.resolve()

    def validate(self):
        if len(self.proportions) != len(self.ancestors):
            raise ValueError("proportions must be same length as ancestors")
        if len(self.ancestors) > 0:
            if not math.isclose(sum(self.proportions), 1):
                raise ValueError("Sum of proportions must be approximately 1")
        if len(set(anc.name for anc in self.ancestors)) != len(self.ancestors):
            raise ValueError("ancestors list contains duplicates")
        for epoch in self.epochs:
            epoch.validate()


@dataclasses.dataclass
class Pulse:
    sources: List[Deme]
    dest: Deme
    time: float
    proportions: List[float]

    def as_json_dict(self) -> dict:
        d = dataclasses.asdict(self)
        d["sources"] = [source.name for source in self.sources]
        d["dest"] = self.dest.name
        return d

    def validate(self):
        sources_names = set(source.name for source in self.sources)
        if self.dest.name in sources_names:
            raise ValueError("Cannot have source deme equal to dest")
        if len(sources_names) != len(self.sources):
            raise ValueError("Duplicate deme in sources")
        if len(self.sources) == 0:
            raise ValueError("Must have one or more source demes")
        if len(self.sources) != len(self.proportions):
            raise ValueError("Sources and proportions must have same lengths")
        for source in self.sources:
            if self.time not in source.time_interval:
                raise ValueError(
                    f"Deme {source.name} does not exist at time {self.time}"
                )
        # Time limits for the destination deme are different to the source deme,
        # because the destination deme is affected immediately after the time
        # of the pulse. Thus, a pulse can occur at the destination deme's
        # start_time, but not at the destination deme's end_time.
        if not (self.dest.start_time >= self.time > self.dest.end_time):
            raise ValueError(
                f"Deme {self.dest.name} does not exist at time {self.time}"
            )

        if sum(self.proportions) > 1 + EPSILON:
            raise ValueError(
                f"Pulse proportions into {self.dest.name} at time {self.time} "
                "sum to more than 1"
            )


@dataclasses.dataclass
class Migration:
    rate: Union[float, None]
    start_time: Union[float, None]
    end_time: Union[float, None]
    source: Deme
    dest: Deme

    @property
    def time_interval(self):
        return Interval(self.start_time, self.end_time)

    def as_json_dict(self) -> dict:
        d = dataclasses.asdict(self)
        d["start_time"] = encode_inf(self.start_time)
        d["source"] = self.source.name
        d["dest"] = self.dest.name
        return d

    def resolve(self):
        if self.start_time is None:
            self.start_time = min(self.source.start_time, self.dest.start_time)
        if self.end_time is None:
            self.end_time = max(self.source.end_time, self.dest.end_time)

    def validate(self):
        if self.start_time <= self.end_time:
            raise ValueError("start_time must be > end_time")
        if self.source.name == self.dest.name:
            raise ValueError("Cannot migrate from a deme to itself")
        for deme in [self.source, self.dest]:
            if not self.time_interval.is_subinterval(deme.time_interval):
                raise ValueError(
                    "Migration time interval must be within the each deme's "
                    "time interval"
                )


@dataclasses.dataclass
class Graph:
    time_units: str
    generation_time: Union[float, None]
    doi: List[str]
    description: str
    metadata: dict
    demes: Dict[str, Deme] = dataclasses.field(default_factory=dict)
    migrations: List[Migration] = dataclasses.field(default_factory=list)
    pulses: List[Pulse] = dataclasses.field(default_factory=list)

    def add_deme(
        self,
        name: str,
        description: str,
        start_time: Union[float, None],
        ancestors: List[str],
        proportions: Union[List[float], None],
    ) -> Deme:
        deme = Deme(
            name=name,
            description=description,
            start_time=start_time,
            ancestors=[self.demes[deme_name] for deme_name in ancestors],
            proportions=proportions,
        )
        if deme.name in self.demes:
            raise ValueError(f"Duplicate deme name '{deme.name}'")
        self.demes[deme.name] = deme
        return deme

    def add_migration(
        self,
        *,
        rate: float,
        start_time: Union[float, None],
        end_time: Union[float, None],
        source: Union[str, None],
        dest: Union[str, None],
        demes: Union[List[str], None],
    ) -> List[Migration]:
        migrations: List[Migration] = []
        if not (
            # symmetric
            (demes is not None and source is None and dest is None)
            # asymmetric
            or (demes is None and source is not None and dest is not None)
        ):
            raise ValueError("Must specify either source and dest, or demes")
        if source is not None:
            assert dest is not None
            migrations.append(
                Migration(
                    rate=rate,
                    start_time=start_time,
                    end_time=end_time,
                    source=self.demes[source],
                    dest=self.demes[dest],
                )
            )
        else:
            assert demes is not None
            if len(demes) < 2:
                raise ValueError("Must specify two or more deme names")
            for j, deme_a in enumerate(demes, 1):
                for deme_b in demes[j:]:
                    migration_ab = Migration(
                        rate=rate,
                        start_time=start_time,
                        end_time=end_time,
                        source=self.demes[deme_a],
                        dest=self.demes[deme_b],
                    )
                    migration_ba = Migration(
                        rate=rate,
                        start_time=start_time,
                        end_time=end_time,
                        source=self.demes[deme_b],
                        dest=self.demes[deme_a],
                    )
                    migrations.extend([migration_ab, migration_ba])
        self.migrations.extend(migrations)
        return migrations

    def add_pulse(
        self, sources: List[str], dest: str, time: float, proportions: List[float]
    ):
        pulse = Pulse(
            sources=[self.demes[source] for source in sources],
            dest=self.demes[dest],
            time=time,
            proportions=proportions,
        )
        self.pulses.append(pulse)
        return pulse

    def __str__(self):
        data = self.as_json_dict()
        return pprint.pformat(data, indent=2)

    def as_json_dict(self):
        d = dataclasses.asdict(self)
        d["demes"] = [deme.as_json_dict() for deme in self.demes.values()]
        d["migrations"] = [migration.as_json_dict() for migration in self.migrations]
        d["pulses"] = [pulse.as_json_dict() for pulse in self.pulses]
        return d

    def validate(self):
        if self.generation_time is None:
            if self.time_units == "generations":
                self.generation_time = 1
            else:
                raise ValueError(
                    "Must specify Graph.generation_time if time_units is not "
                    "'generations'"
                )
        if self.time_units == "generations" and self.generation_time != 1:
            raise ValueError(
                "If time_units are in generations, generation_time must be 1"
            )
        for deme in self.demes.values():
            deme.validate()
        for pulse in self.pulses:
            pulse.validate()
        for migration in self.migrations:
            migration.validate()

        # Migrations involving the same source and dest can't overlap temporally.
        for j, migration_a in enumerate(self.migrations, 1):
            for migration_b in self.migrations[j:]:
                if (
                    migration_a.source == migration_b.source
                    and migration_a.dest == migration_b.dest
                    and migration_a.time_interval.intersects(migration_b.time_interval)
                ):
                    start_time = min(migration_a.end_time, migration_b.end_time)
                    end_time = max(migration_a.start_time, migration_b.start_time)
                    raise ValueError(
                        f"Competing migration definitions for {migration_a.source.name} "
                        f"and {migration_a.dest.name} during time interval "
                        f"({start_time}, {end_time}]"
                    )

        # The rate of migration entering a deme cannot be more than 1 in any
        # given interval of time.
        time_boundaries = set()
        time_boundaries.update(migration.start_time for migration in self.migrations)
        time_boundaries.update(migration.end_time for migration in self.migrations)
        time_boundaries.discard(math.inf)
        end_times = sorted(time_boundaries, reverse=True)
        start_times = [math.inf] + end_times[:-1]
        ingress_rates = {deme_name: [0.0] * len(end_times) for deme_name in self.demes}
        for j, (start_time, end_time) in enumerate(zip(start_times, end_times)):
            current_interval = Interval(start_time, end_time)
            for migration in self.migrations:
                if current_interval.intersects(migration.time_interval):
                    rate = ingress_rates[migration.dest.name][j] + migration.rate
                    if rate > 1 + EPSILON:
                        raise ValueError(
                            f"Migration rates into {migration.dest.name} sum to "
                            "more than 1 during the time inverval "
                            f"({start_time}, {end_time}]"
                        )
                    ingress_rates[migration.dest.name][j] = rate

    def resolve(self):
        # A deme's ancestors must be listed before it, so any deme we
        # visit must always be visited after its ancestors.
        for deme in self.demes.values():
            deme.resolve()
        for migration in self.migrations:
            migration.resolve()

        # Sort pulses from oldest to youngest.
        # In a discrete-time setting, non-integer pulse times that are distinct
        # could be rounded to the same time value. If the input file has the pulses
        # in time-ascending order, then the pulses would occur in the opposite order
        # compared to a continuous-time simulator. Sorting before the rounding
        # occurs avoids this ambiguity, so we explicitly require pulses to be
        # sorted as part of the parser.
        # Note that Python implements "stable" sorting, which maintains the order
        # of pulses that have the same time value to start with (as required by
        # the spec).
        self.pulses.sort(key=lambda pulse: pulse.time, reverse=True)

Appendix#

Converting backwards time to forwards time#

Times in Demes models use a backwards-time convention, where the value 0 represents now and time values increase towards the past. However, many simulators use the opposite convention, where time 0 represents some time in the past and time values increase towards the present.

To convert times in a Demes model into a forward-time representation:

  • Set y equal to the minimum epoch end time in the resolved graph.

  • Set x equal to the most ancient, finite, value out of epoch start_time, epoch end_time, migration start_time, or pulse time.

  • The model duration is d = x - y;

  • Using the convention of starting a forward-in-time model at time zero ( representing the parental generation at the beginning of a model), the model runs forward in time from (0, d].

  • For explicit simulations involving a “burn in time”, the previous interval is shifted by that length. The duration of the burn-in period is (0, b] and the events in the Demes graph occur from (b, b + d].

  • Given these definitions, f = b + d - t, where t is a backwards time in the Demes model and f is the forwards-time equivalent.