Specification#
Introduction#
Demes is a specification for describing population genetic models of demographic history. This specification precisely defines the population genetics model and its assumptions, along with the data model used for interchange and the required behaviour of implementations.
The Demes standard is largely agnostic to the processes that occur within populations, and provides a minimal set of parameters that can accommodate a wide spectrum of population genetic models.
Who is this specification for?
This specification is intended to provide a detailed and definitive resource for the following groups:
Those implementing support for Demes as an input or output format in their programs
Those implementing a Demes parser
As such, this specification contains a lot of detail that is not interesting to most users. If you wish to learn how to understand and create your own Demes models, please see the Tutorial instead.
Note to Readers#
To provide feedback on this specification, please use the issue tracker.
Conventions and Terminology#
The term “Demes” in this document is to be interpreted as a reference to this specification. A deme refers to a set of individuals that can be modelled by a fixed set of parameters; to avoid confusion with name of the specification we will usually use the term “population”, in the understanding that the terms are equivalent for the purposes of this document.
Todo
Define the human and machine data models. And link to assumptions.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.
The terms “JSON”, “JSON text”, “JSON value”, “member”, “element”, “object”, “array”, “number”, “string”, “boolean”, “true”, “false”, and “null” in this document are to be interpreted as defined in RFC 8259.
The term “JSON Schema” in this document is to be interpreted as defined in the JSON Schema core specification.
Infinity#
JSON does not define an encoding for infinite-valued numbers.
However, infinite values are used in Demes for start times.
When writing a Demes model to a format that does not permit infinity,
such as JSON, the string “Infinity” must be used to encode infinity.
When writing to formats that do support infinity, such as YAML, the native
encoding for infinity should be used instead (.inf
in YAML).
When reading a Demes model, the string “Infinity” must be decoded to mean
an infinite-valued number. When reading from formats that do support infinity,
the format’s native encoding for infinite-valued numbers must also be
supported.
Machine Data Model#
The Demes Machine Data Model (MDM) is a formal representation of the Demes model as a JSON document. The MDM is designed to be used as input by programs such as population genetics simulators, and explicitly includes all necessary details. The structure of JSON documents conforming to the specification is formally defined in the Schema, and the detailed requirements for each of the elements in this data model are defined in this section.
The Demes Human Data Model (HDM) is a closely related specification that is intended to be easily human-readable and writable. An MDM document is also a valid HDM document. An MDM document is constructed by resolving and validating an HDM document.
Common concepts#
This section provides details on properties that occur in multiple contexts.
Time#
Times are specified as units in the past, so that time zero corresponds to the final generation or “now”, and event times in the past are values greater than zero with larger values for events that occur in the more distant past. By default, time is measured in generations, but other values (“years”, for example) are allowed. When time units are not given in generations, the generation time must also be specified so that times can be converted into generations. In general, as time flows from the past to the present, populations, epochs, and migration events should be specified in their order of appearance, so that their times are in descending order.
Population sizes#
A fundamental concept in demes is the population size.
Todo
Things to cover:
We’re counting individuals not genomes
We usually mean the population size in expectation, but there’s no hard requirements. For example, it’s up the implementation whether it thinks a population size of 0.33 is meaningful. Clarify with some examples from forward and backward sims.
What do proportions mean? Similar point to pop size above. Give forward pointers to sections we mention proportions in.
What do we mean by migration of individuals? Forward pointers to sections.
Metadata#
Todo
Discussion of metadata. What’s it for?
MDM documents#
The top-level MDM document describes a single instance of a Demes model.
Each MDM document contains the following list of properties. All properties MUST be specified, and additional properties MUST NOT be included in these documents. Please see the Schema for definitive details on the types and structure of these properties.
description#
A concise description of the demographic model.
doi#
The DOI(s) of the publication(s) in which the model was inferred or originally described.
metadata#
An object containing arbitrary additional properties and values. May be empty.
time_units#
The units of time used to specify times and time intervals. These SHOULD be one of “generations” or “years”.
generation_time#
The number by which times must be divided, to convert them to have
units of generations.
Hence generation_time
uses the same time units specified by time_units
.
demes#
The list of demes in the model. At least one deme MUST be specified.
pulses#
The list of pulses in the model.
migrations#
The list of migrations in the model.
Deme#
A Deme is a single population (see the Conventions and Terminology for clarification of these two terms) that exists for some non-empty time interval. A population is defined operationally as some set of individuals that can be modelled by a set of fixed parameters over a series of epochs. Population parameters are defined per epoch, and are defined in the Epoch section below.
A population may have one or more ancestors, which are other populations that exist at the population’s start time. If one ancestor is specified, the first generation is constructed by randomly sampling parents from the ancestral population to contribute to offspring in the newly generated population.
If more than one ancestor is specified, the proportions of ancestry from each contributing population must be provided, and those proportions must sum to one. In this case, parents are chosen randomly from each ancestral population with probability given by those proportions. If no ancestors are specified, the population is assumed to have start time equal to infinity.
The deme may be a descendant of one or more demes in the graph, and may
be an ancestor to others. The deme exists over the half-open time interval
(start_time, end_time]
, and it may continue to exist after
contributing ancestry to a descendant deme. The deme’s end_time
is
implicit (there is no end_time
deme property), but for convenience we
define it as the end_time
of the deme’s last epoch.
name#
A string identifier for a deme, which MUST be unique among all demes in a document. Must be a valid python identifier
description#
A concise description of the deme.
ancestors#
The list of ancestors of the deme at the start of the deme’s first epoch.
May be an empty list if the deme has no ancestors in the graph,
in which case the start_time
must be infinite.
Each ancestor must be in the graph, and each ancestor must be
specified only once. A deme must not be one of its own ancestors.
proportions#
The proportions of ancestry derived from each of the ancestors
at the start of the deme’s first epoch.
The proportions
must be ordered to correspond with the
order of ancestors
. The proportions must be an empty list or sum to 1
(within a reasonable tolerance, e.g. 1e-9). See the
Population sizes section for more details on
how these proportions should be interpreted.
start_time#
The most ancient time
at which the deme exists, in time_units
before the present. Demes with no ancestors are root demes and must
have an infinite start_time
. Otherwise, the start_time
must
correspond with the interval of existence
for each of the deme’s ancestors
. I.e. the start_time
must
be within the half-open interval (deme.start_time, deme.end_time]
for each deme in ancestors
.
epochs#
The list of epochs for this deme. There MUST be at least one epoch for each deme.
Epoch#
A deme-specific period of time spanning the half-open interval
(start_time, end_time]
, in which a fixed set of population parameters
apply. The epoch’s start_time
is implicit (there is no start_time
epoch property), but for convenience we define it as the end_time
of the
previous epoch, or the deme’s start_time
if it is the first epoch.
Each epoch specifies the population size over that interval, which can be a constant value or function defined by start and end sizes that must remain positive. If an epoch has a start time of infinity, the population size for that epoch must be constant.
Epochs can also specify parameters for nonrandom mating, such as selfing or cloning rates, which give the probability that offspring are generated from one generation to the next by self-fertilisation or cloning of an individual. Selfing and cloning rates take values between zero and one.
end_time#
The most recent time of the epoch, in time_units before the present.
start_size#
The population size at the epoch’s start_time
.
end_size#
The population size at the epoch’s end_time
.
size_function#
A function describing the population size change between
start_time
and end_time
.
This may be any string, but the values “constant” and “exponential”
are explicitly acknowledged to have the following meanings.
constant
: the deme’s size does not change over the epoch.start_size
andend_size
must be equal.exponential
: the deme’s size changes exponentially fromstart_size
toend_size
over the epoch. Ift
is a time within the span of the epoch, the deme sizeN
at timet
can be calculated as:dt = (epoch.start_time - t) / (epoch.start_time - epoch.end_time) r = log(epoch.end_size / epoch.start_size) N = epoch.start_size * exp(r * dt)
size_function
must be constant
if the epoch has an infinite start_time
.
cloning_rate#
The proportion of offspring in each generation that
are expected to be generated through clonal reproduction.
1 - cloning_rate
are expected to arise through sexual reproduction.
selfing_rate#
Within the sexually-reproduced offspring,
selfing_rate
are born via self-fertilisation while the rest
have parents drawn at random from the previous generation.
Note
Depending on the simulator, this random drawing of parent may occur
either with or without replacement. When drawing occurs with replacement, a small
amount of residual selfing is expected, so that even with cloning_rate=0
and selfing_rate=0
, selfing may still occur with probability 1/N
.
Simulators that allow variable rates of selfing are expected to clearly
document their behaviour.
Pulse#
An instantaneous pulse of migration at time
, from a list of source demes
(sources
) into the destination deme (dest
).
Pulse migration events specify the instantaneous replacement of a given fraction of individuals in a destination population by individuals with parents from a source population. The fraction must be between zero and one, and if more than one pulse occurs at the same time, those replacement events are applied sequentially in the order that they are specified in the model. The list of pulses must be sorted in time-descending order.
sources#
The list of deme names of the migration sources.
dest#
The deme name of the migration destination.
time#
The time of migration, in
in time_units before the present.
The demes defined by sources
and dest
must both exist at the given
time
. I.e. time
must be contained in the
(deme.start_time, deme.end_time]
interval of the sources
demes and the dest
deme.
proportions#
The proportions of ancestry in the dest
deme derived from the demes
in sources
immediately after the time
of migration.
The proportions
must be ordered to correspond with the order of
sources
. The proportions must sum to less than or equal to 1
(within a reasonable tolerance, e.g. 1e-9).
See the
Population sizes section for more details on
how proportions should be interpreted.
Example: sequential application of pulses#
Consider the following model:
time_units: generations
demes:
- name: A
epochs:
- start_size: 1000
- name: B
epochs:
- start_size: 1000
- name: C
epochs:
- start_size: 1000
pulses:
- sources: [A]
dest: C
proportions: [0.25]
time: 10
- sources: [B]
dest: C
proportions: [0.2]
time: 10
Ten (10) generations ago, pulse events occur from source demes A
and B
into destination deme C
.
We need to arrive at the final ancestry proportions for destination deme C
after this time.
Software implementing pulse events must generate output that is equivalent to the following
procedure.
The steps are:
Initialize an array of zeros with length equal to the number of demes.
Set the ancestry proportion of the destination deme to 1.
For each pulse: a. Multiply the array by one (1) minus the sum of proportions. b. For each source, add its proportion to the array.
For the above model, the steps are:
1. x = [0, 0, 0]
2. x = [0, 0, 1]
3. p = 1 - 0.25
x = x*p = [0, 0, 0.75]
x[A] += 0.25, x = [0.25, 0, 0.75]
p = 1 - 0.2
x = x*p = [0.2, 0, 0.6]
x[B] += 0.2, x = [0.2, 0.2, 0.6]
Thus, our final ancestry proportions for deme C
after time 10 are [0.2, 0.2, 0.6]
.
Important considerations#
The final ancestry proportions depend on the order of the pulses in the model! If we reverse the above model such that:
pulses: - sources: [B] dest: C proportions: [0.2] time: 10 - sources: [A] dest: C proportions: [0.25] time: 10
We get
[0.25, 0.15, 0.6]
as our ancestry proportions due to pulses.The fact that the outcome of applying sequential pulses depends on the order is why
demes-python
emits a warning when resolving such models.Given the procedure used to apply sequential pulses at the same time, the followng two sets of
Pulses
are not equivalent:pulses: - sources: [A] dest: C proportions: [0.2] time: 10 - sources: [B] dest: C proportions: [0.2] time: 10
pulses: - sources: [B, A] dest: C proportions: [0.2, 0.2] time: 10
Therefore, we strongly recommend that models be represented using the following syntax that makes the intended outcome of the model explicit:
pulses: - sources: [A, B] dest: C proportions: [0.2, 0.2] time: 10
Migration#
Continuous asymmetric migration over the half-open time interval
(start_time, end_time]
, from the deme with name source
to the
deme with name dest
.
Rates are defined as the probability that parents in the “destination”
population are chosen from the “source” population. Migration rates are thus
per generation and must be less than or equal to one.
There must be at most one migration specified per source/destination pair
for any given time interval.
Furthermore, if more than one source
population have continuous migration into the same destination population, the
sum of those migration rates must also be less than or equal to one, as rates define
probabilities. The probability that parents come from the same population is
just one minus the sum of incoming migration rates.
Warning
When continuous migration occurs over a time period that includes a pulse, the continuous migration probabilities define the probability of choosing parents from each deme conditional on individuals not arriving via the pulse.
source#
The deme name of the asymmetric migration source.
dest#
The deme name of the asymmetric migration destination.
start_time#
The time at which migration begins,
in time_units before the present.
The start_time
must be contained in the
[deme.start_time, deme.end_time)
interval of the source
deme
and the dest
deme.
end_time#
The time at which migration stops,
in time_units before the present.
The end_time
must be contained in the
(deme.start_time, deme.end_time]
interval of the source
deme
and the dest
deme.
rate#
The rate of migration per generation.
Schema#
The schema listed here is definitive in terms of types and the structure of the JSON documents that are considered to be valid instances of the MDM
$schema: http://json-schema.org/draft-07/schema#
title: Fully qualified Demes graph
type: object
additionalProperties: false
properties:
description:
type: "string"
doi:
type: array
items:
type: string
time_units:
type: string
# TODO: shouldn't this be an enum?
generation_time:
type: "number"
exclusiveMinimum: 0
metadata:
type: object
additionalProperties: true
demes:
type: array
minItems: 1
items:
$ref: '#/definitions/deme'
pulses:
type: array
items:
$ref: '#/definitions/pulse'
migrations:
type: array
items:
$ref: '#/definitions/migration'
required:
- description
- doi
- time_units
- demes
- generation_time
- pulses
- migrations
definitions:
name:
type: string
rate:
type: number
minimum: 0
maximum: 1
proportion:
type: number
exclusiveMinimum: 0
maximum: 1
size:
type: number
exclusiveMinimum: 0
start_time:
oneOf:
- type: number
exclusiveMinimum: 0
- const: "Infinity"
end_time:
type: number
minimum: 0
epoch:
type: object
additionalProperties: false
properties:
end_time:
$ref: '#/definitions/end_time'
start_size:
$ref: '#/definitions/size'
end_size:
$ref: '#/definitions/size'
size_function:
# TODO: make this an enumeration
type: string
cloning_rate:
$ref: '#/definitions/rate'
selfing_rate:
$ref: '#/definitions/rate'
required:
- end_time
- start_size
- end_size
- size_function
- cloning_rate
- selfing_rate
deme:
type: object
additionalProperties: false
properties:
name:
$ref: '#/definitions/name'
description:
type: "string"
ancestors:
type: array
items:
$ref: '#/definitions/name'
proportions:
type: array
items:
$ref: '#/definitions/proportion'
start_time:
$ref: '#/definitions/start_time'
epochs:
type: array
minItems: 1
items:
$ref: '#/definitions/epoch'
required:
- name
- description
- ancestors
- proportions
- start_time
- epochs
pulse:
type: object
additionalProperties: false
properties:
sources:
type: array
items:
$ref: '#/definitions/name'
dest:
$ref: '#/definitions/name'
time:
type: number
exclusiveMinimum: 0
proportions:
type: array
items:
$ref: '#/definitions/proportion'
required:
- sources
- dest
- time
- proportions
migration:
type: object
additionalProperties: false
properties:
source:
$ref: '#/definitions/name'
dest:
$ref: '#/definitions/name'
start_time:
$ref: '#/definitions/start_time'
end_time:
$ref: '#/definitions/end_time'
rate:
$ref: '#/definitions/rate'
required:
- source
- dest
- start_time
- end_time
- rate
Human Data Model#
The Demes Human Data Model (HDM) is an extension of the Machine Data Model that is designed for human readability. The HDM provides default values for many parameters, removes redundant information in the MDM via rules described in this section and also provides a default value replacement mechanism. JSON documents conforming to the HDM are intended to be processed by a parser, which outputs the corresponding MDM document.
This section defines the structure of HDM documents, the rules by which they are transformed into MDM documents, and the error conditions that should be detected by parsers.
We also provide a reference implementation of a parser for the HDM (which also parses the MDM, by definition) written in Python. This implementation is intended to clarify any ambiguities there may be in this specification, but is not intended to be used directly in downstream software. Please use the demes Python library instead.
Defaults#
Repeated values such as shared population sizes represent a significant opportunity for error in human-generated models. The HDM provides the default value propagation mechanism to avoid this repetition. The essential idea is that we declare default values hierarchically within the document, and that the ultimate value assigned to a property is prioritised by proximity within the document hierarchy.
Default values can be provided in two places within a HDM document: at the top-level or within a deme definition.
See also
See the tutorial for examples of how the defaults section can be used.
See also
See the reference implementation for a practical example of how defaults in the HDM can be implemented.
Top-level defaults#
The top-level HDM document may contain a propery defaults
,
which defines values to use for entities within rest of the document
unless otherwise specified. The defaults
object can have
the following properties:
epoch: this can specify any property valid for an MDM epoch. All epochs in the document will be assigned these properties, unless specified within the epoch or the deme defaults.
migration: this can specify any property valid for an MDM migration. All migrations in the document will be assigned these properties, unless specified within the migration itself.
pulse: this can specify any property valid for an MDM pulse. All pulses in the document will be assigned these properties, unless specified within the pulse itself.
deme: this can specify the following properties for a deme only:
description
,ancestors
,proportions
andstart_time
.
See also
See the schema
for definitive information on the structural properties of the
top level defaults
section.
Deme defaults#
Deme defaults operate in the same manner as
top-level defaults: the specified
values will be used if omitted in any epochs within the deme. Defaults
specified within a deme override any values specified in the
top level deme
defaults.
This can specify any property valid for an MDM epoch. Any epochs in the deme will be assigned these properties, unless specified within an epoch itself.
See also
See the schema
for definitive information on the structural properties of the
deme defaults
section.
Resolution#
The Demes machine data model contains many
values that are technically redundant, in that they can be reliably
inferred from other values in the model. For example, if a deme’s
size_function
is "constant"
during an Epoch,
then clearly the start_size
and end_size
will be equal.
The MDM still requires that both be specified, because it is intended
for machine consumption, and having a fully specified and complete
data model allows code that consumes this model to be simple and
straightforward. However, such redundancy is a significant
downside for human consumption, where having repeated
or redundant values leads to poorer readability and increases
the probability of errors.
Thus, one of the differences
between the Demes Human Data Model and Machine Data Model
is that the HDM tries to remove as much redundancy as possible.
A major part of a Demes parser implementation’s task is to
fill in the redundant information, a process that we refer to
as model “resolution”.
Please consult the
reference implementation
for more detailed information.
Resolution is idempotent; that is, resolution of an already resolved model (i.e., in MDM form) MUST result in identical output. Thus a parser need not know if a model is in HDM or MDM form a priori.
Resolution happens in a set of steps in a defined order:
Todo
Resolution order matters for some things, but not for others. Clarify where order matters (and why).
time_units#
time_units
must be specified. The value “generations” is special,
in that it implies that the generation_time
will be 1 and may thus be omitted.
generation_time#
If time_units
is not “generations”, then generation_time
MUST be specified.
If time_units
is “generations”, then
generation_time
may be omitted, in which case it shall be given the value 1.an error shall be raised if
generation_time
is not 1.
metadata#
If metadata
is omitted, it shall be given the value of an
empty dictionary.
If metadata
is present, the value must be a dictionary,
but metadata is otherwise transferred to the output
without further processing. Errors may be raised if metadata is not
parsable (e.g. invalid YAML), but the parser shall not attempt
to validate fields within the metdata.
description#
If description
is omitted, it shall be given the value of an empty string.
doi#
If doi
is omitted, it shall be given the value of an empty list.
defaults#
If top-level defaults
is provided, default values shall be validated
to the extent possible, to avoid propagating invalid values.
E.g. defaults.epoch.start_size
cannot be negative.
Deme resolution#
Each deme is resolved in the order that it occurs in the input. A deme can only be resolved if its ancestor demes have already been resolved, and the parser MUST raise an error if a deme is encountered that has unresolved ancestors. Thus, valid input files list the demes in topologically sorted order, such that ancestors are listed before their descendants.
Resolution order:
defaults#
If deme-level defaults
is provided, default values shall be validated
to the extent possible, to avoid propagating invalid values.
E.g. defaults.epoch.start_size
cannot be negative.
For each deme, deme-level defaults
override top-level defaults.
description#
If description
is omitted,
If the
deme.description
defaults field is present,description
shall be given this value.Otherwise,
description
shall be given the value of the empty string.
ancestors#
If ancestors
is omitted,
If the
deme.ancestors
defaults field is present,ancestors
shall be given this value.Otherwise,
ancestors
shall be given the value of the empty list.
proportions#
If proportions
is omitted,
If the
deme.proportions
defaults field is present,proportions
shall be given this value.Otherwise, if
ancestors
has length one,proportions
shall be a single-element list containing the element1.0
.Otherwise, if
ancestors
has length zero,proportions
shall be given the value of the empty list.Otherwise,
proportions
cannot be determined and an error MUST be raised.
start_time#
If start_time
is omitted,
If the
epoch.start_time
defaults field is present,start_time
shall be given this value.Otherwise, if
ancestors
has length one and the ancestor has anend_time > 0
, the ancestor’send_time
value shall be used.Otherwise, if
ancestors
has length zero,start_time
shall be given the valueinfinity
.Otherwise,
start_time
cannot be determined and an error MUST be raised.
Epoch resolution#
Epochs are listed in time-descending order (from oldest to youngest), and population sizes are inherited from older epochs. Resolution order:
If a deme’s epochs
field is omitted, it will be given the value of
a single-element list, where the list element has the value of an epoch with
all fields omitted. This may produce a valid epoch during subsequent resolution,
e.g. if the epoch.start_size
defaults field has a value.
end_time#
If end_time
is omitted,
If the
epoch.end_time
defaults field is present,end_time
shall be given this value.Otherwise, if this is the last epoch,
end_time
shall be given the value0
.Otherwise,
end_time
cannot be determined and an error MUST be raised.
The end_time
value of the first epoch MUST be strictly
smaller than the deme’s start_time
.
The end_time
values of successive epochs MUST be strictly decreasing.
start_size, end_size#
Note
Sizes are never inherited from ancestors.
If start_size
is omitted and the epoch.start_size
defaults field
is present, then the epoch’s start_size
shall be given this value.
If end_size
is omitted and the epoch.end_size
defaults field
is present, then the epoch’s end_size
shall be given this value.
In the first epoch,
at least one of
start_size
orend_size
MUST be specified (possibly via a defaults field).If
start_size
is omitted (and no default exists), it shall be given the same value asend_size
.If
end_size
is omitted (and no default exists), it shall be given the same value asstart_size
.If the deme’s
start_time
is infinite,start_size
MUST have the same value asend_size
.
In subsequent epochs,
If
start_size
is omitted (and no defaults exist), it shall be given the same value as the previous epoch’send_size
.If
end_size
is omitted (and no defaults exist), it shall be given the same value asstart_size
.
size_function#
If size_function
is omitted,
if the
epoch.size_function
defaults field is present,size_function
shall be given this value.Otherwise, if
start_size
has the same value asend_size
,size_function
will be given the value"constant"
.Otherwise,
size_function
will be given the value"exponential"
.
selfing_rate#
If selfing_rate
is omitted,
if the
epoch.selfing_rate
defaults field is present,selfing_rate
shall be given this value.Otherwise,
selfing_rate
shall be given the value0
.
cloning_rate#
If cloning_rate
is omitted,
if the
epoch.cloning_rate
defaults field is present,cloning_rate
shall be given this value.Otherwise,
cloning_rate
shall be given the value0
.
Migration resolution#
Migrations must be resolved after all demes are resolved.
Asymmetric migration can be specified using the source
and dest
properties, or symmetric migration can be
specified using the demes
property to list the names
of the participating demes. Each symmetric migration is resolved
into two asymmetric migrations (one in each direction) for each
pair of participating demes.
Resolution order:
rate#
If the rate
is omitted,
if the
migration.rate
defaults field is present,rate
shall be given this value.Otherwise, an error MUST be raised.
source#
If source
is omitted and the migration.source
defaults field is present,
source
shall be given this value.
dest#
If dest
is omitted and the migration.dest
defaults field is present,
dest
shall be given this value.
demes#
If demes
is omitted and the migration.demes
defaults field is present,
demes
shall be given this value.
Symmetric migration#
The following rules shall determine the mode of migration (either asymmetric or symmetric):
If
demes
does not have a value, and bothsource
anddest
have values, the migration is asymmetric. Resolution continues from start_time.If
demes
has a value, and neithersource
nordest
have values, the migration is symmetric.Otherwise, the mode of migration cannot be determined, and an error MUST be raised.
If the migration is symmetric, demes
MUST be validated
before further resolution:
demes
MUST be a list of at least two deme names.Each element of
demes
must be unique.Each element of
demes
must be the name of a resolved deme.
If any of the previous conditions are not met, an error MUST be raised.
If the migration is symmetric, two new asymmetric migrations shall be
constructed for each pair of deme names in demes
.
E.g. if demes = ["a", "b", "c"]
, then asymmetric migrations shall
be constructed for the following cases:
source="a"
,dest="b"
,source="b"
,dest="a"
,source="a"
,dest="c"
,source="c"
,dest="a"
,source="b"
,dest="c"
,source="c"
,dest="b"
.
Values for rate
, start_time
, and end_time
for the new asymmetric
migrations shall be taken from the symmetric migration.
If start_time
and/or end_time
are omitted from the symmetric
migration, these shall also be omitted for the new asymmetric migrations.
Resolution now proceeds separately for each distinct asymmetric migration.
Note
The symmetric migration shall not appear in the MDM output. Once the symmetric migration has been resolved into the corresponding asymmetric migrations, the symmetric migration may be discarded.
start_time#
If start_time
is omitted,
If the
migration.start_time
defaults field has a value,start_time
shall be given this value.Otherwise,
start_time
shall be the oldest time at which both thesource
anddest
demes exist. I.e.min(source.start_time, dest.start_time)
.
end_time#
If end_time
is omitted,
If the
migration.end_time
defaults field has a value,end_time
shall be given this value.Otherwise,
end_time
shall be the most recent time at which both thesource
anddest
demes exist. I.e.max(source.end_time, dest.end_time)
.
Pulse resolution#
Pulses must be resolved after all demes are resolved.
Resolution order:
sources#
If sources
is omitted,
if the
pulse.sources
defaults field has a value,sources
shall be given this value.Otherwise, an error MUST be raised.
proportions#
If proportions
is omitted,
if the
pulse.proportions
defaults field has a value,proportions
shall be given this value.Otherwise, an error MUST be raised.
dest#
If dest
is omitted,
if the
pulse.dest
defaults field has a value,dest
shall be given this value.Otherwise, an error MUST be raised.
time#
If time
is omitted,
if the
pulse.time
defaults field has a value,time
shall be given this value.Otherwise, an error MUST be raised.
Sort pulses#
Pulses MUST be sorted in time-descending order (from oldest to youngest).
A stable
sorting algorithm MUST be used to avoid changing the model interpretation
when multiple pulses are specified with the same time
value.
Note
In a discrete-time setting, non-integer pulse times that are distinct could be rounded to the same time value. If pulses are in time-ascending order when times are rounded, then the pulses would be applied in the opposite order compared to a continuous-time setting. Sorting in time-descending order avoids this discrepancy.
Validation#
Note
It may be convenient to perform some or all validation during model resolution. E.g. to avoid code duplication, or to provide better error messages to the user.
Following resolution, the model must be validated against the MDM schema. This includes checking:
all required properties now have values,
no additional properties are present (except where permitted by the schema),
the types of properties match the schema,
the values are within the ranges specified (noting that infinity is permitted only for deme
start_time
and for migrationstart_time
).
In addition to validation against the schema, the following constraints must be checked to ensure overall consistency of the model. If any condition is not met, an error must be raised.
generation_time#
If time_units
is “generations”, then generation_time
must be 1.
demes#
There must be at least one deme.
Each deme’s
name
must be unique in the model.name
must be a valid Python identifier.If
start_time
is infinity,ancestors
must be an empty list.If
ancestors
is an empty list,start_time
must have the value infinity.No deme may appear in its own
ancestors
list.Each element of the
ancestors
list must be unique.The
proportions
list must have the same length as theancestors
list.If the
proportions
list is not empty, then the values must sum to 1 (within a reasonable tolerance, e.g. 1e-9).
epochs#
Each deme must have at least one epoch.
The
end_time
values of successive epochs must be strictly descending (ordered from the past towards the present).The
end_time
values must be strictly smaller than the deme’sstart_time
.If the deme has an infinite
start_time
, the first epoch’ssize_function
must have the value “constant”.If the
size_function
is “constant”, thestart_size
andend_size
must be equal.
migrations#
This section assumes that symmetric migrations have been resolved into
pairs of asymmetric migrations and validated as per the
migration resolution
section. Resolution of symmetric migrations includes
validation of the migration.demes
property, and this property
is not considered below as it is not part of the MDM.
source
must not be the same asdest
.start_time
andend_time
must both be in the closed interval[deme.start_time, deme.end_time]
, for both thesource
deme and thedest
deme.start_time
must be strictly greater thanend_time
.There must be at most one migration specified per source/destination pair for any given time interval.
If more than one source population have continuous migration into the same destination population, the sum of those migration rates must also be less than or equal to 1 (within a reasonable tolerance, e.g. 1e-9).
pulses#
sources
must be list containing at least one element.Each element of
sources
must be unique.The
dest
deme must not appear in thesources
list.For each source deme in
sources
,time
must be in the open-closed interval(deme.start_time, deme.end_time]
, defined by the existence interval of the source deme.time
must be in the closed-open interval[deme.start_time, deme.end_time)
, defined by the existence interval of thedest
deme.Hence,
time
must not have the value infinity, nor the value 0.The
proportions
list must have the same length as thesources
list.The sum of values in the
proportions
list must be less than or equal to 1 (within a reasonable tolerance, e.g. 1e-9).
Schema#
The schema listed here is definitive in terms of types and the structure of the JSON documents that are considered to be valid instances of the Demes standard.
$schema: http://json-schema.org/draft-07/schema#
title: Demes graph
type: object
additionalProperties: false
properties:
description:
type: "string"
default: ""
doi:
type: array
items:
type: string
default: []
time_units:
type: string
# TODO: shouldn't this be an enum?
generation_time:
type: "number"
exclusiveMinimum: 0
metadata:
type: object
default: {}
additionalProperties: true
defaults:
type: object
default: {}
additionalProperties: false
properties:
epoch:
$ref: '#/definitions/epoch'
migration:
$ref: '#/definitions/migration'
pulse:
$ref: '#/definitions/pulse'
deme:
properties:
description:
type: "string"
ancestors:
type: array
items:
$ref: '#/definitions/name'
proportions:
type: array
items:
$ref: '#/definitions/proportion'
start_time:
$ref: '#/definitions/start_time'
demes:
type: array
minItems: 1
items:
$ref: '#/definitions/deme'
pulses:
type: array
items:
$ref: '#/definitions/pulse'
default: []
migrations:
type: array
items:
$ref: '#/definitions/migration'
default: []
required:
- time_units
- demes
definitions:
name:
type: string
rate:
type: number
minimum: 0
maximum: 1
proportion:
type: number
exclusiveMinimum: 0
maximum: 1
size:
type: number
exclusiveMinimum: 0
start_time:
oneOf:
- type: number
exclusiveMinimum: 0
- const: "Infinity"
end_time:
type: number
minimum: 0
epoch:
type: object
additionalProperties: false
properties:
end_time:
$ref: '#/definitions/end_time'
start_size:
$ref: '#/definitions/size'
end_size:
$ref: '#/definitions/size'
size_function:
# TODO: make this an enumeration
type: string
default: exponential
cloning_rate:
$ref: '#/definitions/rate'
selfing_rate:
$ref: '#/definitions/rate'
deme:
type: object
additionalProperties: false
properties:
name:
$ref: '#/definitions/name'
description:
type: "string"
default: ""
ancestors:
type: array
items:
$ref: '#/definitions/name'
default: []
proportions:
type: array
items:
$ref: '#/definitions/proportion'
start_time:
$ref: '#/definitions/start_time'
epochs:
type: array
default: []
minItems: 0
items:
$ref: '#/definitions/epoch'
defaults:
type: object
default: {}
additionalProperties: false
properties:
epoch:
$ref: '#/definitions/epoch'
required:
- name
pulse:
type: object
additionalProperties: false
properties:
sources:
type: array
items:
$ref: '#/definitions/name'
dest:
$ref: '#/definitions/name'
time:
type: number
exclusiveMinimum: 0
proportions:
type: array
items:
$ref: '#/definitions/proportion'
migration:
anyOf:
# Asymmetric
- type: object
additionalProperties: false
properties:
source:
$ref: '#/definitions/name'
dest:
$ref: '#/definitions/name'
start_time:
$ref: '#/definitions/start_time'
end_time:
$ref: '#/definitions/end_time'
rate:
$ref: '#/definitions/rate'
# Symmetric
- type: object
additionalProperties: false
properties:
demes:
type: array
items:
$ref: '#/definitions/name'
minItems: 2
start_time:
$ref: '#/definitions/start_time'
end_time:
$ref: '#/definitions/end_time'
rate:
$ref: '#/definitions/rate'
Reference parser implementation#
# A simple parser that builds a fully-qualified Demes Graph from an input JSON
# string.
#
# Requires Python 3.7+.
#
# This implementation is NOT recommended for use in any downstream software and
# is provided purely as reference material for parser writers (i.e., in other
# programming languages). Python users should use the "demes" package in their
# software: https://github.com/popsim-consortium/demes-python
#
# The entry point is the ``parse`` function, which returns a fully-qualified
# Graph. The implementation is written with clarity and correctness as the main
# priorities. Its main purpose is to remove any potential ambiguities that may
# exist in the written specification and to simplify the process of writing
# other parsers. In the interest of simplicity, the parser does not generate
# useful error messages in all cases (but we would hope that practical
# implementations would).
#
# Type annotations are used where they help with readability, but not applied
# exhaustively.
from __future__ import annotations
import math
import numbers
import copy
import pprint
import dataclasses
from typing import Dict, List, Union
# Numerical wiggle room.
EPSILON = 1e-6
# JSON does not provide a way to encode IEEE infinity values, which we
# require to describe start_time values. To work around this we use the
# string "Infinity" to represent IEEE positive infinity.
JSON_INFINITY_STR = "Infinity"
def parse(data: dict) -> Graph:
# Parsing is done by popping items out of the input data dictionary and
# creating the appropriate Python objects. We ensure that extra items
# have not been included in the data payload by checking if the objects
# are empty once we have removed all the values defined in the
# specification. Type and range validation of simple items (e.g., the
# value must be a positive integer) is performed at the same time,
# using the pop_x functions. Once the full object model of the input
# data has been built, the rules for creating a fully-qualified Demes
# graph are applied in the "resolve" functions. Finally, we validate
# the fully-qualified graph to ensure that relationships between the
# entities have been specified correctly.
data = copy.deepcopy(data)
defaults = pop_object(data, "defaults", {})
deme_defaults = pop_object(defaults, "deme", {})
migration_defaults = pop_object(defaults, "migration", {})
pulse_defaults = pop_object(defaults, "pulse", {})
# epoch defaults may also be specified within a Deme definition.
global_epoch_defaults = pop_object(defaults, "epoch", {})
check_empty(defaults)
graph = Graph(
description=pop_string(data, "description", ""),
time_units=pop_string(data, "time_units", None),
doi=pop_list(data, "doi", [], str, is_nonempty),
generation_time=pop_number(
data, "generation_time", None, is_positive_and_finite
),
metadata=pop_object(data, "metadata", {}),
)
check_defaults(
deme_defaults,
dict(
description=(str, None),
start_time=((str, numbers.Number), is_positive_or_json_infinity),
ancestors=(list, is_list_of_identifiers),
proportions=(list, is_list_of_proportions),
),
)
allowed_epoch_defaults = dict(
end_time=(numbers.Number, is_non_negative_and_finite),
start_size=(numbers.Number, is_positive_and_finite),
end_size=(numbers.Number, is_positive_and_finite),
selfing_rate=(numbers.Number, is_rate),
cloning_rate=(numbers.Number, is_rate),
size_function=(str, None),
)
check_defaults(global_epoch_defaults, allowed_epoch_defaults)
for deme_data in pop_list(data, "demes"):
insert_defaults(deme_data, deme_defaults)
deme = graph.add_deme(
name=pop_string(deme_data, "name", validator=is_identifier),
description=pop_string(deme_data, "description", ""),
start_time=pop_number(
deme_data,
"start_time",
None,
is_positive_or_json_infinity,
allow_inf=True,
),
ancestors=pop_list(deme_data, "ancestors", [], str, is_identifier),
proportions=pop_list(
deme_data, "proportions", None, numbers.Number, is_proportion
),
)
local_defaults = pop_object(deme_data, "defaults", {})
local_epoch_defaults = pop_object(local_defaults, "epoch", {})
check_empty(local_defaults)
check_defaults(local_epoch_defaults, allowed_epoch_defaults)
epoch_defaults = global_epoch_defaults.copy()
epoch_defaults.update(local_epoch_defaults)
check_defaults(epoch_defaults, allowed_epoch_defaults)
# There is always at least one epoch defined with the default values.
for epoch_data in pop_list(deme_data, "epochs", [{}]):
insert_defaults(epoch_data, epoch_defaults)
deme.add_epoch(
end_time=pop_number(
epoch_data, "end_time", None, is_non_negative_and_finite
),
start_size=pop_number(
epoch_data, "start_size", None, is_positive_and_finite
),
end_size=pop_number(
epoch_data, "end_size", None, is_positive_and_finite
),
selfing_rate=pop_number(epoch_data, "selfing_rate", 0, is_rate),
cloning_rate=pop_number(epoch_data, "cloning_rate", 0, is_rate),
size_function=pop_string(epoch_data, "size_function", None),
)
check_empty(epoch_data)
check_empty(deme_data)
if len(deme.epochs) == 0:
raise ValueError(f"no epochs for deme {deme.name}")
if len(graph.demes) == 0:
raise ValueError("the graph must have one or more demes")
check_defaults(
migration_defaults,
dict(
rate=(numbers.Number, is_rate),
start_time=((numbers.Number, str), is_positive_or_json_infinity),
end_time=(numbers.Number, is_non_negative_and_finite),
source=(str, is_identifier),
dest=(str, is_identifier),
demes=(list, is_list_of_identifiers),
),
)
for migration_data in pop_list(data, "migrations", []):
insert_defaults(migration_data, migration_defaults)
graph.add_migration(
rate=pop_number(migration_data, "rate", validator=is_rate),
start_time=pop_number(
migration_data,
"start_time",
None,
is_positive_or_json_infinity,
allow_inf=True,
),
end_time=pop_number(
migration_data, "end_time", None, is_non_negative_and_finite
),
source=pop_string(migration_data, "source", None, is_nonempty),
dest=pop_string(migration_data, "dest", None, is_nonempty),
demes=pop_list(
migration_data,
"demes",
default=None,
required_type=str,
validator=is_identifier,
),
)
check_empty(migration_data)
check_defaults(
pulse_defaults,
dict(
sources=(list, is_nonempty_list_of_identifiers),
dest=(str, is_identifier),
time=(numbers.Number, is_positive_and_finite),
proportions=(list, is_nonempty_list_of_proportions_with_sum_less_than_1),
),
)
for pulse_data in pop_list(data, "pulses", []):
insert_defaults(pulse_data, pulse_defaults)
graph.add_pulse(
sources=pop_list(
pulse_data,
"sources",
default=[],
required_type=str,
validator=is_identifier,
),
dest=pop_string(pulse_data, "dest", validator=is_identifier),
time=pop_number(pulse_data, "time", validator=is_positive_and_finite),
proportions=pop_list(
pulse_data,
"proportions",
default=[],
required_type=numbers.Number,
validator=is_proportion,
),
)
check_empty(pulse_data)
check_empty(data)
# The input object model has now been fully populated, and local type and
# value checking done. Default values (either from the schema or set explicitly
# by the user via "defaults" sections) have been assigned. We now "resolve"
# the model so that any values that can be imputed from the structure of the
# model are set explicitly. Once this is done, we then validate the model to
# check that the relationships between various entities make sense. Note that
# there isn't a clean separation between resolution and validation here, since
# some validation is simplest to perform as part of the resolution logic in
# this particular implementation.
graph.resolve()
graph.validate()
return graph
def encode_inf(value):
if math.isinf(value):
return JSON_INFINITY_STR
return value
# Validator functions. These are used as arguments to the pop_x functions and
# check properties of the values.
def is_positive_or_json_infinity(value):
return value == JSON_INFINITY_STR or value > 0
def is_positive_and_finite(value):
return value > 0 and not math.isinf(value)
def is_non_negative_and_finite(value):
return value >= 0 and not math.isinf(value)
def is_rate(value):
return 0 <= value <= 1
def is_proportion(value):
return 0 < value <= 1
def is_nonempty(value):
return len(value) > 0
def is_identifier(value):
return value.isidentifier()
def is_list_of_identifiers(value):
return all(isinstance(v, str) and is_identifier(v) for v in value)
def is_nonempty_list_of_identifiers(value):
return is_list_of_identifiers(value) and len(value) > 0
def is_list_of_proportions(value):
return all(isinstance(v, numbers.Number) and is_proportion(v) for v in value)
def is_nonempty_list_of_proportions_with_sum_less_than_1(value):
return is_list_of_proportions(value) and len(value) > 0 and sum(value) <= 1
def validate_item(name, value, required_type, validator=None):
if not isinstance(value, required_type):
raise TypeError(
f"Attribute '{name}' must be a {required_type}; "
f"current type is {type(value)}."
)
if validator is not None and not validator(value):
validator_name = validator.__name__[3:] # Strip off is_ from function name
raise ValueError(f"Attribute '{name}' is not {validator_name}")
# We need to use this trick because None is a meaningful input value for these
# pop_x functions.
NO_DEFAULT = object()
def pop_item(data, name, *, required_type, default=NO_DEFAULT, validator=None):
if name in data:
value = data.pop(name)
validate_item(name, value, required_type, validator)
else:
if default is NO_DEFAULT:
raise KeyError(f"Attribute '{name}' is required")
value = default
return value
def pop_list(data, name, default=NO_DEFAULT, required_type=None, validator=None):
value = pop_item(data, name, default=default, required_type=list)
if required_type is not None and value is not None:
for item in value:
validate_item(name, item, required_type, validator)
return value
def pop_object(data, name, default=NO_DEFAULT):
return pop_item(data, name, default=default, required_type=dict)
def pop_string(data, name, default=NO_DEFAULT, validator=None):
return pop_item(data, name, default=default, required_type=str, validator=validator)
def pop_number(data, name, default=NO_DEFAULT, validator=None, allow_inf=False):
# If infinite values are allowed for this number, the string "Infinity"
# is also accepted, and so str is an accepted type. There is a small loophole
# here in which string numbers like "1000" will be accepted by the type
# checking machinery, but the is_positive_or_json_infinity validator
# will catch this and raise a TypeError when it tries to compare with 0.
if allow_inf:
assert validator is is_positive_or_json_infinity
required_type = (numbers.Number, str) if allow_inf else numbers.Number
value = pop_item(
data,
name,
default=default,
required_type=required_type,
validator=validator,
)
if value == JSON_INFINITY_STR:
return math.inf
return value
def check_empty(data):
if len(data) != 0:
raise ValueError(f"Extra fields are not permitted:{data}")
def check_defaults(defaults, allowed_fields):
for key, value in defaults.items():
if key not in allowed_fields:
raise ValueError(
f"Only fields {list(allowed_fields.keys())} can be specified "
"in the defaults"
)
required_type, validator = allowed_fields[key]
validate_item(key, value, required_type, validator)
def insert_defaults(data, defaults):
for key, value in defaults.items():
if key not in data:
data[key] = value
@dataclasses.dataclass
class Interval:
"""
A half-open time interval (start_time, end_time].
"""
start_time: float
end_time: float
def __init__(self, start_time, end_time):
assert start_time > end_time
self.start_time = start_time
self.end_time = end_time
def intersects(self, other):
"""True if self and other intersect, False otherwise."""
assert isinstance(other, self.__class__)
return not (
self.end_time >= other.start_time or other.end_time >= self.start_time
)
def is_subinterval(self, other):
"""True if self is completely contained within other, False otherwise."""
assert isinstance(other, self.__class__)
return self.start_time <= other.start_time and self.end_time >= other.end_time
def __contains__(self, time):
return self.start_time > time >= self.end_time
@dataclasses.dataclass
class Epoch:
end_time: Union[float, None]
start_size: Union[float, None]
end_size: Union[float, None]
size_function: str
selfing_rate: float
cloning_rate: float
def as_json_dict(self) -> dict:
return dataclasses.asdict(self)
def resolve(self):
if self.size_function is None:
if self.start_size == self.end_size:
self.size_function = "constant"
else:
self.size_function = "exponential"
def validate(self):
if self.size_function not in ("constant", "exponential", "linear"):
raise ValueError(f"unknown size_function '{self.size_function}'")
if self.size_function == "constant" and self.start_size != self.end_size:
raise ValueError(
"size_function is constant but "
f"start_size ({self.start_size}) != end_size ({self.end_size})"
)
@dataclasses.dataclass
class Deme:
name: str
start_time: Union[None, float]
description: str
ancestors: List[Deme]
proportions: Union[List[float], None]
epochs: List[Epoch] = dataclasses.field(default_factory=list)
def add_epoch(
self,
end_time: Union[float, None],
start_size: Union[float, None],
end_size: Union[float, None],
selfing_rate: float,
cloning_rate: float,
size_function: str,
) -> Epoch:
epoch = Epoch(
end_time=end_time,
start_size=start_size,
end_size=end_size,
selfing_rate=selfing_rate,
cloning_rate=cloning_rate,
size_function=size_function,
)
self.epochs.append(epoch)
return epoch
@property
def end_time(self):
return self.epochs[-1].end_time
@property
def time_interval(self):
return Interval(self.start_time, self.end_time)
def as_json_dict(self) -> dict:
return {
"name": self.name,
"description": self.description,
"start_time": encode_inf(self.start_time),
"epochs": [epoch.as_json_dict() for epoch in self.epochs],
"proportions": self.proportions,
"ancestors": [deme.name for deme in self.ancestors],
}
def __resolve_times(self):
if self.start_time is None:
default = math.inf
if len(self.ancestors) == 1:
default = self.ancestors[0].epochs[-1].end_time
elif len(self.ancestors) > 1:
raise ValueError(
"Must explicitly set Deme.start_time when > 1 ancestor"
)
self.start_time = default
if len(self.ancestors) == 0 and not math.isinf(self.start_time):
raise ValueError(
f"deme {self.name} has finite start_time, but no ancestors"
)
for ancestor in self.ancestors:
if self.start_time not in ancestor.time_interval:
raise ValueError(
f"Deme {ancestor.name} ({ancestor.time_interval}) doesn't "
f"exist at deme {self.name}'s start_time ({self.start_time})"
)
# The last epoch has a default end_time of 0
last_epoch = self.epochs[-1]
if last_epoch.end_time is None:
last_epoch.end_time = 0
last_time = self.start_time
for epoch in self.epochs:
if epoch.end_time is None:
raise ValueError("Epoch end_time must be specified")
if epoch.end_time >= last_time:
raise ValueError("Epoch end_times must be in decreasing order.")
last_time = epoch.end_time
def __resolve_sizes(self):
first_epoch = self.epochs[0]
# The first epoch must specify either start_size or end_size
if first_epoch.start_size is None and first_epoch.end_size is None:
raise ValueError(
"Must specify one or more of start_size and end_size "
"for the initial epoch"
)
if first_epoch.start_size is None:
first_epoch.start_size = first_epoch.end_size
if first_epoch.end_size is None:
first_epoch.end_size = first_epoch.start_size
last_epoch = first_epoch
for epoch in self.epochs[1:]:
if epoch.start_size is None:
epoch.start_size = last_epoch.end_size
if epoch.end_size is None:
epoch.end_size = epoch.start_size
last_epoch = epoch
if self.start_time == math.inf:
if first_epoch.start_size != first_epoch.end_size:
raise ValueError(
"Cannot have varying population size in an infinite time interval"
)
def __resolve_proportions(self):
if self.proportions is None:
if len(self.ancestors) == 0:
self.proportions = []
elif len(self.ancestors) == 1:
self.proportions = [1]
else:
raise ValueError("Must specify proportions for > 1 ancestor demes")
def resolve(self):
self.__resolve_times()
self.__resolve_sizes()
self.__resolve_proportions()
for epoch in self.epochs:
epoch.resolve()
def validate(self):
if len(self.proportions) != len(self.ancestors):
raise ValueError("proportions must be same length as ancestors")
if len(self.ancestors) > 0:
if not math.isclose(sum(self.proportions), 1):
raise ValueError("Sum of proportions must be approximately 1")
if len(set(anc.name for anc in self.ancestors)) != len(self.ancestors):
raise ValueError("ancestors list contains duplicates")
for epoch in self.epochs:
epoch.validate()
@dataclasses.dataclass
class Pulse:
sources: List[Deme]
dest: Deme
time: float
proportions: List[float]
def as_json_dict(self) -> dict:
d = dataclasses.asdict(self)
d["sources"] = [source.name for source in self.sources]
d["dest"] = self.dest.name
return d
def validate(self):
sources_names = set(source.name for source in self.sources)
if self.dest.name in sources_names:
raise ValueError("Cannot have source deme equal to dest")
if len(sources_names) != len(self.sources):
raise ValueError("Duplicate deme in sources")
if len(self.sources) == 0:
raise ValueError("Must have one or more source demes")
if len(self.sources) != len(self.proportions):
raise ValueError("Sources and proportions must have same lengths")
for source in self.sources:
if self.time not in source.time_interval:
raise ValueError(
f"Deme {source.name} does not exist at time {self.time}"
)
# Time limits for the destination deme are different to the source deme,
# because the destination deme is affected immediately after the time
# of the pulse. Thus, a pulse can occur at the destination deme's
# start_time, but not at the destination deme's end_time.
if not (self.dest.start_time >= self.time > self.dest.end_time):
raise ValueError(
f"Deme {self.dest.name} does not exist at time {self.time}"
)
if sum(self.proportions) > 1 + EPSILON:
raise ValueError(
f"Pulse proportions into {self.dest.name} at time {self.time} "
"sum to more than 1"
)
@dataclasses.dataclass
class Migration:
rate: Union[float, None]
start_time: Union[float, None]
end_time: Union[float, None]
source: Deme
dest: Deme
@property
def time_interval(self):
return Interval(self.start_time, self.end_time)
def as_json_dict(self) -> dict:
d = dataclasses.asdict(self)
d["start_time"] = encode_inf(self.start_time)
d["source"] = self.source.name
d["dest"] = self.dest.name
return d
def resolve(self):
if self.start_time is None:
self.start_time = min(self.source.start_time, self.dest.start_time)
if self.end_time is None:
self.end_time = max(self.source.end_time, self.dest.end_time)
def validate(self):
if self.start_time <= self.end_time:
raise ValueError("start_time must be > end_time")
if self.source.name == self.dest.name:
raise ValueError("Cannot migrate from a deme to itself")
for deme in [self.source, self.dest]:
if not self.time_interval.is_subinterval(deme.time_interval):
raise ValueError(
"Migration time interval must be within the each deme's "
"time interval"
)
@dataclasses.dataclass
class Graph:
time_units: str
generation_time: Union[float, None]
doi: List[str]
description: str
metadata: dict
demes: Dict[str, Deme] = dataclasses.field(default_factory=dict)
migrations: List[Migration] = dataclasses.field(default_factory=list)
pulses: List[Pulse] = dataclasses.field(default_factory=list)
def add_deme(
self,
name: str,
description: str,
start_time: Union[float, None],
ancestors: List[str],
proportions: Union[List[float], None],
) -> Deme:
deme = Deme(
name=name,
description=description,
start_time=start_time,
ancestors=[self.demes[deme_name] for deme_name in ancestors],
proportions=proportions,
)
if deme.name in self.demes:
raise ValueError(f"Duplicate deme name '{deme.name}'")
self.demes[deme.name] = deme
return deme
def add_migration(
self,
*,
rate: float,
start_time: Union[float, None],
end_time: Union[float, None],
source: Union[str, None],
dest: Union[str, None],
demes: Union[List[str], None],
) -> List[Migration]:
migrations: List[Migration] = []
if not (
# symmetric
(demes is not None and source is None and dest is None)
# asymmetric
or (demes is None and source is not None and dest is not None)
):
raise ValueError("Must specify either source and dest, or demes")
if source is not None:
assert dest is not None
migrations.append(
Migration(
rate=rate,
start_time=start_time,
end_time=end_time,
source=self.demes[source],
dest=self.demes[dest],
)
)
else:
assert demes is not None
if len(demes) < 2:
raise ValueError("Must specify two or more deme names")
for j, deme_a in enumerate(demes, 1):
for deme_b in demes[j:]:
migration_ab = Migration(
rate=rate,
start_time=start_time,
end_time=end_time,
source=self.demes[deme_a],
dest=self.demes[deme_b],
)
migration_ba = Migration(
rate=rate,
start_time=start_time,
end_time=end_time,
source=self.demes[deme_b],
dest=self.demes[deme_a],
)
migrations.extend([migration_ab, migration_ba])
self.migrations.extend(migrations)
return migrations
def add_pulse(
self, sources: List[str], dest: str, time: float, proportions: List[float]
):
pulse = Pulse(
sources=[self.demes[source] for source in sources],
dest=self.demes[dest],
time=time,
proportions=proportions,
)
self.pulses.append(pulse)
return pulse
def __str__(self):
data = self.as_json_dict()
return pprint.pformat(data, indent=2)
def as_json_dict(self):
d = dataclasses.asdict(self)
d["demes"] = [deme.as_json_dict() for deme in self.demes.values()]
d["migrations"] = [migration.as_json_dict() for migration in self.migrations]
d["pulses"] = [pulse.as_json_dict() for pulse in self.pulses]
return d
def validate(self):
if self.generation_time is None:
if self.time_units == "generations":
self.generation_time = 1
else:
raise ValueError(
"Must specify Graph.generation_time if time_units is not "
"'generations'"
)
if self.time_units == "generations" and self.generation_time != 1:
raise ValueError(
"If time_units are in generations, generation_time must be 1"
)
for deme in self.demes.values():
deme.validate()
for pulse in self.pulses:
pulse.validate()
for migration in self.migrations:
migration.validate()
# Migrations involving the same source and dest can't overlap temporally.
for j, migration_a in enumerate(self.migrations, 1):
for migration_b in self.migrations[j:]:
if (
migration_a.source == migration_b.source
and migration_a.dest == migration_b.dest
and migration_a.time_interval.intersects(migration_b.time_interval)
):
start_time = min(migration_a.end_time, migration_b.end_time)
end_time = max(migration_a.start_time, migration_b.start_time)
raise ValueError(
f"Competing migration definitions for {migration_a.source.name} "
f"and {migration_a.dest.name} during time interval "
f"({start_time}, {end_time}]"
)
# The rate of migration entering a deme cannot be more than 1 in any
# given interval of time.
time_boundaries = set()
time_boundaries.update(migration.start_time for migration in self.migrations)
time_boundaries.update(migration.end_time for migration in self.migrations)
time_boundaries.discard(math.inf)
end_times = sorted(time_boundaries, reverse=True)
start_times = [math.inf] + end_times[:-1]
ingress_rates = {deme_name: [0.0] * len(end_times) for deme_name in self.demes}
for j, (start_time, end_time) in enumerate(zip(start_times, end_times)):
current_interval = Interval(start_time, end_time)
for migration in self.migrations:
if current_interval.intersects(migration.time_interval):
rate = ingress_rates[migration.dest.name][j] + migration.rate
if rate > 1 + EPSILON:
raise ValueError(
f"Migration rates into {migration.dest.name} sum to "
"more than 1 during the time inverval "
f"({start_time}, {end_time}]"
)
ingress_rates[migration.dest.name][j] = rate
def resolve(self):
# A deme's ancestors must be listed before it, so any deme we
# visit must always be visited after its ancestors.
for deme in self.demes.values():
deme.resolve()
for migration in self.migrations:
migration.resolve()
# Sort pulses from oldest to youngest.
# In a discrete-time setting, non-integer pulse times that are distinct
# could be rounded to the same time value. If the input file has the pulses
# in time-ascending order, then the pulses would occur in the opposite order
# compared to a continuous-time simulator. Sorting before the rounding
# occurs avoids this ambiguity, so we explicitly require pulses to be
# sorted as part of the parser.
# Note that Python implements "stable" sorting, which maintains the order
# of pulses that have the same time value to start with (as required by
# the spec).
self.pulses.sort(key=lambda pulse: pulse.time, reverse=True)
Appendix#
Converting backwards time to forwards time#
Times in Demes models use a backwards-time convention, where the value 0
represents
now and time values increase towards the past.
However, many simulators use the opposite convention, where time 0
represents
some time in the past and time values increase towards the present.
To convert times in a Demes model into a forward-time representation:
Set
y
equal to the minimum epoch end time in the resolved graph.Set
x
equal to the most ancient, finite, value out of epochstart_time
, epochend_time
, migrationstart_time
, or pulsetime
.The model duration is
d = x - y
;Using the convention of starting a forward-in-time model at time zero ( representing the parental generation at the beginning of a model), the model runs forward in time from
(0, d]
.For explicit simulations involving a “burn in time”, the previous interval is shifted by that length. The duration of the burn-in period is
(0, b]
and the events in the Demes graph occur from(b, b + d]
.Given these definitions,
f = b + d - t
, wheret
is a backwards time in the Demes model andf
is the forwards-time equivalent.