Fill has a huge library of thousands of forms all set up to be filled in easily and signed.
Fill in your chosen form
Sign the form using our drawing tool
Send to someone else to fill in and sign.
Contract number
o
507953
Workpackage 8
Population dynamics in the Evolutionary En
vironment
Deliverable 8.3
Report on Population dynamics in EvE
Project funded by the European
Community under the “Information
Society Technology” Programme
1 / 66
Contract number: 507953
Pr oject acronym: DBE
Title: Digital Business Ecosystem
Deliverable N
o
: D8.3
Due date: 05/2006
Delivery date: 07/2006
Short description: The ﬁnal report on population dynamics. Note
that this task has been brought forward by ﬁve months.
Author: UBHAM
Partners contributed: STU, LSE, HWU
Made available to: Public
Versioning
Version Date Author, Organisation
1.0 30/06/2006 J. E. Rowe, B. Mitavskiy, J. Woodward (UBHAM)
2.0 24/07/2006 J. E. Rowe, B. Mitavskiy, J. Woodward (UBHAM)
Quality check
1
st
internal reviewer: Gerard Briscoe (HWU)
2
nd
internal reviewer: Thomas Kurz (STU)
This work is licensed under the Creative Commons
AttributionNonCom mercialShareAlike 2.5 License. To view a copy of this license,
visit : http://creativecommons.org/licenses/byncsa/2.5/ or send a letter to Creative
Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
2 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
3 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Contents
1 Introduction and relationship to the project 6
2 An evolutionary approach to the dynamic set cover problem 8
2.1 The dynamic set cover problem . .................... 8
2.2 TheUMDAalgorithm ......................... 9
2.3 Javacode ................................ 10
3 Theoretical analysis of variablelength evolutionary systems 12
4 Some Results about the Markov Chains Associated to GPs and General
EAs. 14
4.1 Introduction ............................... 14
4.2 Notation................................. 15
4.3 HowDoesaHeuristicSearchAlgorithmWork?............ 19
4.4 TheMarkovChainAssociatedtoanEvolutionaryAlgorithm..... 19
4.5 A Special Kind of Reproduction Steps and the Extended Geiringer
Theorem................................. 20
4.6 Nonlinear Genetic Programming (GP) with Homologous Crossover. . 24
4.7 The Statement of the SchemaBased Version of Geiringer’s Theorem
for Nonlinear GP under Homologous Crossover. . . ......... 29
4.8 HowDoWeObtainTheorem4.7.1fromTheorem4.5.2? ....... 33
4.9 What Does Theorem 4.5.2 Tell Us in the Presence of Mutation for Non
linearGP?................................ 39
4.10 What Can Be Said in the Presence of Selection in the General Case? . 49
4.11 What are the relations and for the case of ﬁtnessproportional
selection? ................................ 53
4.12 What Can Be Said when the Last Elementary Step is Mutation? . . . . 61
4.13Conclusions............................... 64
4 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Executive Summary
This report is the ﬁnal report from UBHAM on WP8 (subtask S4, Population dynam
ics in the evolutionary environment) brought forward from month 36. It is a summary
of work done at UBHAM during the period January–April 2006.
The report is in four main parts. The ﬁrst part introduces the work and describes
its relationship to the rest of the project. The second par t outlines our work on ex
tending evolutionary algorithms to cope with changing requirements (the dynamic set
cover problem). The third part gives an overview of our continuing theoretical analy
sis of evolutionary systems with variablelength structures (including trees). Th e main
theoretical results are presented in the fourth part, which is a technical paper, to be
published in the journal Theoretical Computer Science.
The main “customers” within the DBE project of this work have been STU and,
indirectly, Intel and Sun. The applications of our research have been of some assistance
in the creation of the EvE architecture and algorithms. Most of the work, however, will
be of longterm interest with an impact beyond the lifetime of the project.
5 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 1
Introduction and relationship to
the project
This deliverable is the ﬁnal report for subtask S4 (Population dynamics in the evolu
tionary environment), which is scheduled to run from month 19 to month 36, and builds
on the work of S3, already reported [21] and the preliminary report from S4 [20]. The
objectives of this subtask are:
1. To investigate the effects of a changing environment on evolution. Such changes
may come from external forces (e.g. changes in users’ requirements) or inter
nally (e.g. from the migration of information).
2. To inform the development of DBE Evolutionary Environment, by liaising with
STU, Intel and Sun with regard to representations, operators and ﬁtness deﬁni
tions.
3. To study population dynamics of variablesized structures. This will involve
both empirical studies, as well as a theoretical extension of existing work on
limit theorems for variablesized strin gs.
In our previous reports [21, 20] we described a formalisation of the problem faced
by the EvE as a (weighted) set covering problem, and showed than an evolutionary
approach could be effective on such a problem. We then planned to build on this work
in the following ways:
1. Working with STU to help with their initial implementation of the EvE.
2. Extending the evolutionary approach to deal with changing requirements.
3. Empirical and theoretical studies of the extended algorithm.
4. Continuing collaboration with STU (see [18]) and HWU (see [1]) in the design
of a more sophisticated EvE (including visits to STU).
6 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Unfortunately, due to personnel problems, UBHAM has had to withdraw from the
project early. We have made some contribution to STU’s development of the EvE, in
cluding collaborative visits. We have also begun work on designing the extension to
deal with dynamic requests. This work is presented in chapter 2. However, the empir
ical and theoretical studies have not been completed, although the java code has been
handed over to STU. We are also unable to assist STU with their further development.
We have made more progress, however, in our theoretical studies on variablelength
evolution. Since it is envisioned that the extended algorithm may have to deal with
such structures to represent SBVR statements (e.g. in tr ee form) this is a potentially
rich area of research [19]. We summarise our results in chapter 3. Chapter 4 contains a
copy of our most recent publication on this subject (to appear in Theoretical Computer
Science). It is highly technical, and is included for completeness, since this is our ﬁnal
deliverable.
7 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 2
An evolutionary approach to the
dynamic set cover problem
2.1 The dynamic set cover problem
We b rieﬂy recall the set cover problem as a model f or the DBE EvE. We begin b y
assuming a large collection of features. These are individual things that a consumer
might want. We assume they are atomic. A request from a user is (in the simplest
form) a list of desired features. Service providers make available services to users. A
service comprises, amongst other things, a descirption of all the features which it can
provide. So, from an abstract perspective, we think of a service as being a subset of
features. The problem faced by the EvE is to ﬁnd a collection of services so that, when
taken as a whole, they provide all the features requested. We also want this to be done
as cheaply as possible. We assume each service has an associated cost, and the user
would like the cheapest collection o f services that meets their request.
This problem can be solved effectively u sing an evolutionary approach, as we have
seen in our previous reports. What we now consider is the dynamic set cover problem.
Over time, a user may produce a sequence o f different requests. Similarly, a set of users
with a common (or similar) local service pool, may make different, but related requests.
If the different requests were all unrelated to each other, then the most efﬁcient thing to
do would be to rerun the evolutionary algorithm from scratch on a random population.
However, it is assumed that requests from related users will be somehow similar to
each other. For example, there may be some features, or feature combinations, which
frequently occur in requests from users of a certain type. We want to be able to exploit
this commonality to make the evolutionary algorithm more efﬁcient.
One way to approach this problem is to make sure the initial population used is
biased towards services which occur frequently in response to user requests. We pro
pose a pro bability vector v which a ssigns a certain probability to each service in the
genepool. When a new request arrives, we generate the new initial population as fol
lows. We suppose N is the population size, and there are S services in the genepool).
Then to create a new member of the population, we create a binary string of length S,
8 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
where the probability of setting the k
th
bitto1isv
k
. We repeat this N times to ﬁll up
the population. Each string is interpreted as a subset of services, where a 1 in position
k indicates that service k is included.
When a new habitat is ﬁrst created, it is given a genepool. It must also then get a
vector v to help it construct populations. It could take this vector from a neighbouring
habitat that corresponds to a similar user type. Or, if there is no suitable neighbour, it
could start off b y setting v
k
to be a small p robability such as 1/N .
After a request comes from the user, the population is initialised using v. The evo
lutionary algorithm (see next section) is then run to produce a solution. That solution
will be a collection of services that meets the request as cheaply as possible. We now
want to update v so that the services that appeared in the solution become more likely
to appear in the future. One scheme is to set
v
k
=
(1 − α)v
k
+ α if k is in the solution
αv
k
if k is not in the solution
where 0 <α<1 is a learning rate. When the next request comes in, the new version
of v is used to create the initial population, which is now biased towards the more
frequently used services. This form of selection is related to the population “thinning”
algorithm proposed in [18], which acts as a preselection phase in the EvE. We propose
that such a method be included in future versions.
2.2 The UMDA algorithm
The scheme d escribed in the previous section incrementally modiﬁes the frequencies
of services in the initial population. We thought it might be appropriate, then, to in
vestigate the use of an evolutionary algorithm which also operates on the frequencies
of the services. The UMDA (‘Univariate Marginal Distribution Algorithm’) is such an
evolutionary algorithm. It was invented by Heinz Muhlenbein [7] and has been well
analysed from a theoretical perspective [6, 23]. The algorithm follows three phases:
selection, generation, mutation.
Selection A subset of individuals from the current population are selected depending
on their ﬁtness.
Generation For each service k, denote by p
k
the frequency with which it appears in
the selected population. Generate a new population using the probability vector
p.
Mutation Mutate each bit of each string in the population with probability μ.
There are several ways in which to perform selection. One may use the standard pro
portional or tournament selection schemes to select a number of individuals. Or one
may selected a certain fraction of the best individuals (e.g. take the best half of the
population). The generation stage proceeds exactly as with the initial population, only
now we use the vector p. The mutation rate is typically set to 1/S, so that, on average,
one service is included or deleted from a solution.
As a simple example, suppose that our population is as follows:
9 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
0 0 1 1 0 0 1 fitness = 100
1 0 1 1 0 0 0 fitness = 90
1 0 1 0 0 0 1 fitness = 75
0 1 0 1 0 0 1 fitness = 50
1 0 0 0 0 1 0 fitness = 30
0 1 1 1 0 0 1 fitness = 25
That is, the ﬁrst string indicates a subset containing the third, fourth and seventh ser
vice, and so on. The population has been ranked in order o f ﬁtness. We now select the
best half of the population, namely the ﬁrst three strings. For each service (bit posi
tion) we consider the frequency with which it appears in these top three strings. For
example, the ﬁrst service occurs in the second and third string, but not the ﬁrst string.
Its frequency is therefore 2/3. The frequencies for each service are:
service
1234567
frequency 2/3012/3002/3
We now generate a new population by sampling the services according to these prob
abilities. That is, every time we generate a new string, the ﬁrst bit is set to a one with
probability 2/3 and so on. So we might get:
1011001
1010001
1011000
0011001
0011001
1010000
We then apply mutation. That is, each bit has a probability μ of changing. The value
of μ is usually set to be 1/S, which in this case is 1/7 so that, on average, one bit is
changed p er string. The resulting population might be:
1001001
1110001
1011000
0011101
0011011
1010110
The ﬁtnesses are then calculated, and the population ranked accordingly, and the next
generation begins.
2.3 Java code
Java code implemented the UMDA algorithm can be found at
http://www.cs.bham.ac.uk/ jer/scp.tar.
10 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
After unpackin g the tar ﬁle, the program must be compiled using
javac pop.java
which can then be run. Various parameters can be adjusted in the ﬁle
parameters.java.
This code is preliminary and for experimental purposes only.
11 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 3
Theoretical analysis of
variablelength evolutionary
systems
We have p reviously reported several theoretical results in the analysis of evolutionary
algorithms. These were generally in two areas:
• Results relating to variablelength structures (in the inﬁnite population limit)
• Results relating to ﬁnite populations for strings.
As an example, we proposed and proved a generalisation of the classic Geiringer’s the
orem for ﬁnite populations. Simply put, this states that the repeated action of crossover
is to destroy the correlations between elements of solutions, so that a population tends
to s state of ”linka ge equilibrium”. I t is worth pointing out that th e generation stag e of
the UMDA algorithm takes the population in a single step to this state, as it explicitly
decouples the linkage between services in a solution. UMDA can therefore be seen
(at least theoretically) as applying repeated strong crossover at each generation.
In our current work, we have continued to look at these themes. In the following
chapter, we present a technical paper containing our results, which we summarise here.
Firstly, we return again to Geiringer’s theorem [4], but now considering the evo
lution of variable size and shape individuals, namely trees. This, therefore, applies to
standard Genetic Programming (in which we evolve programs in the form of trees). It
also potentially applies to the DBE EvE if, as is envisaged, the types of requests and
service combinations allowed become more sophisticated than simple lists and subsets.
For example, a hierarchical workﬂow pattern could be represented as a tree structure.
We establish, for tree structures, the kinds of shapes (or schemata as they are tech
nically known) which play a fundamental role in the analogue of Geiringer’s theorem,
and show how crossover again has the effect of decoupling the correlations that exist
in the population.
On the issue of bloat (that is, uncontrolled growth of structures during evolution),
there are a number of possible countermeasures that can be employed. The simplest
12 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
and strictest is to disallow items larger than a certain threshold. Less severe, but still ef
fective, is to have a certain probability that items larger than the threshold will be killed
off. Such a method (called the Tarpeian method) has been proposed and theoretically
motivated by Riccardo Poli [10]. An even less strict method is to have a penalty func
tion that penalises large structures, but this can be difﬁcult to design correctly. See [8]
for a survey of methods.
Secondly, we continue to look at general evolutionary algorithms from the Markov
chain perspective, and begin to build a framework in which bounds on the stationary
distribution can be proved. This distribution gives the probability that a given popu
lation (or a population with any desired property) will be found by the evolutionary
algorithm in the longrun. The framework relies on the construction of a kind of order
relation (technically a preorder ) between populations. Once such an order has been
appropriately constructed, then one can use it to determine if one population is more
likely than another to appear in the stationary distribution.
A simple application of this framework establishe s, f or example, that if we have an
evolutionary algorithm which goes through the phases: mutation, selection, crossover
(in that order), then it is impossible that the stationary distribution can be uniform.
There are always some populations which are preferred to other. This is even true even
if all individuals actually have the same ﬁtness! We ascribe this to the implicit biases
induced by crossover, which tends to “prefer” some types of population over others.
One of the practical consequences of this observation is that one has to be rather
careful when designing crossover operators that one is aware that such biases are being
introduced. For example, it is kn own that certain crossovers for variablelength string
structures have biases towards strings of a certain length. That is, such strings will tend
to be “over sampled” in the long run.
Although we have produced a considerable amount of theoretical work in this pe
riod, it is true to say that at the moment it is still largely of only theoretical interest. We
include our paper in the next chapter for completeness, rather than expecting it to be
of direct value to the project. We had hoped to use the remaining time of the project to
use such results to help design appropriate crossover operators for structured services
within the EvE but, with our withdrawal from the project, this must be left to others, or
to some future work.
13 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 4
Some Results about the Markov
Chains Associated to GPs and
General EAs.
This chapter is a paper accepted for publication in Theoretical Computer Science.
4.1 Introduction
Geiringer’s classical theorem (see [4]) is an important part of GA theory. It has been
cited in a number of papers: see, for instance, [12], [13], [17] and [22]. It deals with
the limit of the sequence of population vectors obtained by repeatedly applying the
crossover operator C(p)
k
=
i,j
p
i
p
j
r
(i, j→k)
where r
(i, j→k)
denotes th e proba bility
of obtaining the individual k from the parents i and j after crossover. In other words,
it speaks to the limit of repeated crossover in the case o f an inﬁnite population. In [5],
a new version of this result was proved for ﬁnite populations, addressing the limiting
distribution of the associated Markov chain, as follows. Let Ω=
n
i=1
A
i
denote the
search space of a given genetic algorithm (intuitively A
i
is the set of alleles correspond
ingtothei
th
gene and n is the chromosome length). Fix a population P consisting of
m individuals with m being an even number. P can be thought of as an m by n matrix
whose rows are the individuals of the population P . Write
P =
⎛
⎜
⎜
⎜
⎝
a
11
a
12
... a
1n
a
21
a
22
... a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
... a
mn
⎞
⎟
⎟
⎟
⎠
.
Notice that the elements of the i
th
column of P are members of A
i
. Continuing with
the notation used in [12], denote by Φ(h, P, i) where h ∈ A
i
the proportion of rows,
say j,ofP for which a
ji
= h. In other words, let R
h
= {j  1 ≤ j ≤ m and a
ji
= h}.
14 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Now simply let Φ(h, P, i)=
R
h

m
. The classical Geiringer theorem (see [4] or, [12]
for modern notation) says that if one starts with a population P of individuals and runs
a genetic algorithm (GA) in the absence of selection and mutation (cro ssover being
the only operator involved) then, in the “long run”, the frequency of occurrence o f
the individual (h
1
,h
2
,...,h
n
) before time t, call it Φ(h
1
,h
2
,...,h
n
,t), approaches
independence:
lim
t→∞
Φ(h
1
,h
2
,...,h
n
,t)=
n
i=1
Φ(h, P, i).
Thereby, Geiringer’s theorem tells us something about the limiting frequency with
which certain elements of the search space are sampled in the long run, provided one
uses crossover alone. In [12] this theorem has been generalized to cover the cases
of variablelength GA’s and homologous linear genetic programming (GP) crossover.
The limiting distributions of the frequency of occurrence of individuals belonging to a
certain schema under these algorithms have been computed. The special conditions un
der which such a limiting distribution exists for linear GP under homologous crossover
have been established (see theorem 9 and section 4.2.1 of [12]). In [5] a rather powerful
extension of the ﬁnite population version of Geiringer’s theorem has been established.
In the current paper we shall use the recipe described in [5] to derive a version of
Geiringer’s theorem for nonlinear GP with homologous crossover (see section 4.6 or
[9] for a detailed description of how nonlinear GP with homologous crossover works)
which is based on Poli hyperschemata (see section 4.6 or [9]). The ﬁrst step in this
procedure is to describe the search space and the appropriate family of reproduction
transformations so that the resulting GP algorithm is bijective and selftransient in the
sense of deﬁn ition 5.2 of [5]. Then the generalized Geiring e r theorem (theorem 5.2
of [5]) as well as corollaries 6.1 and 6.2 of [5] apply. The necessary details are sum
marized in the next few sections. A schema based version of Geirnger’s theorem for
nonlinear GP applies even in the presence of “nodemutation” (see section 4.9).
The ﬁnite population Geiringer theorem established in [5] may completely describe
the stationary distribution of the Markov chain associated to an evolutionary algorithm
only in the absence of selection. In section 4.10 we introduce a preorder relation on
the states of a Markov chain associated to an evolutionary algorithm which is deﬁned
in terms of selection alone, and establish some general inequalities about the station
ary distribution of this Markov chain when selection is the “last stage” in the cycle.
In section 4.12 we demonstrate that the stationary distribution of the Markov chain
associated to most evolutionary algorithms in the presence of selection can never be
uniform when mutation rate is small enough, even if the ﬁtness function is constant.
The material in sections 4.10, 4.11 and 4.12 is independent of the results in sections
5  9. Thus, the reader has an option of jumping to read section 4.10 right after section
4.
4.2 Notation
Ω is a ﬁnite set, called a search space.
15 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
f :Ω→ (0, ∞) is a function, called a ﬁtness function. The goal is to ﬁnd a
maximum of the function f.
F
q
is a collection o f qary operations on Ω. Intuitively F
q
can be thought of as the
collection of reproduction operators: some q parents produce one offspring. In nature
often q =2, for every child has two parents, but in the artiﬁcial setting there seem s to
be no special reason to assume that every child has no more than two parents. When
q =1, the family F
1
can be thought of as asexual reproductions or mutations. The
following deﬁnitions will be used in section 4.3 to describe the general evolutionary
search algorithm. This approach makes it easy to state the Geiringer Theorem.
Deﬁnition 4.2.1 A population P of size m is simply an element of Ω
m
. (Intuitively it
is convenient to think of a population as a “column vector”.)
Remark 4.2.1 There are 2 primary methods for representing populations: multisets
and ordered multisets. Each has advantages, depending upon the particular analyt
ical goals. Lothar Shmitt has published a number of papers which use the ordered
multiset representation to advantage (see, for insta nce, [15] and [16]). According to
deﬁnition 4.2.1, in the current paper we continue the development o f analysis based
upon the presentation pioneered by Lothar Schmitt. The following example illustrates
an aspect of the representation which the reader would do well to keep in mind:
Example 4.2.1 Let Ω={0, 1}
3
. Consider the populations
⎛
⎝
000
111
111
⎞
⎠
,
⎛
⎝
111
000
111
⎞
⎠
and
⎛
⎝
111
111
000
⎞
⎠
. According to deﬁnition 4 .2.1 (the ordered multiset model which is
exploited in the current paper) these are distinct populations despite the fact that they
represent the same population under the multiset model.
An elementary step is a probabilistic rule which takes one population as an input and
produces another population of the same size as an output. For example, the follow
ing elementary step corresponds to the ﬁtnessproportional selection which has been
studied in detail by Wright and Fisher (see [24] and [3]).
Deﬁnition 4.2.2 An elementary step of type 1 (alternatively, of type selection) takes
a given population P =
⎛
⎜
⎜
⎜
⎝
x
1
x
2
.
.
.
x
m
⎞
⎟
⎟
⎟
⎠
with x
i
∈ Ω as an input. The individuals of P are
evaluated:
⎛
⎜
⎜
⎜
⎝
x
1
x
2
.
.
.
x
m
⎞
⎟
⎟
⎟
⎠
→ f(x
1
)
→ f(x
2
)
.
.
.
.
.
.
→ f(x
m
)
16 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
A new population
P
=
⎛
⎜
⎜
⎜
⎝
y
1
y
2
.
.
.
y
m
⎞
⎟
⎟
⎟
⎠
is obtained where y
i
’s are chosen independently m times form the individuals of P and
y
i
= x
j
with probability
f(x
j
)
Σ
m
l=1
f(x
l
)
.
In other words, all of the individuals of P
are among those of P , and the expec
tation of the number of occurrences of any individual of P in P
is proportional to the
number of occurrences of that individual in P times the individual’s ﬁtness value. In
particular, the ﬁtter the individual is, the more copies of that individual are likely to be
present in P
. On the other hand, the individuals having relatively small ﬁtness value
are not likely to enter into P
at all. This is designed to imitate the natural survival of
the ﬁttest principle.
Population P
is the output of this elementary step.
In order to deﬁne an elementary step of type 2 (reproduction) in a general setting which
uses the ordered multiset repr e sentation (see remark 4.2.1 and example 4.2.1) one
needs to introduce the following deﬁnitions:
Deﬁnition 4.2.3 Fix an ordered ktuple of integers q =(q
1
,q
2
, ...,q
k
).LetK de
note a partition of the set {1, 2,...,m} for some m ∈ N. We say that partition K is
qﬁtifeveryelementofK consists of exactly q
i
elements for some i. In logical symbols
this means that if K = {P
1
,P
2
,...,P
l
} then K is qﬁtif∀ 1 ≤ j ≤ l ∃ 1 ≤ i ≤ k
such that P
j
 = q
i
. Denote by E
m
q
the family of all qﬁt partitions of {1, 2,...,m} (i.
e. E
m
q
= {K K is a qﬁt partition of {1, 2,...,m}}).
Deﬁnition 4.2.4 Let Ω be a set, F
q
1
, F
q
2
,...,F
q
k
be some ﬁxed families of q
j
ary op
erations on Ω (F
q
j
is simply a family of functions from Ω
q
j
into Ω), and p
1
,p
2
,...,p
k
be probability distributions on (F
q
1
)
q
1
, (F
q
2
)
q
2
,...,(F
q
k
)
q
k
respectively. Let q =
(q
1
,q
2
, ...,q
k
). Finally, let ℘
m
be a probability distribution o n the collection E
m
q
of
partitions of {1, 2,...,m} (see deﬁnition 4.2.3 above). We then say that the ord ered
2(k +1)tuple (Ω, F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
) is a reproduction ktuple
of arity (q
1
,q
2
, ...,q
k
).
The following deﬁnition o f reproduction covers both, crossover and mutation. Deﬁni
tion 4.2.6 (see also remark 4.2.2) will make it possible to combine different reproduc
tion opera tors in a simple and natural way.
Deﬁnition 4.2.5 An elementary step of type 2 (alternatively, of type reproduction) as
sociated to a given reproduction ktuple (Ω, F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
)
takes a given population P =
⎛
⎜
⎜
⎜
⎝
x
1
x
2
.
.
.
x
m
⎞
⎟
⎟
⎟
⎠
with x
i
∈ Ω as an input.
17 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
The individuals of P are partitioned into pairwise disjoint tuples for mating accord
ing to the probability distribution ℘
m
. For instance, if the partition selected according
to ℘
m
is K = {(i
1
1
,i
1
2
,...,i
1
q
1
), (i
2
1
,i
2
2
,...,i
2
q
2
),...,(i
j
1
,i
j
2
,...,i
j
q
j
) ...} the corre
sponding tuples are
Q
1
=
⎛
⎜
⎜
⎜
⎝
x
i
1
1
x
i
1
2
.
.
.
x
i
1
q
1
⎞
⎟
⎟
⎟
⎠
Q
2
=
⎛
⎜
⎜
⎜
⎝
x
i
2
1
x
i
2
2
.
.
.
x
i
2
q
2
⎞
⎟
⎟
⎟
⎠
... Q
j
=
⎛
⎜
⎜
⎜
⎜
⎝
x
i
j
1
x
i
j
2
.
.
.
x
i
j
q
j
⎞
⎟
⎟
⎟
⎟
⎠
...
Having selected the partition, replace every one of the selected q
j
tuples Q
j
=
⎛
⎜
⎜
⎜
⎜
⎝
x
i
j
1
x
i
j
2
.
.
.
x
i
j
q
j
⎞
⎟
⎟
⎟
⎟
⎠
with the q
j
tuples
Q
=
⎛
⎜
⎜
⎜
⎜
⎝
T
1
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
T
2
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
.
.
.
T
q
j
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
⎞
⎟
⎟
⎟
⎟
⎠
for a q
j
tuple of transformations (T
1
,T
2
,...,T
q
j
) ∈ (F
q
j
)
q
j
selected randomly ac
cording to the probability distribu tion p
j
on (F
q
j
)
q
j
. This gives us a new population
P
=
⎛
⎜
⎜
⎜
⎝
y
1
y
2
.
.
.
y
m
⎞
⎟
⎟
⎟
⎠
which serves as the output of this elementary step.
Notice that a single child does not have to be produced by exactly two parents. It is
possible that a child has more than two parents. Asexual reproduction (mutation) is
also allowed.
Deﬁnition 4.2.6 A cycle is a ﬁnite sequence of elementary steps, say {s
n
}
j
n=1
,which
are either of type 1 or of type 2 and such that all of the steps in the sequence {s
n
}
j
n=1
have the same underlying search space and the same arity of input/output.
Remark 4.2.2 Intuitively, these steps are linked together in such a way that the output
of the step s
i
is the input of the step s
i+1
. This is why all of the steps in the same
cycle must have the same underlying search space and the same arity of input/output
(otherwise the input/output relationship does not make sense).
We are ﬁnally ready to describe a rather wide class of evolutionary heuristic search
algorithms.
18 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.3 How Does a Heuristic Search Algorithm Work?
A general evolutionary search algorithm works as follows: Fix a cycle,sayC =
{s
n
}
j
n=1
(see deﬁnition 4.2.6). Now start the algorithm with an initial population
P =
⎛
⎜
⎜
⎜
⎝
x
1
x
2
.
.
.
x
m
⎞
⎟
⎟
⎟
⎠
The initial population P may b e selected completely randomly, or it
may also b e predetermined depending on the circumstances. The actual method of se
lecting the initial population P is irrelevant for the purposes of the current paper. To
run the algorithm with cycle C = {s
n
}, simply input P into s
1
,runs
1
, then input the
output of s
1
into s
2
... input the output of s
j−1
into s
j
and produce the new output, say
P
.NowuseP
as an initial population and run the cycle C again. Continue this loop
ﬁnitely many times depending on the circumstances.
Deﬁnition 4.3.1 A subalgorithm of a given evolutionary search algorithm deﬁned by
a cycle C = { s
n
}
j
n=1
is simply an evolutionary search algorithm deﬁned by a subse
quence {s
n
q
}
l
q=1
of the sequence C of elementary steps.
A recombination subalgorithm is subalgorithm deﬁned by a sequence of elemen
tary steps of type 2 (Reproduction) only.
4.4 The Markov Chain Associated to an Evolutionary
Algorithm
In [22] it has been pointed out that heuristic search algorithms give rise to the following
Markov process
1
(see also [2], for instance): The state space of this Markov process
is the set of all populations of a ﬁxed size m. This set, in our notation, is simply Ω
m
.
The transition probability p
xy
is simply the probab ility that the population y ∈ Ω
m
is obtained from the population x by going through the cycle once (where the notion
of a cycle is described in section 4.3: see deﬁnition 4 .2 .6 and remark 4.2 .2 ) . The
aim o f the current paper is to establish a few rather general properties of this Markov
chain. In case when there are several algorithms present in our discussion we shall
write {p
A
xy
}
x,y∈Ω
m
to denote the Markov transition matrix associated to the algorithm
A while {p
B
xy
}
x,y∈Ω
m
would denote the Markov transition matrix associated to th e
algorithm B.
Deﬁnition 4.4.1 Fix an evolutionary search algorithm A. Denote by p
n
x,y
the prob
ability that a population y is obtained from the population x upon the completion of
n complete cycles (in the sense of deﬁnition 4.2.6 and remark 4.2 .2) of the algorithm.
We say that a population x leads to a population y under A if and only if p
n
x,y
> 0
for some n. We also write x
A
−→ y as a shorthand notation for x leads to y.(This
terminology is adopted from [2].)
1
In the current paper the state space of this process is slightly modi ﬁed for technical reasons which will
be seen later.
19 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.5 A Special Kind of Reproduction Steps and the Ex
tended Geiringer Theorem
To understand the intuitive meaning of the deﬁnition below, see sections 4.2 and 4.3.
Deﬁnition 4.5.1 Given a set Ω and a family of transformations F
q
from Ω
q
into Ω, ﬁx
a qtuple of transformations (T
1
,T
2
, ...,T
q
) ∈ (F
q
)
q
. Now consider the transforma
tion T
1
,T
2
, ...,T
q
:Ω
q
→ Ω
q
sending any given element
⎛
⎜
⎜
⎜
⎝
x
1
x
2
.
.
.
x
q
⎞
⎟
⎟
⎟
⎠
∈ Ω
q
into
⎛
⎜
⎜
⎜
⎝
T
1
(x
1
,x
2
,...,x
q
)
T
2
(x
1
,x
2
,...,x
q
)
.
.
.
T
q
(x
1
,x
2
,...,x
q
)
⎞
⎟
⎟
⎟
⎠
∈ Ω
q
We say that the transformation T
1
,T
2
, ...,T
q
is the tupling of the ordered qtuple
(T
1
,T
2
, ...,T
q
).
Deﬁnition 4.5.2 Given an elementary step of type 2 (reproduction) associated to the
reproduction ktuple Ω=(Ω, F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
), ﬁx some index
i with 1 ≤ i ≤ k and denote by
G(Ω,q
i
)={T
1
,T
2
, ...,T
q
:Ω
q
i
→ Ω
q
i
 T
j
∈F
q
i
,p
i
(T
1
,T
2
, ...,T
q
i
) > 0}
the family of all tuplings which have a positive probability of being selected.
Remark 4.5.1 The family of tupling transformations G(Ω,q
i
) described in deﬁni
tion 4.5.2 represents the family of q parents → q children crossover transformations
while the family F
q
represents the family of q parents → 1 child crossovers. Depend
ing on the circumstances it may be more convenient to specify the family of q parents
→ q children crossover transformations directly rather than specifying the families F
q
individually. We shall see an example of this situation in section 4.6. The family F
q
of q parents → 1 child crossovers can then be recovered from the family of q parents
→ q children crossover transformations by using coordinate p rojections.
As mentioned in section 4.2, in nature often the arity of the reproduction transfor
mations is 2 meaning that every child has 2 parents.
It turns out that quite many evolutionary algorithms, including the classical genetic
algorithm and nonlinear (as well as linear) genetic programming are equipped with the
reproduction steps having the following nice property which has been introduced and
investigated in [5].
Deﬁnition 4.5.3 A given elementary step of type 2 (reproduction) associated to the re
production ktuple (Ω, F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
) is said to be bijective
(and selftransient) if it satisﬁes conditions 1 (and 2) stated below:
1. ∀ 1 ≤ i ≤ k we have p
i
(T
1
,T
2
, ...,T
q
i
) > 0=⇒T
1
,T
2
, ...,T
q
i
(see
deﬁnition 4.5.1 for the meaning of T
1
,T
2
, ...,T
q
i
) is a bijection (a onetoone and
onto map of Ω
q
i
onto itself).
20 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
2. ∀ 1 ≤ i ≤ k ∃ (T
1
,T
2
, ...,T
q
i
) ∈ (F
q
i
)
q
i
such that p
i
(T
1
,T
2
, ...,T
q
i
) > 0
and T
1
,T
2
, ...,T
q
i
= 1 where 1 :Ω
q
i
→ Ω
q
i
denotes the identity map (i. e. ∀ x ∈
Ω
q
i
we have T
1
,T
2
, ...,T
q
i
(x)=x). We say that a recombination subalgorithm
(see deﬁnition 4.3.1) of a given evolutionary search algorithm is bijective (and self
transient) if every given term of the subsequence, s
n
k
by which the subalgorithm is
deﬁned is bijective (an d selftransient).
Remark 4.5.2 Notice that conditions 1 and 2 of deﬁnition 4.5.3 ca n be restated in
terms of the family G(Ω,q
i
) as follows:
1. Every transformation in the family of tuplings, G(Ω,q
i
) is a bijection.
2. 1 ∈G(Ω,q
i
) where 1 :Ω
q
i
→ Ω
q
i
denotes the identity map.
In [5] the following nice facts have been established:
Proposition 4.5.1 Let A denote a bijective and selftransient algorithm (see deﬁni
tion 4.5.3). Then
A
−→ is an equivalence relation.
Proposition 4.5.1 motivates the following deﬁnition:
Deﬁnition 4.5.4 Given a bijective and selftransient alg orithm A and a population
P ∈ Ω
m
, denote b y [P ]
A
the equivalence class of the population P under the equiva
lence relation
A
−→ .
To alleviate the level of abstraction we illustrate proposition 4.5.1 and deﬁn ition 4.5.4
with a couple of examples.
Example 4.5.1 Consider a binary genetic algorithm over the search space Ω={0, 1}
n
under the action of crossover alone. Let the population size be some even number m.
Consider the following family of masked crossover transformations: F = {F
M
 M ⊆
{1, 2,...,n}} where each F
M
is a binary operation (i. e. a function from Ω
2
into Ω)
deﬁned as follows: For every a =(a
1
,a
2
,...,a
n
) and b =(b
1
,b
2
,...,b
n
) ∈ Ω
n
,
F
M
(a, b)=x =(x
1
,x
2
,...,x
n
) ∈ Ω
n
where
x
i
=
a
i
if i ∈ M
b
i
otherwise
Let A denote the evolutionary algorithm determined by a single elementary step of type
2 (crossover) which is associated to the reproduction
m
2
tuple
Ω=(Ω, F, F,...,F,p
1
,p
2
,...,p
m
2
,℘
m
)
(see deﬁnitions 4.2.4 and 4.2.5) where the probability distributions p
i
have the property
that p
i
(F
M
,F
K
) =0only if K = M
c
(here M
c
denotes the complement of M in
{1, 2,...,n}). This assumption on the distributions p
i
ensures that the elementary
step of crossover associated to the reproduction
m
2
tuple Ω is bijective. Depending on
the further properties of the distributions p
i
and the distribution ℘
m
, different types
of equivalence relations
A
−→ would be induced. Typically, in case of a classical GA
crossover, the distributions p
i
are all identical (i. e. p
1
= p
2
= ... = p
m
2
= p)where
21 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
p is the uniform distribution on {1, 2,...,n} and the distribution ℘
m
is uniform over
all partitions of {1, 2,...,n} into 2element subsets. In such a case the equivalence
relation
A
−→ is determined by the numbers of 0’s in the columns (or, equivalently, by
the numbers of 1’s in the columns). The reason this is so is that a population Q can
be reached from a population P in by performing a sequence of crossover elementary
steps only if it has the same amount of “genetic material” in every column since alleles
are neither lost nor created during homologous crossover. Using the fact that every
permutation can be obtained by performing enough transpositions, one can show the
converse of this fact. This fact is a particular case of lemma 47 of [5]. For instance, if
n =5and m =4we have
⎛
⎜
⎜
⎝
01011
01010
10101
00101
⎞
⎟
⎟
⎠
A
−→
⎛
⎜
⎜
⎝
01001
10101
00010
01111
⎞
⎟
⎟
⎠
.
Indeed, the number of 0’s in both populations in the ﬁrst column is 3,inthe2
nd
, 3
rd
and 4
th
columns is 2 and in the last column is 1. Thus the equivalence class cor
responding to a given population P can be described by an ordered ntuple [c]
P
=
(c
1
,c
2
,...,c
n
) of numbers between 0 and m where c
i
is the number of 0sinthei
th
column of P . For example, if P is either one of the equivalent populations above then
[c]
P
=(3, 2, 2, 2, 1).
2
Example 4.5.2 Continuing with example 4.5.1, consider the following family of muta
tion transformations M = { T
u
 u ∈ Ω} where each transformation T
u
is deﬁned as
follows: Denote by +
2
the addition modulo 2 (0+
2
0=0, 1+
2
0=1, 0+
2
1=1,
1+
2
1=0). We then deﬁne T
u
to be the function from Ω into itself which sends every
a =(a
1
,a
2
,...,a
n
) to T
u
(a)=a ⊕ u where ⊕ is componentwise addition modulo
2, i. e. given x =(x
1
,x
2
,...,x
n
) and y =(y
1
,y
2
,...,y
n
) ∈ Ω
n
,the⊕ operation
is deﬁned as follows: x ⊕ y = z where z =(z
1
,z
2
,...,z
n
) with z
i
= x
i
+
2
y
i
.
Notice ﬁrst that every transformation T
u
is bijective (in fact T
u
◦ T
u
= 1 where 1 is
the identity map on Ω). Since every mutation transformation T
u
is uniquely determined
by the element u ∈ Ω,deﬁning a probability distribution on the family M amounts to
deﬁning a probability distribution o n Ω={0, 1}
n
. To achieve a situ ation equivalent
to the classical case where every b it is mutated independently with a small probability
>0 and remains unchanged with probability 1 − , we choose 1 with probability
and 0 with probability 1 − independently n times. Given a population of size m we
let mutation be the elementary step associated to the reproduction mtuple
Ω
mutation
=(Ω, M, M,...,M, p, p,...,p,℘
m
)
where p is the probability distribution on M described above and ℘
m
is the unique
trivial p robability distribution o n the oneelement set (since there is exactly one way to
partition a given set into singleton subsets). Now let B denote the algorithm determined
by the elementary step of crossover as described in example 4.5.1 followed by the
2
One point crossover under reasonable assumptions will produce the same equivalence relation.
22 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
elementary step of mutation as described above. Then the algorithm B is ergodic in the
sense of deﬁnition 58 of [5] which means that the equivalen ce relation
B
−→ is trivial,
i.e. there is only one equivalence class or, in other words, for any two populations
P and Q we have P
B
−→ Q. Indeed, thanks to the availability of mutation, any given
population can be reached from any other given population in a single step with a small
but a positive p robability which means that any two g iven populations are equivalent
under
B
−→ .
The main result of [5] is the following fact:
Theorem 4.5.2 Let A denote a bijective and selftransient algorithm. Then the Markov
chain initiated a t some population P ∈ Ω
m
is irreducible and its unique stationary dis
tribution is the uniform distribution (on [P ]
A
).
The classical versions of Geiringer theorem, such as the ones established in [4]
and in [12] are stated in terms of the “limiting frequency of occurr e nce” of a certain
element of the search space. The following deﬁnitions, which also appear in [5], make
these notions precise in the ﬁnite population setting:
Deﬁnition 4.5.5 We d eﬁne the characteristic function X :Ω
m
×P(Ω) → N ∪{0}
as follows: X (P, S)=the number of individuals of P which are the elements of S.
(Recall that P ∈ Ω
m
is a population consisting of m individuals and S ∈P(Ω) simply
means that S ⊆ Ω.)
Example 4.5.3 For instance, suppose Ω={0, 1}
n
, P =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎝
01010
01010
10101
00101
01010
10101
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎠
and
S ⊆ Ω={0, 1}
n
is determined by the Holland schema (∗, 1, ∗, 1, ∗).ThenX (P, S)=
3 because exactly three rows of P ,the1
st
,the2
nd
, and the 5
th
are in S.
Deﬁnition 4.5.6 Fix an evolutionary algorithm A and an initial population P ∈ Ω
m
.
Let P (t) denote the population obtained upon the completion of t reproduction steps
of the algorithm A in the absence of selection and mutation. For instance, P (0) = P .
Denote by Φ(S, P, t) the proportion of individuals from the set S which occur before
time t. That is, Φ(S, P, t)=
P
t
s=1
X (P (s),S)
tm
. (Notice that tm is simply the total
number of individuals encountered before time t. The same individual may be repeated
more than once and the multiplicity co ntributes to Φ.) Denote by X (2,S):Ω
m
→ N
the restriction of the function X when the set S is ﬁxed (the notation suggests that one
plugs a population P into the box).
Intuitively, Φ(S, P, t) is the frequency of encountering the individuals in S before time
t when we run the algorithm starting with the initial population P .
23 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.6 Nonlinear Genetic Programming (GP) with Homol
ogous Crossover.
In genetic programming, the search space, Ω, consists of the parse trees which usually
represent various computer programs.
Example 4.6.1 A typical parse tree representing the program (+(sin(x), ∗(x, y))) is
drawn below:
Since computers have only a ﬁnite amount of memory, it is reasonable to assume that
there are ﬁnitely many basic operations which can be used to construct programs and
that every program tree has d epth less than or equal to some integer L. Under these
assumptions Ω is a ﬁnite set. We may then deﬁne the search space as follows:
Deﬁnition 4.6.1 Fix a signature Σ=(Σ
0
, Σ
1
, Σ
2
,...,Σ
N
) where Σ
i
’s are ﬁnite
sets.
3
We assume that Σ
0
= ∅ and Σ
j
 =1∀ j
4
. The search space Ω consists of all
parse trees having depth at most L. Interior nodes having i children are labelled by
the e lements of Σ
i
. The leaf nodes are labelled by the elements of Σ
0
.
In order to study the appropriate family of reproduction (crossover) transformations
with the aim of applying the generalized Geiringer theorem, it is most convenient to
exploit Poli hyperschemata ([9] for a more detailed description).
Deﬁnition 4.6.2 A Poli hyperschema is a rooted parse tree which may have two ad
ditional labels for the nodes, namely # and = signs (it is assumed, of course, that
neither one of these denotes an operation). The = sign may label any interior node v
of the tree. Since v does occur in the tree, we must have Σ
i
 > 0.) The # sign can
only label a leaf node. A given Poli hyperschema represents the set of all programs
whose parse tree can be obtained by replacing the = signs with any operation of the
appropriate arities and attaching any program trees in place of the # signs. Different
occurrences of # or = may be replaced differently. We shall denote by S
t
the set of
programs represented by a hyperschema t.
Consider, for instance, the hyperschema t deﬁned as (+(= (#,x), ∗(sin(y), #)) which
is pictured below:
3
Intuiti vely Σ
i
is the set consisting of iary operations and Σ
0
consists of the input variables. Formally
this does not have to be the case though.
4
The assumption that Σ
j
 =1∀ j does not cause any problems since we are free to select any elements
from the search space that we want. On the other hand, this assumption helps us to av oid unnecessary
complications when dealing with the poset of Poli hyperschemata later
24 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
A couple of p rograms ﬁtting the hyperschema t are shown below:
In order to model the family of reproduction (crossover) transformation in a way which
makes it obvious that GP is a b ijective and selftransient algorithm, we shall introduce
a partial order on the set of all Poli hyperschema so that every two elements have the
least upper bound. The notion of the least upper bound will be also used to deﬁne
the common region (see [11] for an alternative description of the notion of a common
region).
Deﬁnition 4.6.3 Denote by O the set of all basic operations which can be used to
construct the programs (i. e. O =Σ
1
∪ ...,∪Σ
N
) and by V the set of all variables (i.
e. V =Σ
0
). Put the following partial order, ,onthesetO∪V∪{=, #}:
1. ∀ a, b ∈O∪Vwe have a b ⇐⇒ a = b.
2. ∀ a ∈Owe have a =.
3. ∀ a ∈O∪Vwe have a #.
4. ==, # # and = #.
We shall also write a b to mean b a.
It is easy to see that is, indeed a partial order. Moreover, every collection of elem e nts
of O∪V∪{=, #} has the least upper bound under . We are now ready to deﬁne the
partial order relation on the set o f all Poli hyperschemata:
Deﬁnition 4.6.4 Let t
1
and t
2
denote two Poli hyperschemata. We say that t
1
≥ t
2
if
and only if the following two conditions are satisﬁed:
1. the tree corresponding to t
1
when all of the labels are deleted is a subtree of the
tree corresponding to t
2
with all of the labels deleted.
2. Every one of the labels (which represents an operation or a variable) of t
1
is
the label o f the node in the corresponding position of t
2
.
25 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.6.2 For instance, the hyperschema t
1
=(+(=(#,x)), ∗(sin(y), #)) ≥
t
2
= (+(+(∗(sin(x),y),x)), ∗(sin(y), = (#))). Indeed, the parse trees of t
1
and t
2
appear on the picture below:
When all the labels in the dashed subtree of the parse tree of t
2
are deleted one gets
the tree isomorphic to that obtained from t
1
by deleting all the labels. Thus condition 1
of deﬁnition 4.6.4 is satisﬁed. To see that condition 2 is fulﬁlled as well, we notice th at
the labels of t
1
are to the corresponding labels of the dashed subtree of t
2
: Indeed,
we have + +, = +, ∗∗, # ∗, x x, sin sin, # = and y y.
Again it is easy to check that ≥ is, indeed, a partial order relation on th e collection of
Poli hyperschemata. Proposition 4.6.1 below tells us even more:
Proposition 4.6.1 Any given collection of Poli h yperschemata has the least upper
bound under ≥.
Proof: Denote by S a given collection of Poli hyperschemata. We provide an algo
rithm to construct the least upper bound of S as follows: Copies of all the trees in S are
recursively jointly traversed starting from the root nodes to identify the parts with the
same shape, i. e. the same arity in the nodes visited. Recursion is stopped as soon as
an arity mismatch between corresponding nodes in some two trees from S is present.
All the nodes and links encountered are stored. This way we obtain a tree. It remains
to stick in the labels. Each one of the interior nodes is labeled by the least upper bound
of the corresponding labels of the trees in S. The label of a leaf node is a variable, say
x, if all the labels of the corresponding nodes of the trees in S are x (which implies
that they are leaf nodes themselves). In all other cases the label of the leaf node is the
# sign. It is not hard to see that this produces the least upper bound of the collection
S of parse trees.
It was pointed out before, that programs themselves are Poli hyperschemata. The fol
lowing fact is almost immediate from the explicit construction of the least upper bound
carried out in the proof of proposition 4.6.1:
Proposition 4.6.2 A given Poli hypersch ema t is the least upper bound of the set S
t
of
programs determined by t.
26 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
From proposition 4.6.2 it follows easily that ≥ is order isomorphic to the collection of
subsets determined by the Poli hyperschemata:
Proposition 4.6.3 Let t and s denote Poli hyperschemata. Denote by S
t
and S
s
the
subsets of the search space determined by the hyperschemata t and s respectively.
Then t ≥ s ⇐⇒ S
t
⊇ S
s
.
There is another type of schemata which is useful to introduce in order to deﬁne the
family of reproduction (crossover) transformations:
Deﬁnition 4.6.5 A shape schema is just a rooted ordered tree. If
˜
t is a given shape
schema then S
˜
t
is just the set of all programs whose underlying tree when all the
labels are deleted is precisely
˜
t. Given a Poli hyperschema s, we shall denote by ˜s the
underlying shape schema of s, i. e. the tree obtained by deleting all the labels in s.
The notion of a common region which is equivalent to the one deﬁned below also
appears in [11]:
Deﬁnition 4.6.6 Given two Poli hyperschemata t and s we deﬁne their c ommon region
to be the underlying shape schema of the least upper bound of t and s.
Deﬁnition 4.6.7 Fix a shape schema
˜
t. We shall say that the set C
˜
t
= {(a, b)  a, b are
program trees and
˜
t is the common region of a and b} is a component corresponding
to the shape
˜
t.
Notice that sets determined by the shape schemata partition the search space:
Remark 4.6.1 Notice that Ω
2
=
˜
t is a shape
C
˜
t
. Moreover, C
˜
t
∩C
˜s
= ∅ for t = s.(This
is so because least upper bounds in a poset are uniquely determined and so the function
sending (a, b) → sup(a, b) → the underlying shape of sup(a, b) is well deﬁned. But
then the sets C
˜
t
are simply the preimages under this function of the individual shape
schemata and, hence, form a partition of Ω
2
.)
We now proceed to deﬁne the family of reproduction transformations. Our goal is to
introduce a family of functions on Ω
2
in such a way that each one of them is easily seen
to be bijective (see theorem 4.5.2, deﬁnition 4.5.2 , deﬁnition 4.5.3 and remar k 4.5.2 ) .
Theideaistodeﬁne these transformations on each of the components ﬁrst:
Deﬁnition 4.6.8 Fix a shape schema
˜
t. Fix a node, v of
˜
t. A onepoint partial ho
mologous crossover transformation T
v
: C
˜
t
→ C
˜
t
is deﬁned as follows: For given
(a, b) ∈ C
˜
t
let T
v
(a, b)=(c, d) where c and d are obtained from the program trees of
a and b as follows: First identify the node v in the parse trees of a and b respectively.
Now obtain the pair (c, d) by swapping the subtrees of a and b rooted at v. (This pro
cedure is described in detail in [11] and it is also illustrated in the example below).
Let G
˜
t
= {T
v
 v is a node of
˜
t} denote the family of all partial homologous onepoint
crossover transformations associated to the shape
˜
t.
The following example illustrates the concepts in deﬁnitions 4.6.5, 4.6.6 and 4.6.8:
27 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.6.3 In the upper left part of the picture parse trees of the two sample pro
grams a and b are shown. Then on the upper right one can see the least upper bound
of a and b. On the lower right the underlying tree of the least upper bound of a and b
is drawn. According to deﬁnitio n 4.6.6, this tree is precisely the common region of the
programs a and b. The isomorphic subtrees inside both, a and b, are emphasized inside
the dashed areas:
A node v is selected inside the common region. The pair of children (c, d)=T
v
(a, b)
appears on the lower left of the picture above. The subtrees rooted at v which are
swapped during crossover are emphasized inside the dashed area.
Remark 4.6.2 One does need to show that for (a, b) ∈ C
˜
t
we have T
v
(a, b) ∈ C
˜
t
.
A rigorous argument can be given as follows: Clearly T
v
: C
˜
t
→
˜
t is a shape
C
˜
t
is a
welldeﬁned map. Moreover, since v is a node of the least upper bound of a and b and
the pair (c, d) is obtained simply by swapping the corresponding subtrees rooted at
v,wegets =sup{c, d}≤sup{a, b}. Now consider the transformation F
v
: C
˜s
→
˜
t is a shape
C
˜
t
and notice that, by deﬁnition, we have F
v
(c, d)=(a, b).Butthen,
according to the reasoning above, we have sup{c, d}≤sup{a, b}. Thereby, we get
sup{c, d}≤sup{a, b}≤sup{c, d} =⇒ sup{c, d} =sup{ a, b} =⇒
˜
t =˜s.This
shows that T
v
does, indeed, map into C
˜v
. Moreover, in the process, we have also
observed a couple of very important facts:
1. T
v
◦ T
v
= 1
C
˜
t
where 1
C
˜
t
denotes the identity map on C
˜
t
. This shows, in
particular, that T
v
is a bijection.
2. T
v
preserves the least upper bounds: sup{a, b} =supT
v
(a, b).
We ar e ﬁnally ready to deﬁne the family of reproduction transformations on the search
space Ω of all programs:
Deﬁnition 4.6.9 For every shape schema
˜
t ﬁx a node v
˜
t
of
˜
t.Deﬁne a one point
crossover transformation T
{v
˜
t
}
˜
t
is a shape schema
:Ω
2
→ Ω
2
to be the settheoretic union
of all partial crossover transformations of the form T
v
˜
t
. More explicitly, this means
that whenever a given pair (a, b) ∈ Ω
2
we must have (a, b) ∈ C
˜s
for a unique shape
schema ˜s (since, a ccording to remark 4.6.1, Ω
2
is a disjoint union of components cor
responding to various shapes). But then T
{v
˜
t
}
˜
t is a shape schema
(a, b)=T
v
˜s
(a, b). Denote by
28 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
G the family of all crossover transformations together with the id entity map on Ω
2
.For
simplicity of notation we shall denote the transformations in G by plain English letters:
T , F etc., keeping in mind that every such transformation is determined by making
choices of partial crossover transformations on every one of the components.
Remark 4.6.3 Thanks to remark 4.6.2, every one of the crossover transformations in
the family G is bijective (since it is a union of bijections on the pieces of a partition). It
follows now that the generalized Geiringer theorem (theorem 4.5.2) applies to the case
of homologous GP.
Remark 4.6.4 It is also possible to model uniform GP crossover (this type of crossover
is examined in detail in [11]) in the analogous manner. All o f the results established in
the current paper apply to this case without any modiﬁcation.
4.7 The Statement of the SchemaBased Version of Geiringer’s
Theorem for Nonlinear GP under Homologous Crossover.
As mentioned before, the schemabased version of Geiringer’s theorem for nonlinear
GP is stated in terms of Poli hyperschemata.
Deﬁnition 4.7.1 A Poli hyperschema of order i is a Poli hyperschema which has ex
actly i nodes whose label is not a # or an = sign.
Aconﬁguration schema is a 0order Poli hyperschema (i.e a hyperschema which
has only the equal signs in the interior nodes and # signs in the leaf nodes.)
An operation schema is a Poli hyperschema of order 1 (i. e. a hyperschema which
has exactly one node whose label is not a # or an = sign).
Fix an individual (a parse tree) u ∈ Ω.Letv denote any node of u.LetB(v)
denote the branch of the shape schema of u from the root down to the node v.Let
B
+
(v)=B(v) ∪{w  w is a child of some node z of B with z = v}. Now deﬁne cs(v)
to be the conﬁguration schema whose underlying shape schema is B
+
(v).Leto denote
an operation or a variable (an element of Σ
i
for some i between 0 and N ). Now obtain
the operation schema os
o
from cs(v) by attaching the node labelled by o in place of the
# sign at the node corresponding to v of cs(v).Unlessv is the leaf node of u, all the
children of this new node are the leaf nodes of os
o
labelled by the # sign. When o is
the operation (or the variable) labelling the node v of u, we shall write os(v) instead
of os
o
.
Notice that if v is a root node then cs(v) is just the schema which determines the entire
search space, i. e. the parse tree consisting of a single node labelled b y the # sign.
Example 4.7.1 illustrates deﬁnition 4.7.1.
Example 4.7.1 Below we list all of the conﬁguration schemata and operation schemata
for the individual of example 4.6.1:
29 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Recall from deﬁnition 4.5.5 that X (P, S) denotes the number of individuals in the
population P which are the elements of S ⊆ Ω. The following deﬁnition makes it
more convenient to state the schemabased version of Geiringer’s theorem:
Deﬁnition 4.7.2 Given a Poli hyperschema H, we shall write H(P ) in place of
X (P, S
H
) (see deﬁnition 4.6.2) to denote the number of individuals (counting repe
titions) in the population P ﬁtting the hyperschema H.
We can now ﬁnally state the Geiringer’s theorem for nonlinear GP under homologous
crossover:
Theorem 4.7.1 Fix an initial population P ∈ Ω
m
and an individual u ∈ Ω. Suppose
every pair of individuals has a positive probability to be paired u p for crossover and
30 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
every transformation in G has a positive probability of being chosen
5
. Then the limiting
frequency of occurrence of a given individual u,
lim
t→∞
Φ(u,P,t)=
v is a node of u
os(v)(P )
cs(v)(P)
.
Example 4.7.2 To illustrate how theorem 4.7.1 can b e applied in practice, suppose
we are interested in computing the frequency of encountering the individual u from
examples 4.6.1 and 4.7.1 when the initial population of 6 individuals pictured below is
chosen:
The number of individuals in P ﬁtting the operation schema os(v
1
) is 2 (these are
x
1
and x
5
) while every individual ﬁts the con ﬁguration schema cs(v
1
). Therefore
os(v
1
)(P )
cs(v
1
)(P )
=
2
6
=
1
3
. 4 individuals, namely x
1
, x
3
, x
4
and x
5
ﬁt cs(v
2
)=cs(v
3
),
among these only 2 individuals, namely x
3
and x
5
, ﬁt os(v
2
) and 2 individuals, x
4
and
x
5
ﬁt os(v
3
) so that
os(v
2
)(P )
cs(v
2
)(P )
=
os(v
3
)(P )
cs(v
3
)(P )
=
2
4
=
1
2
. Individuals x
3
, x
4
and x
5
ﬁt
the conﬁguration schema cs(v
4
) while only x
4
ﬁts the operation schema os(v
4
) so that
os(v
4
)(P )
cs(v
4
)(P )
=
1
3
. x
1
, x
3
, x
4
and x
5
ﬁt cs(v
5
)=cs(v
6
). Among these only x
3
and x
4
ﬁt os(v
5
) while only x
4
ﬁts os(v
6
) so that
os(v
5
)(P )
cs(v
5
)(P )
=
2
4
=
1
2
and
os(v
6
)(P )
cs(v
6
)(P )
=
1
4
.
Thereby, according to theorem 4.7.1, we obtain:
lim
t→∞
Φ(u,P,t)=
6
i=1
os(v
i
)(P )
cs(v
i
)(P )
=
=
1
3
·
1
2
·
1
2
·
1
3
·
1
2
·
1
4
=
1
288
.
Roughly speaking, this means that if we run GP starting with the population P pictured
above, in the absence of mutation and selection (crossover being the only step) for an
inﬁnitely long time, the individual u will be encountered on average 1 out of 288 times.
5
These conditions can be slightly relaxed, but we try to present the main idea only
31 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.7.3 Notice that linear GP (or, equivalently, variable length GA) as de
scribed in [12] is a special case of nonlinear GP when ∀ i>1Σ
i
= ∅ and Σ
0
and
Σ
1
= ∅. Indeed, the elements of such a search space are parse trees such that every
interior node has exactly one child and the depth of the tree is bounded by some integer
N. One can think of such a tree as a sequence of labels (a
1
,a
2
,...,a
n
),theﬁrst la
bel afﬁliated with the root node, second label with the child of the root node and so on.
The label a
n
is afﬁliated with the leaf node. This gives us a onetoone correspondence,
call it φ between the search space for nonlinear GP in our speciﬁccasewhen∀ i>1
Σ
i
= ∅ while Σ
0
and Σ
1
= ∅ and the search space for linear GP which preserves
crossover. The following types of schemata have been introduced in [12]:
Deﬁnition 4.7.3 The schema H =(∗
i−1
,h
i
, #) represents the subset S
H
= {x =
(x
1
,x
2
,...,x
l
)  l>iand x
i
= h
i
}. In words, S
H
is simply the set of all individuals
whose length is at least i +1and whose i
th
allele is h
i
.
Deﬁnition 4.7.4 The schema H =(∗
i
, #) represents the subset
S
H
= {x =(x
1
,x
2
,...,x
l
)  l>i}.
In word s, S
H
is simply the subset of all individuals whose length is at least i +1.
Deﬁnition 4.7.5 The schema H =(∗
i−1
,h
i
) represents the subset
S
H
= {x =(x
1
,x
2
,...,x
i
)  x
i
= h
i
}
of the search space which is simply the set of all individuals of length exactly equal to
i whose i
th
(last) allele is h
i
.
The reader may check that under the correspondence φ th e conﬁguration schemata
correspond to the schemata H
i
=(∗
i
, #) for i ≥ 1, operation schemata correspond
to the schemata of the form H =(∗
i−1
,h
i
, #) and of the form H =(∗
i−1
,h
i
) for
i>1. Finally, the hyperschema t
(1, 1)
corresponds to the schema H =(h
1
, #).
Fix a population P ∈ Ω
m
. Recall that we denote by H the number of individuals
in P whic h ﬁ t the schema H counting repetitio ns. Also recall from deﬁnition 4.5.6
that Φ(S
H
,P,1) =
H
m
denotes the fraction of the number of individuals of P which
ﬁt the schema H. To abbreviate the notation we shall write Φ(H, P, 1) instead of
Φ(S
H
,P,1). Fix an individual u =(h
1
,h
2
,...,h
n
) ∈ Ω. Theorem 4.7.1 tells us
that
lim
t→∞
Φ(u,P,t)=
(h
1
, #)
m
· (
n−2
i=1
(∗
i
,h
i+1
, #)
(∗
i
, #)
) ·
(∗
n−1
,h
n
)
(∗
n−1
, #)
=
=
(h
1
, #)
m
· (
n−2
i=1
(∗
i
,h
i+1
, #)
m
(∗
i
, #)
m
) ·
(∗
n−1
,h
n
)
m
(∗
n−1
, #)
m
=
Φ(h
1
, #) · (
n−2
i=1
Φ(∗
i
,h
i+1
, #)
Φ(∗
i
, #)
) ·
Φ(∗
n−1
,h
n
)
Φ(∗
n−1
, #)
=
32 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
=Φ(∗
n−1
,h
n
) ·
0
i=n−2
Φ(∗
i
,h
i+1
, #)
1
i=n−1
Φ(∗
i
, #)
=Φ(∗
n−1
,h
n
) ·
i=1
i=n−1
Φ(∗
i−1
,h
i
, #)
Φ(∗
i
, #)
which is precisely the formula obtained in [12].
4.8 How Do We Obtain Theorem 4.7.1 from Theorem 4.5.2?
The following couple of corollaries from [5] are useful in obtaining the schemabased
versions of Geiringer theorem for various evolutionary algorithms. Throughout, we
shall d enote by
[P ]
A
the uniform probability distribution on the set [P ]
A
(see deﬁni
tion 4.5.4).
Corollary 4.8.1 Fix a bijective and selftransient algorithm A and an initial popula
tion P ∈ Ω
m
.FixasetS of individuals in Ω (S ⊆ Ω). Then lim
t→∞
Φ(S, P, t)=
1
m
E
[P ]
A
(X (2,S)) (here E
[P ]
A
(f) denotes the expectation of the random variable
f with respect to the uniform distribution on the set [P ]
A
).
6
To state the next corollary which brings us one step closer to deriving results similar
in ﬂavor to Geiringer’s original theorem we need one more, purely formal, assumption
about the algorithm:
Deﬁnition 4.8.1 We say that a given algorithm A is regular if the following is true:
for every population P =(x
1
,x
2
,...,x
m
) ∈ Ω
m
and for every permutation π ∈
S
m
, the population obtained by permuting the elements of P by π, namely π(P)=
(x
π(1)
,x
π(2)
,...,x
π(m)
) ∈ [P ]
A
. In words this says that the equivalence classes
[P ]
A
are permutation invariant.
Remark 4.8.1 Deﬁnition 4.8.1 is only needed because our description of an evolu
tionary search algorithm uses the ordered multiset model. This makes the generalized
Geiringer theorem (theorem 4.5.2) look nice (the stationary distribution is uniform on
[P ]
A
). A disadvantage of the multiset model is that it allows algorithms which are not
regular. If we were to use the model of [22] where the order of elements in a population
is not taken into account (a reasonable assumption since most evolutionary algorithms
used in practice are, indeed, regular) then the Generalized Geiringer theorem would
have to be modiﬁed accordingly since the stationary distribution of the corresponding
Markov process would be different from uniform (it is not difﬁcult to compute it though
since the corresponding Markov chain is just a “projection” of the one used in the
current paper).
Corollary 4.8.2 Fix a regular bijective and selftransient algorithm A and an ini
tial population P ∈ Ω
m
. Denote by
[P ]
A
the uniform probability distribution on
[P ]
A
(see deﬁnition 4.5.4). Fix a set S of individuals in Ω (S ⊆ Ω). Then we have
lim
t→∞
Φ(S, P, t)=
[P ]
A
(V
S
) where
V
S
= {P  P ∈ [P ]
A
and the 1
st
individual of P is an element o f S}.
6
Throughout the paper , wheneve r a limit is involved, the equality is meant to hold for almost every inﬁ nite
sequence of trials.
33 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Corollaries 4.8.1 and 4.8.2 are proved in section 6 of [5]. When d eriving schemabased
versions of Geiringer theorem for a speciﬁc algorithm the following strategy may be
implemented: Continuing with the notation in corollaries 4.8.1 and 4.8.2, suppose we
are given a nested sequence o f subsets of the search space: S
1
⊇ S
2
⊇ ... ⊇ S
n
.
According to corollary 4.8.2,
lim
t→∞
Φ(S
n
,P,t)=
[P ]
A
(V
S
n
)=
V
S
n

[P ]
A

=
V
S
n

V
S
n−1

·
V
S
n−1

[P ]
A

=
=
V
S
n

V
S
n−1

·
V
S
n−1

V
S
n−2

· ...·
V
S
2

V
S
1

·
V
S
1

[P ]
A

=
=
[P ]
A
(V
S
1
) ·
n−2
j=0
V
S
n−j

V
S
n−j− 1

=
1
m
E
[P ]
A
(X (2,S)) ·
n−2
j=0
V
S
n−j

V
S
n−j− 1

Notice that
V
S
j

V
S
j−1

is just the proportion of populations in [P ]
A
whose ﬁrst individual
is a member of S
j
inside the set of populations in [P ]
A
whose ﬁrst individual is a
member of S
j−1
.
Corollary 4.8.3 Fix a regular, bijective and selftransient algorithm A and an ini
tial population P ∈ Ω
m
. Fix a nested sequence of subsets S
1
⊇ S
2
⊇ ... ⊇ S
n
of individuals in Ω (S
1
⊆ Ω). Then lim
t→∞
Φ(S
n
,P,t)=
1
m
E
[P ]
A
(X (2,S)) ·
n−2
j=0
V
S
n−j

V
S
n−j− 1

where, as before, V
S
denotes the set of all populations whose ﬁrst
individual is a member of S for a given subset S ⊆ Ω.
Denote by A a given GP algorithm. Fix an individual u ∈ Ω. In order to apply
corollary 4.8.3, we may choose a descending chain of Poli hyperschemata t
1
≥ t
2
≥
...≥ t
n
= u. Fix an initial population P . To avoid putting many subscripts, we shall
write V
t
instead of V
S
t
for the set of all populations in [P ]
A
(see deﬁnition 4.4.1) whose
1
st
individual is a member of S
t
(the set of individuals determined by the hyperschema
t). In order to construct the desired sequence of nested hyperschemata, we assign the
following numerical labelling to the nodes of the parse tree of u: The nodes are labelled
by the pairs of integer coordinates. The ﬁrst coordinate shows the depth of the tree and
the second coordinate shows how far to the right a given node at the depth speciﬁed by
the ﬁrst coordinate is located. Notice, for instance, that the root node is labelled by the
coordinates (1, 1). We also introduce the following lexicographic linear ordering on
the set of coordinate pairs:
Deﬁnition 4.8.2 (a, b) ≤ (c, d) if and only if either a ≤ c or (a = c and b ≤ d).
It is well known and easy to verify that this deﬁnes a linear ordering.
Deﬁnition 4.8.3 Given a pair of coordinates (i, j), denote by ↑ (i, j) the immediate
successor of (i, j) under the lexicographic ordering deﬁned above. Explicitly,
↑ (i, j)=
(i +1, 1) if (i, j) labels the rightmost node of u at depth i
(i, j +1)otherwise
34 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
We obtain the desired nested sequence o f hyperschemata for the given individual u
recursively in the following manner:
Deﬁnition 4.8.4 Deﬁne t
(1,1)
to be the hyperschema whose root node has the same
label (operation) and arity as that of the root node of u. All children of the root node
are the leaf nodes labelled by the # sign. Once the hyperschema t
(i,j)
has been con
structed, we obtain the hyperschema t
↑(i,j)
by attaching the node of u with coordinate
↑ (i, j) in place of the # sign at coordinate ↑ (i, j) to the parse tree of t
(i,j)
.Unless
this node, call it v, is a leaf node of u, all children of this new node are the leaf nodes
of t
↑(i,j)
labelled by the # sign.
We illustrate the co nstruction with an explicit exam ple:
Example 4.8.1 Below, the nested sequence t
(1, 1)
≥ t
(2, 1)
≥ t
(2, 2)
≥ t
(3, 1)
, ≥
t
(3, 2)
≥ t
(3, 3)
corresponding to the program of example 4.6.1 is drawn explicitly:
The formula for the limiting frequency of occurrence of a given program u in corol
lary 4.8.3 involves the ratios of the form
V
t
↑(i, j)
V
t
(i, j)
. It turns out that these ratios can
be expressed nicely in terms of the presence of certain conﬁguration and operation
schemata in the initial population P :
Deﬁnition 4.8.5 Given a program tree u and the corresponding nested sequence t
(1, 1)
≥
t
(2, 1)
≥ ... ≥ t
(i, j)
≥ t
↑(i, j)
, ≥ ... ≥ t
(l, k)
= u of hyperschemata as in deﬁni
tion 4.8.4, for every (i, j) =(l, k), denote by cs
(i, j)
(os
(i, j)
)theconﬁguration schema
cs(v) (operation schema os(v))wherev is the node of u with coordinate ↑ (i, j).
Example 4.8.2 Continuing with examples 4.6.1 and 4.7.1 notice that for the individual
in these examples we have cs
(1,1)
= cs
(2,1)
= cs(v
2
)=cs(v
3
) while os
(1,1)
= os(v
2
)
and os
(2,1)
= os(v
3
) (see example 4.7.1), cs
(2,2)
= cs(v
4
) while os
(2,2)
= os(v
4
) and
cs
(3,1)
= cs
(3,2)
= cs(v
5
)=cs(v
6
) while os
(3,1)
= os(v
5
) and os
(3,2)
= os(v
6
).
The following “orbit description” lemma is the reason for introducing conﬁguration
and operation schemata: We prove the lemma under the following special assumption:
35 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Deﬁnition 4.8.6 We say that a population P is special with respect to the individual u
if for every node v of u and for every operation (or variable) o we have os
o
(P )≤1
where os
o
is obtained from cs(v) by means of attaching the operation o at the leaf node
of cs(v) corresponding to v as described in deﬁnition 4.7.1.
Deﬁnition 4.8.6 basically requires that no 2 operations (or variables) occurring in P
at the speciﬁed location are the same. It turns out that the orbit description lemma
stated below is a lot more convenient to prove under this special assumption. The
general case will then follow by introducing enough extra labels for the operations and
variables involved and then deleting the extra labels.
Lemma 4.8.4 Fix a n initial population P and a program u ∈ Ω Assume that the
population P is special with resp ect to the individual u. Suppose every pair of individ
uals has a positive probability to be paired up for crossover and every transformation
in G has a positive probability of being chosen
7
. Consider the sequences of hyper
schemata t
(1, 1)
≥ t
(2, 1)
≥ ... ≥ t
(i, j)
≥ t
↑(i, j)
, ≥ ... ≥ t
(l, k)
= u, {cs
(i, j)
 (i, j)
is a coordinate of u, (i, j) is not the maximal coordinate } and {os
(i, j)
 (i, j) is a
coordinate of u, (i, j) is not the maximal coordinate } corresponding to the individ
ual u. For a given hypersch ema t, denote by t(P ) the number of individuals in P
which ﬁt the hyperschema t counting repetitions. Suppose ∀ nonmaximal pairs of co
ordinates (i, j) we have os
(i, j)
(P ) =0and t
(1, 1)
(P ) =0. Then it is true that
∀ (i, j)
V
t
↑(i, j)

V
t
(i, j)

=
1
cs
(i, j)
(P )
.
Proof: The key idea is to observe the following fact:
Claim: Fix a coordinate (i, j). Fix any two operation schemata os
1
and os
2
which
are obtained from cs
(i, j)
by attaching either a variable or an operation at the node
(i, j). Suppose ∃ individuals in P ﬁtting both, os
1
and os
2
.ThenV
t
(i, j)
∩V
os
1
 =
V
t
(i, j)
∩V
os
2
. Proof: Consider the map F :[P ]
A
→ [P ]
A
deﬁned as follows:
Given a population, say Q ∈ [P ]
A
, notice that ∃ an individual, say x
1
,inQ ﬁtting
the oper ation schema os
1
(due to the way crossover is deﬁned, the number of individ
uals ﬁtting the operation schema os
1
(Q) is the same in every population Q ∈ [P ]
A
).
Moreover, such an individual is unique since we assumed that all operations appearing
in the individuals of P are distinct. Likewise, ∃ unique individual in Q,sayx
2
ﬁtting
the operation schema os
2
. Pair up individuals x
1
and x
2
and pair up the rest of the
individuals arbitrarily for crossover. Select the cro ssover tra nsformation T
v
where v
is the node with coordinate (i, j) for the pair (x
1
, x
2
) and choose the identity trans
formation for the rest of th e pairs. Now let F (Q) be the population obtained upon
the completion of the reproduction step described above (notice that F (Q) ∈ [P ]
A
by
deﬁnition of [P ]
A
). Notice also that F is its own inverse (i. e. F ◦ F = 1
[P ]
A
). This
tells us, in particular, that F is bijective. Moreover, it is clear from the deﬁnitions that
F (V
t
(i, j)
∩V
os
1
) ⊆V
t
(i, j)
∩V
os
2
and, likewise, F (V
t
(i, j)
∩V
os
2
) ⊆V
t
(i, j)
∩V
os
1
.
The desired conclusion follows at once.
Now observe that t
↑(i, j)
= t
(i, j)
∩ os
(i, j)
so that V
t
↑(i, j)
= V
t
(i, j)
∩V
os
(i, j)
and
t
(i, j)
=
o is an operation or a variable
(t
(i, j)
∩ os
o
) where os
o
is obtained from cs
(i, j)
by
7
These conditions can be slightly relaxed, but we try to present the main idea only
36 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
attaching the operation (or variable) o at the node ↑ (i, j). Therefore we also have
V
t
(i, j)
=
o is an operation or a variable,
(V
t
(i, j)
∩V
os
o
). Since operations can not appear or
disappear from a population during crossover, V
os
o
= ∅ =⇒∃an individual in P
ﬁtting th e operation schema os
o
. Thus the only sets of the form V
t
(i, j)
∩V
os
o
which
may possibly contribute to the union above are these for which ∃ an individual in P
ﬁtting the operation schema os
o
. According to the claim above, all such sets contribute
exactly the same amount. Moreover, by assumption os
(i, j)
(P ) = ∅, and so we have
V
t
(i, j)
 = n ·V
t
(i, j)
∩V
os
(i, j)
 = n ·V
t
(i, j)
∩os
(i, j)
 = n ·V
t
↑(i, j)
 =⇒
V
t
↑(i, j)

V
t
(i, j)

=
1
n
where n is the number of operation schemata of the form os
o
which are obtained from
cs
(i, j)
by attaching a variable or an operation at the node with coordinate (i, j) and
for which ∃ an individual in P ﬁtting the operation schema os
o
and the last implica
tion holds under the condition that V
t
(i, j)
 =0. This condition is, indeed satisﬁed.
(Suppose not. Let (a, b) denote the smallest coordinate such that V
t
(a, b)
 =0. Notice
that (a, b) =(1, 1) since V
t
(1, 1)
 =0. (By assumption ∃ an individual, say x,inP
ﬁtting the hyperschema t
(1, 1)
.Evenifx is not the 1
st
individual of P , by perform
ing crossover of x with the 1
st
individual of P at the root node one gets a population
Q ∈V
t
(1, 1)
.) But then (a, b)=↑ (i, j) for some coordinate (i, j) and according to the
equation above we have V
t
(i, j)
 = n ·V
t
↑(i, j)
 = n ·V
t
(a, b)
 =0which contradicts
the minimality of the coordinate (a, b). So we conclude that V
t
(i, j)
 =0) Thereby
we have
V
t
↑(i, j)

V
t
(i, j)

=
1
n
.Butcs
(i, j)
=
o is an operation or a variable
os
o
=⇒ cs
(i, j)
(P )=
o is an operation or a variable
os
o
(P ). Since we assumed that all of the operations and vari
ables ar e d istinct, ∃ at most one individual in P ﬁtting the operation schema os
o
and it
now follows that cs
(i, j)
(P ) = the number of operation schemata of the form os
o
such
that os
o
(P ) = ∅ which is pr ecisely the number n.Weﬁnally obtain
V
t
↑(i, j)

V
t
(i, j)

=
1
cs
(i, j)

which is precisely the conclu sion of the lemma.
Remark 4.8.2 Given an individual u and a population P consisting of m individuals,
observe that the number of individuals ﬁtting the hyperschema t
(1, 1)
is the same in
every population from [P ]
A
,i.e.∀ Q ∈ [P ]
A
we have t
(1, 1)
(Q) = t
(1, 1)
(P ) =1.
It follows immediately now that
1
m
E
[P ]
A
(X (2,S
t
(1, 1)
)) =
1
m
.
We now combine corollary 4.8.3, remark 4.8.2 and lemma 4.8.4 to obtain the following
special case of Geiringer theorem for nonlinear GP under homologous crossover in case
when all of the operations appearing in the individuals of the initial population P are
distinct:
lim
t→∞
Φ(u,P,t)=
1
m
·
(i, j) is not the maximal coordinate of u
1
cs
(i, j)
(P )
=
=
v is a node of u
1
cs(v)(P)
(recall that when v is the root node of u, cs(v) determines the entire search space, and
so
1
cs(v)(P )
=
1
m
) To obtain the general case, suppose we are given an initial popula
tion P . For every node v of u consider the set of operations O(v)={o os
o
(P )≥
37 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
1 wh ere os
o
is obtained from cs(v) as in deﬁnition 4.7.1}. For every Moreover, for ev
ery operation (or variable) o ∈O(v) let x
o
1
, x
o
2
,...,x
o
os
o
(P )
denote an enumeration
of the individuals in P ﬁtting the operation schema os
o
(P ). Relabel the operation o
occurring in the node v of x
o
i
by the formally different operation (o, i) (i.e.bythe
ordered pair (o, i) whose ﬁrst element is the operation o itself and the second element
is the index telling us in which individual of P the operation o labels the node v). Af
ter all of the relabelling is complete we obtain a new population P
which is special
with respect to the individual u in the sense of deﬁnition 4.8.6. Formally speaking,
we expand our signature Σ=(Σ
1
, Σ
2
,...,Σ
N
) as in deﬁnition 4.6.1 by adding the
operations (variables) (o, i) into Σ
j
where j is the arity of the operation o.Thisgives
us a new signature Σ
∗
=(Σ
∗
1
, Σ
∗
2
,...,Σ
∗
N
) where
Σ
∗
j
= {o  o ∈ Σ
j
and o/∈
v is a node of u
O(v)}∪
∪{(o, i)  o ∈O(v) for some v and 1 ≤ i ≤os
o
(P )}.
Denote by Ω
∗
the search space induced by the signature Σ
∗
. The natural projection
maps p
j
:Σ
∗
j
→ Σ
j
sending 0 → o when o/∈
v is a node of u
O(v) and (o, i) →
o when o ∈O(v) for some node v of u, induce the natural “deletion of the extra
labels” p rojection of the search spaces ϕ :Ω
∗
→ Ω where the individual ϕ(w) ∈ Ω is
obtained from the individual w ∈ Ω
∗
by replacing the label of every node w of w with
p
j
(w) where j is the arity of the node w. It is easily seen that the natural projection
ϕ commutes with the crossover transfo rmations in the sense that for any individuals
x, y ∈ Ω
∗
and for any crossover transformation T ∈G(see deﬁnition 4.6.9) we have
ϕ(T (x, y)) = T (ϕ(x),ϕ(y)).
8
Notice also that the population P can be obtained
from the population P
by applying the n atural projection ϕ to every individual of P
.
Therefore, running the algorithm with the initial population P is the same thing as
running the algorithm with the initial population P
and reading the output by applying
the natural projection ϕ. The special case does apply to the population P
, as mentioned
above, and so we have
lim
t→∞
Φ(u,P,t)=
w∈ϕ
−1
(u)
lim
t→∞
Φ(w,P,t)=
w∈ϕ
−1
(u)
v is a node of w
1
cs(v)(P)
.
Notice that w ∈ ϕ
−1
(u) precisely when the underlying shape schema of w is the
same as that of u, call this shape schema t
u
, and the label of every node v of w is
(o, i) where o is the label of the node v of u. According to the way the population P
was introduced, there are precisely os(v)(P) such labels (see also deﬁnition 4.7.1).
We can then identify the preimage ϕ
−1
(u) with the set
K
j=1
{i  1 ≤ i ≤os(v
j
)}
of ordered Ktuples of integers where K is the number of nodes in the parse tree of
u and v
1
, v
2
,...,v
K
is any ﬁxed enumeration of the nodes of u, in the following
manner: The identiﬁcation map ı :
K
j=1
{i  1 ≤ i ≤os(v
j
)(P )} → ϕ
−1
(u) sends
a given ordered Ktuple (i
1
,i
2
,...,i
K
) into the tree w = ı((i
1
,i
2
,...,i
K
)) whose
8
Of course, formally speaking, the two transformations T inv olved in the equation above are distinct,
as they have different domains (Ω
∗
and Ω respectively), but they are determined by the same set of shape
schemata and the same choice of nodes for crosso ver so we denote them by the same symbol.
38 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
underlying shape schema is t
u
and the label of a node v
j
of w is (o
j
,i
j
) where o
j
is
the label of the node v
j
in the parse tree of u.Weﬁnally obtain:
lim
t→∞
Φ(u,P,t)=
w∈ϕ
−1
(u)
v is a node of w
1
cs(v)(P)
=
=
(i
1
,i
2
,...i
K
)∈
Q
K
j=1
{i  1≤i≤os(v
j
)}
v is a node of u
1
cs(v)(P)
=
=
os(v
1
)(P )
i
1
=1
os(v
2
)(P )
i
2
=1
...
os(v
K
)(P )
i
K
=1
v is a node of u
1
cs(v)(P)
=
K
j=1
os(v
j
)(P )
i
j
=1
1
cs(v
j
)(P )
=
v is a node of u
os(v)(P )
cs(v)(P)
which is precisely the assertion of theorem 4.7.1.
4.9 What Does Theorem 4.5.2 Tell Us in the Presence of
Mutation for Nonlinear GP?
In general, mutation is an elementary step of type 2 (see deﬁnition 4.2.5) which is de
termined by the reproduction 1tuple of the form (Ω, M,p, ℘
m
) where M is a family
of functions on Ω. Notice that the set of partitions of the set of m elements into 1
element subsets consists of exactly one element – the partition into the singletons. This
forces ℘
m
to be the tr ivial probability distribution. We shall, therefore, omit it from
writing:
Deﬁnition 4.9.1 A mutation 1tuple is a reproduction 1tuple (Ω, M,p) where M
consists of functions on Ω and 1
Ω
∈M.(Here1
Ω
:Ω→ Ω denotes the identity map.)
An ergodic mutation 1tuple is a mutation 1tuple (Ω, M,p) such that ∀ x, and y ∈
Ω ∃ M ∈Mwith M (x)=y and p(M) > 0.
The following fact is a rather simple consequence of theorem 4.5.2 (see corollaries 7.1
and 7.2 of [5]):
Corollary 4.9.1 Let A denote a bijective and selftransient algorithm which involves
an elementary step determined by an ergodic mutation. Then the Markov chain asso
ciated to the algorithm A is irreducible and the unique stationary distribution of this
Markov chain is uniform. In particular, the limiting frequency of occurrence of any
given individual x is lim
t→∞
Φ({x},P,t)=
1
Ω
(see deﬁnition 4.5.6 for the meanin g
of Φ({x},P,t)).
When dealing with nonlinear GP, depending on the circumstances, one may want to
consider different types of mutation. Below we deﬁne one such possible mutation:
39 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Deﬁnition 4.9.2 Let Ω denote the search space for nonlinear GP over the signature
Σ=(Σ
0
, Σ
1
, Σ
2
,...,Σ
N
) where Σ
i
’s ar e ﬁnite sets (see deﬁnition 4.6.1). Consider
aconﬁguration schema t and a node v of t.Leti denote the arity of the node v and
let π denote a permutation of Σ
i
.Wedeﬁne a node mutation transformation M
t, v, π
:
Ω → Ω to be the function which sends a given program tree u which ﬁts the schema t
to the program M
t, v, π
(u) obtained from u by replacing the label a ∈ Σ
i
of the node v
of u with π(a) whenever u ﬁts the conﬁguration schema t (if u does not ﬁt the schema
t then M
t, v, π
(u)=u). We deﬁne the family of node mutations
M
node
= {M
t, v, π
:Ω→ Ω  π ∈S
Σ
i
where t is a conﬁguration schema and
i is the arity of the node v of t}.
As usual S
X
denotes the set of all permutations of the set X. Denote by Ω
NodeMut
=
(Ω, M
node
,p) the corresponding mutation 1tuple.
Although node mutation described in deﬁnition 4.9.2 above is not ergodic in the sense
of deﬁnition 4.9.1, it deﬁnes a bijective elementary step (see deﬁnition 4.5.3) . Indeed,
it is easy to see that the transformation M
t, v, π
−1
is a 2sided inverse of the trans
formation M
t, v, π
. Thereby theorem 4.5.2 applies to nonlinear GP with homologous
crossover and node mutation. It is also possible to derive a formula for the limiting
frequency of occurrence of a given individual u, namely lim
t→∞
Φ(u,P,t) much in
the same way as in theorem 4.7.1. In order to state the corresponding result for non
linear GP with homologous crossover and node mutation, it is convenient to introduce
the following deﬁnitions ﬁrst:
Deﬁnition 4.9.3 Fix an individual u ∈ Ω.Letv denote a node of u and consider
the conﬁguration schema cs(↑ v, i) obtained from the conﬁguration schema cs(v) by
attaching a node of degree i together with it’s i children in p lace of the # sign at the
node corresponding to v of cs(v). The newly attached nodes are then labelled by the =
and # signs respectively. If the newly attached node is of arity 0 then it is a leaf node
labelled by the = sign
9
. Furthermore, write cs(↑ v, u) in place of cs(↑ v, i) when i
is the arity of the node v in u. Also denote by os(v, o) the operation schema obtained
from the conﬁguration schema cs(v) by attaching a node labelled by the operation o
together with its appropriate number of children in place of the # sign. The children
of the newly attached node (if there are any) a re labelled by the # signs.
Deﬁnition 4.9.4 Given a mutation 1tuple (Ω, M,p),aconﬁguration schema t (see
deﬁnition 4.7.1), and a node v of t having arity i, denote by G(t, v) the group generated
by all the permutations π ∈ Σ
i
such that p(M
s, v, π
) > 0 for some conﬁguration
schema s such that the common region of s and t contains the node v. Fix an operation
a ∈ Σ
i
.Let
O(t, v, a)={ o ∈ Σ
i
∃g ∈ G(t, v) with g · a = o}
denote the orbit of the operation a under the action of the group G(t, v).
9
Formally speaking, according to deﬁnition 4.7.1, cs(↑ v, i) is not alwa ys a Poli hyperschema since it
may contain a leaf node labelled by the = sign. However, such a schema also deﬁnes a subset of the search
space Ω in much the same way as Poli hyperschemata. The only difference is that a leaf node labelled by the
= sign can be replaced by a variable only. One can not attach a nontrivial program tree to it.
40 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Suppose we are given a population P of size m consisting of program trees from Ω.
Recall from deﬁnition 4.7.2 that we denote by H(P ) the number of individuals in the
population P ﬁtting the schema H.
Theorem 4.9.2 Let A denote an algorithm determined by 2 elementary steps of type
2 one of which is determined by the node mutation (see deﬁnition 4.9.2) and the other
one by a homologous GP crossover. Suppose every one of the transformations in the
family G of GP homologous crossovers has a positive probability of being chosen.
10
Fix an individual (a program tree) u ∈ Ω and an initial population P .Leto(u,v)
denote the operation labelling the node v of the program tree u. Denote by
ˆ
u the
conﬁguration schema obtained from the shape schema,
˜
u,ofu (see deﬁnition 4.6.5) by
labelling all the interior nodes of u with the = signs and all the leaf nodes with the #
signs. Suppose that the probability distribution on the collection of node mutations is
such that whenever v is a node of u we have p(M
s, v, π
) > 0=⇒ s = cs(v) where as
before cs(v) is the conﬁguration schema of u corresponding to the node v.
11
Then we
have
lim
t→∞
Φ(u,P,t)=
v is a node of u
o∈O (
ˆ
u,v,o(u,v))
os(v, o)(P )
N
i=0
o∈Σ
i
os(v, o)(P )·O(
ˆ
u,v,o)
.
Proof: The proof of theorem 4.9.2 is very similar to the proof of theorem 4.7.1
and contains essentially no new ideas. We leave most of the details for the interested
reader as an exercise and p rovide only the rough outline: Just like theorem 4.7.1, the
orem 4.9.4 follows from corollary 4.8.3 by considering the nested sequence of hyper
schemata t
(1, 1)
≥ t
(2, 1)
≥ ... ≥ t
(i, j)
≥ t
↑(i, j)
≥ ... ≥ t
(l, k)
= u corresponding
to the program u (see deﬁnition 4.8.4). First we consider a special case when every
set Σ
i
consists of ordered p airs (l, o) where l is an integer, and mutation is allowed to
modify only the operation o and is not allowed to change the integer l. We then prove
theorem 4.9.2 in the special case when all the labels contained in the initial population
P have distinct ﬁrst coordinates. The general case then follows by introducing the ex
tra integer labels for the ﬁrst coordinate, applying the special case and then “erasing
the integer part from the labels” in exactly the same way as it was done in the proof of
theorem 4.7.1. The main difference lies in the claim proved inside lemma 4.8.4. The
corresponding claim for the proof of theorem 4.9.2 then says the following:
Lemma 4.9.3 Fix a node v with coordinate (i, j). Fix any two operation schemata
os(a) and os(b) which are obtained from cs
(i, j)
by attaching either a v ariable or
an operation at the node with coordinate (i, j). Suppose ∃ individuals in P ﬁ tting
both, os(c) and os(d) where a ∈O(cs(i, j),v,c) and b ∈O(cs(i, j),v,d).Then
V
t
(i, j)
∩V
os(a)
 = V
t
(i, j)
∩V
os(b)
.
Just like the claim inside of the lemma 4.8.4, lemma 4.9.3 is proved by constructing
an explicit bijection between the sets V
t
(i, j)
∩V
os(a)
and V
t
(i, j)
∩V
os(b)
, The only
difference is that these bijections make use of mutations as well as crossover.
10
Again we remark that this condition can be slightly relaxed but it does not introduce any new ideas of
interest.
11
Notice that this implies that O(
ˆ
u,v,o(u,v))=O(cs(v),v,o(u,v))
41 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
It may b e worth mentioning that theorem 4.7.1 is a special case of theorem 4.9.2. In
deed, if the only mutation tran sformations chosen with positive probability are these
which assign positive probability only to the mutations deﬁned by the identity per
mutations, then every orbit O(
ˆ
u,v,o) consists of exactly one element so that ∀ t and
v we have O(
ˆ
u,v,o) =1. To compress the language, we shall use
to denote
the union of disjoint sets. We then have
o∈O (
ˆ
u,v,o(u,v))
os(v, o)(P )=os(v)(P )
since o(u,v) is the only operation inside of O(
ˆ
u,v,o(u,v)) so that os(v)(P ) is
the only contributor to the disjoint union above. Moreover, we also have cs(v)=
N
i=0
o∈Σ
i
os(v, o) so that we obtain
N
i=0
o∈Σ
i
os(v, o)(P )·O(
ˆ
u,v,o) =
N
i=0
o∈Σ
i
os(v, o)(P )·1=cs(v)(P).
The f ormula in theorem 4.9.2 now simpliﬁes to the o ne in theorem 4.7.1.
Example 4.9.1 Continuing with example 4.7.2, suppose the signature Σ is deﬁned as
follows: Σ=(Σ
0
, Σ
1
, Σ
2
) where Σ
0
= {x, y, z, w}, Σ
1
= {sin, cos, tan, cot, ln}
and Σ
2
= {+, −, ∗, } where is a binary operation symbol different from +, ∗ and
−. (Of course, the semantics of the binary operation is irrelevan t to the c ontent of
this example, but if the reader feels more comfortable with a concrete interpretation,
they may assume, for instance, that x y =
y
x
e
ξ
2
dξ.) Now suppose the individual u
is the program (+(ln(x), (x, y)) pictured below (with nodes being enumerated just
like in example 4.7.1) (it has the same shape schema as the individual in that example):
Notice that this individual has exactly the same set of conﬁguration schemata as the
corresponding set of conﬁguration schemata in example 4.7.1 (the reader may see these
conﬁguration schemata pictured in that example). Suppose that the following mutation
transformations are the only ones chosen with positive probability:
M
cs(v
1
),v
1
, (+, −)
,M
cs(v
1
),v
1
, (∗, )
,M
cs(v
2
),v
2
, (ln, sin, cos)(tan, cot)
,
M
cs(v
2
),v
2
, (tan, cot)(sin, cos)
,M
cs(v
2
),v
2
, (ln, sin)
,M
cs(v
3
),v
3
, (+, , ∗)
,
M
cs(v
3
),v
3
, (−, ∗)
,M
cs(v
4
),v
4
, (x, w)(y,z)
,M
cs(v
4
),v
4
, (+, ∗)(−, )
,M
cs(v
5
),v
5
, (x, y, z)
,
M
cs(v
5
),v
5
, (+, −, ∗, )
,M
cs(v
6
),v
6
, (x, w)(y,z)
, and
M
cs(v
6
),v
6
, (x, w)
,M
cs(v
6
),v
6
, (sin, cos,tan,cot)
,M
cs(v
6
),v
6
, (+, −)
.
Here we represent permu tations in terms of their “disjoint cycle decompo sitions”: for
example, (ln, sin, cos)(tan, cot) rep resents the permutation on Σ
1
which sends ln into
sin, sin into cos and cos back into ln. Likewise, it sends tan into cot and cot back into
42 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
tan. If a cycle has length one (i.e. the element appearing in the cycle is a ﬁxed point of
the corresponding permutation) we omit that cycle from writing. For example, and ∗
are the ﬁxed points of the permutation (+, −). The corresponding permutation groups
are:
G(cs(v
1
),v
1
)=(+, −), (∗, ),
G(cs(v
2
),v
2
)=(ln, sin, cos)(tan, cot), (tan, cot)(sin, cos), (ln, sin),
G(cs(v
3
),v
3
)=(+, , ∗), (−, ∗),
G(cs(v
4
),v
4
)=(x, w)(y, z), (+, ∗)(−, ),
G(cs(v
5
),v
5
)=(x, y, z), (+, −, ∗, ), and
G(cs(v
6
),v
6
)=(x, w)(y, z), (x, w)(sin, cos, tan, cot), (+, − ).
The cycle decomposition makes it easy to compute the corresponding orbits:
O(cs(v
1
),v
1
, +) = {+, −}, O (cs(v
2
),v
2
, ln) = {ln, sin, cos},
O(cs(v
3
),v
3
, )={+, −, ∗, }, O (cs(v
4
),v
4
,x)={x, w},
O(cs(v
5
),v
5
,x)={x, y, z} and O(cs(v
6
),v
6
,y)={y, z} .
Now suppose the initial population is the same as in example 4.7.2. In ord er to apply
theorem 4.9.2, for every node v of u we need to compute the number os(v, o)(P ).
Recall that the schema os(v, o) is obtained from the schema os(v) by attaching the
operation o at the node v and labelling its children nodes b y the # signs. For the pop
ulation P in example 4.7.2 it was already computed that os(v
1
, +) = os(v
1
) =2.
There are exactly 2 individuals (namely x
3
and x
4
) ﬁtting the schema os(v
1
, ∗) and
so os(v
1
, ∗) =2. Exactly one individual, namely x
2
, and one individual, namely
x
6
, ﬁt the schemata os(v