Contract number
o
507953
Workpackage 8
Population dynamics in the Evolutionary En-
vironment
Deliverable 8.3
Report on Population dynamics in EvE
Project funded by the European
Community under the “Information
Society Technology” Programme
1 / 66
Contract number: 507953
Pr oject acronym: DBE
Title: Digital Business Ecosystem
Deliverable N
o
: D8.3
Due date: 05/2006
Delivery date: 07/2006
Short description: The nal report on population dynamics. Note
that this task has been brought forward by ve months.
Author: UBHAM
Partners contributed: STU, LSE, HWU
Made available to: Public
Versioning
Version Date Author, Organisation
1.0 30/06/2006 J. E. Rowe, B. Mitavskiy, J. Woodward (UBHAM)
2.0 24/07/2006 J. E. Rowe, B. Mitavskiy, J. Woodward (UBHAM)
Quality check
1
st
internal reviewer: Gerard Briscoe (HWU)
2
nd
internal reviewer: Thomas Kurz (STU)
This work is licensed under the Creative Commons
Attribution-NonCom mercial-ShareAlike 2.5 License. To view a copy of this license,
visit : http://creativecommons.org/licenses/by-nc-sa/2.5/ or send a letter to Creative
Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
2 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
3 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Contents
1 Introduction and relationship to the project 6
2 An evolutionary approach to the dynamic set cover problem 8
2.1 The dynamic set cover problem . .................... 8
2.2 TheUMDAalgorithm ......................... 9
2.3 Javacode ................................ 10
3 Theoretical analysis of variable-length evolutionary systems 12
4 Some Results about the Markov Chains Associated to GPs and General
EAs. 14
4.1 Introduction ............................... 14
4.2 Notation................................. 15
4.3 HowDoesaHeuristicSearchAlgorithmWork?............ 19
4.4 TheMarkovChainAssociatedtoanEvolutionaryAlgorithm..... 19
4.5 A Special Kind of Reproduction Steps and the Extended Geiringer
Theorem................................. 20
4.6 Nonlinear Genetic Programming (GP) with Homologous Crossover. . 24
4.7 The Statement of the Schema-Based Version of Geiringer’s Theorem
for Non-linear GP under Homologous Crossover. . . ......... 29
4.8 HowDoWeObtainTheorem4.7.1fromTheorem4.5.2? ....... 33
4.9 What Does Theorem 4.5.2 Tell Us in the Presence of Mutation for Non-
linearGP?................................ 39
4.10 What Can Be Said in the Presence of Selection in the General Case? . 49
4.11 What are the relations and for the case of tness-proportional
selection? ................................ 53
4.12 What Can Be Said when the Last Elementary Step is Mutation? . . . . 61
4.13Conclusions............................... 64
4 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Executive Summary
This report is the nal report from UBHAM on WP8 (sub-task S4, Population dynam-
ics in the evolutionary environment) brought forward from month 36. It is a summary
of work done at UBHAM during the period January–April 2006.
The report is in four main parts. The rst part introduces the work and describes
its relationship to the rest of the project. The second par t outlines our work on ex-
tending evolutionary algorithms to cope with changing requirements (the dynamic set
cover problem). The third part gives an overview of our continuing theoretical analy-
sis of evolutionary systems with variable-length structures (including trees). Th e main
theoretical results are presented in the fourth part, which is a technical paper, to be
published in the journal Theoretical Computer Science.
The main “customers” within the DBE project of this work have been STU and,
indirectly, Intel and Sun. The applications of our research have been of some assistance
in the creation of the EvE architecture and algorithms. Most of the work, however, will
be of long-term interest with an impact beyond the lifetime of the project.
5 / 66
DBE Project (Contract n° 507953)
Chapter 1
Introduction and relationship to
the project
This deliverable is the nal report for sub-task S4 (Population dynamics in the evolu-
tionary environment), which is scheduled to run from month 19 to month 36, and builds
on the work of S3, already reported [21] and the preliminary report from S4 [20]. The
objectives of this sub-task are:
1. To investigate the effects of a changing environment on evolution. Such changes
may come from external forces (e.g. changes in users’ requirements) or inter-
nally (e.g. from the migration of information).
2. To inform the development of DBE Evolutionary Environment, by liaising with
STU, Intel and Sun with regard to representations, operators and tness deni-
tions.
3. To study population dynamics of variable-sized structures. This will involve
both empirical studies, as well as a theoretical extension of existing work on
limit theorems for variable-sized strin gs.
In our previous reports [21, 20] we described a formalisation of the problem faced
by the EvE as a (weighted) set covering problem, and showed than an evolutionary
approach could be effective on such a problem. We then planned to build on this work
in the following ways:
1. Working with STU to help with their initial implementation of the EvE.
2. Extending the evolutionary approach to deal with changing requirements.
3. Empirical and theoretical studies of the extended algorithm.
4. Continuing collaboration with STU (see [18]) and HWU (see [1]) in the design
of a more sophisticated EvE (including visits to STU).
6 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Unfortunately, due to personnel problems, UBHAM has had to withdraw from the
project early. We have made some contribution to STU’s development of the EvE, in-
cluding collaborative visits. We have also begun work on designing the extension to
deal with dynamic requests. This work is presented in chapter 2. However, the empir-
ical and theoretical studies have not been completed, although the java code has been
handed over to STU. We are also unable to assist STU with their further development.
We have made more progress, however, in our theoretical studies on variable-length
evolution. Since it is envisioned that the extended algorithm may have to deal with
such structures to represent SBVR statements (e.g. in tr ee form) this is a potentially
rich area of research [19]. We summarise our results in chapter 3. Chapter 4 contains a
copy of our most recent publication on this subject (to appear in Theoretical Computer
Science). It is highly technical, and is included for completeness, since this is our nal
deliverable.
7 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 2
An evolutionary approach to the
dynamic set cover problem
2.1 The dynamic set cover problem
We b riey recall the set cover problem as a model f or the DBE EvE. We begin b y
assuming a large collection of features. These are individual things that a consumer
might want. We assume they are atomic. A request from a user is (in the simplest
form) a list of desired features. Service providers make available services to users. A
service comprises, amongst other things, a descirption of all the features which it can
provide. So, from an abstract perspective, we think of a service as being a subset of
features. The problem faced by the EvE is to nd a collection of services so that, when
taken as a whole, they provide all the features requested. We also want this to be done
as cheaply as possible. We assume each service has an associated cost, and the user
would like the cheapest collection o f services that meets their request.
This problem can be solved effectively u sing an evolutionary approach, as we have
seen in our previous reports. What we now consider is the dynamic set cover problem.
Over time, a user may produce a sequence o f different requests. Similarly, a set of users
with a common (or similar) local service pool, may make different, but related requests.
If the different requests were all unrelated to each other, then the most efcient thing to
do would be to re-run the evolutionary algorithm from scratch on a random population.
However, it is assumed that requests from related users will be somehow similar to
each other. For example, there may be some features, or feature combinations, which
frequently occur in requests from users of a certain type. We want to be able to exploit
this commonality to make the evolutionary algorithm more efcient.
One way to approach this problem is to make sure the initial population used is
biased towards services which occur frequently in response to user requests. We pro-
pose a pro bability vector v which a ssigns a certain probability to each service in the
genepool. When a new request arrives, we generate the new initial population as fol-
lows. We suppose N is the population size, and there are S services in the genepool).
Then to create a new member of the population, we create a binary string of length S,
8 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
where the probability of setting the k
th
bitto1isv
k
. We repeat this N times to ll up
the population. Each string is interpreted as a subset of services, where a 1 in position
k indicates that service k is included.
When a new habitat is rst created, it is given a genepool. It must also then get a
vector v to help it construct populations. It could take this vector from a neighbouring
habitat that corresponds to a similar user type. Or, if there is no suitable neighbour, it
could start off b y setting v
k
to be a small p robability such as 1/N .
After a request comes from the user, the population is initialised using v. The evo-
lutionary algorithm (see next section) is then run to produce a solution. That solution
will be a collection of services that meets the request as cheaply as possible. We now
want to update v so that the services that appeared in the solution become more likely
to appear in the future. One scheme is to set
v
k
=
(1 α)v
k
+ α if k is in the solution
αv
k
if k is not in the solution
where 0 <α<1 is a learning rate. When the next request comes in, the new version
of v is used to create the initial population, which is now biased towards the more
frequently used services. This form of selection is related to the population “thinning”
algorithm proposed in [18], which acts as a pre-selection phase in the EvE. We propose
that such a method be included in future versions.
2.2 The UMDA algorithm
The scheme d escribed in the previous section incrementally modies the frequencies
of services in the initial population. We thought it might be appropriate, then, to in-
vestigate the use of an evolutionary algorithm which also operates on the frequencies
of the services. The UMDA (‘Univariate Marginal Distribution Algorithm’) is such an
evolutionary algorithm. It was invented by Heinz Muhlenbein [7] and has been well-
analysed from a theoretical perspective [6, 23]. The algorithm follows three phases:
selection, generation, mutation.
Selection A subset of individuals from the current population are selected depending
on their tness.
Generation For each service k, denote by p
k
the frequency with which it appears in
the selected population. Generate a new population using the probability vector
p.
Mutation Mutate each bit of each string in the population with probability μ.
There are several ways in which to perform selection. One may use the standard pro-
portional or tournament selection schemes to select a number of individuals. Or one
may selected a certain fraction of the best individuals (e.g. take the best half of the
population). The generation stage proceeds exactly as with the initial population, only
now we use the vector p. The mutation rate is typically set to 1/S, so that, on average,
one service is included or deleted from a solution.
As a simple example, suppose that our population is as follows:
9 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
0 0 1 1 0 0 1 fitness = 100
1 0 1 1 0 0 0 fitness = 90
1 0 1 0 0 0 1 fitness = 75
0 1 0 1 0 0 1 fitness = 50
1 0 0 0 0 1 0 fitness = 30
0 1 1 1 0 0 1 fitness = 25
That is, the rst string indicates a subset containing the third, fourth and seventh ser-
vice, and so on. The population has been ranked in order o f tness. We now select the
best half of the population, namely the rst three strings. For each service (bit posi-
tion) we consider the frequency with which it appears in these top three strings. For
example, the rst service occurs in the second and third string, but not the rst string.
Its frequency is therefore 2/3. The frequencies for each service are:
service
1234567
frequency 2/3012/3002/3
We now generate a new population by sampling the services according to these prob-
abilities. That is, every time we generate a new string, the rst bit is set to a one with
probability 2/3 and so on. So we might get:
1011001
1010001
1011000
0011001
0011001
1010000
We then apply mutation. That is, each bit has a probability μ of changing. The value
of μ is usually set to be 1/S, which in this case is 1/7 so that, on average, one bit is
changed p er string. The resulting population might be:
1001001
1110001
1011000
0011101
0011011
1010110
The tnesses are then calculated, and the population ranked accordingly, and the next
generation begins.
2.3 Java code
Java code implemented the UMDA algorithm can be found at
http://www.cs.bham.ac.uk/ jer/scp.tar.
10 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
After unpackin g the tar le, the program must be compiled using
javac pop.java
which can then be run. Various parameters can be adjusted in the le
parameters.java.
This code is preliminary and for experimental purposes only.
11 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 3
Theoretical analysis of
variable-length evolutionary
systems
We have p reviously reported several theoretical results in the analysis of evolutionary
algorithms. These were generally in two areas:
Results relating to variable-length structures (in the innite population limit)
Results relating to nite populations for strings.
As an example, we proposed and proved a generalisation of the classic Geiringer’s the-
orem for nite populations. Simply put, this states that the repeated action of crossover
is to destroy the correlations between elements of solutions, so that a population tends
to s state of ”linka ge equilibrium”. I t is worth pointing out that th e generation stag e of
the UMDA algorithm takes the population in a single step to this state, as it explicitly
de-couples the linkage between services in a solution. UMDA can therefore be seen
(at least theoretically) as applying repeated strong crossover at each generation.
In our current work, we have continued to look at these themes. In the following
chapter, we present a technical paper containing our results, which we summarise here.
Firstly, we return again to Geiringer’s theorem [4], but now considering the evo-
lution of variable size and shape individuals, namely trees. This, therefore, applies to
standard Genetic Programming (in which we evolve programs in the form of trees). It
also potentially applies to the DBE EvE if, as is envisaged, the types of requests and
service combinations allowed become more sophisticated than simple lists and subsets.
For example, a hierarchical workow pattern could be represented as a tree structure.
We establish, for tree structures, the kinds of shapes (or schemata as they are tech-
nically known) which play a fundamental role in the analogue of Geiringer’s theorem,
and show how crossover again has the effect of de-coupling the correlations that exist
in the population.
On the issue of bloat (that is, uncontrolled growth of structures during evolution),
there are a number of possible counter-measures that can be employed. The simplest
12 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
and strictest is to disallow items larger than a certain threshold. Less severe, but still ef-
fective, is to have a certain probability that items larger than the threshold will be killed
off. Such a method (called the Tarpeian method) has been proposed and theoretically
motivated by Riccardo Poli [10]. An even less strict method is to have a penalty func-
tion that penalises large structures, but this can be difcult to design correctly. See [8]
for a survey of methods.
Secondly, we continue to look at general evolutionary algorithms from the Markov
chain perspective, and begin to build a framework in which bounds on the stationary
distribution can be proved. This distribution gives the probability that a given popu-
lation (or a population with any desired property) will be found by the evolutionary
algorithm in the long-run. The framework relies on the construction of a kind of order
relation (technically a pre-order ) between populations. Once such an order has been
appropriately constructed, then one can use it to determine if one population is more
likely than another to appear in the stationary distribution.
A simple application of this framework establishe s, f or example, that if we have an
evolutionary algorithm which goes through the phases: mutation, selection, crossover
(in that order), then it is impossible that the stationary distribution can be uniform.
There are always some populations which are preferred to other. This is even true even
if all individuals actually have the same tness! We ascribe this to the implicit biases
induced by crossover, which tends to “prefer” some types of population over others.
One of the practical consequences of this observation is that one has to be rather
careful when designing crossover operators that one is aware that such biases are being
introduced. For example, it is kn own that certain crossovers for variable-length string
structures have biases towards strings of a certain length. That is, such strings will tend
to be “over sampled” in the long run.
Although we have produced a considerable amount of theoretical work in this pe-
riod, it is true to say that at the moment it is still largely of only theoretical interest. We
include our paper in the next chapter for completeness, rather than expecting it to be
of direct value to the project. We had hoped to use the remaining time of the project to
use such results to help design appropriate crossover operators for structured services
within the EvE but, with our withdrawal from the project, this must be left to others, or
to some future work.
13 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Chapter 4
Some Results about the Markov
Chains Associated to GPs and
General EAs.
This chapter is a paper accepted for publication in Theoretical Computer Science.
4.1 Introduction
Geiringer’s classical theorem (see [4]) is an important part of GA theory. It has been
cited in a number of papers: see, for instance, [12], [13], [17] and [22]. It deals with
the limit of the sequence of population vectors obtained by repeatedly applying the
crossover operator C(p)
k
=
i,j
p
i
p
j
r
(i, jk)
where r
(i, jk)
denotes th e proba bility
of obtaining the individual k from the parents i and j after crossover. In other words,
it speaks to the limit of repeated crossover in the case o f an innite population. In [5],
a new version of this result was proved for nite populations, addressing the limiting
distribution of the associated Markov chain, as follows. Let Ω=
n
i=1
A
i
denote the
search space of a given genetic algorithm (intuitively A
i
is the set of alleles correspond-
ingtothei
th
gene and n is the chromosome length). Fix a population P consisting of
m individuals with m being an even number. P can be thought of as an m by n matrix
whose rows are the individuals of the population P . Write
P =
a
11
a
12
... a
1n
a
21
a
22
... a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
... a
mn
.
Notice that the elements of the i
th
column of P are members of A
i
. Continuing with
the notation used in [12], denote by Φ(h, P, i) where h A
i
the proportion of rows,
say j,ofP for which a
ji
= h. In other words, let R
h
= {j | 1 j m and a
ji
= h}.
14 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Now simply let Φ(h, P, i)=
|R
h
|
m
. The classical Geiringer theorem (see [4] or, [12]
for modern notation) says that if one starts with a population P of individuals and runs
a genetic algorithm (GA) in the absence of selection and mutation (cro ssover being
the only operator involved) then, in the “long run”, the frequency of occurrence o f
the individual (h
1
,h
2
,...,h
n
) before time t, call it Φ(h
1
,h
2
,...,h
n
,t), approaches
independence:
lim
t→∞
Φ(h
1
,h
2
,...,h
n
,t)=
n
i=1
Φ(h, P, i).
Thereby, Geiringer’s theorem tells us something about the limiting frequency with
which certain elements of the search space are sampled in the long run, provided one
uses crossover alone. In [12] this theorem has been generalized to cover the cases
of variable-length GAs and homologous linear genetic programming (GP) crossover.
The limiting distributions of the frequency of occurrence of individuals belonging to a
certain schema under these algorithms have been computed. The special conditions un-
der which such a limiting distribution exists for linear GP under homologous crossover
have been established (see theorem 9 and section 4.2.1 of [12]). In [5] a rather powerful
extension of the nite population version of Geiringer’s theorem has been established.
In the current paper we shall use the recipe described in [5] to derive a version of
Geiringer’s theorem for nonlinear GP with homologous crossover (see section 4.6 or
[9] for a detailed description of how nonlinear GP with homologous crossover works)
which is based on Poli hyperschemata (see section 4.6 or [9]). The rst step in this
procedure is to describe the search space and the appropriate family of reproduction
transformations so that the resulting GP algorithm is bijective and self-transient in the
sense of den ition 5.2 of [5]. Then the generalized Geiring e r theorem (theorem 5.2
of [5]) as well as corollaries 6.1 and 6.2 of [5] apply. The necessary details are sum-
marized in the next few sections. A schema based version of Geirnger’s theorem for
nonlinear GP applies even in the presence of “node-mutation” (see section 4.9).
The nite population Geiringer theorem established in [5] may completely describe
the stationary distribution of the Markov chain associated to an evolutionary algorithm
only in the absence of selection. In section 4.10 we introduce a pre-order relation on
the states of a Markov chain associated to an evolutionary algorithm which is dened
in terms of selection alone, and establish some general inequalities about the station-
ary distribution of this Markov chain when selection is the “last stage” in the cycle.
In section 4.12 we demonstrate that the stationary distribution of the Markov chain
associated to most evolutionary algorithms in the presence of selection can never be
uniform when mutation rate is small enough, even if the tness function is constant.
The material in sections 4.10, 4.11 and 4.12 is independent of the results in sections
5 - 9. Thus, the reader has an option of jumping to read section 4.10 right after section
4.
4.2 Notation
Ω is a nite set, called a search space.
15 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
f (0, ) is a function, called a tness function. The goal is to nd a
maximum of the function f.
F
q
is a collection o f q-ary operations on Ω. Intuitively F
q
can be thought of as the
collection of reproduction operators: some q parents produce one offspring. In nature
often q =2, for every child has two parents, but in the articial setting there seem s to
be no special reason to assume that every child has no more than two parents. When
q =1, the family F
1
can be thought of as asexual reproductions or mutations. The
following denitions will be used in section 4.3 to describe the general evolutionary
search algorithm. This approach makes it easy to state the Geiringer Theorem.
Denition 4.2.1 A population P of size m is simply an element of Ω
m
. (Intuitively it
is convenient to think of a population as a “column vector”.)
Remark 4.2.1 There are 2 primary methods for representing populations: multi-sets
and ordered multi-sets. Each has advantages, depending upon the particular analyt-
ical goals. Lothar Shmitt has published a number of papers which use the ordered
multi-set representation to advantage (see, for insta nce, [15] and [16]). According to
denition 4.2.1, in the current paper we continue the development o f analysis based
upon the presentation pioneered by Lothar Schmitt. The following example illustrates
an aspect of the representation which the reader would do well to keep in mind:
Example 4.2.1 Let Ω={0, 1}
3
. Consider the populations
000
111
111
,
111
000
111
and
111
111
000
. According to denition 4 .2.1 (the ordered multi-set model which is
exploited in the current paper) these are distinct populations despite the fact that they
represent the same population under the multi-set model.
An elementary step is a probabilistic rule which takes one population as an input and
produces another population of the same size as an output. For example, the follow-
ing elementary step corresponds to the tness-proportional selection which has been
studied in detail by Wright and Fisher (see [24] and [3]).
Denition 4.2.2 An elementary step of type 1 (alternatively, of type selection) takes
a given population P =
x
1
x
2
.
.
.
x
m
with x
i
Ω as an input. The individuals of P are
evaluated:
x
1
x
2
.
.
.
x
m
f(x
1
)
f(x
2
)
.
.
.
.
.
.
f(x
m
)
16 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
A new population
P
=
y
1
y
2
.
.
.
y
m
is obtained where y
i
s are chosen independently m times form the individuals of P and
y
i
= x
j
with probability
f(x
j
)
Σ
m
l=1
f(x
l
)
.
In other words, all of the individuals of P
are among those of P , and the expec-
tation of the number of occurrences of any individual of P in P
is proportional to the
number of occurrences of that individual in P times the individual’s tness value. In
particular, the tter the individual is, the more copies of that individual are likely to be
present in P
. On the other hand, the individuals having relatively small tness value
are not likely to enter into P
at all. This is designed to imitate the natural survival of
the ttest principle.
Population P
is the output of this elementary step.
In order to dene an elementary step of type 2 (reproduction) in a general setting which
uses the ordered multi-set repr e sentation (see remark 4.2.1 and example 4.2.1) one
needs to introduce the following denitions:
Denition 4.2.3 Fix an ordered k-tuple of integers q =(q
1
,q
2
, ...,q
k
).LetK de-
note a partition of the set {1, 2,...,m} for some m N. We say that partition K is
q-tifeveryelementofK consists of exactly q
i
elements for some i. In logical symbols
this means that if K = {P
1
,P
2
,...,P
l
} then K is q-tif 1 j l 1 i k
such that |P
j
| = q
i
. Denote by E
m
q
the family of all q-t partitions of {1, 2,...,m} (i.
e. E
m
q
= {K |K is a q-t partition of {1, 2,...,m}}).
Denition 4.2.4 Let Ω be a set, F
q
1
, F
q
2
,...,F
q
k
be some xed families of q
j
-ary op-
erations on Ω (F
q
j
is simply a family of functions from Ω
q
j
into Ω), and p
1
,p
2
,...,p
k
be probability distributions on (F
q
1
)
q
1
, (F
q
2
)
q
2
,...,(F
q
k
)
q
k
respectively. Let q =
(q
1
,q
2
, ...,q
k
). Finally, let
m
be a probability distribution o n the collection E
m
q
of
partitions of {1, 2,...,m} (see denition 4.2.3 above). We then say that the ord ered
2(k +1)-tuple , F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
) is a reproduction k-tuple
of arity (q
1
,q
2
, ...,q
k
).
The following denition o f reproduction covers both, crossover and mutation. Deni-
tion 4.2.6 (see also remark 4.2.2) will make it possible to combine different reproduc-
tion opera tors in a simple and natural way.
Denition 4.2.5 An elementary step of type 2 (alternatively, of type reproduction) as-
sociated to a given reproduction k-tuple , F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
)
takes a given population P =
x
1
x
2
.
.
.
x
m
with x
i
Ω as an input.
17 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
The individuals of P are partitioned into pairwise disjoint tuples for mating accord-
ing to the probability distribution
m
. For instance, if the partition selected according
to
m
is K = {(i
1
1
,i
1
2
,...,i
1
q
1
), (i
2
1
,i
2
2
,...,i
2
q
2
),...,(i
j
1
,i
j
2
,...,i
j
q
j
) ...} the corre-
sponding tuples are
Q
1
=
x
i
1
1
x
i
1
2
.
.
.
x
i
1
q
1
Q
2
=
x
i
2
1
x
i
2
2
.
.
.
x
i
2
q
2
... Q
j
=
x
i
j
1
x
i
j
2
.
.
.
x
i
j
q
j
...
Having selected the partition, replace every one of the selected q
j
-tuples Q
j
=
x
i
j
1
x
i
j
2
.
.
.
x
i
j
q
j
with the q
j
-tuples
Q
=
T
1
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
T
2
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
.
.
.
T
q
j
(x
i
j
1
,x
i
j
2
,...,x
i
j
q
j
)
for a q
j
-tuple of transformations (T
1
,T
2
,...,T
q
j
) (F
q
j
)
q
j
selected randomly ac-
cording to the probability distribu tion p
j
on (F
q
j
)
q
j
. This gives us a new population
P
=
y
1
y
2
.
.
.
y
m
which serves as the output of this elementary step.
Notice that a single child does not have to be produced by exactly two parents. It is
possible that a child has more than two parents. Asexual reproduction (mutation) is
also allowed.
Denition 4.2.6 A cycle is a nite sequence of elementary steps, say {s
n
}
j
n=1
,which
are either of type 1 or of type 2 and such that all of the steps in the sequence {s
n
}
j
n=1
have the same underlying search space and the same arity of input/output.
Remark 4.2.2 Intuitively, these steps are linked together in such a way that the output
of the step s
i
is the input of the step s
i+1
. This is why all of the steps in the same
cycle must have the same underlying search space and the same arity of input/output
(otherwise the input/output relationship does not make sense).
We are nally ready to describe a rather wide class of evolutionary heuristic search
algorithms.
18 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.3 How Does a Heuristic Search Algorithm Work?
A general evolutionary search algorithm works as follows: Fix a cycle,sayC =
{s
n
}
j
n=1
(see denition 4.2.6). Now start the algorithm with an initial population
P =
x
1
x
2
.
.
.
x
m
The initial population P may b e selected completely randomly, or it
may also b e predetermined depending on the circumstances. The actual method of se-
lecting the initial population P is irrelevant for the purposes of the current paper. To
run the algorithm with cycle C = {s
n
}, simply input P into s
1
,runs
1
, then input the
output of s
1
into s
2
... input the output of s
j1
into s
j
and produce the new output, say
P
.NowuseP
as an initial population and run the cycle C again. Continue this loop
nitely many times depending on the circumstances.
Denition 4.3.1 A sub-algorithm of a given evolutionary search algorithm dened by
a cycle C = { s
n
}
j
n=1
is simply an evolutionary search algorithm dened by a subse-
quence {s
n
q
}
l
q=1
of the sequence C of elementary steps.
A recombination sub-algorithm is sub-algorithm dened by a sequence of elemen-
tary steps of type 2 (Reproduction) only.
4.4 The Markov Chain Associated to an Evolutionary
Algorithm
In [22] it has been pointed out that heuristic search algorithms give rise to the following
Markov process
1
(see also [2], for instance): The state space of this Markov process
is the set of all populations of a xed size m. This set, in our notation, is simply Ω
m
.
The transition probability p
xy
is simply the probab ility that the population y Ω
m
is obtained from the population x by going through the cycle once (where the notion
of a cycle is described in section 4.3: see denition 4 .2 .6 and remark 4.2 .2 ) . The
aim o f the current paper is to establish a few rather general properties of this Markov
chain. In case when there are several algorithms present in our discussion we shall
write {p
A
xy
}
x,yΩ
m
to denote the Markov transition matrix associated to the algorithm
A while {p
B
xy
}
x,yΩ
m
would denote the Markov transition matrix associated to th e
algorithm B.
Denition 4.4.1 Fix an evolutionary search algorithm A. Denote by p
n
x,y
the prob-
ability that a population y is obtained from the population x upon the completion of
n complete cycles (in the sense of denition 4.2.6 and remark 4.2 .2) of the algorithm.
We say that a population x leads to a population y under A if and only if p
n
x,y
> 0
for some n. We also write x
A
−→ y as a shorthand notation for x leads to y.(This
terminology is adopted from [2].)
1
In the current paper the state space of this process is slightly modi ed for technical reasons which will
be seen later.
19 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.5 A Special Kind of Reproduction Steps and the Ex-
tended Geiringer Theorem
To understand the intuitive meaning of the denition below, see sections 4.2 and 4.3.
Denition 4.5.1 Given a set Ω and a family of transformations F
q
from Ω
q
into Ω, x
a q-tuple of transformations (T
1
,T
2
, ...,T
q
) (F
q
)
q
. Now consider the transforma-
tion T
1
,T
2
, ...,T
q
q
Ω
q
sending any given element
x
1
x
2
.
.
.
x
q
Ω
q
into
T
1
(x
1
,x
2
,...,x
q
)
T
2
(x
1
,x
2
,...,x
q
)
.
.
.
T
q
(x
1
,x
2
,...,x
q
)
Ω
q
We say that the transformation T
1
,T
2
, ...,T
q
is the tupling of the ordered q-tuple
(T
1
,T
2
, ...,T
q
).
Denition 4.5.2 Given an elementary step of type 2 (reproduction) associated to the
reproduction k-tuple Ω=(Ω, F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
), x some index
i with 1 i k and denote by
G,q
i
)={T
1
,T
2
, ...,T
q
q
i
Ω
q
i
| T
j
∈F
q
i
,p
i
(T
1
,T
2
, ...,T
q
i
) > 0}
the family of all tuplings which have a positive probability of being selected.
Remark 4.5.1 The family of tupling transformations G,q
i
) described in deni-
tion 4.5.2 represents the family of q parents q children crossover transformations
while the family F
q
represents the family of q parents 1 child crossovers. Depend-
ing on the circumstances it may be more convenient to specify the family of q parents
q children crossover transformations directly rather than specifying the families F
q
individually. We shall see an example of this situation in section 4.6. The family F
q
of q parents 1 child crossovers can then be recovered from the family of q parents
q children crossover transformations by using coordinate p rojections.
As mentioned in section 4.2, in nature often the arity of the reproduction transfor-
mations is 2 meaning that every child has 2 parents.
It turns out that quite many evolutionary algorithms, including the classical genetic
algorithm and nonlinear (as well as linear) genetic programming are equipped with the
reproduction steps having the following nice property which has been introduced and
investigated in [5].
Denition 4.5.3 A given elementary step of type 2 (reproduction) associated to the re-
production k-tuple , F
q
1
, F
q
2
,...,F
q
k
,p
1
,p
2
,...,p
k
,℘
m
) is said to be bijective
(and self-transient) if it satises conditions 1 (and 2) stated below:
1. 1 i k we have p
i
(T
1
,T
2
, ...,T
q
i
) > 0=⇒T
1
,T
2
, ...,T
q
i
(see
denition 4.5.1 for the meaning of T
1
,T
2
, ...,T
q
i
) is a bijection (a one-to-one and
onto map of Ω
q
i
onto itself).
20 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
2. 1 i k (T
1
,T
2
, ...,T
q
i
) (F
q
i
)
q
i
such that p
i
(T
1
,T
2
, ...,T
q
i
) > 0
and T
1
,T
2
, ...,T
q
i
= 1 where 1
q
i
Ω
q
i
denotes the identity map (i. e. x
Ω
q
i
we have T
1
,T
2
, ...,T
q
i
(x)=x). We say that a recombination sub-algorithm
(see denition 4.3.1) of a given evolutionary search algorithm is bijective (and self-
transient) if every given term of the subsequence, s
n
k
by which the sub-algorithm is
dened is bijective (an d self-transient).
Remark 4.5.2 Notice that conditions 1 and 2 of denition 4.5.3 ca n be restated in
terms of the family G,q
i
) as follows:
1. Every transformation in the family of tuplings, G,q
i
) is a bijection.
2. 1 ∈G,q
i
) where 1
q
i
Ω
q
i
denotes the identity map.
In [5] the following nice facts have been established:
Proposition 4.5.1 Let A denote a bijective and self-transient algorithm (see deni-
tion 4.5.3). Then
A
−→ is an equivalence relation.
Proposition 4.5.1 motivates the following denition:
Denition 4.5.4 Given a bijective and self-transient alg orithm A and a population
P Ω
m
, denote b y [P ]
A
the equivalence class of the population P under the equiva-
lence relation
A
−→ .
To alleviate the level of abstraction we illustrate proposition 4.5.1 and den ition 4.5.4
with a couple of examples.
Example 4.5.1 Consider a binary genetic algorithm over the search space Ω={0, 1}
n
under the action of crossover alone. Let the population size be some even number m.
Consider the following family of masked crossover transformations: F = {F
M
| M
{1, 2,...,n}} where each F
M
is a binary operation (i. e. a function from Ω
2
into Ω)
dened as follows: For every a =(a
1
,a
2
,...,a
n
) and b =(b
1
,b
2
,...,b
n
) Ω
n
,
F
M
(a, b)=x =(x
1
,x
2
,...,x
n
) Ω
n
where
x
i
=
a
i
if i M
b
i
otherwise
Let A denote the evolutionary algorithm determined by a single elementary step of type
2 (crossover) which is associated to the reproduction
m
2
-tuple
Ω=(Ω, F, F,...,F,p
1
,p
2
,...,p
m
2
,℘
m
)
(see denitions 4.2.4 and 4.2.5) where the probability distributions p
i
have the property
that p
i
(F
M
,F
K
) =0only if K = M
c
(here M
c
denotes the complement of M in
{1, 2,...,n}). This assumption on the distributions p
i
ensures that the elementary
step of crossover associated to the reproduction
m
2
-tuple Ω is bijective. Depending on
the further properties of the distributions p
i
and the distribution
m
, different types
of equivalence relations
A
−→ would be induced. Typically, in case of a classical GA
crossover, the distributions p
i
are all identical (i. e. p
1
= p
2
= ... = p
m
2
= p)where
21 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
p is the uniform distribution on {1, 2,...,n} and the distribution
m
is uniform over
all partitions of {1, 2,...,n} into 2-element subsets. In such a case the equivalence
relation
A
−→ is determined by the numbers of 0s in the columns (or, equivalently, by
the numbers of 1s in the columns). The reason this is so is that a population Q can
be reached from a population P in by performing a sequence of crossover elementary
steps only if it has the same amount of “genetic material in every column since alleles
are neither lost nor created during homologous crossover. Using the fact that every
permutation can be obtained by performing enough transpositions, one can show the
converse of this fact. This fact is a particular case of lemma 47 of [5]. For instance, if
n =5and m =4we have
01011
01010
10101
00101
A
−→
01001
10101
00010
01111
.
Indeed, the number of 0’s in both populations in the rst column is 3,inthe2
nd
, 3
rd
and 4
th
columns is 2 and in the last column is 1. Thus the equivalence class cor-
responding to a given population P can be described by an ordered n-tuple [c]
P
=
(c
1
,c
2
,...,c
n
) of numbers between 0 and m where c
i
is the number of 0sinthei
th
column of P . For example, if P is either one of the equivalent populations above then
[c]
P
=(3, 2, 2, 2, 1).
2
Example 4.5.2 Continuing with example 4.5.1, consider the following family of muta-
tion transformations M = { T
u
| u Ω} where each transformation T
u
is dened as
follows: Denote by +
2
the addition modulo 2 (0+
2
0=0, 1+
2
0=1, 0+
2
1=1,
1+
2
1=0). We then dene T
u
to be the function from Ω into itself which sends every
a =(a
1
,a
2
,...,a
n
) to T
u
(a)=a u where is componentwise addition modulo
2, i. e. given x =(x
1
,x
2
,...,x
n
) and y =(y
1
,y
2
,...,y
n
) Ω
n
,the operation
is dened as follows: x y = z where z =(z
1
,z
2
,...,z
n
) with z
i
= x
i
+
2
y
i
.
Notice rst that every transformation T
u
is bijective (in fact T
u
T
u
= 1 where 1 is
the identity map on Ω). Since every mutation transformation T
u
is uniquely determined
by the element u Ω,dening a probability distribution on the family M amounts to
dening a probability distribution o n Ω={0, 1}
n
. To achieve a situ ation equivalent
to the classical case where every b it is mutated independently with a small probability
>0 and remains unchanged with probability 1 , we choose 1 with probability
and 0 with probability 1 independently n times. Given a population of size m we
let mutation be the elementary step associated to the reproduction m-tuple
Ω
mutation
=(Ω, M, M,...,M, p, p,...,p,℘
m
)
where p is the probability distribution on M described above and
m
is the unique
trivial p robability distribution o n the one-element set (since there is exactly one way to
partition a given set into singleton subsets). Now let B denote the algorithm determined
by the elementary step of crossover as described in example 4.5.1 followed by the
2
One point crossover under reasonable assumptions will produce the same equivalence relation.
22 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
elementary step of mutation as described above. Then the algorithm B is ergodic in the
sense of denition 58 of [5] which means that the equivalen ce relation
B
−→ is trivial,
i.e. there is only one equivalence class or, in other words, for any two populations
P and Q we have P
B
−→ Q. Indeed, thanks to the availability of mutation, any given
population can be reached from any other given population in a single step with a small
but a positive p robability which means that any two g iven populations are equivalent
under
B
−→ .
The main result of [5] is the following fact:
Theorem 4.5.2 Let A denote a bijective and self-transient algorithm. Then the Markov
chain initiated a t some population P Ω
m
is irreducible and its unique stationary dis-
tribution is the uniform distribution (on [P ]
A
).
The classical versions of Geiringer theorem, such as the ones established in [4]
and in [12] are stated in terms of the “limiting frequency of occurr e nce” of a certain
element of the search space. The following denitions, which also appear in [5], make
these notions precise in the nite population setting:
Denition 4.5.5 We d ene the characteristic function X
m
×P(Ω) N ∪{0}
as follows: X (P, S)=the number of individuals of P which are the elements of S.
(Recall that P Ω
m
is a population consisting of m individuals and S ∈P(Ω) simply
means that S Ω.)
Example 4.5.3 For instance, suppose Ω={0, 1}
n
, P =
01010
01010
10101
00101
01010
10101
and
S Ω={0, 1}
n
is determined by the Holland schema (, 1, , 1, ).ThenX (P, S)=
3 because exactly three rows of P ,the1
st
,the2
nd
, and the 5
th
are in S.
Denition 4.5.6 Fix an evolutionary algorithm A and an initial population P Ω
m
.
Let P (t) denote the population obtained upon the completion of t reproduction steps
of the algorithm A in the absence of selection and mutation. For instance, P (0) = P .
Denote by Φ(S, P, t) the proportion of individuals from the set S which occur before
time t. That is, Φ(S, P, t)=
P
t
s=1
X (P (s),S)
tm
. (Notice that tm is simply the total
number of individuals encountered before time t. The same individual may be repeated
more than once and the multiplicity co ntributes to Φ.) Denote by X (2,S):Ω
m
N
the restriction of the function X when the set S is xed (the notation suggests that one
plugs a population P into the box).
Intuitively, Φ(S, P, t) is the frequency of encountering the individuals in S before time
t when we run the algorithm starting with the initial population P .
23 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.6 Nonlinear Genetic Programming (GP) with Homol-
ogous Crossover.
In genetic programming, the search space, Ω, consists of the parse trees which usually
represent various computer programs.
Example 4.6.1 A typical parse tree representing the program (+(sin(x), (x, y))) is
drawn below:
Since computers have only a nite amount of memory, it is reasonable to assume that
there are nitely many basic operations which can be used to construct programs and
that every program tree has d epth less than or equal to some integer L. Under these
assumptions Ω is a nite set. We may then dene the search space as follows:
Denition 4.6.1 Fix a signature Σ=(Σ
0
, Σ
1
, Σ
2
,...,Σ
N
) where Σ
i
’s are nite
sets.
3
We assume that Σ
0
= and |Σ
j
| =1 j
4
. The search space Ω consists of all
parse trees having depth at most L. Interior nodes having i children are labelled by
the e lements of Σ
i
. The leaf nodes are labelled by the elements of Σ
0
.
In order to study the appropriate family of reproduction (crossover) transformations
with the aim of applying the generalized Geiringer theorem, it is most convenient to
exploit Poli hyperschemata ([9] for a more detailed description).
Denition 4.6.2 A Poli hyperschema is a rooted parse tree which may have two ad-
ditional labels for the nodes, namely # and = signs (it is assumed, of course, that
neither one of these denotes an operation). The = sign may label any interior node v
of the tree. Since v does occur in the tree, we must have |Σ
i
| > 0.) The # sign can
only label a leaf node. A given Poli hyperschema represents the set of all programs
whose parse tree can be obtained by replacing the = signs with any operation of the
appropriate arities and attaching any program trees in place of the # signs. Different
occurrences of # or = may be replaced differently. We shall denote by S
t
the set of
programs represented by a hyperschema t.
Consider, for instance, the hyperschema t dened as (+(= (#,x), (sin(y), #)) which
is pictured below:
3
Intuiti vely Σ
i
is the set consisting of i-ary operations and Σ
0
consists of the input variables. Formally
this does not have to be the case though.
4
The assumption that |Σ
j
| =1 j does not cause any problems since we are free to select any elements
from the search space that we want. On the other hand, this assumption helps us to av oid unnecessary
complications when dealing with the poset of Poli hyperschemata later
24 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
A couple of p rograms tting the hyperschema t are shown below:
In order to model the family of reproduction (crossover) transformation in a way which
makes it obvious that GP is a b ijective and self-transient algorithm, we shall introduce
a partial order on the set of all Poli hyperschema so that every two elements have the
least upper bound. The notion of the least upper bound will be also used to dene
the common region (see [11] for an alternative description of the notion of a common
region).
Denition 4.6.3 Denote by O the set of all basic operations which can be used to
construct the programs (i. e. O
1
...,Σ
N
) and by V the set of all variables (i.
e. V
0
). Put the following partial order, ,onthesetO∪V∪{=, #}:
1. a, b ∈O∪Vwe have a b ⇐⇒ a = b.
2. a ∈Owe have a =.
3. a ∈OVwe have a #.
4. ==, # # and = #.
We shall also write a b to mean b a.
It is easy to see that is, indeed a partial order. Moreover, every collection of elem e nts
of O∪V{=, #} has the least upper bound under . We are now ready to dene the
partial order relation on the set o f all Poli hyperschemata:
Denition 4.6.4 Let t
1
and t
2
denote two Poli hyperschemata. We say that t
1
t
2
if
and only if the following two conditions are satised:
1. the tree corresponding to t
1
when all of the labels are deleted is a subtree of the
tree corresponding to t
2
with all of the labels deleted.
2. Every one of the labels (which represents an operation or a variable) of t
1
is
the label o f the node in the corresponding position of t
2
.
25 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.6.2 For instance, the hyperschema t
1
=(+(=(#,x)), (sin(y), #))
t
2
= (+(+((sin(x),y),x)), (sin(y), = (#))). Indeed, the parse trees of t
1
and t
2
appear on the picture below:
When all the labels in the dashed subtree of the parse tree of t
2
are deleted one gets
the tree isomorphic to that obtained from t
1
by deleting all the labels. Thus condition 1
of denition 4.6.4 is satised. To see that condition 2 is fullled as well, we notice th at
the labels of t
1
are to the corresponding labels of the dashed subtree of t
2
: Indeed,
we have + +, = +, ∗, # ∗, x x, sin sin, # = and y y.
Again it is easy to check that is, indeed, a partial order relation on th e collection of
Poli hyperschemata. Proposition 4.6.1 below tells us even more:
Proposition 4.6.1 Any given collection of Poli h yperschemata has the least upper
bound under .
Proof: Denote by S a given collection of Poli hyperschemata. We provide an algo-
rithm to construct the least upper bound of S as follows: Copies of all the trees in S are
recursively jointly traversed starting from the root nodes to identify the parts with the
same shape, i. e. the same arity in the nodes visited. Recursion is stopped as soon as
an arity mismatch between corresponding nodes in some two trees from S is present.
All the nodes and links encountered are stored. This way we obtain a tree. It remains
to stick in the labels. Each one of the interior nodes is labeled by the least upper bound
of the corresponding labels of the trees in S. The label of a leaf node is a variable, say
x, if all the labels of the corresponding nodes of the trees in S are x (which implies
that they are leaf nodes themselves). In all other cases the label of the leaf node is the
# sign. It is not hard to see that this produces the least upper bound of the collection
S of parse trees.
It was pointed out before, that programs themselves are Poli hyperschemata. The fol-
lowing fact is almost immediate from the explicit construction of the least upper bound
carried out in the proof of proposition 4.6.1:
Proposition 4.6.2 A given Poli hypersch ema t is the least upper bound of the set S
t
of
programs determined by t.
26 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
From proposition 4.6.2 it follows easily that is order isomorphic to the collection of
subsets determined by the Poli hyperschemata:
Proposition 4.6.3 Let t and s denote Poli hyperschemata. Denote by S
t
and S
s
the
subsets of the search space determined by the hyperschemata t and s respectively.
Then t s ⇐⇒ S
t
S
s
.
There is another type of schemata which is useful to introduce in order to dene the
family of reproduction (crossover) transformations:
Denition 4.6.5 A shape schema is just a rooted ordered tree. If
˜
t is a given shape
schema then S
˜
t
is just the set of all programs whose underlying tree when all the
labels are deleted is precisely
˜
t. Given a Poli hyperschema s, we shall denote by ˜s the
underlying shape schema of s, i. e. the tree obtained by deleting all the labels in s.
The notion of a common region which is equivalent to the one dened below also
appears in [11]:
Denition 4.6.6 Given two Poli hyperschemata t and s we dene their c ommon region
to be the underlying shape schema of the least upper bound of t and s.
Denition 4.6.7 Fix a shape schema
˜
t. We shall say that the set C
˜
t
= {(a, b) | a, b are
program trees and
˜
t is the common region of a and b} is a component corresponding
to the shape
˜
t.
Notice that sets determined by the shape schemata partition the search space:
Remark 4.6.1 Notice that Ω
2
=
˜
t is a shape
C
˜
t
. Moreover, C
˜
t
C
˜s
= for t = s.(This
is so because least upper bounds in a poset are uniquely determined and so the function
sending (a, b) sup(a, b) the underlying shape of sup(a, b) is well dened. But
then the sets C
˜
t
are simply the pre-images under this function of the individual shape
schemata and, hence, form a partition of Ω
2
.)
We now proceed to dene the family of reproduction transformations. Our goal is to
introduce a family of functions on Ω
2
in such a way that each one of them is easily seen
to be bijective (see theorem 4.5.2, denition 4.5.2 , denition 4.5.3 and remar k 4.5.2 ) .
Theideaistodene these transformations on each of the components rst:
Denition 4.6.8 Fix a shape schema
˜
t. Fix a node, v of
˜
t. A one-point partial ho-
mologous crossover transformation T
v
: C
˜
t
C
˜
t
is dened as follows: For given
(a, b) C
˜
t
let T
v
(a, b)=(c, d) where c and d are obtained from the program trees of
a and b as follows: First identify the node v in the parse trees of a and b respectively.
Now obtain the pair (c, d) by swapping the subtrees of a and b rooted at v. (This pro-
cedure is described in detail in [11] and it is also illustrated in the example below).
Let G
˜
t
= {T
v
| v is a node of
˜
t} denote the family of all partial homologous one-point
crossover transformations associated to the shape
˜
t.
The following example illustrates the concepts in denitions 4.6.5, 4.6.6 and 4.6.8:
27 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.6.3 In the upper left part of the picture parse trees of the two sample pro-
grams a and b are shown. Then on the upper right one can see the least upper bound
of a and b. On the lower right the underlying tree of the least upper bound of a and b
is drawn. According to denitio n 4.6.6, this tree is precisely the common region of the
programs a and b. The isomorphic subtrees inside both, a and b, are emphasized inside
the dashed areas:
A node v is selected inside the common region. The pair of children (c, d)=T
v
(a, b)
appears on the lower left of the picture above. The subtrees rooted at v which are
swapped during crossover are emphasized inside the dashed area.
Remark 4.6.2 One does need to show that for (a, b) C
˜
t
we have T
v
(a, b) C
˜
t
.
A rigorous argument can be given as follows: Clearly T
v
: C
˜
t
˜
t is a shape
C
˜
t
is a
well-dened map. Moreover, since v is a node of the least upper bound of a and b and
the pair (c, d) is obtained simply by swapping the corresponding subtrees rooted at
v,wegets =sup{c, d}≤sup{a, b}. Now consider the transformation F
v
: C
˜s
˜
t is a shape
C
˜
t
and notice that, by denition, we have F
v
(c, d)=(a, b).Butthen,
according to the reasoning above, we have sup{c, d}≤sup{a, b}. Thereby, we get
sup{c, d}≤sup{a, b}≤sup{c, d} = sup{c, d} =sup{ a, b} =
˜
t s.This
shows that T
v
does, indeed, map into C
˜v
. Moreover, in the process, we have also
observed a couple of very important facts:
1. T
v
T
v
= 1
C
˜
t
where 1
C
˜
t
denotes the identity map on C
˜
t
. This shows, in
particular, that T
v
is a bijection.
2. T
v
preserves the least upper bounds: sup{a, b} =supT
v
(a, b).
We ar e nally ready to dene the family of reproduction transformations on the search
space Ω of all programs:
Denition 4.6.9 For every shape schema
˜
t x a node v
˜
t
of
˜
t.Dene a one point
crossover transformation T
{v
˜
t
}
˜
t
is a shape schema
2
Ω
2
to be the set-theoretic union
of all partial crossover transformations of the form T
v
˜
t
. More explicitly, this means
that whenever a given pair (a, b) Ω
2
we must have (a, b) C
˜s
for a unique shape
schema ˜s (since, a ccording to remark 4.6.1, Ω
2
is a disjoint union of components cor-
responding to various shapes). But then T
{v
˜
t
}
˜
t is a shape schema
(a, b)=T
v
˜s
(a, b). Denote by
28 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
G the family of all crossover transformations together with the id entity map on Ω
2
.For
simplicity of notation we shall denote the transformations in G by plain English letters:
T , F etc., keeping in mind that every such transformation is determined by making
choices of partial crossover transformations on every one of the components.
Remark 4.6.3 Thanks to remark 4.6.2, every one of the crossover transformations in
the family G is bijective (since it is a union of bijections on the pieces of a partition). It
follows now that the generalized Geiringer theorem (theorem 4.5.2) applies to the case
of homologous GP.
Remark 4.6.4 It is also possible to model uniform GP crossover (this type of crossover
is examined in detail in [11]) in the analogous manner. All o f the results established in
the current paper apply to this case without any modication.
4.7 The Statement of the Schema-Based Version of Geiringer’s
Theorem for Non-linear GP under Homologous Crossover.
As mentioned before, the schema-based version of Geiringer’s theorem for non-linear
GP is stated in terms of Poli hyperschemata.
Denition 4.7.1 A Poli hyperschema of order i is a Poli hyperschema which has ex-
actly i nodes whose label is not a # or an = sign.
Aconguration schema is a 0-order Poli hyperschema (i.e a hyperschema which
has only the equal signs in the interior nodes and # signs in the leaf nodes.)
An operation schema is a Poli hyperschema of order 1 (i. e. a hyperschema which
has exactly one node whose label is not a # or an = sign).
Fix an individual (a parse tree) u Ω.Letv denote any node of u.LetB(v)
denote the branch of the shape schema of u from the root down to the node v.Let
B
+
(v)=B(v) ∪{w | w is a child of some node z of B with z = v}. Now dene cs(v)
to be the conguration schema whose underlying shape schema is B
+
(v).Leto denote
an operation or a variable (an element of Σ
i
for some i between 0 and N ). Now obtain
the operation schema os
o
from cs(v) by attaching the node labelled by o in place of the
# sign at the node corresponding to v of cs(v).Unlessv is the leaf node of u, all the
children of this new node are the leaf nodes of os
o
labelled by the # sign. When o is
the operation (or the variable) labelling the node v of u, we shall write os(v) instead
of os
o
.
Notice that if v is a root node then cs(v) is just the schema which determines the entire
search space, i. e. the parse tree consisting of a single node labelled b y the # sign.
Example 4.7.1 illustrates denition 4.7.1.
Example 4.7.1 Below we list all of the conguration schemata and operation schemata
for the individual of example 4.6.1:
29 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Recall from denition 4.5.5 that X (P, S) denotes the number of individuals in the
population P which are the elements of S Ω. The following denition makes it
more convenient to state the schema-based version of Geiringer’s theorem:
Denition 4.7.2 Given a Poli hyperschema H, we shall write |H(P )| in place of
X (P, S
H
) (see denition 4.6.2) to denote the number of individuals (counting repe-
titions) in the population P tting the hyperschema H.
We can now nally state the Geiringer’s theorem for non-linear GP under homologous
crossover:
Theorem 4.7.1 Fix an initial population P Ω
m
and an individual u Ω. Suppose
every pair of individuals has a positive probability to be paired u p for crossover and
30 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
every transformation in G has a positive probability of being chosen
5
. Then the limiting
frequency of occurrence of a given individual u,
lim
t→∞
Φ(u,P,t)=
v is a node of u
|os(v)(P )|
|cs(v)(P)|
.
Example 4.7.2 To illustrate how theorem 4.7.1 can b e applied in practice, suppose
we are interested in computing the frequency of encountering the individual u from
examples 4.6.1 and 4.7.1 when the initial population of 6 individuals pictured below is
chosen:
The number of individuals in P tting the operation schema os(v
1
) is 2 (these are
x
1
and x
5
) while every individual ts the con guration schema cs(v
1
). Therefore
|os(v
1
)(P )|
|cs(v
1
)(P )|
=
2
6
=
1
3
. 4 individuals, namely x
1
, x
3
, x
4
and x
5
t cs(v
2
)=cs(v
3
),
among these only 2 individuals, namely x
3
and x
5
, t os(v
2
) and 2 individuals, x
4
and
x
5
t os(v
3
) so that
|os(v
2
)(P )|
|cs(v
2
)(P )|
=
|os(v
3
)(P )|
|cs(v
3
)(P )|
=
2
4
=
1
2
. Individuals x
3
, x
4
and x
5
t
the conguration schema cs(v
4
) while only x
4
ts the operation schema os(v
4
) so that
|os(v
4
)(P )|
|cs(v
4
)(P )|
=
1
3
. x
1
, x
3
, x
4
and x
5
t cs(v
5
)=cs(v
6
). Among these only x
3
and x
4
t os(v
5
) while only x
4
ts os(v
6
) so that
|os(v
5
)(P )|
|cs(v
5
)(P )|
=
2
4
=
1
2
and
|os(v
6
)(P )|
|cs(v
6
)(P )|
=
1
4
.
Thereby, according to theorem 4.7.1, we obtain:
lim
t→∞
Φ(u,P,t)=
6
i=1
|os(v
i
)(P )|
|cs(v
i
)(P )|
=
=
1
3
·
1
2
·
1
2
·
1
3
·
1
2
·
1
4
=
1
288
.
Roughly speaking, this means that if we run GP starting with the population P pictured
above, in the absence of mutation and selection (crossover being the only step) for an
innitely long time, the individual u will be encountered on average 1 out of 288 times.
5
These conditions can be slightly relaxed, but we try to present the main idea only
31 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.7.3 Notice that linear GP (or, equivalently, variable length GA) as de-
scribed in [12] is a special case of nonlinear GP when i>
i
= and Σ
0
and
Σ
1
= . Indeed, the elements of such a search space are parse trees such that every
interior node has exactly one child and the depth of the tree is bounded by some integer
N. One can think of such a tree as a sequence of labels (a
1
,a
2
,...,a
n
),therst la-
bel afliated with the root node, second label with the child of the root node and so on.
The label a
n
is afliated with the leaf node. This gives us a one-to-one correspondence,
call it φ between the search space for nonlinear GP in our speciccasewhen i>1
Σ
i
= while Σ
0
and Σ
1
= and the search space for linear GP which preserves
crossover. The following types of schemata have been introduced in [12]:
Denition 4.7.3 The schema H =(
i1
,h
i
, #) represents the subset S
H
= {x =
(x
1
,x
2
,...,x
l
) | l>iand x
i
= h
i
}. In words, S
H
is simply the set of all individuals
whose length is at least i +1and whose i
th
allele is h
i
.
Denition 4.7.4 The schema H =(
i
, #) represents the subset
S
H
= {x =(x
1
,x
2
,...,x
l
) | l>i}.
In word s, S
H
is simply the subset of all individuals whose length is at least i +1.
Denition 4.7.5 The schema H =(
i1
,h
i
) represents the subset
S
H
= {x =(x
1
,x
2
,...,x
i
) | x
i
= h
i
}
of the search space which is simply the set of all individuals of length exactly equal to
i whose i
th
(last) allele is h
i
.
The reader may check that under the correspondence φ th e conguration schemata
correspond to the schemata H
i
=(
i
, #) for i 1, operation schemata correspond
to the schemata of the form H =(
i1
,h
i
, #) and of the form H =(
i1
,h
i
) for
i>1. Finally, the hyperschema t
(1, 1)
corresponds to the schema H =(h
1
, #).
Fix a population P Ω
m
. Recall that we denote by |H| the number of individuals
in P whic h t the schema H counting repetitio ns. Also recall from denition 4.5.6
that Φ(S
H
,P,1) =
|H|
m
denotes the fraction of the number of individuals of P which
t the schema H. To abbreviate the notation we shall write Φ(H, P, 1) instead of
Φ(S
H
,P,1). Fix an individual u =(h
1
,h
2
,...,h
n
) Ω. Theorem 4.7.1 tells us
that
lim
t→∞
Φ(u,P,t)=
|(h
1
, #)|
m
· (
n2
i=1
|(
i
,h
i+1
, #)|
|(
i
, #)|
) ·
|(
n1
,h
n
)|
|(
n1
, #)|
=
=
|(h
1
, #)|
m
· (
n2
i=1
|(
i
,h
i+1
, #)|
m
|(
i
, #)|
m
) ·
|(
n1
,h
n
)|
m
|(
n1
, #)|
m
=
Φ(h
1
, #) · (
n2
i=1
Φ(
i
,h
i+1
, #)
Φ(
i
, #)
) ·
Φ(
n1
,h
n
)
Φ(
n1
, #)
=
32 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
(
n1
,h
n
) ·
0
i=n2
Φ(
i
,h
i+1
, #)
1
i=n1
Φ(
i
, #)
(
n1
,h
n
) ·
i=1
i=n1
Φ(
i1
,h
i
, #)
Φ(
i
, #)
which is precisely the formula obtained in [12].
4.8 How Do We Obtain Theorem 4.7.1 from Theorem 4.5.2?
The following couple of corollaries from [5] are useful in obtaining the schema-based
versions of Geiringer theorem for various evolutionary algorithms. Throughout, we
shall d enote by
[P ]
A
the uniform probability distribution on the set [P ]
A
(see deni-
tion 4.5.4).
Corollary 4.8.1 Fix a bijective and self-transient algorithm A and an initial popula-
tion P Ω
m
.FixasetS of individuals in Ω (S Ω). Then lim
t→∞
Φ(S, P, t)=
1
m
E
[P ]
A
(X (2,S)) (here E
[P ]
A
(f) denotes the expectation of the random variable
f with respect to the uniform distribution on the set [P ]
A
).
6
To state the next corollary which brings us one step closer to deriving results similar
in avor to Geiringer’s original theorem we need one more, purely formal, assumption
about the algorithm:
Denition 4.8.1 We say that a given algorithm A is regular if the following is true:
for every population P =(x
1
,x
2
,...,x
m
) Ω
m
and for every permutation π
S
m
, the population obtained by permuting the elements of P by π, namely π(P)=
(x
π(1)
,x
π(2)
,...,x
π(m)
) [P ]
A
. In words this says that the equivalence classes
[P ]
A
are permutation invariant.
Remark 4.8.1 Denition 4.8.1 is only needed because our description of an evolu-
tionary search algorithm uses the ordered multi-set model. This makes the generalized
Geiringer theorem (theorem 4.5.2) look nice (the stationary distribution is uniform on
[P ]
A
). A disadvantage of the multi-set model is that it allows algorithms which are not
regular. If we were to use the model of [22] where the order of elements in a population
is not taken into account (a reasonable assumption since most evolutionary algorithms
used in practice are, indeed, regular) then the Generalized Geiringer theorem would
have to be modied accordingly since the stationary distribution of the corresponding
Markov process would be different from uniform (it is not difcult to compute it though
since the corresponding Markov chain is just a “projection” of the one used in the
current paper).
Corollary 4.8.2 Fix a regular bijective and self-transient algorithm A and an ini-
tial population P Ω
m
. Denote by
[P ]
A
the uniform probability distribution on
[P ]
A
(see denition 4.5.4). Fix a set S of individuals in Ω (S Ω). Then we have
lim
t→∞
Φ(S, P, t)=
[P ]
A
(V
S
) where
V
S
= {P | P [P ]
A
and the 1
st
individual of P is an element o f S}.
6
Throughout the paper , wheneve r a limit is involved, the equality is meant to hold for almost every innite
sequence of trials.
33 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Corollaries 4.8.1 and 4.8.2 are proved in section 6 of [5]. When d eriving schema-based
versions of Geiringer theorem for a specic algorithm the following strategy may be
implemented: Continuing with the notation in corollaries 4.8.1 and 4.8.2, suppose we
are given a nested sequence o f subsets of the search space: S
1
S
2
... S
n
.
According to corollary 4.8.2,
lim
t→∞
Φ(S
n
,P,t)=
[P ]
A
(V
S
n
)=
|V
S
n
|
|[P ]
A
|
=
|V
S
n
|
|V
S
n1
|
·
|V
S
n1
|
|[P ]
A
|
=
=
|V
S
n
|
|V
S
n1
|
·
|V
S
n1
|
|V
S
n2
|
· ...·
|V
S
2
|
|V
S
1
|
·
|V
S
1
|
|[P ]
A
|
=
=
[P ]
A
(V
S
1
) ·
n2
j=0
|V
S
nj
|
|V
S
nj1
|
=
1
m
E
[P ]
A
(X (2,S)) ·
n2
j=0
|V
S
nj
|
|V
S
nj1
|
Notice that
|V
S
j
|
|V
S
j1
|
is just the proportion of populations in [P ]
A
whose rst individual
is a member of S
j
inside the set of populations in [P ]
A
whose rst individual is a
member of S
j1
.
Corollary 4.8.3 Fix a regular, bijective and self-transient algorithm A and an ini-
tial population P Ω
m
. Fix a nested sequence of subsets S
1
S
2
... S
n
of individuals in Ω (S
1
Ω). Then lim
t→∞
Φ(S
n
,P,t)=
1
m
E
[P ]
A
(X (2,S)) ·
n2
j=0
|V
S
nj
|
|V
S
nj1
|
where, as before, V
S
denotes the set of all populations whose rst
individual is a member of S for a given subset S Ω.
Denote by A a given GP algorithm. Fix an individual u Ω. In order to apply
corollary 4.8.3, we may choose a descending chain of Poli hyperschemata t
1
t
2
... t
n
= u. Fix an initial population P . To avoid putting many subscripts, we shall
write V
t
instead of V
S
t
for the set of all populations in [P ]
A
(see denition 4.4.1) whose
1
st
individual is a member of S
t
(the set of individuals determined by the hyperschema
t). In order to construct the desired sequence of nested hyperschemata, we assign the
following numerical labelling to the nodes of the parse tree of u: The nodes are labelled
by the pairs of integer coordinates. The rst coordinate shows the depth of the tree and
the second coordinate shows how far to the right a given node at the depth specied by
the rst coordinate is located. Notice, for instance, that the root node is labelled by the
coordinates (1, 1). We also introduce the following lexicographic linear ordering on
the set of coordinate pairs:
Denition 4.8.2 (a, b) (c, d) if and only if either a c or (a = c and b d).
It is well known and easy to verify that this denes a linear ordering.
Denition 4.8.3 Given a pair of coordinates (i, j), denote by (i, j) the immediate
successor of (i, j) under the lexicographic ordering dened above. Explicitly,
(i, j)=
(i +1, 1) if (i, j) labels the rightmost node of u at depth i
(i, j +1)otherwise
34 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
We obtain the desired nested sequence o f hyperschemata for the given individual u
recursively in the following manner:
Denition 4.8.4 Dene t
(1,1)
to be the hyperschema whose root node has the same
label (operation) and arity as that of the root node of u. All children of the root node
are the leaf nodes labelled by the # sign. Once the hyperschema t
(i,j)
has been con-
structed, we obtain the hyperschema t
(i,j)
by attaching the node of u with coordinate
(i, j) in place of the # sign at coordinate (i, j) to the parse tree of t
(i,j)
.Unless
this node, call it v, is a leaf node of u, all children of this new node are the leaf nodes
of t
(i,j)
labelled by the # sign.
We illustrate the co nstruction with an explicit exam ple:
Example 4.8.1 Below, the nested sequence t
(1, 1)
t
(2, 1)
t
(2, 2)
t
(3, 1)
,
t
(3, 2)
t
(3, 3)
corresponding to the program of example 4.6.1 is drawn explicitly:
The formula for the limiting frequency of occurrence of a given program u in corol-
lary 4.8.3 involves the ratios of the form
V
t
(i, j)
V
t
(i, j)
. It turns out that these ratios can
be expressed nicely in terms of the presence of certain conguration and operation
schemata in the initial population P :
Denition 4.8.5 Given a program tree u and the corresponding nested sequence t
(1, 1)
t
(2, 1)
... t
(i, j)
t
(i, j)
, ... t
(l, k)
= u of hyperschemata as in deni-
tion 4.8.4, for every (i, j) =(l, k), denote by cs
(i, j)
(os
(i, j)
)theconguration schema
cs(v) (operation schema os(v))wherev is the node of u with coordinate (i, j).
Example 4.8.2 Continuing with examples 4.6.1 and 4.7.1 notice that for the individual
in these examples we have cs
(1,1)
= cs
(2,1)
= cs(v
2
)=cs(v
3
) while os
(1,1)
= os(v
2
)
and os
(2,1)
= os(v
3
) (see example 4.7.1), cs
(2,2)
= cs(v
4
) while os
(2,2)
= os(v
4
) and
cs
(3,1)
= cs
(3,2)
= cs(v
5
)=cs(v
6
) while os
(3,1)
= os(v
5
) and os
(3,2)
= os(v
6
).
The following “orbit description” lemma is the reason for introducing conguration
and operation schemata: We prove the lemma under the following special assumption:
35 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Denition 4.8.6 We say that a population P is special with respect to the individual u
if for every node v of u and for every operation (or variable) o we have |os
o
(P )|≤1
where os
o
is obtained from cs(v) by means of attaching the operation o at the leaf node
of cs(v) corresponding to v as described in denition 4.7.1.
Denition 4.8.6 basically requires that no 2 operations (or variables) occurring in P
at the specied location are the same. It turns out that the orbit description lemma
stated below is a lot more convenient to prove under this special assumption. The
general case will then follow by introducing enough extra labels for the operations and
variables involved and then deleting the extra labels.
Lemma 4.8.4 Fix a n initial population P and a program u Ω Assume that the
population P is special with resp ect to the individual u. Suppose every pair of individ-
uals has a positive probability to be paired up for crossover and every transformation
in G has a positive probability of being chosen
7
. Consider the sequences of hyper-
schemata t
(1, 1)
t
(2, 1)
... t
(i, j)
t
(i, j)
, ... t
(l, k)
= u, {cs
(i, j)
| (i, j)
is a coordinate of u, (i, j) is not the maximal coordinate } and {os
(i, j)
| (i, j) is a
coordinate of u, (i, j) is not the maximal coordinate } corresponding to the individ-
ual u. For a given hypersch ema t, denote by |t(P )| the number of individuals in P
which t the hyperschema t counting repetitions. Suppose non-maximal pairs of co-
ordinates (i, j) we have |os
(i, j)
(P )| =0and |t
(1, 1)
(P )| =0. Then it is true that
(i, j)
|V
t
(i, j)
|
|V
t
(i, j)
|
=
1
|cs
(i, j)
(P )|
.
Proof: The key idea is to observe the following fact:
Claim: Fix a coordinate (i, j). Fix any two operation schemata os
1
and os
2
which
are obtained from cs
(i, j)
by attaching either a variable or an operation at the node
(i, j). Suppose individuals in P tting both, os
1
and os
2
.Then|V
t
(i, j)
∩V
os
1
| =
|V
t
(i, j)
∩V
os
2
|. Proof: Consider the map F :[P ]
A
[P ]
A
dened as follows:
Given a population, say Q [P ]
A
, notice that an individual, say x
1
,inQ tting
the oper ation schema os
1
(due to the way crossover is dened, the number of individ-
uals tting the operation schema os
1
(Q) is the same in every population Q [P ]
A
).
Moreover, such an individual is unique since we assumed that all operations appearing
in the individuals of P are distinct. Likewise, unique individual in Q,sayx
2
tting
the operation schema os
2
. Pair up individuals x
1
and x
2
and pair up the rest of the
individuals arbitrarily for crossover. Select the cro ssover tra nsformation T
v
where v
is the node with coordinate (i, j) for the pair (x
1
, x
2
) and choose the identity trans-
formation for the rest of th e pairs. Now let F (Q) be the population obtained upon
the completion of the reproduction step described above (notice that F (Q) [P ]
A
by
denition of [P ]
A
). Notice also that F is its own inverse (i. e. F F = 1
[P ]
A
). This
tells us, in particular, that F is bijective. Moreover, it is clear from the denitions that
F (V
t
(i, j)
∩V
os
1
) ⊆V
t
(i, j)
∩V
os
2
and, likewise, F (V
t
(i, j)
∩V
os
2
) ⊆V
t
(i, j)
∩V
os
1
.
The desired conclusion follows at once.
Now observe that t
(i, j)
= t
(i, j)
os
(i, j)
so that V
t
(i, j)
= V
t
(i, j)
∩V
os
(i, j)
and
t
(i, j)
=
o is an operation or a variable
(t
(i, j)
os
o
) where os
o
is obtained from cs
(i, j)
by
7
These conditions can be slightly relaxed, but we try to present the main idea only
36 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
attaching the operation (or variable) o at the node (i, j). Therefore we also have
V
t
(i, j)
=
o is an operation or a variable,
(V
t
(i, j)
∩V
os
o
). Since operations can not appear or
disappear from a population during crossover, V
os
o
= =⇒∃an individual in P
tting th e operation schema os
o
. Thus the only sets of the form V
t
(i, j)
∩V
os
o
which
may possibly contribute to the union above are these for which an individual in P
tting the operation schema os
o
. According to the claim above, all such sets contribute
exactly the same amount. Moreover, by assumption os
(i, j)
(P ) = , and so we have
|V
t
(i, j)
| = n ·|V
t
(i, j)
∩V
os
(i, j)
| = n ·|V
t
(i, j)
os
(i, j)
| = n ·|V
t
(i, j)
| =
|V
t
(i, j)
|
|V
t
(i, j)
|
=
1
n
where n is the number of operation schemata of the form os
o
which are obtained from
cs
(i, j)
by attaching a variable or an operation at the node with coordinate (i, j) and
for which an individual in P tting the operation schema os
o
and the last implica-
tion holds under the condition that |V
t
(i, j)
| =0. This condition is, indeed satised.
(Suppose not. Let (a, b) denote the smallest coordinate such that |V
t
(a, b)
| =0. Notice
that (a, b) =(1, 1) since |V
t
(1, 1)
| =0. (By assumption an individual, say x,inP
tting the hyperschema t
(1, 1)
.Evenifx is not the 1
st
individual of P , by perform-
ing crossover of x with the 1
st
individual of P at the root node one gets a population
Q ∈V
t
(1, 1)
.) But then (a, b)= (i, j) for some coordinate (i, j) and according to the
equation above we have |V
t
(i, j)
| = n ·|V
t
(i, j)
| = n ·|V
t
(a, b)
| =0which contradicts
the minimality of the coordinate (a, b). So we conclude that |V
t
(i, j)
| =0) Thereby
we have
|V
t
(i, j)
|
|V
t
(i, j)
|
=
1
n
.Butcs
(i, j)
=
o is an operation or a variable
os
o
= cs
(i, j)
(P )=
o is an operation or a variable
os
o
(P ). Since we assumed that all of the operations and vari-
ables ar e d istinct, at most one individual in P tting the operation schema os
o
and it
now follows that |cs
(i, j)
(P )| = the number of operation schemata of the form os
o
such
that os
o
(P ) = which is pr ecisely the number n.Wenally obtain
|V
t
(i, j)
|
|V
t
(i, j)
|
=
1
|cs
(i, j)
|
which is precisely the conclu sion of the lemma.
Remark 4.8.2 Given an individual u and a population P consisting of m individuals,
observe that the number of individuals tting the hyperschema t
(1, 1)
is the same in
every population from [P ]
A
,i.e. Q [P ]
A
we have |t
(1, 1)
(Q)| = |t
(1, 1)
(P )| =1.
It follows immediately now that
1
m
E
[P ]
A
(X (2,S
t
(1, 1)
)) =
1
m
.
We now combine corollary 4.8.3, remark 4.8.2 and lemma 4.8.4 to obtain the following
special case of Geiringer theorem for nonlinear GP under homologous crossover in case
when all of the operations appearing in the individuals of the initial population P are
distinct:
lim
t→∞
Φ(u,P,t)=
1
m
·
(i, j) is not the maximal coordinate of u
1
|cs
(i, j)
(P )|
=
=
v is a node of u
1
|cs(v)(P)|
(recall that when v is the root node of u, cs(v) determines the entire search space, and
so
1
|cs(v)(P )|
=
1
m
) To obtain the general case, suppose we are given an initial popula-
tion P . For every node v of u consider the set of operations O(v)={o ||os
o
(P )|≥
37 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
1 wh ere os
o
is obtained from cs(v) as in denition 4.7.1}. For every Moreover, for ev-
ery operation (or variable) o ∈O(v) let x
o
1
, x
o
2
,...,x
o
|os
o
(P )|
denote an enumeration
of the individuals in P tting the operation schema os
o
(P ). Relabel the operation o
occurring in the node v of x
o
i
by the formally different operation (o, i) (i.e.bythe
ordered pair (o, i) whose rst element is the operation o itself and the second element
is the index telling us in which individual of P the operation o labels the node v). Af-
ter all of the relabelling is complete we obtain a new population P
which is special
with respect to the individual u in the sense of denition 4.8.6. Formally speaking,
we expand our signature Σ=(Σ
1
, Σ
2
,...,Σ
N
) as in denition 4.6.1 by adding the
operations (variables) (o, i) into Σ
j
where j is the arity of the operation o.Thisgives
us a new signature Σ
=(Σ
1
, Σ
2
,...,Σ
N
) where
Σ
j
= {o | o Σ
j
and o/
v is a node of u
O(v)}∪
∪{(o, i) | o ∈O(v) for some v and 1 i ≤|os
o
(P )|}.
Denote by Ω
the search space induced by the signature Σ
. The natural projection
maps p
j
j
Σ
j
sending 0 o when o/
v is a node of u
O(v) and (o, i)
o when o ∈O(v) for some node v of u, induce the natural “deletion of the extra
labels” p rojection of the search spaces ϕ
Ω where the individual ϕ(w) Ω is
obtained from the individual w Ω
by replacing the label of every node w of w with
p
j
(w) where j is the arity of the node w. It is easily seen that the natural projection
ϕ commutes with the crossover transfo rmations in the sense that for any individuals
x, y Ω
and for any crossover transformation T ∈G(see denition 4.6.9) we have
ϕ(T (x, y)) = T (ϕ(x)(y)).
8
Notice also that the population P can be obtained
from the population P
by applying the n atural projection ϕ to every individual of P
.
Therefore, running the algorithm with the initial population P is the same thing as
running the algorithm with the initial population P
and reading the output by applying
the natural projection ϕ. The special case does apply to the population P
, as mentioned
above, and so we have
lim
t→∞
Φ(u,P,t)=
wϕ
1
(u)
lim
t→∞
Φ(w,P,t)=
wϕ
1
(u)
v is a node of w
1
|cs(v)(P)|
.
Notice that w ϕ
1
(u) precisely when the underlying shape schema of w is the
same as that of u, call this shape schema t
u
, and the label of every node v of w is
(o, i) where o is the label of the node v of u. According to the way the population P
was introduced, there are precisely |os(v)(P)| such labels (see also denition 4.7.1).
We can then identify the preimage ϕ
1
(u) with the set
K
j=1
{i | 1 i ≤|os(v
j
)|}
of ordered K-tuples of integers where K is the number of nodes in the parse tree of
u and v
1
, v
2
,...,v
K
is any xed enumeration of the nodes of u, in the following
manner: The identication map ı :
K
j=1
{i | 1 i ≤|os(v
j
)(P )|} ϕ
1
(u) sends
a given ordered K-tuple (i
1
,i
2
,...,i
K
) into the tree w = ı((i
1
,i
2
,...,i
K
)) whose
8
Of course, formally speaking, the two transformations T inv olved in the equation above are distinct,
as they have different domains (Ω
and Ω respectively), but they are determined by the same set of shape
schemata and the same choice of nodes for crosso ver so we denote them by the same symbol.
38 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
underlying shape schema is t
u
and the label of a node v
j
of w is (o
j
,i
j
) where o
j
is
the label of the node v
j
in the parse tree of u.Wenally obtain:
lim
t→∞
Φ(u,P,t)=
wϕ
1
(u)
v is a node of w
1
|cs(v)(P)|
=
=
(i
1
,i
2
,...i
K
)
Q
K
j=1
{i | 1i≤|os(v
j
)|}
v is a node of u
1
|cs(v)(P)|
=
=
|os(v
1
)(P )|
i
1
=1
|os(v
2
)(P )|
i
2
=1
...
|os(v
K
)(P )|
i
K
=1
v is a node of u
1
|cs(v)(P)|
=
K
j=1
|os(v
j
)(P )|
i
j
=1
1
|cs(v
j
)(P )|
=
v is a node of u
|os(v)(P )|
|cs(v)(P)|
which is precisely the assertion of theorem 4.7.1.
4.9 What Does Theorem 4.5.2 Tell Us in the Presence of
Mutation for Nonlinear GP?
In general, mutation is an elementary step of type 2 (see denition 4.2.5) which is de-
termined by the reproduction 1-tuple of the form , M,p,
m
) where M is a family
of functions on Ω. Notice that the set of partitions of the set of m elements into 1-
element subsets consists of exactly one element the partition into the singletons. This
forces
m
to be the tr ivial probability distribution. We shall, therefore, omit it from
writing:
Denition 4.9.1 A mutation 1-tuple is a reproduction 1-tuple , M,p) where M
consists of functions on Ω and 1
Ω
∈M.(Here1
Ω
Ω denotes the identity map.)
An ergodic mutation 1-tuple is a mutation 1-tuple , M,p) such that x, and y
Ω M ∈Mwith M (x)=y and p(M) > 0.
The following fact is a rather simple consequence of theorem 4.5.2 (see corollaries 7.1
and 7.2 of [5]):
Corollary 4.9.1 Let A denote a bijective and self-transient algorithm which involves
an elementary step determined by an ergodic mutation. Then the Markov chain asso-
ciated to the algorithm A is irreducible and the unique stationary distribution of this
Markov chain is uniform. In particular, the limiting frequency of occurrence of any
given individual x is lim
t→∞
Φ({x},P,t)=
1
|Ω|
(see denition 4.5.6 for the meanin g
of Φ({x},P,t)).
When dealing with nonlinear GP, depending on the circumstances, one may want to
consider different types of mutation. Below we dene one such possible mutation:
39 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Denition 4.9.2 Let Ω denote the search space for nonlinear GP over the signature
Σ=(Σ
0
, Σ
1
, Σ
2
,...,Σ
N
) where Σ
i
’s ar e nite sets (see denition 4.6.1). Consider
aconguration schema t and a node v of t.Leti denote the arity of the node v and
let π denote a permutation of Σ
i
.Wedene a node mutation transformation M
t, v, π
:
Ω Ω to be the function which sends a given program tree u which ts the schema t
to the program M
t, v, π
(u) obtained from u by replacing the label a Σ
i
of the node v
of u with π(a) whenever u ts the conguration schema t (if u does not t the schema
t then M
t, v, π
(u)=u). We dene the family of node mutations
M
node
= {M
t, v, π
Ω | π ∈S
Σ
i
where t is a conguration schema and
i is the arity of the node v of t}.
As usual S
X
denotes the set of all permutations of the set X. Denote by Ω
NodeMut
=
, M
node
,p) the corresponding mutation 1-tuple.
Although node mutation described in denition 4.9.2 above is not ergodic in the sense
of denition 4.9.1, it denes a bijective elementary step (see denition 4.5.3) . Indeed,
it is easy to see that the transformation M
t, v, π
1
is a 2-sided inverse of the trans-
formation M
t, v, π
. Thereby theorem 4.5.2 applies to nonlinear GP with homologous
crossover and node mutation. It is also possible to derive a formula for the limiting
frequency of occurrence of a given individual u, namely lim
t→∞
Φ(u,P,t) much in
the same way as in theorem 4.7.1. In order to state the corresponding result for non-
linear GP with homologous crossover and node mutation, it is convenient to introduce
the following denitions rst:
Denition 4.9.3 Fix an individual u Ω.Letv denote a node of u and consider
the conguration schema cs( v, i) obtained from the conguration schema cs(v) by
attaching a node of degree i together with it’s i children in p lace of the # sign at the
node corresponding to v of cs(v). The newly attached nodes are then labelled by the =
and # signs respectively. If the newly attached node is of arity 0 then it is a leaf node
labelled by the = sign
9
. Furthermore, write cs( v, u) in place of cs( v, i) when i
is the arity of the node v in u. Also denote by os(v, o) the operation schema obtained
from the conguration schema cs(v) by attaching a node labelled by the operation o
together with its appropriate number of children in place of the # sign. The children
of the newly attached node (if there are any) a re labelled by the # signs.
Denition 4.9.4 Given a mutation 1-tuple , M,p),aconguration schema t (see
denition 4.7.1), and a node v of t having arity i, denote by G(t, v) the group generated
by all the permutations π Σ
i
such that p(M
s, v, π
) > 0 for some conguration
schema s such that the common region of s and t contains the node v. Fix an operation
a Σ
i
.Let
O(t, v, a)={ o Σ
i
|∃g G(t, v) with g · a = o}
denote the orbit of the operation a under the action of the group G(t, v).
9
Formally speaking, according to denition 4.7.1, cs( v, i) is not alwa ys a Poli hyperschema since it
may contain a leaf node labelled by the = sign. However, such a schema also denes a subset of the search
space Ω in much the same way as Poli hyperschemata. The only difference is that a leaf node labelled by the
= sign can be replaced by a variable only. One can not attach a nontrivial program tree to it.
40 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Suppose we are given a population P of size m consisting of program trees from Ω.
Recall from denition 4.7.2 that we denote by |H(P )| the number of individuals in the
population P tting the schema H.
Theorem 4.9.2 Let A denote an algorithm determined by 2 elementary steps of type
2 one of which is determined by the node mutation (see denition 4.9.2) and the other
one by a homologous GP crossover. Suppose every one of the transformations in the
family G of GP homologous crossovers has a positive probability of being chosen.
10
Fix an individual (a program tree) u Ω and an initial population P .Leto(u,v)
denote the operation labelling the node v of the program tree u. Denote by
ˆ
u the
conguration schema obtained from the shape schema,
˜
u,ofu (see denition 4.6.5) by
labelling all the interior nodes of u with the = signs and all the leaf nodes with the #
signs. Suppose that the probability distribution on the collection of node mutations is
such that whenever v is a node of u we have p(M
s, v, π
) > 0= s = cs(v) where as
before cs(v) is the conguration schema of u corresponding to the node v.
11
Then we
have
lim
t→∞
Φ(u,P,t)=
v is a node of u
o∈O (
ˆ
u,v,o(u,v))
|os(v, o)(P )|
N
i=0
oΣ
i
|os(v, o)(P )|·|O(
ˆ
u,v,o)|
.
Proof: The proof of theorem 4.9.2 is very similar to the proof of theorem 4.7.1
and contains essentially no new ideas. We leave most of the details for the interested
reader as an exercise and p rovide only the rough outline: Just like theorem 4.7.1, the-
orem 4.9.4 follows from corollary 4.8.3 by considering the nested sequence of hyper-
schemata t
(1, 1)
t
(2, 1)
... t
(i, j)
t
(i, j)
... t
(l, k)
= u corresponding
to the program u (see denition 4.8.4). First we consider a special case when every
set Σ
i
consists of ordered p airs (l, o) where l is an integer, and mutation is allowed to
modify only the operation o and is not allowed to change the integer l. We then prove
theorem 4.9.2 in the special case when all the labels contained in the initial population
P have distinct rst coordinates. The general case then follows by introducing the ex-
tra integer labels for the rst coordinate, applying the special case and then “erasing
the integer part from the labels” in exactly the same way as it was done in the proof of
theorem 4.7.1. The main difference lies in the claim proved inside lemma 4.8.4. The
corresponding claim for the proof of theorem 4.9.2 then says the following:
Lemma 4.9.3 Fix a node v with coordinate (i, j). Fix any two operation schemata
os(a) and os(b) which are obtained from cs
(i, j)
by attaching either a v ariable or
an operation at the node with coordinate (i, j). Suppose individuals in P tting
both, os(c) and os(d) where a ∈O(cs(i, j),v,c) and b ∈O(cs(i, j),v,d).Then
|V
t
(i, j)
∩V
os(a)
| = |V
t
(i, j)
∩V
os(b)
|.
Just like the claim inside of the lemma 4.8.4, lemma 4.9.3 is proved by constructing
an explicit bijection between the sets V
t
(i, j)
∩V
os(a)
and V
t
(i, j)
∩V
os(b)
, The only
difference is that these bijections make use of mutations as well as crossover.
10
Again we remark that this condition can be slightly relaxed but it does not introduce any new ideas of
interest.
11
Notice that this implies that O(
ˆ
u,v,o(u,v))=O(cs(v),v,o(u,v))
41 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
It may b e worth mentioning that theorem 4.7.1 is a special case of theorem 4.9.2. In-
deed, if the only mutation tran sformations chosen with positive probability are these
which assign positive probability only to the mutations dened by the identity per-
mutations, then every orbit O(
ˆ
u,v,o) consists of exactly one element so that t and
v we have |O(
ˆ
u,v,o)| =1. To compress the language, we shall use
to denote
the union of disjoint sets. We then have
o∈O (
ˆ
u,v,o(u,v))
os(v, o)(P )=os(v)(P )
since o(u,v) is the only operation inside of O(
ˆ
u,v,o(u,v)) so that os(v)(P ) is
the only contributor to the disjoint union above. Moreover, we also have cs(v)=
N
i=0
oΣ
i
os(v, o) so that we obtain
N
i=0
oΣ
i
|os(v, o)(P )|·|O(
ˆ
u,v,o)| =
N
i=0
oΣ
i
|os(v, o)(P )1=|cs(v)(P)|.
The f ormula in theorem 4.9.2 now simplies to the o ne in theorem 4.7.1.
Example 4.9.1 Continuing with example 4.7.2, suppose the signature Σ is dened as
follows: Σ=(Σ
0
, Σ
1
, Σ
2
) where Σ
0
= {x, y, z, w}, Σ
1
= {sin, cos, tan, cot, ln}
and Σ
2
= {+, , , } where is a binary operation symbol different from +, and
. (Of course, the semantics of the binary operation is irrelevan t to the c ontent of
this example, but if the reader feels more comfortable with a concrete interpretation,
they may assume, for instance, that x y =
y
x
e
ξ
2
.) Now suppose the individual u
is the program (+(ln(x), (x, y)) pictured below (with nodes being enumerated just
like in example 4.7.1) (it has the same shape schema as the individual in that example):
Notice that this individual has exactly the same set of conguration schemata as the
corresponding set of conguration schemata in example 4.7.1 (the reader may see these
conguration schemata pictured in that example). Suppose that the following mutation
transformations are the only ones chosen with positive probability:
M
cs(v
1
),v
1
, (+, )
,M
cs(v
1
),v
1
, (, )
,M
cs(v
2
),v
2
, (ln, sin, cos)(tan, cot)
,
M
cs(v
2
),v
2
, (tan, cot)(sin, cos)
,M
cs(v
2
),v
2
, (ln, sin)
,M
cs(v
3
),v
3
, (+, , )
,
M
cs(v
3
),v
3
, (, )
,M
cs(v
4
),v
4
, (x, w)(y,z)
,M
cs(v
4
),v
4
, (+, )(, )
,M
cs(v
5
),v
5
, (x, y, z)
,
M
cs(v
5
),v
5
, (+, , , )
,M
cs(v
6
),v
6
, (x, w)(y,z)
, and
M
cs(v
6
),v
6
, (x, w)
,M
cs(v
6
),v
6
, (sin, cos,tan,cot)
,M
cs(v
6
),v
6
, (+, )
.
Here we represent permu tations in terms of their “disjoint cycle decompo sitions”: for
example, (ln, sin, cos)(tan, cot) rep resents the permutation on Σ
1
which sends ln into
sin, sin into cos and cos back into ln. Likewise, it sends tan into cot and cot back into
42 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
tan. If a cycle has length one (i.e. the element appearing in the cycle is a xed point of
the corresponding permutation) we omit that cycle from writing. For example, and
are the xed points of the permutation (+, ). The corresponding permutation groups
are:
G(cs(v
1
),v
1
)=(+, ), (, ),
G(cs(v
2
),v
2
)=(ln, sin, cos)(tan, cot), (tan, cot)(sin, cos), (ln, sin),
G(cs(v
3
),v
3
)=(+, , ), (, ),
G(cs(v
4
),v
4
)=(x, w)(y, z), (+, )(, ),
G(cs(v
5
),v
5
)=(x, y, z), (+, , , ), and
G(cs(v
6
),v
6
)=(x, w)(y, z), (x, w)(sin, cos, tan, cot), (+, ).
The cycle decomposition makes it easy to compute the corresponding orbits:
O(cs(v
1
),v
1
, +) = {+, −}, O (cs(v
2
),v
2
, ln) = {ln, sin, cos},
O(cs(v
3
),v
3
, )={+, , , }, O (cs(v
4
),v
4
,x)={x, w},
O(cs(v
5
),v
5
,x)={x, y, z} and O(cs(v
6
),v
6
,y)={y, z} .
Now suppose the initial population is the same as in example 4.7.2. In ord er to apply
theorem 4.9.2, for every node v of u we need to compute the number |os(v, o)(P )|.
Recall that the schema os(v, o) is obtained from the schema os(v) by attaching the
operation o at the node v and labelling its children nodes b y the # signs. For the pop-
ulation P in example 4.7.2 it was already computed that |os(v
1
, +)| = |os(v
1
)| =2.
There are exactly 2 individuals (namely x
3
and x
4
) tting the schema os(v
1
, ) and
so |os(v
1
, )| =2. Exactly one individual, namely x
2
, and one individual, namely
x
6
, t the schemata os(v
1
, sin) and os(v
2
, cos) respectively and so |os(v
1
, sin)| =
|os(v
1
, cos)| =1. For all the other operations o
0
Σ
1
Σ
2
) −{+, , sin, cos}
there are no individuals in P tting the schema os(v
1
,o) and so we have |os(v
1
,o)| =
0. There are no individuals in P tting the schema os(v
2
, ln) = os(v
2
) for the
individual u of the current example, and no individuals tting the schemata of the
form os(v
2
,o) wher e o/∈{, sin, cos} so that for such o we have |os(v
2
,o)(P )| =
0. Moreover, there is exactly one individual, namely x
1
tting the schema os(v
2
, )
and exactly one, namely x
4
tting the schema os(v
2
, cos) so that |os(v
2
, )(P )| =
|os(v
2
, cos)(P )| =1; exactly two individuals, namely x
3
and x
5
t the schema
os(v
2
, sin) so that |os(v
2
, sin)(P )| =2. Continuing in this manner with the rest
of the nodes of u we obtain |os(v
3
,o)(P )| =0for o/∈{+, ∗}; x
1
and x
3
t
os(v
3
, +) while x
4
and x
5
t os(v
3
, +) and so |os(v
3
, +)(P )| = |os(v
3
, )(P )| =2.
|os(v
4
,o)(P )| =0for o/∈{x, y, +}; x
3
is the only individual tting the schema
os(v
4
,y), x
4
is the only individual tting the schema os(v
4
,x) and x
5
is the only indi-
vidual tting the schema os(v
4
, +) and so we have |os(v
4
,x)(P )| = |os(v
4
,y)(P )| =
|os(v
4
, +)(P )| =1. |os(v
5
,o)(P )| =0for o/∈{,x,y}; x
1
is the only individual
tting the schema os(v
5
, ) and x
5
is the only individual tting the schema os(v
5
,y)
while the individuals x
3
and x
4
are the only two which t the schema os(v
5
,x) so that
we have |os(v
5
, )(P )| = |os(v
5
,y)(P )| =1and |os(v
5
,x)(P)| =2. |os(v
6
,o)(P )| =
43 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
0 for o/∈{sin,x,y}; moreover, x
1
is the only individual tting the schema os(v
6
, sin)
and x
4
is the only individual tting the schema os(v
6
,y) while the individuals x
3
and
x
5
are the only two which t the schema os(v
6
,x) so that we have |os(v
6
, sin)(P )| =
|os(v
6
,y)(P )| =1and |os(v
6
,x)(P)| =2.
Finally for every node v and for every operation o Σ
0
Σ
1
Σ
2
such that
|os(v, o)(P )| =0we need to compute |O(
ˆ
u,v,o)|. For the node v
1
these are
O(
ˆ
u,v
1
, +) = O(cs(v
1
),v
1
, +) = {+, −} so that |O(cs(v
1
),v
1
, +)| =2, and,
likewise, from the description of the groups G(cs(v
i
),v
i
) given above, it is easy to com-
pute that O(cs(v
1
),v
1
, )={∗, }, O(cs(v
1
),v
1
, sin) = {sin} and O(cs(v
1
),v
1
, cos) =
{cos} so that
|O(cs(v
1
),v
1
, sin)| = |O(cs(v
1
),v
1
, cos)| =1.
O(cs(v
2
),v
2
, )={∗}, O(cs(v
2
),v
2
, sin) = O(cs(v
2
),v
2
, cos) = {ln, sin, cos}
and so
|O(cs(v
2
),v
2
, sin)| = |O(cs(v
2
),v
2
, cos)| =3;
O(cs(v
3
),v
3
, )=O(cs(v
3
),v
3
, +) = Σ
2
so that
|O(cs(v
3
),v
3
, )| = |O(cs(v
3
),v
3
, +)| =4;
O(cs(v
4
),v
4
,x)={x, w} and O(cs(v
4
),v
4
,y)={y, z} so that
|O(cs(v
4
),v
4
,x)| = |O(cs(v
4
),v
4
,y)| =2;
O(cs(v
4
),v
4
, +) = {+, ∗} so that |O(cs(v
4
),v
4
, +)| =2; O(cs(v
5
),v
5
, )=Σ
2
,
O(cs(v
5
),v
5
,x)=O(cs(v
5
),v
5
,y)={x, y, z} and so
|O(cs(v
5
),v
5
,x)| = |O(cs(v
5
),v
5
,y)| =3;
O(cs(v
6
),v
6
, sin) = {sin, cos, tan, cot}, O(cs(v
6
),v
6
,x)={x, w} and, nally,
O(cs(v
6
),v
6
,y)={y, z} so that
|O(cs(v
6
),v
6
,x)| = |O(cs(v
6
),v
6
,y)| =2.
Now we are ready to compute the ratios of the form
P
o∈O(
ˆ
u,v
j
,o(u,v
j
))
|os(v
j
,o)(P )|
P
N
i=0
P
oΣ
i
|os(v
j
,o)(P )|·|O(
ˆ
u,v
j
,o)|
.
From the data computed above we have
o∈O (
ˆ
u,v
1
,o(u,v
1
))
|os(v
1
,o)(P )| = |os(v
1
, +)(P )| + |os(v
1
, )(P )| =2+0=2
(x
1
and x
5
are the only two individuals in P which t the schema os(v
1
, +) while no
individual in P ts os(v
1
, ));
N
i=0
oΣ
i
|os(v
1
,o)(P )|·|O(
ˆ
u,v
1
,o)| = |os(v
1
, +)(P )|·|O(
ˆ
u,v
1
, +)(P )|+
+|os(v
1
, sin)(P )|·|O(
ˆ
u,v
1
, sin)(P )| + |os(v
1
, )(P )|·|O(
ˆ
u,v
1
, )(P )|+
44 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
+|os(v
1
, cos)(P )|·|O(
ˆ
u,v
1
, cos)(P)| =2· 2+1· 1+2· 2+1· 1=10
and so
o∈O (
ˆ
u,v
1
,o(u,v
1
))
|os(v
1
,o)(P )|
N
i=0
oΣ
i
|os(v
1
,o)(P )|·|O(
ˆ
u,v
1
,o)|
=
2
10
=
1
5
.
Continuing in this manner, we obtain
o∈O (
ˆ
u,v
2
,o(u,v
2
))
|os(v
2
,o)(P )| = |os(v
2
, ln)(P )| + |os(v
2
, sin)(P )|+
+|os(v
2
, cos)(P )| =0+2+1=3
and
N
i=0
oΣ
i
|os(v
2
,o)(P )|·|O(
ˆ
u,v
2
,o)| = |os(v
2
, )(P )|·|O(
ˆ
u,v
2
, )(P )|+
+|os(v
2
, sin)(P )|·|O(
ˆ
u,v
2
, sin)(P )| + |os(v
2
, cos)(P)|·|O(
ˆ
u,v
2
, cos)(P)| =
=1· 1+2· 3+1· 3=10
so that
o∈O (
ˆ
u,v
2
,o(u,v
2
))
|os(v
2
,o)(P )|
N
i=0
oΣ
i
|os(v
2
,o)(P )|·|O(
ˆ
u,v
2
,o)|
=
3
10
.
o∈O (
ˆ
u,v
3
,o(u,v
3
))
|os(v
3
,o)(P )| = |os(v
3
, )(P )| + |os(v
3
, +)(P )|+
+|os(v
3
, )(P )| + |os(v
3
, )(P )| =0+2+0+2=4
and
N
i=0
oΣ
i
|os(v
3
,o)(P )|·|O(
ˆ
u,v
3
,o)| = |os(v
3
, +)(P )|·|O(
ˆ
u,v
3
, +)(P )|+
+|os(v
3
, )(P )|·|O(
ˆ
u,v
3
, )(P )| =2· 4+2· 4=16
and so
o∈O (
ˆ
u,v
3
,o(u,v
3
))
|os(v
3
,o)(P )|
N
i=0
oΣ
i
|os(v
3
,o)(P )|·|O(
ˆ
u,v
3
,o)|
=
4
16
=
1
4
.
o∈O (
ˆ
u,v
4
,o(u,v
4
))
|os(v
4
,o)(P )| = |os(v
4
,x)(P )| + |os(v
4
,w)(P )| =1+0=1
and
N
i=0
oΣ
i
|os(v
4
,o)(P )|·|O(
ˆ
u,v
4
,o)| = |os(v
4
,y)(P )|O(
ˆ
u,v
4
,y)|+
+|os(v
4
,x)(P)|·|O(
ˆ
u,v
4
,x)|+|os(v
4
, +)(P )|·|O(
ˆ
u,v
4
, +)| =1·2+1·2+1·2=6
45 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
and so
o∈O (
ˆ
u,v
4
,o(u,v
4
))
|os(v
4
,o)(P )|
N
i=0
oΣ
i
|os(v
4
,o)(P )|·|O(
ˆ
u,v
4
,o)|
=
1
6
.
o∈O (
ˆ
u,v
5
,o(u,v
5
))
|os(v
5
,o)(P )| = |os(v
5
,x)(P)| + |os(v
5
,y)(P )|+
+|os(v
5
,z)(P )| =2+1+0=3
and
N
i=0
oΣ
i
|os(v
5
,o)(P )|·|O(
ˆ
u,v
5
,o)| = |os(v
5
, )(P )|·|O(
ˆ
u,v
5
, )|+
+|os(v
5
,x)(P)|·|O(
ˆ
u,v
5
,x)|+|os(v
5
,y)(P )|·|O(
ˆ
u,v
5
,y)| =1·4+2·3+1·3=13
and so
o∈O (
ˆ
u,v
5
,o(u,v
5
))
|os(v
5
,o)(P )|
N
i=0
oΣ
i
|os(v
5
,o)(P )|·|O(
ˆ
u,v
5
,o)|
=
3
13
.
Finally,
o∈O (
ˆ
u,v
6
,o(u,v
6
))
|os(v
6
,o)(P )| = |os(v
6
,y)(P )| + |os(v
6
,z)(P )| =1+0=1
and
N
i=0
oΣ
i
|os(v
6
,o)(P )|·|O(
ˆ
u,v
6
,o)| = |os(v
6
, sin)(P )|·|O(
ˆ
u,v
6
, sin)|+
+|os(v
6
,x)(P)|·|O(
ˆ
u,v
6
,x)|+|os(v
6
,y)(P )|·|O(
ˆ
u,v
6
,y)| =1·4+2·2+1·2=10
so that
o∈O (
ˆ
u,v
6
,o(u,v
6
))
|os(v
6
,o)(P )|
N
i=0
oΣ
i
|os(v
6
,o)(P )|·|O(
ˆ
u,v
6
,o)|
=
1
10
.
Now we nally compute the product of these ratios and obtain:
lim
t→∞
Φ(u,P,t)=
6
i=1
o∈O (
ˆ
u,v
i
,o(u,v
i
))
|os(v
i
,o)(P )|
N
i=0
oΣ
i
|os(v
i
,o)(P )|·|O(
ˆ
u,v
i
,o)|
=
=
1
5
·
3
10
·
1
4
·
1
6
·
3
13
·
1
10
=
3
52000
.
At the opposite extreme is the case when every mutation transformation in the family
M
node
has a positive probability of being chosen. In this case O(
ˆ
u,v,o(u,v)) = Σ
i
where i is the arity of the operation o. In particular, for every operation o we have
o ∈O(
ˆ
u,v,o(u,v)) if and only if o Σ
i
. But then we have
o∈O (
ˆ
u,v,o(u,v))
os(v, o)=cs( v, u)
46 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
and so
o∈O (
ˆ
u,v,o(u,v))
|os(v, o)(P )| = |cs( v, u)(P )|.
We also have
oΣ
i
os(v, o)=cs( v, i) so that
oΣ
i
|os(v, o)(P )|·|O(
ˆ
u,v,o)| = |Σ
i
oΣ
i
|os(v, o)(P )| = |cs( v, i)(P)|·|Σ
i
|.
Combining these equations with theorem 4.9.2 we obtain:
Corollary 4.9.4 Let A denote an algorithm determined by 2 elementary steps of type 2
one of which is determined by the node mutation (see denition 4.9.2) and the other one
by a homologous GP crossover. Suppose every one of the transformations in the family
G of GP homologous crossovers has a positive probability of being chosen. Suppose
also that for every node v of u of arity i and for every permutation π of Σ
i
we have
p(M
ˆ
u,v
)=p(M
cs(v),v,π
) > 0. Fix an individual (a progra m tree) u Ω and an
initial population P . Then we have
lim
t→∞
Φ(u,P,t)=
v is a node of u
|cs( v, u)(P )|
N
i=0
|cs( v, i)(P )|·|Σ
i
|
.
Example 4.9.2 Suppose the signature Σ=(Σ
0
, Σ
1
, Σ
2
), the initial population P and
the individual u are exactly as in example 4.9.1. Now suppose, (unlike in example 4.9.1,
that for every permutation π of Σ
i
we have p(M
ˆ
u,v
) > 0. Now corollary 4.9.4
applies and we can compute the frequency of occurrence of the individual u according
to the formula given there. To apply this formula we need to compute the numbers |cs(
v
j
,i)(P )| where 1 j 6 and 0 i 2 (the numbers |cs( v, u)(P )| are among
these). The conguration schemata cs(v
i
) for the individual u are exactly the same
as these for the individual of example 4.7.1 (since these two individuals have the same
underlying shape schema) and they are pictured in that example. Below we display
only these schemata |cs( v
j
,i)(P )| for which |cs( v
j
,i)(P )| =0. According to
denition 4.9.3 they are obtained from the corresponding schemata cs(v
i
) by attaching
a node which has i children (if i =0it has no children) in place of the # sign at the
node v
i
(which means that the # sign can be replaced with an arbitrary variable but
not with an operation symbol):
47 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
There are exactly 2 individuals, namely x
2
and x
6
in P tting the schema cs( v
1
, 1)
so that |cs( v
1
, 1)(P )| =2; x
1
, x
3
, x
4
and x
5
are the only individuals in P which t
the schema cs( v
1
, 2) = cs( v
1
, u) so that |cs( v
1
, 2)(P )| = |cs( v
1
, u)(P )| =
4; x
3
, x
4
and x
5
are the only individuals in P which t the schema cs( v
2
, 1) = cs(
v
2
, u) so that |cs( v
2
, 1)(P )| = |cs( v
2
, u)(P )| =3; x
1
is the only individual
tting the schema cs( v
2
, 2) so that |cs( v
2
, 2)(P )| =1; x
1
, x
3
, x
4
and x
5
are
the only individuals in P which t the schema cs( v
3
, 2) = cs( v
3
, u) so that |cs(
v
3
, 2)(P )| = |cs( v
3
, u)(P )| =4; x
3
and x
4
are the only individuals in P which t
the schema cs( v
4
, 0) = cs( v
4
, u) so that |cs( v
4
, 0)(P )| = |cs( v
4
, u)(P )| =
2; x
5
is the only individual tting the schema cs( v
4
, 2) and so |cs( v
4
, 2)(P )| =
1; x
3
, x
4
and x
5
are the only individuals in P tting the schemata cs( v
5
, 0) =
cs( v
5
, u) and/or cs( v
6
, 0) = cs( v
6
, u) and so we have |cs( v
5
, 0)| = |cs(
v
5
, u)| = |cs( v
6
, 0)| = |cs( v
6
, u)| =3and x
1
is the only individual tting the
either and both schemata cs( v
5
, 2) and/or cs( v
6
, 1) so that |cs( v
5
, 2)| = |cs(
v
6
, 1)| =1. From the denition of the signature Σ=(Σ
0
, Σ
1
, Σ
2
) in example 4.9.1
we see that |Σ
0
| =4, |Σ
1
| =5and |Σ
2
| =4. We are now ready to compute the ratios:
|cs( v
1
, u)(P )|
3
i=0
|cs( v
1
,i)(P )|·|Σ
i
|
=
|cs( v
1
, u)(P )|
|cs( v
1
, 2)(P )|·|Σ
2
| + |cs( v
1
, 1)(P )|·|Σ
1
|
=
=
4
4 · 4+2· 5
=
2
13
.
|cs( v
2
, u)(P )|
3
i=0
|cs( v
2
,i)(P )|·|Σ
i
|
=
|cs( v
2
, u)(P )|
|cs( v
2
, 2)(P )|·|Σ
2
| + |cs( v
2
, 1)(P )|·|Σ
1
|
=
=
3
1 · 4+3· 5
=
3
19
.
|cs( v
3
, u)(P )|
3
i=0
|cs( v
3
,i)(P )|·|Σ
i
|
=
|cs( v
3
, u)(P )|
|cs( v
3
, 2)(P )|·|Σ
2
|
=
=
4
4 · 4
=
1
4
.
48 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
|cs( v
4
, u)(P )|
3
i=0
|cs( v
4
,i)(P )|·|Σ
i
|
=
|cs( v
4
, u)(P )|
|cs( v
4
, 0)(P )|·|Σ
0
| + |cs( v
4
, 2)(P )|·|Σ
2
|
=
=
2
2 · 4+1· 4
=
1
6
.
|cs( v
5
, u)(P )|
3
i=0
|cs( v
5
,i)(P )|·|Σ
i
|
=
|cs( v
5
, u)(P )|
|cs( v
5
, 0)(P )|·|Σ
0
| + |cs( v
5
, 2)(P )|·|Σ
2
|
=
=
3
3 · 4+1· 4
=
3
16
.
|cs( v
6
, u)(P )|
3
i=0
|cs( v
6
,i)(P )|·|Σ
i
|
=
|cs( v
6
, u)(P )|
|cs( v
6
, 0)(P )|·|Σ
0
| + |cs( v
6
, 1)(P )|·|Σ
1
|
=
=
3
3 · 4+1· 5
=
3
17
.
And now corollary 4.9.4 tells us that
lim
t→∞
Φ(u,P,t)=
6
i=1
|cs( v
i
, u)(P )|
3
j=0
|cs( v
i
,j)(P )|·|Σ
j
|
=
2
13
·
3
19
·
1
4
·
1
6
·
3
16
·
3
17
=
9
268736
.
It is possible to introduce mutation operators for nonlinear GP which are ergodic in the
sense of denition 4.9.1, but the easiest thing to do is probably ju st to dene the family
M
erg
to be the family of all permutations of the search space Ω. The probability distri-
bution p must then be concentrated on any subset of M which satises the ergodicity
requirement of denition 4.9.1. This would en su re tha t corollary 4.9.1 applies.
4.10 What Can Be Said in the Presence of Selection in
the General Case?
Theorem 4.5.2 established in [5] which allows us to deduce r esults such as theo-
rems 4.7.1 and 4.9.2, applies only in the absence of selection. The theme of the re-
mainder of the current paper is to establish a few basic properties of the Markov chains
associated to evolutionary algorithms in the presence of tness-proportional selection
(as described in denition 4.2.2). Throughout the rest of the paper we shall break up
our algorithm, call it A into sub-algorithms and then consider their composition. This
idea will be made clear below:
49 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Proposition 4.10.1 Denote by A an evolutionary algorithm determined by the cycle
(s
1
,s
2
,...,s
n
).Fixi with 1 <i<nand let B and C denote the sub-algorithms deter-
mined by the cycles (s
1
,s
2
,...,s
i
) and (s
i+1
,s
2
,...,s
n
) respectively. Recall from
section 4.5 that {p
A
xy
}
x,yΩ
m
, {p
B
xy
}
x,yΩ
m
and {p
C
xy
}
x,yΩ
m
denote the Markov
transition matrices associated to the algorithms A, B and C respectively. Then we
have
{p
A
xy
}
x,yΩ
m
= {p
C
xy
}
x,yΩ
m
·{p
B
xy
}
x,yΩ
m
where · denotes the usual matrix multip lication.
Proof: Denote by λ a probability distribution on Ω
m
. Completing a cycle of the
algorithm A amounts to completing a cycle of B and then completing a cycle of C.
The next generation probability distribution upon the completion of the cycle of B with
the input distribution λ is {p
B
xy
}
x,yΩ
m
· λ. Likewise the next generation distribution
obtained upon the completion of a cycle of the algorithm C with the input distribution
{p
B
xy
}
x,yΩ
m
· λ is just
{p
C
xy
}
x,yΩ
m
· ({p
B
xy
}
x,yΩ
m
· λ)=({ p
C
xy
}
x,yΩ
m
·{p
B
xy
}
x,yΩ
m
) · λ
which means that {p
A
xy
}
x,yΩ
m
= {p
C
xy
}
x,yΩ
m
·{p
B
xy
}
x,yΩ
m
since the equation
above holds for an arbitrary input distribu tion λ.
We now proceed to study the effects of selection alone. First of all it is convenient to
observe the following general fact:
Denition 4.10.1 Let {p
xy
}
x,y∈X
denote a Markov transition matrix on a nite set X .
Fix x ∈X.Wedene the transition support of x to be the set S(x)={z | p
zx
> 0} of
all states z from which it is possible to get to x.
Denition 4.10.2 Let {p
xy
}
x,y∈X
denote a Markov transition matrix on a nite set
X .Fixx and y ∈X. We say that y x if S(y) S(x) and z S(y) we have
p
zy
p
zx
. Moreover, if either S(y) S(x) or p
zy
p
zx
for some z S(x) we
write y x.
Proposition 4.10.2 below provides the reason for denitions 4.10.1 and 4.10.2:
Proposition 4.10.2 Let {p
xy
}
x,y∈X
denote a Markov transition matrix on a nite set
X .Fixu and v ∈Xwith u v and an input probability distribution λ on X .
Denote by ρ the output distribution (ρ = {p
xy
}
x,y∈X
· λ). Then we have ρ(u) ρ(v).
Moreover, if u v and λ(x) > 0 for every x ∈Xthen ρ(u) ρ(v).
Proof: This is a straightforward verication of the denitions:
ρ(u)=
z∈X
λ(z)p
zu
=
zS(u)
λ(z)p
zu
zS(v)
λ(z)p
zv
=
z∈X
λ(z)p
zv
= ρ(v)
where =
if u v
if u v
.
50 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
The following mild technical condition on a Markov transition matrix (which is easily
satisable by most transition matrices modeling crossover and mutation) will extend
proposition 4.10.2.
Denition 4.10.3 We call a Markov transition matrix {q
xy
}
x,y∈X
non-annihilating if
y ∈X x ∈Xsuch that q
xy
> 0.
The main reason for introducing denition 4.10.3 is the following fact:
Proposition 4.10.3 A given Markov transition matrix {q
xy
}
x,y∈X
is non-annihilating
if and only if for every input probability distribution λ on X with λ(x) > 0 for every
x ∈X, the output distribution ρ = {q
xy
}
x,y∈X
· λ also satises the property that
ρ(x) > 0 for every x ∈X.
Proof: Given an input distribution λ with λ(x) > 0 for every x ∈Xand any state
y ∈X,wehaveρ(y)=
x∈X
λ(x) · q
xy
> 0 if and only if λ(z) · q
zy
> 0 for
some z X if and only if q
zy
> 0 for some z X (since λ(z) > 0 automatically b y
assumption) if and only if the transition matrix {q
xy
}
x,y∈X
is non-annihilating.
Before proceeding any further it is worthwhile to mention that a product of non-
annihilating transition matrices is non-annihilating:
Corollary 4.10.4 Given non-annihilating Markov transition matrices {q
xy
}
x,y∈X
and
{m
xy
}
x,y∈X
, the matrix {r
xy
}
x,y∈X
= {m
xy
}
x,y∈X
·{q
xy
}
x,y∈X
is non-annihilating
as well.
Proof: Given an input distribution λ with λ(x) > 0,since{q
xy
}
x,y∈X
is non-
annihilating, the “intermediate” output distribution μ = {q
xy
}
x,y∈X
(λ) also has the
property that μ(x) > 0 for all x ∈X.Nowwehave
ρ = {r
xy
}
x,y∈X
· λ = {m
xy
}
x,y∈X
· ({q
xy
}
x,y∈X
· λ)) = {m
xy
}
x,y∈X
· μ
also has the property that ρ(x) > 0 for all x ∈Xsince it is an output of μ under the
non-annihilating transition matrix {m
xy
}
x,y∈X
. By proposition 4.10.3, the transition
matrix {r
xy
}
x,y∈X
is non-annihilating as well.
Though quite elementary, proposition 4.10.2 readily implies subtle and rather general
inequalities about the stationary distributions of the Markov chains for which the last
elementary step is selection:
Corollary 4.10.5 Let {q
xy
}
x,y∈X
and {p
xy
}
x,y∈X
denote Markov transition matri-
ces on a nite set X .Fixu and v ∈Xwith u v where the and relations are
meant with respect to the matrix {p
xy
}
x,y∈X
and an input probability distribution λ
on X . Denote by ρ the output distribution of the composed matrix {p
xy
}
x,y∈X
·{q
xy
}
(ρ = {p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
· λ). Then we have ρ(u) ρ(v). Suppose, in addition,
the matrix {q
xy
}
x,y∈X
is non-annihilating. Now, if u v, and λ(x) > 0 for every
x ∈X then ρ(u) ρ(v).
51 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Proof: Since
{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
· λ = {p
xy
}
x,y∈X
· ({q
xy
}
x,y∈X
· λ)={p
xy
}
x,y∈X
· μ
where μ = {q
xy
}
x,y∈X
· λ, the desired conclusions follow by applying proposi-
tion 4.10.2 to the matrix {p
xy
}
x,y∈X
and the input distribution μ. For the second
conclusion we use the assumption that {q
xy
}
x,y∈X
is non-annihilating to deduce that
μ(x) > 0 for every x.
As an almost immediate consequence we deduce the following fact:
Corollary 4.10.6 Let {q
xy
}
x,y∈X
, {m
xy
}
x,y∈X
and {p
xy
}
x,y∈X
denote Markov tran-
sition matrices on a nite set X . Suppose that the matrices {q
xy
}
x,y∈X
and {p
xy
}
x,y∈X
are non-annihilating while the matrix {m
xy
}
x,y∈X
has the property that x, y
X m
xy
> 0.Fixu and v ∈Xwith u v where the and relations are
meant with respect to the matrix {p
xy
}
x,y∈X
. Then the Markov chain determined
by either one of the composed matrices {p
xy
}
x,y∈X
·{m
xy
}
x,y∈X
·{q
xy
}
x,y∈X
or
{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·{m
xy
}
x,y∈X
is irreducible Let π denote the unique sta-
tionary distribution of the composed chain. Then we have π(u) π(v). Suppose, in
addition, the matrix {q
xy
}
x,y∈X
is non-annihilating. Now, if u v , and λ(x) > 0 for
every x ∈X then π(u) π(v).
Proof: The irreducibility of the composed chain is left as an exercise for the reader.
As a hint, the reader may notice that from the assumptions that {m
xy
}
x,y∈X
has all
positive entries while {p
xy
}
x,y∈X
and {q
xy
}
x,y∈X
are non-annihilating, it follows
that every one of the composed matrices has all positive entries and, hence, deter-
mines an irreducible Markov chain. The second conclusion follows from the fact that
{p
xy
}
x,y∈X
is the leftmost matrix in the composition by applying corollary 4.10.5 to
the matrix {p
xy
}
x,y∈X
·{a
xy
}
x,y∈X
where {a
xy
}
x,y∈X
= {m
xy
}
x,y∈X
·{q
xy
}
x,y∈X
or {a
xy
}
x,y∈X
= {q
xy
}
x,y∈X
·{m
xy
}
x,y∈X
with π being the input distribution which
is then also the output distribution by stationar ity. The condition of corollary 4.10.5 is
satised thanks to corollary 4.10.4.
When applying corollary 4.10.6 we have in mind that {q
xy
}
x,y∈X
is the Markov tran-
sition matrix corresponding to recombination (i. e. a sub-algorithm determined by
a single elementary step of type 2: see denitions 4.3.1 and 4.2.5), {p
xy
}
x,y∈X
is
the Markov transition m atrix corresponding to selection (i. e. a sub-algorithm deter-
mined by a single elementary step of type 1: see denition 4.2.2) and {m
xy
}
x,y∈X
is the Markov transition m atrix corresponding to mutation. In order to apply propo-
sition 4.10.2 to the case of tness-proportional selection we need to determine the
relation and for this special case. Although this task is not difcult, it requires
a careful “bookkeeping” analysis. This will be the subject of the next section. We
end the current section with an immediate consequence (basically a restatement) of
corollary 4.10.6:
12
12
It is worth mentioning that tness-proportional selection is not the only possible type of selection. Other
elementary steps of type 1 include, for instance, tournament selection and rank selection.
52 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Corollary 4.10.7 Suppose we are given an evolutionary algorithm A determined by
the elementary steps s
1
, s
2
and s
3
where s
1
are and s
2
are any elementary step (usu-
ally one of them is selection and the other is mutation) which dene non-annihilating
Markov transition matrices and such that one of these matrices has all positive entries,
while s
3
is the elementary step of type 1, i.e. selection. As b e fore, let {p
xy
}
x,y∈X
denote the Markov transition matrix determined by the elementary step s
3
and and
are the dened with respect to the transition matrix {p
xy
}
x,y∈X
. Then the Markov
chain determined by the algorithm A with state space X
m
is irreducible and its
unique stationary distribution π satises π(x) π(y) and π(x) (y) whenever
x y and x y respectively.
4.11 What are the relations and for the case of
tness-proportional selection?
This section is devoted to classifying the relations and for the case of tness-
proportional selection. Although this task is not difcult, it requires a careful step-by-
step analysis. The reader who is interested only in the net results can read only deni-
tions 4.11.1 and 4.11.2, example 4.11.1, theorem 4.11.5 followed by examples 4.11.3,
4.11.4, 4.11.5 and 4 .11.6 and theorem 4.11.6 which is illustrated by example 4.11.7. It
is recommended (but not essential for understanding) that the reader does not omit the
discussion between example 4.11.6 and theorem 4.11.6. We also strongly recommend
that the reader makes him/herself familiar with lemma 4.11.1 since this fact is rather
simple and reveals a very important step in the classication process.
Denition 4.11.1 Fix a population x =(x
1
,x
2
,...,x
m
) Ω
m
and denote by I(x)=
{x | x = x
i
for some i with 1 i m} the set of all individuals in the population x.
Lemma 4.11.1 Given populations x and y, we have S(x) S(x) if and only if
I(x) I(y). In particular, a necessary condition for x y is that I(x) I(y).
Moreover, if I(x) I(y) then x y.
Proof: Since individuals can only disappear (and new individuals can not appear)
upon the completion of the elementary step of tness-proportional selection (see de-
nition 4.2.2) it follows immediately that for any populations z and w we have p
z,w
0
if and only if I(z) I(w).InotherwordsS(w)={z | I(z) I(w)}. It follows
immediately now that if I(y) I(x) then S(x) S(y). On the other hand, if
S(x) S(y), then, since y S(y) S(x) we also have I(y) I(x) according
to the characterization given above. We deduce n ow that I(y) I(x) if and only if
S(x) S
(y). In p articular, it follows immediately from the previous statement that
I(y) I(x) if and only if S(x) S(y). All of the remaining conclusions follow
immediately from denition 4.10.2.
Denition 4.11.2 Given a population x =(x
1
,x
2
,...,x
m
) and an individual x
I(x), denote by n(x,x)=|{i | x = x
i
}| the number of times x occurs in the popula-
tion x.
53 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Example 4.11.1 Suppose
x =(a, a, a, b, c, a, b, b, b) and y =(b, c, c, c, a, b, b, d, b).
Then I(x)={a, b, c} and I(y)={a, b, c, d}. We also have
n(x,a)=n(x,b)=4and n(x,c)=1.
Likewise,
n(y,a)=1,n(x,b)=4,n(x,c)=3and n(x,d)=1.
According to d enition 4.2 .2, when performing tness-proportional selection, the in-
dividuals are chosen independently with probability proportional to their tness. Thus,
if z =(z
1
,z
2
,...,z
m
) and x =(x
1
,x
2
,...,x
m
) is obtained from z by performing
tness-proportional selection, then the probability that x
i
= z for a given z I(z)
is
n(z,z)·f (z)
P
m
i=1
f(z
i
)
. Thus p
z,x
=
m
j=1
n(z,x
j
)·f (x
j
)
P
m
i=1
f(z
i
)
. Moreover , every x I(x) occurs in
the above product n(z,x
j
) times while every z I(z) occurs n(z,z) times in the
denominator sum of each of the multiples and so we deduce the following:
Proposition 4.11.2 Given populations x =(x
1
,x
2
,...,x
m
) and z =(z
1
,z
2
,...,z
m
)
we have
p
zx
=
(
1
P
zI(z)
n(z,z)·f(z)
)
m
xI(x)
(n(z,x))
n(x,x)
· (f(x))
n(x,x)
if I(x) I(z)
0 otherwise.
In particular, p
zx
does not depend on the way the individuals in z and in x are
ordered, but only depends on the sets I(x) and I(x) and the numbers n(x,x) for
x I(x) and n(z,z) for z I(z). In other words, if σ and τ demote arbitrary
permutations of the set {1, 2,...m} ,Ifx
σ
=(x
σ(1)
,x
σ(2)
,...,x
σ(m)
) and z
τ
=
(z
τ (1)
,z
τ (2)
,...,z
τ (m)
) then p
z
σ
x
τ
= p
zx
.
In order to continue the investigation of the relation for the case of tness-proportional
selection, it is convenient to introduce the following notions:
Denition 4.11.3 Given populations x and y of size m let
I(y|x)={y | y I(y),n(y,y) >n(x,y)}
(if y/ I(x) then n(x,y)=0). Moreover, for y I(y|x) let κ(y|x,y)=n(y,y)
n(x,y)}.
Example 4.11.2 Continuing with example 4.11.1, we have n(x,x) >n(y,x) if and
only if x
= a and so I(x|y)={a}. Likewise, n(y,y) <n(y,y) if and only if y = c
or y = d (since d/ I(x) according to denition 4 .11.3 we have n(x,d)=0< 1=
n(x,d)) and so I(y|x)={ c, d}. Moreover, we also have κ(x|y,a)=4 1=3,
κ(y|x,c)=3 1=2and κ(y|x,d)=1 0=1.
The sets I(y|x)
play a crucial role in discovering a sufcient and necessary co ndition
for a population x y in view of the fact below:
54 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Lemma 4.11.3 Given populations z, y and x with I(z) I(y) I(x), have the
following:
xI(x|y)
κ(x|y,x)=
yI(y|x)
κ(y|x,y)
p
zx
p
zy
=
xI(x|y)
(n(z,x))
κ(x|y ,x)
· (f(x))
κ(x|y ,x)
yI(y|x)
(n(z,y))
κ(y|x ,y)
· (f(y))
κ(y|x ,y)
.
Proof: Given populations z, y and x, from denitio n 4.11.3 it fo llows that for every
x I(x) we have
n(x,x)=
min(n(x,x),n(y,x)) if x/ I(x|y)
min(n(x,x),n(y,x)) + κ(x|y,x) if x I(x|y)
.
Likewise, for every y I(y) we have
n(y,y)=
min(n(x,y),n(y,y))
if y/ I(y|x) and y I(x)
min(n(x,x),n(y,x)) + κ(y|x,x) if y I(y|x) and y I(x)
κ(y|x,x) if y I(y|x) and y/ I(x)
.
Since there are totally m elements in every population we must have
yI(y)
n(y,y)=
xI(y)
n(x,x)=m.
Rearranging the terms in both sides of the last equation according to the observations
made above, we obtain
xI(x)
min(n(x,x),n(y,x)) +
xI(x|y)
κ(x|y,x)=
=
yI(x)
min(n(x,y),n(y,y)) +
yI(y|x)
κ(y|x,y)
and the rst desired equation follows b y subtracting
xI(x)
min(n(x,x),n(y,x))
from both sides. The second equation follows by rearranging the multiples in the
formula of proposition 4.11.2 according to the equation above and letting k(z)=
(
1
P
zI(z)
n(z,z)·f(z)
)
m
so that we can write:
p
zx
= k(z) ·
xI(x)
(n(z,x))
min(n(x,x),n(y,x))
· (f (x))
min(n(x,x),n(y,x))
)×
×
xI(x|y)
(n(z,x))
κ(x|y ,x)
· (f(x))
κ(x|y ,x)
)
and, likewise,
p
zy
= k(z) ·
yI(x)
(n(z,y))
min(n(x,y),n(y,y))
· (f(y))
min(n(x,y),n(y,y))
)×
55 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
×
yI(y|x)
(n(z,y))
κ(y|x ,y)
· (f (y))
κ(y|x ,y)
).
Now the common factor
k(z) ·
xI(x)
(n(z,x))
min(n(x,x),n(y,x))
· (f(x))
min(n(x,x),n(y,x))
)
on the top and the bottom of the ratio
p
zx
p
zy
is canceled out and we obtain the desired
formula.
In fact, according to lemma 4.11.1, we have x y = I(y) I(x)). Moreover,
since new individuals can not appear as a result of selection, whenever z S(y) (see
denition 4.10.1 for the m eaning of S(y))wemusthaveI(z) I(y). Therefore,
x y if and only if I(y) I(x)) and z such that I(z) I(y) we have p
zx
p
zy
.
The condition p
zx
p
zy
can be restated equivalently as
p
zx
p
zy
1. But, thanks to
lemma 4.11.3, we have
p
zx
p
zy
=
xI(x|y)
(n(z,x))
κ(x|y ,x)
· (f (x))
κ(x|y ,x)
yI(y|x)
(n(z,y))
κ(y|x ,y)
· (f (y))
κ(y|x ,y)
=
=
xI(x|y)
(n(z,x))
κ(x|y ,x)
yI(y|x)
(n(z,y))
κ(y|x ,y)
·
xI(x|y)
(f(x))
κ(x|y ,x)
yI(y|x)
(f(y))
κ(y|x ,y)
1
⇐⇒
xI(x|y)
(n(z,x))
κ(x|y ,x)
yI(y|x)
(n(z,y))
κ(y|x ,y)
yI(y|x)
(f(y))
κ(y|x ,y)
yI(x|y)
(f(x))
κ(x|y ,x)
.
Observing that
yI(y|x)
(f(y))
κ(y|x ,y)
yI(x|y)
(f(x))
κ(x|y ,x)
does not depend on z at all we deduce the following:
Lemma 4.11.4 Given populations x and y of size m, we have x y if and only if
I(y) I(x) and
min
I(z)I(x)
xI(x|y)
(n(z,x))
κ(x|y ,x)
yI(y|x)
(n(z,y))
κ(y|x ,y)
yI(y|x)
(f(y))
κ(y|x ,y)
yI(x|y)
(f(x))
κ(x|y ,x)
.
Thanks to lemma 4.11.4, the rest of our analysis boils down to constructing a pop-
ulation z which minimizes the ratio over z with I(z) I(y). In view of proposi-
tion 4 .11.2, without loss of generality, we can assume that the rst |I(y)| individuals of
z enumerate the elements of I(y),i.e. z = {y
1
,y
2
,...,y
|I(y)|
,z
1
,z
2
,...z
m−|I(y)|
}.
Our goal is then to select z
1
,z
2
,...,z
m−|I(y)|
in a way which m inimizes this ra-
tio. First, it is worth pointing out, that unless y = x
σ
for some permutation σ of
{1, 2,...m} in the sense described in proposition 4.11.2 (in which case we trivially
have x y thanks to proposition 4.11.2), we can assume that I(y|x) = . (Indeed,
56 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
according to lemma 4.11.3, we have
xI(x|y)
κ(x|y,x)=
yI(y|x)
κ(y|x,y).
If I(y|x) = then
xI(x|y)
κ(x|y,x)=
yI(y|x)
κ(y|x,y)=0which forces
I(x|y)=. But then for every y I(y) we have y/ I(y|x)= I(y) I(x) and
n(y,y) n(x,y) and, since I(x|y)=,alson(y,y) n(x,y). Summarizing, we
obtain n(y,y)=n(x,y) y I(y)=I(y) which means that y = x
σ
.) Next, we
observe that i we have z
i
I(y|x). (If not, then for some i we have z
i
/ I(y|x).
In such a case, since replacing z
i
with an element of I(y|x) = will increase the de-
nominator and either decrease (in case if z
i
I(x|y)) or not inuence the numerator
in any way (since it is clear from denition 4.11.3 that I(y|x) I(x|y)=)ofthe
ratio on the L.H.S. of the inequa lity of lemma 4.11.4. This in turn would only decrease
this ratio so that z can not minimize it.) Since z
i
s are chosen from the set I(y|x),and
I(y|x) I(x|y)=,foreveryx I(x|y) x = z
i
for 1 i m −|I(y)| and x = y
j
for a unique j with 1 j ≤|I(y)| (since I(y) I(x) and y
1
,y
2
,...,y
|I(y)|
enumer-
ate the elements of I(y))wehaven(z,x)=1for every x I(x|y). But then the nu-
merator of the ratios on the L.H.S. of lemma 4.11.4,
xI(x|y)
(n(z,x))
κ(x|y ,x)
=1
and the L.H.S. of the inequality in lemma 4.11.3 simplies to
min
zQ
1
yI(y|x)
(n(z,y))
κ(y|x ,y)
where Q = {z | z =(y
1
,y
2
,...,y
|I(y)|
,z
1
,z
2
,...,z
m−|I(y)|
),y
1
,y
2
,...,y
|I(y)|
enumerate I(y) and z
i
I(y|x)}. Moreover, notice that z
i
s can be chosen arbitrarily
from the set I(y|x) while for every y I(y|x) there exists exactly one 1 j ≤|I(y)|
with y = y
j
.Wethenhaven(z,y)=1+|{i | 1 i m −|I(y)|,z
i
= y}|
and
yI(y|x)
n(z,y)=|I(y|x)| + m −|I(y)|. On the other hand, given a nite
sequence of n atural numbers {n
y
}
yI(y|x)
satisfying the constraint
yI(y|x)
n
y
=
|I(y|x)| + m −|I(y)|, we can construct z = {y
1
,y
2
,...,y
|I(y)|
,z
1
,z
2
,...z
m−|I(y)|
}
with n(z,y)=n
y
by picking exactly n
y
1 z
i
s equaling to y for every y I(y|x).
All of this is summarized in the main theorem below:
Theorem 4.11.5 Given populations x and y of size m, we have x y with respect
to tness-proportional selection as described in denition 4 .2.2, if and on ly if I(y)
I(x) and
max
{n
y
}
yI(y|x)
Q(y|x)
yI(y|x)
(n
y
)
κ(y|x ,y)
xI(x|y)
(f(x))
κ(x|y ,x)
yI(y|x)
(f(y))
κ(y|x ,y)
where
Q(y|x)={{n
y
}
yI(y|x)
|∀y I(y|x) n
y
N,
yI(y|x)
n
y
= |I(y|x)|+m−|I(y)|}.
Below we illustrate theorem 4.11.5 with a few simple examples:
Example 4.11.3 Continuing with examples 4.11.1 and 4.11.2, notice that we do have
I(y) I(x) and |I(y|x)|+m−|I(y)| =2+94=7. so according to theorem 4.11.5
57 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
we have x y if and only if
f(a)
3
f(c)
2
· f (d)
max
n
c
+n
d
=7,n
c
1 and n
d
1
n
2
c
· n
d
.
There are 6 possible pairs (n
c
,n
d
) over which we want to maximize the product n
2
c
·n
d
.
These are (1, 6), (2, 5), (3, 4), (4, 3), (5, 2) and (6, 1). Moreover, by symmetry,
since the power of the coefcient n
c
in the product is bigger than that of n
d
we only
have to cheque 3 of these pairs: (4, 3), (5, 2) and (6, 1). The corresponding products
are 4
2
· 3=48, 5
2
· 2=50and 6
2
· 1=36. Then biggest one among these is 50 and
so we deduce that x y if and only if
f(a)
3
f(c)
2
·f (d)
50.
Example 4.11.4 Suppose y = {y
1
,y
2
,...,y
m
} with y
i
= y
j
for i = j (i.e. y is
a population consisting of distinct individuals). Now let x = {x
1
,x
2
,...,x
m
} with
I(y) I(x). Notice that in this example I(y|x)=I(y) I(x) while I(x|y)={x |
there is more than one i s. t. x = x
i
}. Moreover, x I(x|y) we have κ(x|y,x)=
n(x,x) 1 (since n(y,x)=1) and y I(y|x) we have n(y,y)=1so that
0 (y|x,y) n(y,y) and we have κ(y|x,y)=1. Finally, observe that the set
Q(y|x)={{1, 1, 1,...,1}} since |I(y)| = m and we must have
yI(y|x)
n
y
=
|I(y|x)| + m −|I(y)| = |I(y|x)| and n
y
1 which forces n
y
=1 y I(y|x).
According to theorem 4.11.5 we have x y if and only if
Q
xI(x|y)
(f(x))
κ(x|y,x)
Q
yI(y|x)
(f(y))
κ(y|x,y)
1
if and only if
xI(x|y)
(f(x))
n(x,x)1
yI(y|x)
f(y).
Example 4.11.5 Continuing with example 4.11.4, suppose, in addition, that there is
exactly one individual in x which occurs more than once in this population. That
is, without loss of generality, let x =(y
1
,y
2
,...,y
k
,y
m
,y
m
,...,y
m
) and y =
(y
1
,y
2
,...,y
k
,y
k+1
,...,y
m
) where y
i
= y
j
for i = j and k<m.Thisisaspecial
case of example 4.11.4 where I(x|y)={y
m
} and I(y|x)={y
k+1
,y
k+2
,...,y
m1
}
According to the conclusion of example 4.11.4 we have x y if and only if
(f(y
m
))
m(k+1)
m(k+1)
i=1
f(y
k+i
) if and only if f (y
m
)
m(k+1)
m(k+1)
i=1
f(y
k+i
)
which, in words, says that x y if and only if the tness of the unique repeated indi-
vidual of x is at least as large as the geometric mean of the tness of all the individuals
in y which do not occur in x. It is also worth pointing out that even if the inequality
above is an equality we still have x y since I(y) I(x) and so S(x) S(y).
In particular, even if the tness function is at, the relation = in case one uses
tness-proportional selection.
Example 4.11.6 Now consider an “opposite extreme” to example 4.11.4 (in the sense
of the diversity of elements in the population) where I(y)=I(x)={x, y}.lets
say n(x,x)=k (which implies that n(x,y)=m k) and n(y,x)=l<k
58 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
(hence n(y,y)=m l). It follows then that I(x|y)={x} and I(y|x)={y}.
Moreover, κ(y|x,y)=κ(x|y,x)=k l.Since|I(y|x)| = |{y}|, it follows that
Q(y|x)={{k l}} and which makes the maximization procedure trivial. According
to theorem 4.11.5, we have x y if and only if (k l)
kl
f(x)
kl
f(y)
kl
if and only if
f(x) (k l) · f (y).
Theorem 4.11.5 tells us that in order to cheque if x y with respect to tness-
proportional selection we need to solve an integer optimization problem subject to
linear constraints. Examples 4.11.4, 4.11.5 and 4.11.6 are particularly simple mainly
because the sets Q(y|x) were singletons so there was not much choice for the maximiz-
ing domain element. Although we do not intend to pursue studying this optimization
problem in much detail since it is not the main subject of the current paper, it is worth
mentioning that the method of Lagrange multipliers allows us to give an upper bound
on the
max
{n
y
}
yI(y|x)
Q(y|x)
yI(y|x)
(n
y
)
κ(y|x ,y)
by letting n
y
s range over positive real numbers subject to the linear constraint
yI(y|x)
n
y
=
|I(y|x)| + m −|I(y)|. Moreover, if one wants an exact solution then the method al-
lows to narrow down the choice of suitable integer sequences signicantly: Indeed,
according to the method of Lagrange multipliers, if any local maximum of a differ-
entiable function f(
n ) on an open set D R
n
subject to the constraint g(
n )=c
where g is another differentiable function on D is achieved at a point q, then we must
have f(q)=λ ·∇g(q) where denotes the gradient (derivative of a real-valued
function) operator and λ is some constant proportionality coefcient (in other words,
the gradients of f and g evaluated at the point q must be collinear vectors). In our
case f, g : U R
I(y|x)
R where U = {
n |
n R
I(y|x)
and n
y
0}
are dened according to the following formulas: f (
n )=
yI(y|x)
(n
y
)
κ(y|x ,y)
and g(
n )=
yI(y|x)
n
y
. Our goal is to maximize f subject to the constraint
g(
n )=|I(y|x)| + m −|I(y)| where
n = {n
y
}
yI(y|x)
. Clearly n
y
we have
∂g
∂n
y
=1and so the condition f(q)=λ ·∇g(q) boils down to the condition
∂f
∂n
u
=
∂f
∂n
v
for every u and v I(y|x). For any given w I(y|x) we have
∂f
∂n
w
= κ(y|x,w) · (n
w
)
κ(y|x ,w)1
·
yI(y|x),y=w
(n
y
)
κ(y|x ,y)
.
Therefore, the equation
∂f
∂n
u
=
∂f
∂n
v
holds for every u and v I(y|x) if and only if
for every u and v I(y|x) we have κ(y|x,u) · n
v
= κ(y|x,v) · n
u
if and only if
n
u
κ(y|x ,u)
=
n
v
κ(y|x ,v)
for all u and v I(y|x). In other words, the equation
∂f
∂n
u
=
∂f
∂n
v
holds for every u and v I(y|x) if and only if the ratio
n
y
κ(y|x ,y)
= α is a constant
independent of y I(y|x). Moreover , this also gives us n
u
=
n
v
κ(y|x ,v)
· κ(y|x,u)=
α · κ(y|x,u) for every u I(y|x) and, according to the constraint, we also h ave
yI(y|x)
n
y
= α ·
yI(y|x)
κ(y|x,y)=|I(y|x)| + m −|I(y)|
59 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
which gives
α =
|I(y|x)| + m −|I(y)|
yI(y|x)
κ(y|x,y)
.
Notice that the point
q with coordinates n
u
=
|I(y|x)|+m−|I(y)|
P
yI(y|x)
κ(y|x ,y)
· κ(y|x,y) is the
unique point which satises f(
q )=λ ·∇g(
q ). We argue that this point must
be the global maximum of the function f on the domain D = { n
y
| n
y
0}∩
g
1
({|I(y|x)| + m −|I(y)|}) which is a closed and bounded subset of R
I(y|x)
and,
hence, is compact. Clearly the function f is continuous on D and, since D is compact
it must achieve a minimum and a maximum on D. The only interesting case to con-
sider is when |I(y|x)| > 1 (indeed, if |I(y |x)| =1,thenD is a singleton set whose
only point is
q so that it is trivially a global maximum). If maximum of f was not the
point
q then it must be the point on the boundary of D (since it is the only interior
point satisfying f(
q )=λ ·∇g(
q )). But every boundary point of D has at least
one zero coordinate so that f(
y )=0for every boundary point
y of D. On the other
hand f (
q ) > 0. Thus we deduce that f achieves a global maximum at the point
q .
We then have the following sufcient con dition for
x y:
Theorem 4.11.6 Given populations x and y of size m, we have x y with respect to
tness-proportional selection as described in denitio n 4.2.2, if I(y) I(x) and
xI(x|y)
(f(x))
κ(x|y ,x)
yI(y|x)
(f(y))
κ(y|x ,y)
(
|I(y|x)| + m −|I(y)|
yI(y|x)
κ(y|x,y)
)
P
yI(y|x)
κ(y|x ,y)
·
yI(y|x)
κ(y|x,y)
κ(y|x ,y)
Example 4.11.7 Continuing with examples 4.11.1, 4.11.2 and 4.11.3, according to
corollary 4.11.6 we have x y (recall that we do have I(y) I(y))if
f(a)
3
f(c)
2
· f(d)
(
|I(y|x)| + m −|I(y)|
κ(y|x,c)+κ(y|x,d)
)
κ(y|x ,c)+κ(y|x,d)
· κ(y|x,c)
κ(y|x ,c)
· κ(y|x,d)
κ(y|x ,d)
=
=(
7
3
)
3
· 2
2
· 1
1
=50.(814).
Notice that the bound is only slightly larger than the exact one given in example 4.11.3.
Moreover, although corollary 4.11.6 itself only provides a numerical bound, the method
of Lagrange multipliers which was used to establish corollary 4.11.6, suggests how one
can narrow down the search for the optimizing choice of coefcients by considering
only the pairs (n
c
,n
d
) with integer coordinates which are closest to the point with
coordinates x
u
=
|I(y|x)|+m−|I(y)|
P
yI(y|x)
κ(y|x ,y)
· κ(y|x,u) in “every direction”. In our specic
example, this point is (
7
3
· 2,
7
3
· 1) = (4
2
3
, 2
1
3
) and so the only potential candidates
are (4, 3) and (5, 2). We saw in 4.11.3 that the point (5, 2) is the winner. Of course,
this “narrowing down” procedure is particularly useful for the cases when |I(y|x)| is
a large number.
60 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
4.12 What Can Be Said when the Last Elementary Step
is Mutation?
Although not nearly as much can be said when the last elementary step is mutation, the
following result is a rather general “anti-communism” theorem. It should be noted that
a much stronger and more informative result which depends on the assumption that
crossover is “pure” in the sense of [14] (meaning that identical pair of parents produce
a pair of the same identical children) shall be established in a sequel paper.
Theorem 4.12.1 Let {q
xy
}
x,y∈X
and {p
xy
}
x,y∈X
denote Markov transition matri-
ces on a nite set X . Suppose {q
xy
}
x,y∈X
is non-annihilating in the sense of deni-
tion 4.10.3. Also let {{m
δ
xy
}
x,y∈X
| 0 <1} denote an indexed family of Markov
transition matrices such that for every >0 there exists r>0 such that for all δ<r
we have {m
δ
xy
}
x,y∈X
I <for some norm on the nite dimensional vector space
of |X | × |X | matrices
13
. Suppose also that for all δ>0 with δ<1 the composed
Markov chain M(δ)={m
δ
xy
}
x,y∈X
·{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
is irreducible. Let
denote the relation associated with the Markov transition matrix {p
xy
}
x,y∈X
(see def-
inition 4.10.2). Finally, let π
δ
denote the unique stationary distribution of the Markov
chain M(δ). Then, for all small enough δ, either there exists a state z ∈Xsuch that
π
δ
(z) <
1
|X |
or, whenever x y, we also have π
δ
(x)
δ
(y). In particular, as long
as = , the stationary distribution of the Markov chain determined by the transition
matrix M(δ) is never uniform for all sufciently small “mutation rates” δ.
Proof: Denote by Λ={{λ
z
}
z∈X
|
z∈X
λ
z
=1λ
z
0} the probability simplex
and let Λ
1
|X |
= {{λ
z
}
z∈X
|
z∈X
λ
z
=1
z
1
2|X |
}. For any given x y ∈X
consider a function f
x, y
1
|X |
R which sends a given λ Λ
1
|X |
to the number
{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
(λ)(x) −{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
(λ)(y) > 0
thanks to proposition 4.10.5. From basic point-set topology we know that the set
Λ
1
|X |
is a compact topological space (it is a closed and bounded subset of R
|X |
with
|X | < ) and, moreover, the function f
x, y
is continuous (it is a restriction of a lin-
ear map). It follows then that the function f
x, y
achieves a minimum, min(f
x, y
),on
Λ
1
|X |
. Thanks to proposition 4 .10.5 this minimum must be a positive number since
the matrix {q
xy
}
x,y∈X
is non-annihilating and every λ Λ
1
|X |
has the property that
λ(z)
1
2|X |
> 0 for every z ∈X. We now conclude that
α =min{min{f
x, y
(λ) | λ Λ
1
|X |
}|x y ∈X}> 0.
Now choose r>0 small enough so that whenever 0 <rwe have
{m
δ
xy
}
x,y∈X
I
op
< min{
α
3
,
1
3|X |
}.
13
It is a fact that all the norms on nite-dimensional vector spaces are equivalent. It is then irrele-
vant which norm we consider. For practical applications it is con venient to use {a
xy
}
x,y∈X
max
=
max{|a
xy
||x, y ∈X}. For the purpose of proving the theorem it seems most convenient to
use the operator norm dened as {a
xy
}
x,y∈X
op
=sup{{a
xy
}
x,y∈X
(v)|v =1} where
(v
1
,v
2
,...,v
|X |
) =
P
|X |
i=1
|v
i
|.
61 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Choose any δ satisfying 0 <rNow there are exactly two mutually exclusive and
exhaustive cases:
Case 1: n N such that {p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(Λ) Λ
1
|X |
.
In this case, let γ
δ
= {p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(π
δ
).Sinceπ
δ
is the sta-
tionary distribution of M(δ) (see the statement of the theorem), it is also the stationary
distribution of M(δ)
n+1
and it follows that
{m
δ
xy
}
x,y∈X
(γ
δ
)={m
δ
xy
}
x,y∈X
· ({p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(π
δ
)) =
=({m
δ
xy
}
x,y∈X
·{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
) ·M(δ)
n
(π
δ
)=M(δ)
n+1
(π
δ
)=π
δ
and so
γ
δ
π
δ
= (I −{m
δ
xy
}
x,y∈X
)(γ
δ
)≤I −{m
δ
xy
}
x,y∈X
op
<
α
3
.
In particular, x y ∈Xwe have
|γ
δ
(x) π
δ
(x)| <
α
3
and |γ
δ
(y) π
δ
(y)| <
α
3
so that
π
δ
(x)
δ
(x)
α
3
and π
δ
(y)
δ
(y)+
α
3
and, nally,
π
δ
(x) π
δ
(y)
δ
(x)
α
3
(γ
δ
(y)+
α
3
)=γ
δ
(x) γ
δ
(y)
2α
3
>
α
3
> 0
thanks to the choice of α, and it follows, in this case, th at π
δ
(x)
δ
(y).
Case 2: n N we have {p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(Λ) Λ
1
|X |
.
In this case, rst we claim that for every n N there exists a distribution γ
M(δ)
n+1
(Λ) such that γ(z) <
5
6|X |
for some z ∈X. Indeed, the assumption of case
2 says that there exists a distribution λ ∈{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(Λ) such
that λ(z) <
1
2X
.Butthenγ = {m
δ
xy
}
x,y∈X
(λ) is the distribution with the desired
property. Indeed, since λ ∈{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(Λ), it follows that
λ = {p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(η) for some distribution η Λ.Butthen
γ = {m
δ
xy
}
x,y∈X
· ({p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
·M(δ)
n
(η)) =
=({m
δ
xy
}
x,y∈X
·{p
xy
}
x,y∈X
·{q
xy
}
x,y∈X
) ·M(δ)
n
(η)=
= M(δ)
n+1
(η) ∈M(δ)
n+1
(Λ).
Moreover, since δ<rwe have
|γ(z) λ(z)|≤γ λ = {m
δ
xy
}
x,y∈X
(λ) λ =
= ({m
δ
xy
}
x,y∈X
I)(λ)≤({m
δ
xy
}
x,y∈X
I)
op
<
1
3|X |
.
62 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
But then we also have γ(z) λ(z)+
1
3|X |
<
1
2|X |
+
1
3|X |
<
5
6|X |
as desired. So we
deduce every one of the sets M(δ)
n+1
(Λ) contains a point γ
n+1
with γ
n+1
(z) <
5
6|X |
for some z ∈X. It is well-known from Markov chain theory that the sequence of
convex compact sets {M(δ)
n+1
(Λ)}
n=1
is nested (M(δ)
n+1
(Λ) ⊇M(δ)
n+2
(Λ))
and
n=1
M(δ)
n+1
(Λ) = {π
δ
} where π
δ
is the unique stationary distribution of
the Markov chain determined by the matrix M(δ). Also, all the elements of the se-
quence {γ
n+1
}
n=1
inside of the compact set Λ, and, hence, the sequence {γ
n+1
}
n=1
has a convergent subsequence {γ
(n+1)
k
}
k=1
.Butthenγ
(n+1)
k
π
δ
as k →∞
(since the limit point must lie inside of every one of the compact sets M(δ)
n+1
(Λ)
and there intersection consists of a single point π
δ
). Moreover, notice that since X is
a nite set while {γ
(n+1)
k
}
k=1
is an innite sequence, according to the “pigeonhole
principle” it follows that z ∈Xsuch that innitely many elements of the subse-
quence {γ
(n+1)
k
}
k=1
have the property that {γ
(n+1)
k
}
k=1
(z) <
5
6|X |
.Inotherwords,
z ∈Xfor which we can extract a subsequence {γ
(n+1)
k
s
}
s=1
of the convergent
sequence {γ
(n+1)
k
}
k=1
with the property that γ
(n+1)
k
s
(z) <
5
6|X |
. In particular,
γ
(n+1)
k
s
(z) π
δ
(z) as s →∞and it follows that π
δ
(z)
5
6|X |
<
1
|X |
which is
what we were after.
When applying theorem 4.12.1 we have in mind that {q
xy
}
x,y∈X
is the Markov tran-
sition matrix corresponding to recombination ( i. e. a sub-algorithm determined by a
single elementary step of type 2: see denitions 4.3.1 and 4.2.5), {p
xy
}
x,y∈X
is the
Markov transition matrix corresponding to selection (i. e. a sub-algorithm d etermined
by a single elementary step of type 1: see denition 4.2.2) and {m
δ
xy
}
x,y∈X
is the
Markov transition matrix corresponding to mutation with some “rate” δ. For the pur-
pose of the current section, thanks to the generality of theorem 4.12.1, it is sufcient to
assume only that m
δ
xy
> 0 x, y and that max({m
δ
xy
| x = y ∈X}) 0 as δ 0.
The following proposition tells us when mutation determined by the reproduction 1-
tuple , M,p) satises conditions of theorem 4.12.1:
Denition 4.12.1 An ergodic family of mutations is an indexed family of ergodic mu-
tation 1-tup les (see denition 4.9.1) of the form {, M,p
δ
)}
δ(0, 1)
where p
δ
(1
Ω
)
1 δ.
Proposition 4.12.2 Suppose {, M,p
δ
)}
δ(0, 1)
is an ergodic family of mutations
as in den ition 4.12.1. Then δ (0, 1) the Markov transition matrix {m
δ
xy
}
x,y∈X
associated to the sub-algorithm determined by the mutation 1-tuple , M,p
δ
) has
the property that I −{m
δ
xy
}
x,y∈X
→0 as δ 0.
Proof: Notice that I −{m
δ
x, y
}
x,y∈X
=max{m
δ
xy
| x = y} andsoitsufces
to show that for every x = y we have m
δ
x, y
0 as δ 0 (since the state space is
nite). No tice also that x = y we have 0 <m
δ
x, y
< 1 m
δ
x, x
.Itnowsufces to
show only that x ∈X
m
we have 1 m
δ
x, x
0 as δ 0, or, equivalently, that
x ∈X
m
we have m
δ
x, x
1 as δ 0. If we write x =(x
1
,x
2
,...,x
m
)
Ω
m
then, since 1
Ω
(x
i
)=x
i
we see that 1 m
δ
x, x
(p
δ
(1
Ω
))
m
(1 δ)
m
1 as
δ 0 and the desired conclusion follows.
63 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Combining theorem 4.12.1 with the conclusion of example 4.11.5 (saying that =
for tness-proportional selection) we deduce the following:
Corollary 4.12.3 Suppose for every 0 <δ<1 we are given an evolutionary algorithm
A
δ
determined by the cycle s
1
,s
2
,s
δ
3
where s
1
is any elementary step (but usually it is
an elementary step of type 2), s
2
is the elementary step of type 1 (tness-proportional
selection as described in den ition 4.2.2 ) and s
δ
3
is an elementary step of type 2 de-
termined by an ergodic mutation 1-tuple chosen from an ergodic family of mutations
(see denition 4.12.1). Then the Markov chain determined by the algorithm A
δ
with
state space X
m
is irreducible and, for all small enough δ, the unique stationary
distribution of this Markov chain is not uniform.
Corollary 4.12.3 tells us, in particular, that the stationary distribution of the Markov
chain associated to an algorithm A with the second elementary step being of type 1
(selection) is never uniform, even when the tness function is at. It is still reason-
able to conjecture though, that in case of at-tness selection, under certain symmetry
assumptions on recombination and mutation, everyone of the individuals in a given
population is equally likely to occur “in the long run” in the sense of denition 4.5 .6.
Results of this nature (and even stronger) shall be established in the upcoming paper.
4.13 Conclusions
In the current paper we applied the methods developed in [5] to obtain a schema-
based version of Geiringer’s theorem for non-linear GP with homologous crossover.
The result enables us to calculate exactly the lim iting distribution of the Markov chain
associated with the evolution of a nite (xed size) population under the action of
repeated crossover, or the action of the mixture of crossover and mutation. This is
an extension of the results for xed and variable length strings given in [5] for nite
populations.
The main result established in [5] applies only in the absence of selection and only
when crossover and mutation are bijective (which is often, but not always the case). In
the current paper we established a property of the stationary distribu tion of the Markov
chain for a rather wide class of EAs. More specically, we introduced a pre-order
relation on the state space of a Markov chain which allows us to establish rather general
inequalities concerning the stationary distribution of the Markov chain determ ined by
an EA. This pre-order relation depends primar ily on selection and not on the other
stages d etermining an EA. In section 4.11 this partial order is completely classied for
the case of tness-proportional selection in section 4.11. More results on this issue,
as well as some connection between the innite and the nite population Geiringer
theorems will appear in a forthcoming paper.
64 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
Bibliography
[1] G Briscoe and P De Wilde. D6.6 high- level design specication of the distributed
intelligence system. Digital Business Ecosystem, Contract no 507953, 2006.
[2] S. Coffey. An applied probabilist’s guide to genetic algorithms. Master’s thesis,
The University of Dublin, 1999.
[3] R.A.Fisher.The genetical theory of natural selection. Oxford: Clarendon Press,
1930.
[4] H. Geir inger. On the probability of linkage in Mendelian heredity. Annals of
Mathematical Statistics, 15:25–57, 1944.
[5] B. Mitavskiy and J. Rowe. An extension of geiringer theorem for a wide class of
evolutionary algorithms. Evolutionary Computation, 14, 2006.
[6] Heinz Muhlenbein and Thilo Mahnig. Evolutionary algorithms and the Boltz-
mann distribution. In Foundations of Genetic Algorithms (FOGA2002). Morgan
Kaufmann Publishers, 2002.
[7] Heinz Muhlenbein and Thilo Mahnig. Evolutionary computation and Wright’s
equation. Theoretical Computer Science, 2002.
[8] Liviu Panait and Sean Luke. Alternative bloat control methods. In Genetic
and Evolutionary Computation GECCO-2004, pages 630–64. Springer-Verlag,
2004.
[9] R. Poli. Hyperschema theory for GP with one-point crossover, building blocks,
and some new results in GA theory. In R. Poli and W. Banzhaf et al, editors,
Genetic Programming, Pro ceedings of EuroGP’2000, pages 163–180. Springer-
Verlag, 2000.
[10] R. Poli. A simple but theoretically-motivated method to control bloat in genetic
programming. In Genetic Programming, Proceedings of the 6th European Con-
ference, EuroGP 2003, pages 211–223. Springer-Verlag, 2003.
[11] R. Poli and W. Langdon. On the search properties of different crossover operators
in genetic programming. In Proceedings of the Third Annual genetic program-
ming conference, pages 293–301, 1998.
65 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)
[12] R. Poli, C. Stephens, A. Wright, and J. Rowe. A schema-theory-based extension
of geiringer’s theorem for linear GP and variable-length GAs under homologous
crossover. In K. De Jong, R. Poli, and J. E. Rowe, editors, Foundations of Genetic
Algorithms, volume 7, pages 45–62, 2002.
[13] Y. Rabani, Y. Rabinovich, and A. Sinclair. A computational view of population
genetics. In Annual ACM S ymposium on the Theory of Computing, pages 83–92,
1995.
[14] N. Radcliffe. The algebra of genetic algorithms. Annals of Mathematics and
Articial Intelligence, 10:339–384, 1994.
[15] L. Schm itt. Theory of genetic algorithm s. Theoretical Computer Science, 259:1–
61, 2001.
[16] L. Schmitt. Theory of genetic algorithms ii: models for g enetic operators over the
string-tensor representation of populations and convergence to global optima for
arbitrary tness function under scaling. Theoretical Computer Science, 310:181–
231, 2004.
[17] W. Sp ears. The equilibrium and transient behavior of mutation and recombina-
tion. In W. Martin and W. Spears, editors, Foundations of genetic Algorithms,
volume 6, pages 241–260, 2000.
[18] STU. Report on evolutionary and distributed tness environment. Internal, 2006.
[19] M De Tommasi. D15.3 BML framework 2nd release. Digital Business Ecosystem,
2005.
[20] UBHAM. Report on population dynamics for variable-sized structures. DBE
Deliverable 8.2 .
[21] UBHAM. Report on the evolution of high-level software components. DBE
Deliverable 8.1 .
[22] M. Vose. The simple genetic algorithm: foundations and theory. MIT Press,
1999.
[23] A. H. Wright, J. E. Rowe, R. Poli, and C. R. Stephens. A xed point analysis of
a genepool GA with mutation. In Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO 2002). Morgan Kaufmann Publishers, 2002.
[24] S. Wright. Evolution in mendelian populations. Genetics, 16:97–159, 1931.
66 / 66
Del 8.3 Report on Population dynamics in EvE
DBE Project (Contract n° 507953)