<TeXmacs|1.99.7>

<style|<tuple|article|std-latex>>

<\body>
  <\hide-preamble>
    <assign|familydefault|<macro|ptm>>

    <assign|red|<macro|>>

    <assign|pmm|<macro|p<rsub|m>>>

    <assign|pii|<macro|p<rsub|i>>>

    <assign|nc|<macro|N<rsub|c>>>

    <assign|nt|<macro|N<rsub|t>>>

    <assign|bnc|<macro|<wide|N|\<bar\>><rsub|c>>>

    <assign|bnt|<macro|<wide|N|\<bar\>><rsub|t>>>

    <assign|be|<macro|>>

    <assign|ee|<macro|>>

    <assign|nofigure|<macro|0>>

    <\assign|@makecaption>
      <\macro|1|2>
        abovecaptionskip <sbox><@tempboxa|<arg|1>.
        <arg|2>><ifdim><wd><@tempboxa>\<gtr\><hsize>
        <space|1cm><minipage|f|14cm|<with|font-series|bold|<arg|1>.> <arg|2>>

        <else> <global><@minipagefalse> <hb@xt@><hsize><htab|0pt>\<box\><@tempboxa><htab|0pt><fi>
        belowcaptionskip
      </macro>
    </assign>

    <assign|figure-name|<macro|Figure>>
  </hide-preamble>

  <\center>
    <with|font-size|1.68|font-series|bold|Mutation-selection dynamics and
    error threshold in an evolutionary model for Turing Machines>

    <vspace|10mm>

    <\minipage||110mm>
      <\center>
        <with|font-size|1.41|Fabio Musso<rsup|<math|>> & Giovanni
        Feverati<rsup|<math|\<dag\>>>><next-line><vspace*|10mm>
        <rsup|<math|>> Departamento de Fsica, Universidad de
        Burgos,<next-line>Plaza Misael Bauelos s/n, 09001 Burgos,
        Spain<next-line>fmusso@ubu.es<next-line><vspace*|2mm>
        <rsup|<math|\<dag\>>> Laboratoire de physique theorique LAPTH, CNRS,
        Universit de Savoie,<next-line>9, Chemin de Bellevue, BP 110, 74941,
        Annecy le Vieux Cedex, France feverati@lapp.in2p3.fr
      </center>
    </minipage>
  </center>

  <paragraph|Keywords:> Darwinian evolution, in-silico evolution,
  mutation-selection, error threshold, Turing machines

  <section*|Abstract>

  <\quote-env>
    We investigate the mutation-selection dynamics for an evolutionary
    computation model based on Turing Machines that we introduced in a
    previous article <cite|PRE>.

    The use of Turing Machines allows for very simple mechanisms of code
    growth and code activation/inactivation through point mutations. To any
    value of the point mutation probability corresponds a maximum amount of
    active code that can be maintained by selection and the Turing machines
    that reach it are said to be at the error threshold. Simulations with our
    model show that the Turing machines population evolve towards the error
    threshold.

    Mathematical descriptions of the model point out that this behaviour is
    due more to the mutation-selection dynamics than to the intrinsic nature
    of the Turing machines. This indicates that this result is much more
    general than the model considered here and could play a role also in
    biological evolution.
  </quote-env>

  <section|Introduction>

  The study of \Pin silico\Q evolutionary models has increased significantly
  in recent times, see <cite|Tierra>, <cite|Lenskietal1999>,
  <cite|Wilkeetal2001>, <cite|Lenskietal2003>, <cite|Knibbeetal2007>,
  <cite|Knibbeetal2007b>, <cite|Cluneetal2008>, <cite|PRE> just to give some
  examples. The basic idea behind these models is to simulate the evolution
  of computer algorithms subject to mutation and selection procedures. In
  this artificial evolution setting, the algorithms play the role of the
  biological organisms and they are selected on the basis of their ability in
  performing one or more prescribed tasks (replicate themselves, compute some
  mathematical function, etc.). While the simulated algorithms have clearly
  an incomparably lesser degree of complexity than a whatever biological
  organism, the hope is that (at least some of) the phenomena observed in the
  digital evolution model could correspond to general behaviours of
  evolutionary systems. Indeed, it seems that this is what happens in some
  cases: emergence of parasitism in <cite|Tierra>, quasi-species selection in
  <cite|Wilkeetal2001> and the striking similarity between the C-value enigma
  <cite|Gregory> and the phenomenon of code-bloat in evolutionary programming
  <cite|Luke>, <cite|PRE>.

  One of the motivations for performing artificial evolution experiments is
  the continuously increasing computational power of modern computers.
  Nowadays, very fast multiprocessor computers have relatively low prices and
  many scientific institutions have at their disposal large facilities for
  parallel computation. For example, one run lasting <math|50000> generations
  of a population of <math|300> Turing machines (TMs) of our evolutionary
  model lasts about half a day per processor on an ordinary home computer
  (for the higher value of the states-increase rate <math|<pii>>, for lower
  values it lasts considerably less). The long term evolution experiment on
  E. coli directed by R.E. Lenski reached the <math|40000> generations after
  almost <math|20> years <cite|Lenski> (however, the population considered in
  this experiment is much larger, of the order of <math|10<rsup|7>> cells).
  When population size is not a crucial parameter, digital evolution
  experiments can explore a number of generations inaccessible to laboratory
  experiments with real organisms. If one wants to study evolutionary effects
  on a so large time scale in real biological organisms, then has to resort
  to paleontological studies. However, such studies are vexed by the
  incompleteness of the fossil record and by the unrepeatability of the
  experiments. Indeed, repeatability allows to discriminate easily among
  effects due to adaptation and those simply due to drift. These problems are
  overcame in laboratory experiments such as Lenski one, but at the price of
  reducing the environment to a Petri dish. Artificial evolution experiments
  allow to explore larger time scales than laboratory experiments at much
  reduced costs, but at the higher price of replacing biological organisms
  with algorithms. By the way, there is another big advantage when performing
  artificial evolution experiments, namely the complete control over all the
  experimental settings. This gives the opportunity to use a reductionistic
  approach, by studying separately the effects of the various mechanisms
  involved in the evolutionary dynamics, something that is very difficult to
  obtain when working with real organisms. Finally, as a last argument in
  favour of artificial evolution experiments, we cite one given by Maynard
  Smith <cite|MaynardSmith>: \P...we badly need a comparative biology. So
  far, we have been able to study only one evolving system and we cannot wait
  for interstellar flight to provide us with a second. If we want to discover
  generalizations about evolving systems, we will have to look at artificial
  ones.\Q

  As we said, even the most complicated computer algorithm is incomparably
  simpler than a whatever biological organism. Moreover, typical artificial
  evolution experiments have a unique ecological niche and the interaction
  between the artificial organisms is often limited to the comparison of
  their performances. So, a very big distance separates artificial evolution
  experiments from biological evolution. For this reason, many biologists are
  skeptical on the biological relevance of the results obtained in the
  digital framework; for example, some objections typically raised are
  reported in <cite|ONeill>. On the other hand, supporters of artificial
  evolution experiments reply that the observed results can actually be
  general phenomena of evolutive systems, therefore being independent from
  the particular model under consideration. To test this hypothesis it would
  be nice to compare the results obtained in the artificial evolution setting
  with real biological data, but this is very hard to do for long-term
  evolutionary effects, that is where artificial evolution models are most
  useful. On the other hand, general evolutionary behaviours do emerge if the
  mutation-selection dynamics have a prominent role on the peculiar
  characteristics of the evolving organism. When this is the case, the
  observed effects can be reproduced through a population genetic
  mathematical model. Indeed, these models center on the dynamics induced by
  the selection and mutation operators (under some work hypotheses), more
  than in the specific details of functioning of the organism. If successful,
  this procedure extends the validity of the results observed in the
  evolutionary model under consideration to all the evolutionary models
  working with the same mutation and selection operators (under the same
  hypotheses). This means that the problem of the biological relevance of the
  results obtained in the artificial evolution experiment is switched to the
  problem of assessing the biological likelihood of the mutation and
  selection operators and of the hypotheses used in the mathematical model.

  In this paper we apply this strategy, so that we derive a deterministic and
  a stochastic population genetic model of our evolutionary model for TMs.
  Our main aim is to show that the evolutionary dynamics pushes the TMs
  toward the error threshold <cite|Eigen71>, The population genetic model is
  used to compute mathematically the value of the error threshold and to show
  that this dynamical behaviour is due to quite mild hypotheses

  According to this program, in the Materials and Methods section we first
  briefly recall our evolutionary model for TMs <cite|PRE> and the Eigen
  error threshold concept. A deterministic population genetic model for our
  digital evolution model is introduced in the third subsection \PThe
  deterministic model\Q, while its stochastic counterpart (limited to the
  evolution of the best performing TMs) is given in the fourth one. In the
  Results section we report the results obtained by the computer simulations
  and compare them with those predicted by the mathematical models. Finally,
  our concluding remarks are given in the Discussion section.

  <section|Materials and Methods>

  <subsection|The evolutionary model><label|model>

  We basically use the same evolutionary programming model based on Turing
  Machines that has been introduced in <cite|PRE>. The following are the only
  differences between that model and the model we use in this article:

  <\enumerate>
    <item>the TMs' movable head can move only right or left, now it cannot
    stay still (this also affects the definition of the added
    state);<label|still>

    <item>the TMs' tape is now circular, so that the TM head cannot exit from
    the tape;<label|circular>
  </enumerate>

  The first choice allows us to save one bit of memory for each state of the
  TMs and, at the same time, makes our definition more similar to the
  original one <cite|Turing>. The second choice seems to us the most
  convenient when dealing with finite tapes.

  To the sake of making this article self-contained, we give a terse
  description of Turing machines and of the evolutionary programming model
  that we use.

  Turing Machines are very simple symbol-manipulating devices which can be
  used to encode any feasible algorithm. They were invented in 1936 by Alan
  Turing <cite|Turing> and used as abstract tools to investigate the problem
  of functions computability. For a complete treatment of this subject we
  refer to <cite|davis>.

  A Turing machine consists of a movable head acting on an infinite tape
  <math|T<around|(|t|)>>, see figure <reference|Turing>. The tape consists of
  discrete cells that can contain a 0 or a 1 symbol. The head has a finite
  number of internal states that we denote by
  <math|<with|font-series|bold|N>> (in which case the TM is called an
  <math|<with|font-series|bold|N>>-state TM). At any time <math|t> the head
  is in a given internal state <math|<math-bf|s><around|(|t|)>> and it is
  located upon a single cell <math|k<around|(|t|)>> of the infinite tape
  <math|T<around|(|t|)>>. It reads the symbol stored inside the cell and,
  according to its internal state and the symbol read, performs three
  actions:

  <\enumerate>
    <item>\Pwrite\Q: writes a new symbol on the <math|k<around|(|t|)>> cell
    (<math|T<around|(|t|)>\<mapsto\>T*<around|(|t+1|)>>),

    <item>\Pmove\Q: moves one cell on the right or on the left
    (<math|k<around|(|t|)>\<mapsto\>k*<around|(|t+1|)>>),

    <item>\Pcall\Q: changes its internal state to a new state
    (<math|<math-bf|s><around|(|t|)>\<mapsto\><math-bf|s><around|(|t+1|)>>).
  </enumerate>

  Accordingly, a state can be specified by two triplets \Pwrite-move-call\Q
  listing the actions to undertake after reading respectively a <math|0> or
  <math|1> symbol. There exists a distinguished state (the Halt state) that
  stops the machine when called. The initial tape <math|T<around|(|0|)>> is
  the input tape of the TM, and the tape <math|T<around|(|<wide|t|\<bar\>>|)>>
  at the instant <math|<wide|t|\<bar\>>> when the machine stops is its output
  tape, that is the result of executing the algorithm defined by the given TM
  on the input tape <math|T<around|(|0|)>>. However, many TMs will never
  stop, so that they will not be associated with any algorithm. Moreover, the
  halting problem, that is the problem of establishing if a TM will
  eventually stop when provided with a given input tape, is undecidable. This
  means that there will exist TMs for which it is impossible to predict if
  they will eventually halt or not for a given input tape.

  We have to introduce some restrictions on the definition of the TMs in our
  evolutionary model. Since we want to perform computer simulations, we need
  to use a tape of finite length that we fix to <math|300> cells. The
  position of the head is taken modulo the length of the tape, that is we
  consider a circular tape with cell <math|1> coming next cell <math|300>.
  Since it is quite easy to generate machines that run forever, we also need
  to fix a maximum number of time steps, therefore we choose to force halting
  the machine if it reaches <math|4000> steps.

  We begin with a population of <math|300> <math|1>-state TMs of the
  following form

  <\equation>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|<with|font-series|bold|1>>>|<row|<cell|0>|<cell|0-<text|move1>-<text|<with|font-series|bold|Halt>>>>|<row|<cell|1>|<cell|1-<text|move2>-<text|<with|font-series|bold|Halt>>>>>>><label|state>
  </equation>

  where move1 and move2 are fixed at random as Right or Left, and let them
  evolve for <math|50000> generations. At each generation every TM undergoes
  the following three processes (in this order):

  <\enumerate>
    <item>states-increase,

    <item>mutation,

    <item>selection and reproduction.
  </enumerate>

  <paragraph|States-increase.> In this phase, further states are added to the
  TM with a rate <math|<pii>>. The new states are the same as
  (<reference|state>) with the <math|<with|font-series|bold|1>> label
  replaced by <math|<with|font-series|bold|N+1>>,
  <math|<with|font-series|bold|N>> being the number of states before the
  addition. While it is clear that the states-increase should be considered a
  form of mutation (vaguely resembling insertion), we preferred to keep it
  distinguished because its effect is always neutral.

  <paragraph|Mutation.> During mutation, all entries of each state of the TM
  are randomly changed with probability <math|<pmm>>. The new entry is
  randomly chosen among all corresponding permitted values excluding the
  original one. The permitted values are:

  <\itemize>
    <item>0 or 1 for the \Pwrite\Q entries;

    <item>Right, Left for the \Pmove\Q entries;

    <item>The Halt state or an integer from <with|font-series|bold|1> to the
    number of states <math|<math-bf|N>> of the machine for the \Pcall\Q
    entries.
  </itemize>

  <paragraph|Selection and reproduction.> In the selection and reproduction
  phase a new population is created from the actual one (old population). The
  number of offspring of a TM is determined by its \Pperformance\Q and, to a
  minor extent, by chance. Actually, in the field of evolutionary programming
  the word used is fitness. However, in population genetics, fitness is used
  to denote the expected number of offspring (or the fraction that reach the
  reproductive age) of an individual. To avoid ambiguities, we decided to
  reserve the word fitness for this meaning, and to use the word performance
  for the evolutionary programming one. The performance of a TM is a function
  that measures how well the output tape of the machine reproduces a given
  \Pgoal\Q tape starting from a prescribed input tape. We compute it in the
  following way. The performance is initially set to zero. Then the output
  tape and the goal tape are compared cell by cell. The performance is
  increased by one for any <math|1> on the output tape that has a matching
  <math|1> on the goal tape and it is decreased by 3 for any <math|1> on the
  output tape that matches a <math|0> on the goal tape.

  As a selection process, we use what in the field of evolutionary algorithms
  is known as \Ptournament selection of size 2 without replacement\Q. Namely
  two TMs are randomly extracted from the old population, they run on the
  input tape and a performance value is assigned to them according to their
  output tapes. The performance values are compared and the machine which
  scores higher creates two copies of itself in the new population, while the
  other one is eliminated (asexual reproduction). If the performance values
  are equal, each TM creates a copy of itself in the new population. The two
  TMs that were chosen for the tournament are eliminated from the old
  population (namely they are not replaced) and the process restarts until
  the exhaustion of the old population. Notice that this selection procedure
  keeps the total population size <math|N> (with <math|N> an even number)
  constant. From our point of view this selection mechanism has two main
  advantages: it is computationally fast and quite simple to treat
  mathematically.

  The choice of TMs to encode the algorithms in our evolutionary model was
  convenient for various reasons. The first reason is that any feasible
  algorithm can be encoded through a TM (Church-Turing thesis <cite|davis>);
  so that TMs are universal objects inside the algorithms class. The second
  reason is that even TMs with a very low number of states can exhibit a very
  complicated and unpredictable behaviour (even if the input tape is filled
  only with zeroes as in our case, see for example the busy beaver function
  <cite|Beaver>). Thanks to this property, it is very difficult to predict
  the dynamics of our evolutionary model.

  While developing the model, we were primarily interested in how the
  variations in the length of the code affect the evolutionary dynamics. From
  this point of view, the TMs present many advantages. The distinction
  between coding and non-coding triplets is unambiguous and very easy to
  verify. We define a triplet as non-coding (with respect to a given input
  tape) if mutations in its entries cannot affect the output tape of the TM
  and we will call it coding in the complementary case. This definition is
  practically equivalent to saying that a triplet is coding if it is executed
  at least one time when the TM runs on a given input tape and it is
  non-coding if it is never executed. In this way, to identify coding
  triplets, one has only to run the TM on the prescribed input tape and mark
  the triplets that are executed. Another advantage is that the mechanism of
  state-adding is completely neutral; the added states are always non-coding,
  so that they cannot change the performance of the TM. On the other hand,
  there is a simple mechanism of code activation. Namely, a triplet of a
  non-coding state <math|s> can be activated, for example, when a mutation
  occurs in the call entry of a coding state changing its value to <math|s>,
  but notice that also mutations in the write and move entries (of a coding
  triplet), can result in an activation or inactivation of the TMs triplets.
  Finally, another advantage of using TMs is that they are specified in terms
  of an atomic instruction: the state.

  <subsection|The Eigen's error threshold><label|error>

  The error threshold concept was introduced in 1971 by Eigen in the context
  of its quasispecies model <cite|Eigen71>,<cite|Eigen-Schuster>. The model
  describes the dynamics of a population of self-replicating polynucleotides
  of fixed length <math|L>, subject to mutation and under the constraint of
  constant population size. Each polynucleotide <math|I<rsup|<around|(|i|)>>>
  is characterized by its replication rate <math|A<rsub|i>>, its degradation
  rate <math|D<rsub|i>> and the probabilities <math|Q<rsub|j*i>> of mutating
  into a different polynucleotide <math|I<rsup|<around|(|j|)>>> as a
  consequence of an inexact replication. All these parameters are assumed to
  be fixed numbers, independent of time and of population composition. The
  Eigen model then consists of a set of ODEs determining the evolution of the
  frequency <math|\<phi\><rsub|i>> of the polynucleotides
  <math|I<rsup|<around|(|i|)>>> in the total population:

  <\equation>
    <wide|\<phi\><rsub|i>|\<dot\>>=<big|sum><rsub|j><around|(|A<rsub|j>*Q<rsub|i*j>-D<rsub|j>*\<delta\><rsub|i*j>|)>*\<phi\><rsub|j>-\<phi\><rsub|i>*<big|sum><rsub|j><around|(|A<rsub|j>-D<rsub|j>|)>*\<phi\><rsub|j>,<label|Eigen>
  </equation>

  where the sum is over all possible polynucleotide templates
  <math|I<rsup|<around|(|j|)>>>. It is supposed that the polynucleotide
  <math|I<rsup|<around|(|1|)>>> has a larger fitness than the others
  <math|A<rsub|1>-D<rsub|1>\<gtr\>A<rsub|k>-D<rsub|k>,k\<gtr\>1>. Such
  polynucleotide is usually called the master sequence while the others are
  called mutants. If we assume that mutation is exclusively due to point
  mutation, we neglect transversions, suppose that transitions have all the
  same probabilities of occurring and that the point mutation probability is
  independent on the site, then we can identify our polynucleotides as binary
  chains of length <math|L> and the mutation probabilities <math|Q<rsub|j*i>>
  depend only on the point mutation probability <math|q> and the Hamming
  distance <math|d<around|(|i,j|)>> among the binary chain
  <math|I<rsup|<around|(|i|)>>> and the binary chain
  <math|I<rsup|<around|(|j|)>>>:

  <\equation*>
    Q<rsub|j*i>=q<rsup|d<around|(|i,j|)>>*<around|(|1-q|)><rsup|L-d<around|(|i,j|)>>
  </equation*>

  Once assigned the <math|A<rsub|j>> and <math|D<rsub|j>> parameters, one can
  study the asymptotic composition of the population as a function of the
  point mutation probability <math|q>. It turns out that (at least for some
  choices of the fitness landscape, see <cite|Swetina-Schuster>,
  <cite|Krall>, <cite|Wilke>, <cite|Takeuchi>, <cite|Saakiaan>) there is a
  sharp transition in the population composition near a particular value of
  <math|q> that is termed error threshold. Before the error threshold, the
  population is organized as a cloud of mutants surrounding the master
  sequence, while, after the error threshold, each polynucleotide is almost
  equally represented. In the thermodynamic limit (when the chain length
  <math|L> goes to infinity and the point mutation <math|q> goes to zero in
  such a way that the genomic mutation rate <math|p=q*L> stays finite) this
  is a real phase transition of first order <cite|Tarazona>, and the error
  threshold is mathematically well defined. As a consequence, from this model
  it follows that natural selection can preserve the genome informative
  content only if the mutation rate is lower than the error threshold (see
  <cite|Eigen2000>); after the error threshold, all the information content
  is lost. For a single peak fitness landscape (i.e.
  <math|A<rsub|k>-D<rsub|k>=A<rsub|2>-D<rsub|2>,k\<gtr\>2>) and in the
  thermodynamic limit, the system of equations (<reference|Eigen>) can be
  decoupled into a two by two system by introducing a collective variable
  <math|\<phi\><rsub|M>> for the overall frequency of mutants in the
  population

  <\equation*>
    \<phi\><rsub|M>=<big|sum><rsub|k=2><rsup|\<infty\>>\<phi\><rsub|k>
  </equation*>

  In the thermodynamic limit, the fidelity rate of the master sequence will
  be given by <math|Q<rsub|11>=e<rsup|-p>> and the probability of back
  mutation <math|Q<rsub|1*M>> will go to zero. So, the Eigen equations take
  the form:

  <eqnarray|<tformat|<table|<row|<cell|<wide|\<phi\>|\<dot\>><rsub|1>>|<cell|=>|<cell|<around*|(|A<rsub|1>*e<rsup|-p>-D<rsub|1>|)>*\<phi\><rsub|1>-\<phi\><rsub|1>*<around*|[|<around*|(|A<rsub|1>-D<rsub|1>|)>*\<phi\><rsub|1>+<around*|(|A<rsub|2>-D<rsub|2>|)>*\<phi\><rsub|M>|]>,<eq-number><label|Eigen1>>>|<row|<cell|<wide|\<phi\>|\<dot\>><rsub|M>>|<cell|=>|<cell|A<rsub|1>*<around*|(|1-e<rsup|-p>|)>*\<phi\><rsub|1>+<around|(|A<rsub|2>-D<rsub|2>|)>*\<phi\><rsub|M>-\<phi\><rsub|M>*<around*|[|<around*|(|A<rsub|1>-D<rsub|1>|)>*\<phi\><rsub|1>+<around*|(|A<rsub|2>-D<rsub|2>|)>*\<phi\><rsub|M>|]>.<eq-number><label|Eigen2>>>>>>

  The error threshold <math|<wide|P|\<bar\>>>, in this case, will coincide
  with the lowest value of the mutation probability <math|P=1-e<rsup|-p>> for
  which the master sequence goes extinct in the asymptotic limit
  <math|t\<to\>\<infty\>> <cite|Nowak>. Using the constraint
  <math|\<phi\><rsub|1>+\<phi\><rsub|M>=1> in equation (<reference|Eigen1>)
  we get a closed equation for <math|\<phi\><rsub|1>> that gives:

  <\equation>
    <wide|P|\<bar\>>=<frac|<around|(|A<rsub|1>-D<rsub|1>|)>-<around|(|A<rsub|2>-D<rsub|2>|)>|A<rsub|1>>.<label|Pt>
  </equation>

  Observe that the infinite population limit has the effect of removing the
  genetic drift and that the survival of the master sequence in the
  asymptotic limit for values of the mutation probability less than the error
  threshold is possible only for infinite populations. For finite
  populations, when the probability of reverse mutations is zero, the genetic
  drift will always push the population in its only absorbing state: the
  extinction of the master sequence. In the finite population case, however,
  the expected number of generations before the extinction of the master
  sequence will start to grow by several orders of magnitude when the
  mutation probability drops below the error threshold (see <cite|Nowak89>,
  <cite|io>). This is the reason why the value of the error threshold
  predicted by the deterministic model works also for finite populations.This
  effect is also present in our model (see the next two sections and figure
  <reference|ext1>).

  <subsection|The deterministic model>

  In this section we will describe a deterministic mutation-selection model
  with tournament selection of rank two and we will obtain the corresponding
  error threshold. Our model of selection and reproduction is very different
  from that of the Eigen model. In particular, the number of offspring is not
  constant for each genotype, since while the performance landscape is fixed,
  the fitness landscape changes in time following the changes in the
  population composition. Despite this fact, when one neglects the
  probability of back mutations, one obtains a closed equation for the number
  of individuals with the best performance, as it happens for the Eigen model
  with the single peak fitness landscape. Following the example of the Eigen
  model (see also <cite|bull2005>), we will define the error threshold as the
  value of the mutation probability that causes the extinction of the master
  sequence (that, in our case, is the best performance class) for our
  selection model considered in the deterministic limit.

  Let us suppose that we have <math|M> possible performance classes, a
  population of size <math|N>, and let us denote with <math|n<rsub|i>> the
  number of individuals belonging to the <math|i>th performance class. In the
  selection step we draw <math|2> individuals from the population without
  replacement and compare their performances. The individual with higher
  performance is copied into the new population and has a probability
  <math|f> to give raise to another copy, while, with probability <math|1-f>
  the second copy will belong to the individual with the lower performance.
  When the two individuals have the same performance, then both are passed to
  the new population. The two individuals are eliminated from the old
  population and the process is restarted until the old population is
  exhausted and the new population is replenished. Notice that it must hold
  <math|0\<leq\>f\<leq\>1>. The selection mechanism we used in our TMs model
  correspond to the particular choice <math|f=1>.

  With this mechanism, each individual belonging to <math|n<rsub|i>> has a
  probability

  <\equation*>
    P<rsub|2>=f<frac|<big|sum><rsub|j\<less\>i>n<rsub|j>|N-1>
  </equation*>

  of making two copies of itself, a probability

  <\equation*>
    P<rsub|1>=<frac|1|N-1>*<around*|(|<around|(|1-f|)>*<big|sum><rsub|j\<less\>i>n<rsub|j>+n<rsub|i>-1+<around|(|1-f|)>*<big|sum><rsub|j\<gtr\>i>n<rsub|j>|)>
  </equation*>

  of making one copy of itself, and, finally, a probability

  <\equation*>
    P<rsub|0>=<frac|f|N-1>*<big|sum><rsub|j\<gtr\>i>n<rsub|j>
  </equation*>

  of making no copy at all.

  It follows that the expected number <math|n<rsub|i><rprime|'>> of
  individuals in the <math|i>th performance class after selection is given
  by:

  <\equation>
    n<rsub|i><rprime|'>=n<rsub|i>*<around*|[|1+<frac|f|N-1>*<around*|(|<big|sum><rsub|j\<less\>i>n<rsub|j>-<big|sum><rsub|j\<gtr\>i>n<rsub|j>|)>|]><label|nprimo>
  </equation>

  Notice that it holds:

  <\equation*>
    <big|sum><rsub|i=1><rsup|M>n<rsub|i><rprime|'>=N.
  </equation*>

  Now, let us consider the mutation step. We assume that the individuals in
  each performance class <math|i> share the same probability <math|Q<rsub|i>>
  of undergoing neutral mutations only, or no mutations at all. We will call
  <math|Q<rsub|i>>, with a slight abuse of terminology, the fidelity rate of
  the <math|i>th performance class. Let us denote with <math|g<rsub|i*j>> the
  probability that an individual in the <math|j>th performance class gives
  raise to an individual in the <math|i>th performance class as a result of a
  mutation (<math|g<rsub|i*i>=0> since we included the neutral mutations in
  the fidelity rate <math|Q<rsub|i>>. Obviously, we could define
  <math|Q<rsub|i>> as the probability of undergoing no mutations at all and
  <math|g<rsub|i*i>> as the probability for the intervening mutations of
  being neutral. However, even if the alternative chosen in the text could
  seem clumsier it is better suited for our mathematical analysis.). This
  mutation mechanism gives raise to the following deterministic discrete
  equation:

  <\equation>
    n<rsub|i><rprime|''>=n<rsub|i><rprime|'>*Q<rsub|i>+<big|sum><rsub|j=1><rsup|M><around|(|1-Q<rsub|j>|)>*n<rsub|j><rprime|'>*g<rsub|i*j>.<label|nsecondo>
  </equation>

  Notice that, since by definition,

  <\equation*>
    <big|sum><rsub|i=1><rsup|M>g<rsub|i*j>=1,<label|geq1>
  </equation*>

  it will also hold

  <\equation*>
    <big|sum><rsub|i=1><rsup|M>n<rsub|i><rprime|''>=N.
  </equation*>

  Suppose now that <math|g<rsub|i*j>\<ll\>1> if <math|i\<gtr\>j>, namely that
  the probability of a mutation to a higher performance class is very small,
  then the fraction of individuals undergoing a beneficial mutation in one
  generation is negligible. Let <math|s> be the best occupied performance
  class at a given time <math|n<rsub|s>\<gtr\>0>,
  <math|n<rsub|i>=0,i\<gtr\>s>, and suppose <math|1\<less\>s\<less\>M>. From
  equation (<reference|nprimo>), we get that it also holds
  <math|n<rsub|s><rprime|'>\<gtr\>0>, <math|n<rsub|i><rprime|'>=0,i\<gtr\>s>.
  Then, from (<reference|nsecondo>) and (<reference|nprimo>) we get:

  <\equation>
    n<rsub|s><rprime|''>=n<rsub|s><rprime|'>*Q<rsub|s>=n<rsub|s>*Q<rsub|s>*<around*|[|1+<frac|f|N-1><around*|(|<big|sum><rsub|j\<less\>s>n<rsub|j>|)>|]>=n<rsub|s>*Q<rsub|s>*<around*|[|1+<frac|f*<around|(|N-n<rsub|s>|)>|N-1>|]><label|nss>
  </equation>

  The best performance class is stably populated if
  <math|n<rsub|s><rprime|''>=n<rsub|s>>. We have two solutions. The first one
  is given by <math|n<rsub|s><rsup|<around|(|1|)>>=0> and the second by

  <\equation>
    n<rsub|s><rsup|<around|(|2|)>>=<frac|1|f>*<around*|(|N*<around|(|1+f|)>-1-<frac|N-1|Q<rsub|s>>|)>.<label|ns2>
  </equation>

  <math|n<rsub|s><rsup|<around|(|2|)>>> is greater than zero if

  <\equation*>
    Q<rsub|s>\<gtr\><frac|1|1+f<frac|N|N-1>>=<frac|1|1+f>+O<around*|(|<frac|1|N>|)>
  </equation*>

  and in such a case <math|n<rsub|s>=n<rsub|s><rsup|<around|(|2|)>>> is a
  sink and <math|n<rsub|s>=0> an unstable equilibrium, since the function
  <math|n<rsub|s><rprime|''>-n<rsub|s>> is positive for
  <math|n<rsub|s>\<in\><around|(|0,n<rsub|s><rsup|<around|(|2|)>>|)>> and
  negative for <math|n<rsub|s>\<in\><around|(|n<rsub|s><rsup|<around|(|2|)>>,N|)>>,
  as shown in figure <reference|dinamica>. If

  <\equation*>
    Q<rsub|s>\<less\><frac|1|1+f<frac|N|N-1>>,
  </equation*>

  then there is only a sink in <math|n<rsub|s>=0>. Hence, the error threshold
  is given by

  <\equation>
    <label|qt><wide|Q|\<bar\>>=1-<wide|P|\<bar\>>=<frac|1|1+f<frac|N|N-1>>
  </equation>

  Neglecting the <math|O*<around|(|1/N|)>> corrections, we obtain:

  <\equation>
    <label|errort><wide|P|\<bar\>>=<frac|f|1+f>
  </equation>

  This is the same result that one gets from the Eigen model when considering
  the single peak fitness landscape with <math|A<rsub|1>=<around|(|1+f|)>*<around|(|A<rsub|2>-D<rsub|2>+D<rsub|1>|)>>
  (see equation (<reference|Pt>) and also <cite|Nowak>).

  With the previous argument we have shown that, after infinitely many
  generations, the best occupied performance class must satisfy
  <math|Q<rsub|s>\<gtr\><wide|Q|\<bar\>>> namely that <math|n<rsub|j>=0> for
  all <math|j> such that <math|Q<rsub|j>\<less\><wide|Q|\<bar\>>>. We can now
  show that actually the index <math|s> is actually the largest possible one,
  namely that there is no class <math|i> such that
  <math|Q<rsub|s>\<gtr\>Q<rsub|i>\<gtr\><wide|Q|\<bar\>>>. Indeed, let us
  suppose that <math|g<rsub|i+1,i>\<neq\>0\<forall\>i=1,\<ldots\>,M-1>, then
  if at a given generation, the <math|i>th performance class is populated,
  while the <math|i+1>th is empty, then at the following generation we will
  have

  <eqnarray|<tformat|<table|<row|<cell|n<rsub|i+1><rprime|''>>|<cell|=>|<cell|<big|sum><rsub|l\<leq\>i><around|(|1-Q<rsub|l>|)>*g<rsub|i+1,l>*n<rsub|l>*<around*|[|1+<frac|f|N-1>*<around*|(|<big|sum><rsub|j\<less\>l>n<rsub|j>-<big|sum><rsub|j\<gtr\>l>n<rsub|j>|)>|]>>>|<row|<cell|>|<cell|\<geq\>>|<cell|<around|(|1-Q<rsub|i>|)>*g<rsub|i+1,i>*n<rsub|i>*<around*|[|1+<frac|f|N-1>*<big|sum><rsub|j\<less\>i>n<rsub|j>|]>\<gtr\>0<eq-number><label|riempi>>>>>>

  So, a fraction of the population (possibly very small) will filtrate
  progressively into higher performance classes. This process will continue
  until the last performance class <math|M> or a performance class <math|s>
  such that <math|Q<rsub|s>\<gtr\><wide|Q|\<bar\>>>,
  <math|Q<rsub|i>\<less\><wide|Q|\<bar\>>> if <math|i\<gtr\>s> will be
  reached. Then the asymptotic occupation number of this class will be given
  by equation (<reference|nss>). A certain number of observations about this
  result are in order. First, let us notice that according to equation
  (<reference|riempi>), the <math|s+1>th performance class will be populated
  at each generation by mutants of the <math|s>th one. As we said, if
  <math|g<rsub|s+1,s>> is small, this number will be a tiny fraction of
  <math|n<rsub|s>> and we can neglect it (as we did in equation
  (<reference|nss>)). So, what we have really shown is that the <math|s>th
  performance class is the last one that will have a significative occupation
  number. The actual value of this number will depend mainly on how near
  <math|Q<rsub|s>> is to <math|<wide|Q|\<bar\>>>.

  The second argument to keep into account is that the time to populate the
  <math|s>th class could be astronomical and will depend on the values of
  <math|g<rsub|i*j>>, <math|Q<rsub|i>> and <math|f>. In particular, to keep
  it reasonable, <math|g<rsub|i+1,i>> and <math|f> must not be exceedingly
  small. It is also necessary that for <math|i\<less\>s> the fidelity rates
  <math|Q<rsub|i>> are not smaller than or too near to
  <math|<wide|Q|\<bar\>>>. A natural assumption avoiding this occurrence is
  that <math|Q<rsub|i>> is a monotonically decreasing function of <math|i>.

  As an illustrative example, we show in figure <reference|simul> the results
  of a numerical simulation of the discrete system (<reference|nprimo>),
  (<reference|nsecondo>) with the following choices of the parameters:
  <math|M=40>, <math|N=100>, <math|f=10<rsup|-3>>,

  <\equation*>
    g<rsub|i*j>=<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|2|2|cell-halign|l>|<cwith|1|-1|2|2|cell-rborder|0ln>|<table|<row|<cell|1-10<rsup|-6>,>|<cell|<space|2em><text|if
    >i=j=1*<text|or >j-i=1>>|<row|<cell|10<rsup|-6>,>|<cell|<space|2em><text|if
    >i=j=40*<text|or >i-j=1>>|<row|<cell|0,>|<cell|<space|2em><text|otherwise>,>>>>>|\<nobracket\>>
  </equation*>

  <\equation>
    Q<rsub|i>=<around|(|1-10<rsup|-5>|)><rsup|<sqrt|i<rsup|3>>>,<space|2em>i=1,\<ldots\>,40.<label|quali>
  </equation>

  and with all the <math|100> individuals in the first performance class as
  an initial state. Needless to say that this choice of the parameters does
  not pretend to have any degree of biological realism. The green points show
  the occupation numbers of the <math|40> performance classes obtained after
  <math|2\<cdot\>10<rsup|6>> generations, while the red line connect those
  obtained after <math|10<rsup|6>> generations. The maximum difference
  between the occupation numbers of the same performance class at
  <math|2\<cdot\>10<rsup|6>> and <math|10<rsup|6>> generations is
  <math|5\<cdot\>10<rsup|-5>> and, consequently, cannot be detected. This
  means that the population had almost reached its stable state after
  <math|10<rsup|6>> generations. The error catastrophe does occur when the
  fidelity rate (<reference|quali>) is less than the fidelity threshold
  (<reference|qt>). With the above choices, we have
  <math|Q<rsub|i>\<gtr\><wide|Q|\<bar\>>> for <math|i=1,\<ldots\>,21> and
  <math|Q<rsub|i>\<less\><wide|Q|\<bar\>>> for <math|i=22,\<ldots\>,40>.
  According to the above, we expect that, in the asymptotic limit, the
  performance classes from the <math|22>nd on, should be empty, while
  equation (<reference|ns2>) predicts that the <math|21>st one should be
  occupied by <math|4.682> individuals. At the end of the simulation, the
  number of individuals in the <math|21>st performance class is <math|4.686>
  while those in the <math|22>nd are <math|2.02\<cdot\>10<rsup|-4>> and they
  go progressively decreasing, by approximately <math|5> orders of magnitude
  per performance class, while the performance class increases.

  <paragraph|The TMs critical number of coding states> We made two hypotheses
  in our deterministic model to find the value of the error threshold
  (<reference|errort>). The first hypothesis is that
  <math|g<rsub|i*j>\<ll\>1> if <math|i\<gtr\>j>, that is, that favorable
  mutations are extremely rare. This is a natural assumption in our TMs
  model, because very often the mutations induce a big change in the output
  tape that have a very small probability of being favorable. Moreover, in
  the next section, we will develop a stochastic model based on the same
  assumption, and we will see (figure (<reference|ext1>)) that there is good
  agreement between the prediction of this model and the observed results.
  The second relevant assumption to compute the error threshold
  (<reference|errort>) is that the individuals belonging to the best
  performance class <math|s> have the same fidelity rate <math|Q<rsub|s>>. We
  will make the further assumption that, for TMs, the probability that a
  mutation occurring in a coding triplet is neutral is also negligible. Then,
  the fidelity rate of a TM with <math|N<rsub|c>> coding triplets is given by

  <\equation>
    Q=<around*|(|1-<pmm>|)><rsup|3*N<rsub|c>>,<label|qualityTM>
  </equation>

  since mutations occurring in non-coding triplets are, by definition,
  neutral. It follows that, for a given value of <math|<pmm>>, the fidelity
  rate is determined only by the number <math|<nc>> of coding triplets of the
  TM. The assumption that the best performing TMs have the same fidelity rate
  is therefore equivalent to the assumption that they have the same number of
  coding triplets. Figure <reference|sigmaNc> shows that this assumption is
  very near to the truth when considered for a given run (that is what we
  really need). However, notice that the relation among <math|s> and
  <math|Q<rsub|s>> varies considerably among different runs. This is
  particularly evident in figure <reference|correlation>, where the number of
  coding triplets associated with the performance scores of <math|47> and
  <math|48> exhibits a more than two-fold variation.

  Having established that the hypotheses under which we have obtained the
  error threshold (<reference|errort>) are accurate for our model, we can use
  it to determine the maximum allowed number of coding triplets for a TM. In
  the case of our model, <math|f=1>, so that equation (<reference|errort>)
  give us the error threshold at <math|<wide|P|\<bar\>>=1/2>. The mutation
  probability for a TM with <math|<nc>> coding triplets is given by:

  <\equation>
    P=1-<around|(|1-<pmm>|)><rsup|3<nc>>.<label|mutP>
  </equation>

  By equating (<reference|mutP>) to the error threshold, we get the critical
  number of coding states for the TMs:

  <\equation>
    <nc><rsup|\<ast\>>=-<frac|ln <around|(|2|)>|3*ln
    <around|(|1-<pmm>|)>><label|Ncrit>
  </equation>

  This expression is represented by the thick black line in
  figure<nbsp><reference|nclimit>. The ultimate fate of TMs with a number of
  coding states larger than <math|<nc><rsup|\<ast\>>>, according to our
  deterministic model, will be the extinction.

  <subsection|The stochastic model><label|stocha>

  In this section we will keep into account the stochastic effects in our
  mutation and selection procedures.

  We recall that the constant population size <math|N> must be an even
  number. We will introduce a stochastic model for the evolution of only the
  number <math|n<rsub|s>> of individuals with the best performance value. Let
  us consider separately the selection and mutation steps.

  <paragraph|The selection step> Since we are interested in the evolution of
  the number <math|n<rsub|s>> of the best individuals only, we can put all
  the remaining <math|N-n<rsub|s>> individuals into the same class. Let us
  denote with the symbol \P<math|1>\Q the individuals of the best performance
  class and with the symbol \P<math|0>\Q all the others. We will denote by
  <math|n<rsub|s><rprime|'>> the number of individuals in the highest
  performance class in the new population. <math|n<rsub|s><rprime|'>> will be
  determined by the number of pairs <math|11>, <math|10> and <math|00> that
  we will get extracting random pairs without replacement from the old
  population. Let us denote by <math|k> the number of <math|11> pairs, by
  <math|l> the number of <math|10> pairs and by <math|m> the number of
  <math|00> pairs. As a consequence we will have

  <\equation>
    n<rsub|s>=2*k+l*<space|2em>n<rsub|s>*'=2*<around|(|k+l|)>*<space|2em>2*<around|(|k+l+m|)>=N
  </equation>

  The probability that we get <math|n<rsub|s><rprime|'>=2*<around|(|k+l|)>>
  individuals into the best class when applying the selection step to a
  population with <math|n<rsub|s>=2*k+l> individuals into the best class, is
  given by the probability that we extract <math|k> <math|11> pairs, <math|l>
  <math|10> pairs and <math|m> <math|00> pairs from a set containing
  <math|2*k+l> ones and <math|l+2*m> zeroes. This probability is given by:

  <\equation*>
    P<around|(|<wide|<around|(|11|)>*\<ldots\><around|(|11|)>|\<wide-overbrace\>><rsup|k><wide|<around|(|10|)>*\<ldots\><around|(|10|)>|\<wide-overbrace\>><rsup|l><wide|<around|(|00|)>*\<ldots\><around|(|00|)>|\<wide-overbrace\>><rsup|m>|)>=2<rsup|l><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|k+l+m>>|<row|<cell|k>>>>>|)><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|l+m>>|<row|<cell|l>>>>>|)><around*|/|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|2*<around|(|k+l+m|)>>>|<row|<cell|2*k+l>>>>>|)>|\<nobracket\>>
  </equation*>

  Indeed, the <math|2<rsup|l>> term keeps into account that the <math|l>
  pairs <math|10> can be obtained extracting the <math|1> before the <math|0>
  or vice versa. The term

  <\equation*>
    <around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|k+l+m>>|<row|<cell|k>>>>>|)>
  </equation*>

  gives the number of possible distributions of the <math|k> <math|11> pairs
  inside the <math|k+l+m> total pairs. The term

  <\equation*>
    <around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|l+m>>|<row|<cell|l>>>>>|)>
  </equation*>

  gives the number of possible distributions of the <math|l> <math|10> pairs
  inside the remaining <math|l+m> pairs. Finally,

  <\equation*>
    <around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|2*<around|(|k+l+m|)>>>|<row|<cell|2*k+l>>>>>|)>
  </equation*>

  is the number of possible distributions of the <math|2*k+l> <math|1>
  symbols in the <math|2*<around|(|k+l+m|)>=N> possible places.

  Let us notice that

  <\itemize>
    <item><math|n<rsub|s><rprime|'>> is always even,

    <item><math|n<rsub|s><rprime|'>\<geq\>n<rsub|s>>,

    <item><math|n<rsub|s><rprime|'>\<leq\>2*n<rsub|s>>.
  </itemize>

  If we fix <math|n<rsub|s>> and <math|n<rsub|s><rprime|'>> satisfying the
  above constraints, we can obtain <math|k> and <math|l> as a function of
  <math|n<rsub|s>> and <math|n<rsub|s><rprime|'>>:

  <\equation*>
    k=<frac|2*n<rsub|s>-n<rsub|s><rprime|'>|2>*<space|2em>l=n<rsub|s><rprime|'>-n<rsub|s>
  </equation*>

  Since the total population is fixed to <math|N> we have:

  <\equation*>
    2*<around|(|k+l+m|)>=N<space|1em>\<Longrightarrow\><space|1em>m=<frac|N-n<rsub|s><rprime|'>|2>
  </equation*>

  Hence, the probability of getting <math|n<rsub|s><rprime|'>> individuals
  into the best performance class after applying the selection procedure to a
  population with <math|n<rsub|s>> individuals into the best performance
  class is:

  <\equation*>
    P<rsub|r*i*p>*<around|(|n<rsub|s>\<to\>n<rsub|s><rprime|'>|)>=<around*|{|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|0<space|2em><math-up|if>n<rsub|s><rprime|'>\<less\>n<rsub|s><math-up|or>n<rsub|s><rprime|'>\<gtr\><math-up|min><around|(|2*n<rsub|s>,N|)>>>|<row|<cell|0<space|2em><math-up|if>n<rsub|s><rprime|'><math-up|odd>>>|<row|<cell|2<rsup|<around|(|n<rsub|s><rprime|'>-n<rsub|s>|)>><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<frac|N|2>>>|<row|<cell|<frac|2*n<rsub|s>-n<rsub|s><rprime|'>|2>>>>>>|)><around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|<frac|N-2*n<rsub|s>+n<rsub|s><rprime|'>|2>>>|<row|<cell|n<rsub|s><rprime|'>-n<rsub|s>>>>>>|)><around*|/|<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|N>>|<row|<cell|n<rsub|s>>>>>>|)>|\<nobracket\>><space|2em><math-up|otherwise>.>>>>>|\<nobracket\>>
  </equation*>

  <paragraph|The mutation step> Let us introduce mutation into the model. We
  will follow to use the two simplifying assumptions that we used for the
  deterministic model, namely:

  <\enumerate>
    <item>TMs in the best performance class have the same number of coding
    triplets <math|<nc>>.

    <item>Mutations in coding triplets are (almost) always deleterious.
  </enumerate>

  By definition, mutations in non-coding triplets are neutral.

  Under this assumptions if <math|n<rsub|s><rprime|'>> is the number of best
  individuals before mutation, the probability of getting
  <math|n<rsub|s><rprime|''>=n<rsub|s><rprime|'>-k> individuals after the
  mutation step is given by

  <\equation*>
    P<rsub|m*u*t>*<around|(|n<rsub|s><rprime|'>\<to\>n<rsub|s><rprime|''>=n<rsub|s><rprime|'>-k|)>=<around*|(|<tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|n<rsub|s><rprime|'>>>|<row|<cell|k>>>>>|)>*P<rsup|k>*<around|(|1-P|)><rsup|n<rprime|'><rsub|s>-k>,
  </equation*>

  where we denoted by <math|P> the probability that an individual in the best
  performance class will undergo at least one mutation into a coding triplet.

  <\equation*>
    P=1-<around|(|1-<pmm>|)><rsup|3<nc>>
  </equation*>

  <paragraph|The Markov matrix> If the total population <math|N> is finite,
  under our assumption the best individuals will always go extinct in a
  finite time. The expected number of generations <math|\<tau\>> before it
  happens can be computed using the Markov matrix <math|M> of the process
  <cite|AMS>. The entries <math|M<rsub|i*j>> of the Markov matrix give the
  probability that the system under scrutiny pass from its <math|i>th state
  to the <math|j>th one. In our case the state of the system is labeled by
  the number <math|n<rsub|s>> of individuals into the best performance class
  and the entries of <math|M> will be given by

  <\equation*>
    M<rsub|n<rsub|s>+1,n<rsub|s><rprime|''>+1>=<big|sum><rsub|n<rsub|s><rprime|'>=0><rsup|N>P<rsub|r*i*p>*<around|(|n<rsub|s>\<to\>n<rsub|s><rprime|'>|)>*P<rsub|m*u*t>*<around|(|n<rsub|s><rprime|'>\<to\>n<rsub|s><rprime|''>|)>,<space|2em>n<rsub|s>,n<rsub|s><rprime|''>=0,\<ldots\>,N
  </equation*>

  The state <math|n<rsub|s><rprime|''>=0> will be an absorbing state for
  <math|M> and the procedure to compute the expected number of generations
  <math|\<tau\>> for reaching it, works as follows. Let <math|S> be the
  matrix that one obtains by removing the first row and the first column
  corresponding to the only absorbing state and let
  <math|<with|font-series|bold|c>> be a <math|N->dimensional vector whose
  entries are all one. The matrix <math|\<bbb-I\>-S>, where <math|\<bbb-I\>>
  denotes the identity matrix, is invertible. If the Markov process begins in
  the state <math|i>, then the expected number of generations before
  extinction will be given by:

  <\equation>
    \<tau\>=<around*|[|<around|(|\<bbb-I\>-S|)><rsup|-1>*<with|font-series|bold|c>|]><rsub|i><space|0.17em>.<label|eqt>
  </equation>

  Let us stress that the equation (<reference|eqt>) is obtained by assuming
  that the system evolves for an infinite number of generations. The expected
  extinction times versus the number of coding triplets are plotted in figure
  <reference|ext1> for <math|5> different values of the mutation probability.

  <subsection|Simulations settings>

  In this subsection we introduce the parameter values that we adopted in our
  computer simulations.

  We chose the goal tape containing the binary expression of the decimal part
  of <math|\<pi\>> (the dots are just a useful separator):

  <\equation*>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-halign|l>|<cwith|1|-1|1|1|cell-lborder|0ln>|<cwith|1|-1|1|1|cell-rborder|0ln>|<table|<row|<cell|0010010000.1111110110.1010100010.0010000101.1010001100.0010001101.0011000100.1100011001.>>|<row|<cell|1000101000.1011100000.0011011100.0001110011.0100010010.1001000000.1001001110.0000100010.>>|<row|<cell|0010100110.0111110011.0001110100.0000001000.0010111011.1110101001.1000111011.0001001110.>>|<row|<cell|0110110010.0010010100.0101001010.0000100001.1110011000.1110001101.>>>>>
  </equation*>

  As a consequence, the maximum possible performance value is <math|125>. We
  performed simulations with the following (approximate) values of the
  states-increase rate <math|<pii>> and point mutation probability
  <math|<pmm>>:

  <eqnarray*|<tformat|<table|<row|<cell|<pii>>|<cell|\<in\>>|<cell|<around*|{|9.26\<cdot\>10<rsup|-5><space|0.17em>;1.66\<cdot\>10<rsup|-4><space|0.17em>;3.00\<cdot\>10<rsup|-4><space|0.17em>;5.40\<cdot\>10<rsup|-4><space|0.17em>;9.72\<cdot\>10<rsup|-4><space|0.17em>;1.75\<cdot\>10<rsup|-3><space|0.17em>;|\<nobracket\>>>>|<row|<cell|>|<cell|>|<cell|<around*|\<nobracket\>|3.14\<cdot\>10<rsup|-3><space|0.17em>;5.68\<cdot\>10<rsup|-3><space|0.17em>;1.02\<cdot\>10<rsup|-2><space|0.17em>;1.85\<cdot\>10<rsup|-2><space|0.17em>;3.33\<cdot\>10<rsup|-2>;6.00\<cdot\>10<rsup|-2><space|0.17em>;|\<nobracket\>>>>|<row|<cell|>|<cell|>|<cell|<around*|\<nobracket\>|1.08\<cdot\>10<rsup|-1><space|0.17em>;1.95\<cdot\>10<rsup|-1><space|0.17em>;3.51\<cdot\>10<rsup|-1><space|0.17em>;6.33\<cdot\>10<rsup|-1><space|0.17em>;1.14|}><space|0.17em>.>>|<row|<cell|<pmm>>|<cell|\<in\>>|<cell|<around*|{|4.91\<cdot\>10<rsup|-5><space|0.17em>;8.10\<cdot\>10<rsup|-5><space|0.17em>;1.34\<cdot\>10<rsup|-4><space|0.17em>;2.21\<cdot\>10<rsup|-4><space|0.17em>;3.64\<cdot\>10<rsup|-4><space|0.17em>;6.01\<cdot\>10<rsup|-4><space|0.17em>;|\<nobracket\>>>>|<row|<cell|>|<cell|>|<cell|<around*|\<nobracket\>|9.91\<cdot\>10<rsup|-4><space|0.17em>;1.64\<cdot\>10<rsup|-3><space|0.17em>;2.70\<cdot\>10<rsup|-3><space|0.17em>;4.44\<cdot\>10<rsup|-3><space|0.17em>;7.35\<cdot\>10<rsup|-3><space|0.17em>|}><space|0.17em>.>>>>>

  These values have been chosen in such a way that consecutive ones have a
  constant ratio. For any pair of values <math|<pii>,<pmm>>, we performed
  <math|20> simulations varying the initial seed of the C native random
  number generator, for a total of <math|3740> runs. Each simulation lasted
  <math|50000> generations.

  <section|Results><label|results>

  <subsection|Performance, coding triplets and mutation probabilities>

  In this subsection we analyze how the performance and the number of coding
  triplets of the best performing machines vary with the different values of
  the mutation and states-increase rates.

  In figure <reference|performance> we plot the best performance value
  obtained in the population at the last generation (averaged on the
  different choices of the seed) versus the state-increase rate <math|<pii>>
  and the mutation probability <math|<pmm>>. The maximum performance value of
  <math|50.6> is obtained for the maximum value of <math|<pii>>,
  <math|<pii>\<simeq\>1.14> (see figure <reference|performance>.c) and an
  intermediate value of <math|<pmm>>, <math|<pmm>\<simeq\>3.64\<cdot\>10<rsup|-4>>
  (see figure <reference|performance>.d). In figure <reference|esoni>, we
  show the number of coding triplets <math|<bnc>> (averaged on the best
  performing machines at the last generation and on the seeds), versus
  <math|<pii>> and <math|<pmm>>. Again the maximum value <math|<bnc>=333.9>
  is obtained for exactly the same values of <math|<pii>\<simeq\>1.14> and
  <math|<pmm>\<simeq\>3.64\<cdot\>10<rsup|-4>>. This fact and the similarity
  between figure <reference|performance>.a and <reference|esoni> suggest a
  strong correlation between the performance and the number of coding
  triplets. Indeed, the correlation coefficient between them is <math|r=0.95>
  (see also figures <reference|correlation>, <reference|doppio>,
  <reference|correlazione>). The fact that the maximum performance occurs for
  an intermediate value of <math|<pmm>> is also partially due to this
  correlation. Indeed, if the mutation probability is too low, there is no
  enough variability among the TMs for selection to work on, while when the
  mutation probability is too high, the error threshold exerts a strong
  limiting action on the maximum number of coding triplets. This latter
  effect is clearly visible in figures <reference|correlation> and
  <reference|nclimit>, both taken after 50000 generations. Indeed, in figure
  <reference|correlation>, the abscissa positions seldom exceed the
  corresponding vertical lines at <math|<nc><rsup|\<ast\>>>. The presence of
  the error threshold also affects the trend of the performance with the
  generations. Indeed, when both <math|<pmm>> and <math|<pii>> are large, the
  TMs approach very early the maximum number of coding triplets. From that
  moment on, further accumulation of coding triplets is strongly opposed by
  mutation and selection. This leads to a saturation in the performance and
  in the number of coding triplets that is clearly visible in the plateau of
  figure <reference|doppio>, (b) and (d). Notice that this plateau effect is
  not present when <math|<pii>> is small (figure <reference|doppio>, (a) and
  (c)). This behaviour suggests that an adaptative choice for the mutation
  probability could maximize the speed of evolution. Indeed, one could start
  with an high mutation rate in the first generations to increase the
  variability, progressively diminishing it when the number of coding
  triplets increase to reduce the limiting effect due to the error threshold.
  We presented a proposal for the optimal adaptative mutation probability for
  this model in <cite|Ideal>.

  Figure <reference|correlazione> shows, on a log-log scale, the relation
  between <math|<bnc>> and <math|<pii>>. The straight line of linear
  regression has been evaluated in the range
  <math|<pii>\<leq\>3.33\<cdot\>10<rsup|-2>>. The reason is that this range
  corresponds to the one considered in <cite|PRE>, allowing us to compare the
  two results. Moreover, it is clear that the linear regime does not hold for
  large values of <math|<pii>>, for which we observe a saturation effect. The
  regression gives the relation

  <eqnarray|<tformat|<table|<row|<cell|<bnc>=7.3\<cdot\>10<rsup|2><pii><rsup|0.46>,>|<cell|>|<cell|<text|present
  simulations,>>>|<row|<cell|<bnc>=2.5\<cdot\>10<rsup|3><pii><rsup|0.53>,>|<cell|>|<cell|<text|paper
  <cite|PRE>,><eq-number><label|scala>>>>>>

  both exponents being close to <math|<frac|1|2>>. Now, if <math|<bnt>> is
  the total number of states, its expected value is

  <\equation>
    <bnt>=50000\<cdot\><pii>+1<space|1em>\<Rightarrow\><space|1em><pii>\<simeq\><frac|<bnt>|50000>,
  </equation>

  (this is true in absence of selection but, as discussed in <cite|PRE>, it
  remains approximately true even with selection, except for very small
  values of <math|<pii>>) so that

  <\equation>
    <frac|<bnc>|<bnt>>\<propto\><frac|1|<sqrt|<pii>>>
  </equation>

  This means that the fraction of coding triplets on the total will decrease
  when <math|<pii>> increases (strictly speaking, this analysis holds only
  for the linear regime however it is clear that, for larger values of
  <math|<pii>>, the plateau of figure <reference|correlazione> corresponds to
  an amplification of this effect).

  As in our previous paper <cite|PRE>, we observe that the maximum
  performance is obtained for the maximum value of <math|<pii>>. However,
  this time, the trend of the performance with <math|<pii>> is not strictly
  monotonically increasing, since, as shown in figure
  <reference|performance>.c, there is a plateau for high values of
  <math|<pii>>. Notice that this plateau corresponds exactly to the region in
  figure <reference|correlazione> where the number of coding triplets reaches
  a saturation, so that this effect also is due to the presence of the error
  threshold. For various reasons, explained in <cite|PRE>, we believe that
  the performance should exhibit a maximum for a finite value of <math|<pii>>
  (this is one of the reasons that led us to increase upward the range of
  variation of <math|<pii>>). Unfortunately, it seems that if this maximum
  exists, it lies outside of the range of values of <math|<pii>> that we
  selected. Finally, it is interesting to compare figure
  <reference|performance>.c restricted to the range of <math|<pii>> values
  considered in <cite|PRE> with the figure 3.c of <cite|PRE>, that we
  reproduce here (see figure <reference|comparison>). Despite the fact that
  we changed the number of generations (they were <math|200000> in
  <cite|PRE>), the way the head can move on the tape (in <cite|PRE> it could
  also stay still) and the topology of the tape (in <cite|PRE> it was not
  periodic), the two profiles are very similar. This means that the
  dependence of the performance on <math|<pii>> is very robust in this model.
  That's not the case of the dependence of the performance on <math|<pmm>>,
  that is influenced, for example, by the choice of the number of
  generations.

  The trend of the performance with <math|<pii>> is interesting because it
  suggests that, in our model, the presence of inactive and free to mutate
  code strongly improves the evolvability of our populations.

  <subsection|Extinction times>

  In figure <reference|ext1>, we plot the base <math|10> logarithm of
  <math|\<tau\>>, the expected number of generations before extinction given
  by equation (<reference|eqt>) versus the number of coding triplets (red
  line), and superimpose the data obtained from our simulations (blue
  points). The red line is obtained numerically starting with a unique
  individual in the best performance class (<math|i=1> in (<reference|eqt>))
  through the Markov matrix of the process, as explained in the previous
  section. The blue points give the observed value of <math|\<tau\>> averaged
  over bins through the following procedure. First, the whole range of the
  number of coding triplets is divided into bins. The size of the bin is
  different for the different values of the mutation probability <math|<pmm>>
  and is given by the smallest integer greater than or equal to the critical
  value for the number of coding triplets (see eq. (<reference|Ncrit>))
  divided by <math|40>. It is necessary to introduce bins because otherwise,
  especially when <math|<nc><rsup|\<ast\>>> is large, there are too few
  extinction events associated with any value of the coding states to give
  raise to an also minimal statistics. On the other hand, one has to avoid
  that the bin size is so large that the expected number of generations
  before extinction varies considerably inside it. It seems that dividing
  <math|<nc><rsup|\<ast\>>> by <math|40> is a good choice for the bin size.

  Now, let us suppose that at a given generation <math|\<tau\><rprime|'>>,
  the data register a drop in the maximum performance value, then we go back
  to the generation <math|\<tau\>> when this performance value appeared and
  we count the number of TMs scored with it. If this number is exactly two,
  then we register the extinction time <math|\<tau\><rprime|'>-\<tau\>> and
  increment the number of extinction events registered in the bin containing
  the number of coding states of the two TMs at the generation
  <math|\<tau\>>. The discrepancy between the fact that we compare the
  extinction data obtained starting with two individuals in the best
  performance class with those obtained from the Markov process starting with
  a unique individual is due to the fact that our program registers the data
  after the selection step, when the best performing individual has already
  made a copy of itself. We are neglecting the quite improbable case that
  after mutation two new best performing individuals (with the same
  performance) do emerge and they are extracted as a pair in the subsequent
  selection step. If the number of extinction events registered for a bin is
  greater than or equal to <math|5>, then a blue point is plotted with an
  <math|x> coordinate equal to the center of the bin and an <math|y>
  coordinate equal to the mean of all the registered times of extinction. The
  requirement to have at least <math|5> extinction events in each bin is due
  to the fact that the expected number of generations before extinction
  corresponds to the mean over an infinite number of extinction events. This
  pushes us to select a minimum number of extinction events as large as
  possible, to reduce the stochastic noise. On the other hand, if we choose a
  too large number, then we get too few points from our data. Again, to fix
  the minimum number of extinction events per bin to <math|5> seemed to us a
  reasonable compromise.

  The figure <reference|ext1> corresponds to the <math|5> largest values of
  the mutation probability <math|<pmm>> considered in the simulations. For
  smaller values of <math|<pmm>>, too few \Pexperimental\Q points are
  obtained. We see that the agreement between the theoretical model and the
  simulation data is extremely good from large values of <math|<nc>> to the
  peak of the blue points, that occurs when the expected number of
  generations before extinction is near <math|100>. On the left of such peak,
  the agreement is completely lost and a peculiar monotonic growth of
  <math|\<tau\>> appears instead. The main reason is that we run our
  simulations for <math|50000> generations, while the theoretical model
  assumes an infinite number of them. This implies that the agreement between
  the data and the theoretical model will be good until when <math|50000>
  generations is a good approximation to <math|\<infty\>>, that is when
  <math|50000> generations is much larger than the expected number of
  generations before extinction. Clearly, when this latter number increases,
  the approximation is doomed to worsen. Indeed, when the theoretical model
  predicts that the expected number of generations before extinction is
  larger than <math|50000> (around <math|y=4.7> in our graphs), the agreement
  between the model and the simulations is impossible.

  We suggest two possible mechanisms to explain why in the region where there
  is no agreement with the theoretical model, the observed extinction times
  increase while increasing the number of coding triplets. The first
  mechanism is that, in this region, there is a relatively high probability
  of extinction when the best TMs have just emerged. Indeed, we know that at
  the beginning there will be only two TMs in the best performance class
  (otherwise we do not register the data, as we explained above). If both TMs
  undergo a mutation (in a coding triplet) in the next mutation phase, then
  they will most probably go extinct. On the other hand, if they start to
  spread into the population, then extinction becomes more and more
  improbable and, on consequence, the time to wait to observe it largely
  increases. In these cases, the cut to <math|50000> generations will throw
  away a considerable portion of the extinction probability distribution,
  with the effect of amplifying the weight of the probabilities before that
  generation. This effect is clearly visible in figure <reference|estinzione>
  where we plotted for <math|<pmm>=4.44\<cdot\>10<rsup|-3>> and five
  different values of <math|<nc>>, the extinction probability distribution
  renormalized to <math|1> in the range between <math|1> and <math|32768>
  generations (this number is dictated by computational reasons). We see that
  the relative probability of observing an extinction event in the first few
  generations decreases with <math|<nc>> below <math|<nc><rsup|\<ast\>>> and
  increases after, in accordance with figure <reference|ext1>.

  Another mechanism can amplify this effect. Indeed, if the TMs extinction
  time is large, the probability that an increase in the performance value
  will occur does also increase. In such a case, the extinction of the
  original TMs simply will not be registered, creating a bias toward short
  extinction times.

  <subsection|The route to the error threshold through punctuated equilibria>

  Let us notice that all the possible output tapes (and consequently all the
  possible performance scores) can be obtained with a <math|300> states TM
  with <math|300> coding triplets and running for <math|300> time steps.
  Indeed, let us denote with <math|o> the desired output tape and with
  <math|o<rsub|i>> the entry of its <math|i>th cell, then the following
  <math|300> states TM will produce it:

  <\equation*>
    <tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|<with|font-series|bold|i>>>|<row|<cell|0>|<cell|o<rsub|i>-<text|Right>-<text|<with|font-series|bold|i+1>>>>|<row|<cell|1>|<cell|\<ast\>-\<ast\>-<space|0.17em><with|font-series|bold|\<ast\>><space|0.17em>>>>>><space|1em><with|font-series|bold|i>=1,\<ldots\>,299,<space|2em><tabular*|<tformat|<cwith|1|-1|1|1|cell-lborder|1ln>|<cwith|1|-1|1|1|cell-halign|c>|<cwith|1|-1|1|1|cell-rborder|1ln>|<cwith|1|-1|2|2|cell-halign|c>|<cwith|1|-1|2|2|cell-rborder|1ln>|<cwith|1|1|1|-1|cell-tborder|1ln>|<cwith|1|1|1|-1|cell-bborder|1ln>|<cwith|2|2|1|-1|cell-bborder|1ln>|<cwith|3|3|1|-1|cell-bborder|1ln>|<table|<row|<cell|>|<cell|<with|font-series|bold|300>>>|<row|<cell|0>|<cell|o<rsub|300>-\<ast\>-<text|<with|font-series|bold|Halt>>>>|<row|<cell|1>|<cell|\<ast\><space|0.17em>-\<ast\>-<space|0.17em><with|font-series|bold|\<ast\>><space|0.17em>>>>>>
  </equation*>

  Here the <math|\<ast\>> symbol means that the corresponding entry of the
  state is irrelevant. Notice also that this is only a possible solution and,
  quite probably, not the shortest one.

  Figure <reference|correlation> shows that, for some of the values of the
  mutation probabilities, TMs can indeed accumulate <math|300> coding
  triplets, or even more, during the <math|50000> generations. So, the TMs
  could, in principle, attain the maximum possible performance value of
  <math|125>. However, the actual maximum performance value obtained in the
  <math|3740> runs of our simulations is <math|70>, quite far from the
  theoretical maximum. This is due to the fact that TMs do not optimize the
  use of coding triplets as it is apparent from figure
  <reference|correlation>. Indeed, if we consider the various TMs that obtain
  a performance value of <math|47> (for example), we see that the number of
  coding triplets spans a range from <math|131> to <math|443>. It is worth
  noticing that by diminishing the mutation probability, the number of coding
  triplets tends to spread and to shift toward larger values. Moreover, even
  if the TMs use many more coding triplets than strictly necessary, we see
  from figure <reference|doppio> that once they approach the error threshold,
  the performance growth with generations slows down considerably, while the
  number of coding triplets remains practically constant. In this way, an
  \Phistorical\Q factor is introduced into the evolutionary dynamics; namely,
  once the TMs have wasted their coding triplets, they need a very large
  amount of time to achieve a more efficient usage.

  The mathematical model developed in the Methods showed that under the
  following hypotheses

  <\enumerate>
    <item><math|Q<rsub|i>\<less\>1\<forall\>i>,

    <item><math|Q<rsub|i>> is a monotonically decreasing function of
    <math|i>,

    <item><math|g<rsub|i+1,i>\<neq\>0\<forall\>i=1,\<ldots\>,M-1>,
  </enumerate>

  the mutation-selection dynamics will always decrease the fidelity rates of
  the evolving organisms (in our case the TMs), until they reach the error
  threshold <math|<wide|Q|\<bar\>>> or the highest performance class
  <math|M>. All of these hypotheses do hold true for our evolutionary model.
  Indeed the first one is implied by the definition (<reference|qualityTM>)
  of the fidelity rate for the TMs, that was also used to calculate the
  extinction times. From (<reference|qualityTM>) and from the fact that the
  performance and the number of coding triplets are positively correlated it
  follows that, on average, the fidelity rate decreases while the performance
  increases. The second hypothesis corresponds to the deterministic limit of
  this effect.

  The third hypothesis states that it is always possible to increase by one
  the performance through mutations. From the simulations we see (figure
  <reference|doppio>) that the performance grows almost linearly with the
  generations until approaching the critical number of coding triplets, thus
  supporting this hypothesis.

  From a theoretical point of view, a performance increase of one can be
  obtained in our model in the following way. Let us suppose that the TM
  stops before reaching the <math|4000> maximum time steps (entering into the
  halt state). Let <math|d> be the distance on the output tape between the
  head position after the TM stopped and the nearest cell that would improve
  the performance score if its value would be changed. The TM can then
  increase its performance by one by adding <math|d> further coding triplets
  that move the machine head on the desired cell (without altering the
  intermediate cells) and change its value. What it is important is that the
  probability for this process to happen is not exceedingly small, so that
  performance increases can be observed within the generation range. The
  above mechanism does not work for non-halting machines. Indeed, since the
  maximum observed number of coding triplets is less than <math|600>,
  non-halting machines have to use several times some subset of them. If we
  introduce a mutation inside a coding triplet belonging to this subset, with
  the aim of modifying the dynamics at a given time step, we cannot predict
  how it will affect the earlier dynamics (notice that, in the previous case,
  we are sure that the triplet calling the Halt state is executed only once).
  So, in the non-halting case, we cannot see any simple recipe to increase
  the performance. In general, mutations inside the above subset of coding
  triplets will probably result in a big change in the output tape and will
  be almost always discarded by selection. This leads to the fact that
  non-halting machines need longer times to improve their performance. We
  analyzed the <math|3740> best performing machines that we registered at the
  end of the <math|50000>th generation and we found that <math|2730> stop by
  calling the halt state, while the remaining <math|1010> stop by exhausting
  the <math|4000> time steps. The former TMs have a better average
  performance (<math|19.57>) compared to the latter ones (<math|11.95>). We
  found that, on average, the distance <math|d> is <math|5.99> for the
  halting TMs and is <math|2.47> for the non-halting TMs. As a reference
  value, the average distance of two adjacent ones on the output tape is
  <math|2.44>. The fact that non-halting machines have, on average, lower
  performance and <math|d> values is consistent with the above analysis.

  Since in our simulations no TM reached the maximum performance of
  <math|125>, our deterministic model predicts that they will accumulate
  coding triplets until reaching the error threshold. However, this model
  assumes an infinite number of generations, while our simulations last
  <math|50000>. We can see from figure <reference|doppio>.c and
  <reference|doppio>.d that, when far from the critical number of coding
  triplets, the TMs do indeed accumulate coding triplets in a steady way,
  that depends on the values of <math|<pmm>> and <math|<pii>>. While for high
  values of these probabilities, <math|50000> generations are enough to reach
  the critical number of coding triplets, they are not for low ones (see
  figures <reference|nclimit>, <reference|doppio>.c, <reference|doppio>.d and
  <reference|correlation>). We conclude that our evolutionary model gives a
  working example of the dynamical behaviour predicted by the deterministic
  model described in the previous section.

  In figure <reference|equilibria>.a we show the time evolution of the higher
  performance score for all values of the state-increase probability
  <math|<pii>> corresponding to a particular choice of the seed of the random
  number generator and <math|<pmm>=3.64\<cdot\>10<rsup|-4>>. We observe the
  dynamical behaviour typical of punctuated equilibria <cite|Punctuated>:
  long periods of stasis and briefs periods of rapid evolution. The same kind
  of evolution is observed also for the other choices of the seeds, so that
  we present figure <reference|equilibria>.a as a representative case. The
  apparent big jumps in the performance in figure <reference|equilibria>.a,
  as that from <math|28> to <math|46> in the boxed region, are simply due to
  the time scale used. A zoom of the boxed region (fig.
  <reference|equilibria>.b) shows that this big jump is really composed by
  <math|14> jumps of one point and <math|2> jumps of two points, occurring in
  a relatively short number of generations. The same is true also for the
  other big performance jumps observed in figure <reference|equilibria>.a.
  Indeed, figure <reference|salti> shows the histogram of the number of
  occurrences of positive performance jumps versus their amplitude.
  Performance jumps of one point (the minimum possible value) are, by far,
  the most common, while performance jumps larger than three are extremely
  infrequent. There is however a single, quite amazing, performance jump of
  <math|12> that in figure <reference|salti> is not visible due to the scale
  used. The mean positive performance jump in our simulations is <math|1.16>.
  This value seems to us sufficiently near to the minimum possible value of
  <math|1>, that we would say that the evolution of the performance in our
  model is essentially gradualistic. Let us stress that while the mechanism
  proposed in <cite|Punctuated> is based on a particular speciation
  mechanisms, in our case this dynamical behaviour is determined only by the
  form of the performance landscape.

  The mean performance jumps for different values of <math|<pmm>> and
  <math|<pii>> fluctuate between <math|1.00> and <math|1.27>, the largest
  value appearing for <math|<pmm>\<simeq\>1.64\<cdot\>10<rsup|-3>> and
  <math|<pii>\<simeq\>3.51\<cdot\>10<rsup|-1>>. Notice that this probability
  values do not coincide with those associated with the largest performance,
  namely <math|<pmm>\<simeq\>3.64\<cdot\>10<rsup|-4>>, <math|<pii>=1.14>,
  whose corresponding mean performance jump is <math|1.13>, that is lower
  than average.

  For the value of the mutation probability considered in figure
  <reference|equilibria>, the TMs stays far from the error threshold and we
  see the typical increasing trend of performance with generations. In figure
  <reference|oscillations> we present the same graph but for a much higher
  value of the mutation probability <math|<pmm>>. In this case, for some
  values of <math|<pii>>, the TMs do indeed reach the error threshold. From
  that moment on a typical oscillatory behaviour around a base performance
  value emerges. There are many performance increases followed by a rapid
  extinction and also performance decreases followed in a short time by back
  mutations.

  <section|Discussion><label|conclusions>

  In this paper we studied, through computer simulations and mathematical
  modeling, the dynamics of an evolutionary model for Turing machines. In the
  mathematical models, by imposing suitable hypotheses on the impact of
  mutations on the performance landscape, we were able to compute the value
  of the error threshold and the expected extinction times for Turing
  machines versus the mutation rate. The agreement between theoretical and
  simulation data (see fig. <reference|ext1>) prove that the hypotheses we
  made are accurate for our model. Our main finding is that evolution pushes
  the TMs towards the error threshold. Again, we substantiated this finding
  through mathematical analysis, by showing that this behaviour is due to the
  mutation and selection mechanisms used and on some further hypotheses
  related to the structure of the performance landscape. Consequently, the
  question of the similarity between TMs and biological organisms is
  irrelevant to address the problem of the biological relevance of this
  finding. What is really relevant is the biological plausibility of the
  mutation-selection mechanisms and of the hypotheses employed. Let us stress
  that, despite this fact, our model still has to be considered as a toy
  model of evolution, so that it contains simplifying (hence necessarily
  unrealistic) hypotheses that makes it mathematically affordable.
  Nevertheless, toy models can give valuable suggestions on mechanisms
  working also in the full, non-simplified system of which they are
  approximations. In the following we will try to discuss the hypotheses we
  made from a biological point of view.

  The main and most relevant approximation is to consider a unique and fixed
  performance landscape. In nature, the existence of different ecological
  niches, the changing environment and the coevolution with other species
  give raise to multiple and ever changing fitness landscapes. While it is
  difficult to estimate how this approximation influences our conclusion, it
  is worth noticing that an ever changing performance landscape makes a
  perfectly fit organism substantially unattainable. By ruling out one of the
  possible end points of evolution, this fact could reinforce our results.

  The deterministic mutation-selection model that we used is completely
  specified by the selection mechanism and by the choices of the parameters
  <math|Q<rsub|j>>, <math|j=1,\<ldots\>,M>, <math|g<rsub|i*j>>,
  <math|i,j=1,\<ldots\>,M> and <math|f>. The values of <math|Q<rsub|j>> and
  <math|g<rsub|i*j>> are related to the mutation mechanisms and to the
  genotype <math|\<to\>> performance mapping, while <math|f> specifies,
  through the tournament selection, the performance <math|\<to\>> fitness
  mapping. Let us first discuss the selection mechanism and the choice of
  <math|f>. First of all, the selection mechanism keeps the total population
  <math|N> constant (soft selection). This is a frequent assumption in
  population genetic models (for instance it is used in the Wright-Fisher
  model and in the Eigen model). From a biological point of view it
  translates in assuming that the population fecundity is always enough to
  keep it to the (constant) carrying capacity <math|N> of the environment.
  The fact that only two individuals are compared at each generation is
  clearly unrealistic from a biological point of view. However, if one
  considers a large number of generations, virtually all individuals will
  have interacted through this pair interactions, so we think that a more
  realistic interaction mechanism would not alter the conclusions. Also the
  assumption that the fitness difference <math|f> depends on the performance
  values of the two individuals only through the signum of their difference
  is not realistic. However, since the conclusion that the population will
  eventually reach the error threshold holds for any <math|f\<gtr\>0>, a more
  realistic choice would not change it, but only affect the population
  distribution in performance classes near the error threshold.

  Regarding the mutation mechanism, we needed three basic assumptions:

  <\enumerate>
    <item>there is no perfect replicator,

    <item>the fidelity rates and the performance classes are negatively
    correlated,

    <item>the probability of improving the performance is never exceedingly
    small.
  </enumerate>

  The first and the third hypotheses seems to us perfectly acceptable from a
  biological point of view. The second assumption deserves a deeper
  discussion. It can be interpreted as saying that the performance of an
  individual is incremented mainly through the addition of coding DNA (here
  we use \Pcoding\Q in the same informatic/algorithmic sense that we used for
  our TMs model, to indicate parts of the genome that influence the
  phenotype; the usual biological meaning would be to indicate protein coding
  sequences), and that this addition increases the probability of undergoing
  a non-neutral mutation.

  The latter statement is quite natural: if, for example, an organism
  increases its performance by converting a piece of junk DNA into a new gene
  (the metabolic cost should be kept into account to evaluate if there is a
  real performance increase), all the mutations that inactivate the new gene
  will be new non-neutral mutations. According to the first part, one has to
  assume that organism improve their adaptation more by increasing their
  coding DNA than by reorganizing it.

  We suggested that the fact that the mutation-selection dynamics pushes the
  evolving organisms towards the error threshold is due to quite mild
  hypotheses and could be a quite robust property of evolutionary systems.
  Indeed, the same phenomenon appears in another artificial evolution
  experiment <cite|Knibbeetal2007b>, where the codification of the evolving
  algorithms and the mutation and selection mechanisms used are completely
  different from ours.

  At the biological level, RNA viruses have error rates (per genome per
  replication) near to one and have been suggested to replicate near the
  error threshold (see for example <cite|Eigen2000> and the references
  therein), in accordance with the behaviour that we observe in our model.
  Indeed, these organisms lack proof-reading mechanisms, so that their
  mutation rates per nucleotide are quite large. Moreover, the necessity to
  escape the immune response produces a selective pressure toward an high
  variability. Notice, however, that this latter effect is not present in our
  evolutionary model, since we considered a static performance landscape. For
  what concerns DNA based organisms, they also have remarkably small
  variations in their mutation rates<cite|Drake>. However, their mutation
  rates are much smaller than those of RNA viruses, of the order of
  <math|1/300> per genome per replication. In <cite|Eigen2000> it has been
  suggested that these almost constant mutation rates could be due to the
  fact that also DNA based organisms do reproduce near the error threshold.
  Their higher fidelity rate could be explained by two factors: a larger
  number of neutrals in DNA sequences and the dissymmetry between the error
  rates of the two daughter DNA double strands (see <cite|Eigen2000>). While
  our results could encourage this explanation, the development of error
  correction mechanisms is not considered in our model. Approaching the error
  threshold surely induces an high selective pressure on the development of
  error correction mechanisms. The short term advantage derived by an
  increase in the reproductive fidelity could be reinforced by the long term
  advantage of an higher evolvability (if the organisms evolvabilities are
  much lower near the error threshold as it happens in our model, see figure
  <reference|doppio>). Maybe, the mutation rates observed for the DNA could
  be due to a balance between the natural trend toward the error threshold,
  due to the mutation-selection dynamics, the need of proof-reading
  mechanisms and their metabolic costs.

  In this paper, we showed that some of the features observed in an
  artificial evolutionary model can have a much more general validity than
  the specific model itself. This happens when the phenomenon under
  consideration is mainly due to the mutation-selection dynamics, so that it
  can be described through a population genetic model. It seems to us that
  this synergistic integration between artificial evolution and population
  genetic model should be pursued, when possible. Since the phenomena
  observed in an artificial evolutionary model can have a quite wide degree
  of generality, we think that they can give interesting suggestions on
  possible evolutionary mechanisms working also at the biological level.

  <section*|Funding>

  This work was partially supported by the Spanish Ministerio de Ciencia e
  Innovacin under grant MTM2007-67389 (with EU-FEDER support), by Junta de
  Castilla y Len (Project GR224) and by UBU-Caja de Burgos (Project K07J0I).

  <thebibliography|99|<bibitem|PRE>Feverati G, Musso F (2008) Evolutionary
  Model for Turing Machines. Phys. Rev. E 77: 061901.<bibitem|Tierra>Ray TS
  (1991) An approach to the synthesis of life. In : Langton, C., C. Taylor,
  J. D. Farmer, S. Rasmussen editors, Artificial Life II, Santa Fe Institute
  Studies in the Sciences of Complexity, vol. XI, 371-408. Redwood City, CA:
  Addison-Wesley.<bibitem|Lenskietal1999>Lenski RE, Ofria C, Collier TC,
  Adami C (1999) Genome complexity, robustness and genetic interactions in
  digital organisms. Nature 400: 661\U664.<bibitem|Wilkeetal2001>Wilke CO,
  Wang JL, Ofria C, Lenski RE, Adami C (2001) Evolution of digital organisms
  at high mutation rates leads to survival of the flattest. Nature 412:
  331\U333.<bibitem|Lenskietal2003>Lenski RE, Ofria C, Pennock RT, Adami C
  (2003) The evolutionary origin of complex features. Nature 423:
  139\U144.<bibitem|Knibbeetal2007>Knibbe C, Mazet O, Chaudier F, Fayard JM,
  Beslon G (2007) Evolutionary coupling between the deleteriousness of gene
  mutations and the amount of non-coding sequences. J. Theor. Biol. 244:
  621\U630.<bibitem|Knibbeetal2007b>Knibbe C, Coulon A, Mazet O, Fayard JM,
  Beslon G (2007) A Long-Term Evolutionary Pressure on the Amount of
  Noncoding DNA. Mol. Biol. Evol. 24: 2344\U2353.<bibitem|Cluneetal2008>Clune
  J, Misevic D, Ofria C, Lenski RE, Santiago FE, Sanjun R (2008) Natural
  Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on
  Rugged Fitness Landscapes. PLOS Comp. Biol. 4:
  e1000187.<bibitem|Gregory>Gregory TR (2001) Coincidence, coevolution or
  causation? DNA content, cell size, and the C-value Enigma. Biol. Rev. 76:
  65\U101.<bibitem|Luke>Luke S (2005) Evolutionary Computation and the
  c-Value Paradox. In: Genetic And Evolutionary Computation Conference,
  Proceedings of the 2005 conference on Genetic and evolutionary computation,
  H-G Beyer et al. editors, Association for Computing Machinery, Inc.,
  91\U97.<bibitem|Lenski>Barrick JE et al. (2009) Genome evolution and
  adaptation in a long-term experiment with
  <with|font-shape|italic|Escherichia coli>. Nature 461:
  1243\U1247.<bibitem|MaynardSmith>Maynard Smith J (1992) Byte-sized
  evolution. Nature 355: 772\U773.<bibitem|ONeill>O'Neill B (2003) Digital
  Evolution. PLoS Biol. 1: 11\U14.<bibitem|Eigen71>Eigen M (1971),
  Selforganization of Matter and the Evolution of Biological Macromolecules.
  Naturwissenschaften 58: 465\U523.<bibitem|Turing>Turing AM (1937) On
  computable numbers, with an application to the Entscheidungsproblem.
  Proceedings of the London Mathematical Society, Ser. 2, Vol. 42:
  230\U265.<bibitem|davis>Davis M (1982) Computability and unsolvability.
  Dover, New York.<bibitem|Beaver>Rad T (1962) On non-computable functions,
  Bell System Technical Journal, Vol. 41, No. 3:
  877\U884.<bibitem|Eigen-Schuster>Eigen M, Schuster P (1977) The hypercycle.
  A principle of natural self-organization. Part A: Emergence of the
  hypercycle. Naturwissenschaften 64: 541\U565.<bibitem|Swetina-Schuster>Swetina
  J, Schuster P (1982), Self-replication with errors. A model for
  polynucleotide replication. Biophys. Chem. 16:
  329\U345.<bibitem|Krall>Wagner GP, Krall P (1993), What is the difference
  between models of error thresholds and Muller's ratchet? J. Math. Biol. 32:
  33\U44.<bibitem|Wilke>Wilke CO (2005), Quasispecies theory in the context
  of population genetics. BMC Evol. Biol. 5:44.<bibitem|Takeuchi>Takeuchi N,
  Hogeweg P (2007), Error-threshold exists in fitness landscapes with lethal
  mutants. BMC Evol. Biol. 7:15.<bibitem|Saakiaan>Saakian DB, Hu CK (2006),
  Exact solution of the Eigen model with general fitness functions and
  degradation rates. PNAS 103: 4935\U4939.<bibitem|Tarazona>Tarazona P
  (1992), Error thresholds for molecular quasispecies as phase transitions:
  From simple landscapes to spin-glass models. Phys. Rev. A 45:
  60386050.<bibitem|Eigen2000>Eigen M (2000) Natural selection: a phase
  transition? Biophys. Chem. 85: 101\U123.<bibitem|Nowak>Nowak MA (2006)
  Evolutionary Dynamics: Exploring the Equations of Life. Harvard University
  Press.<bibitem|Nowak89>Nowak M, Schuster P (1989), Error Thresholds of
  Replication in Finite Populations. Mutation Frequencies and the Onset of
  Muller's Ratchet. J. Theor. Biol. 137: 375\U395.<bibitem|io>Musso F (2010)
  A stochastic version of the Eigen model. Bull. Math.
  Biol.<bibitem|bull2005>Bull JJ, Meyers LA, Lachmann M (2005), Quasispecies
  Made Simple. PLOS comp. biol. 1: 450\U460.<bibitem|AMS>Grimstead CM, Snell
  JL (1997) Introduction to Probability: Second Revised Edition.
  AMS.<bibitem|Ideal>Musso F, Feverati G (2009) A Proposal for an Optimal
  Mutation Probability in an Evolutionary Model Based on Turing Machines.
  Lecture Notes in Computer Science 5788, H. Yin, E. Corchado editors,
  Springer Verlag, 735\U742.<bibitem|Punctuated>Eldredge N, Gould SJ (1972),
  Punctuated equilibria: an alternative to phyletic gradualism. In T.J.M.
  Schopf editor, Models in Paleobiology. San Francisco: Freeman Cooper. pp
  82\U115.<bibitem|Drake>Drake J, Charlesworth B, Charlesworth D, Crow JF
  (1998) Rates of Spontaneous Mutation. Genetics 148: 1667\U1686.>

  <new-page><ifcase><nofigure>

  <section*|Figures>

  \;

  <section*|Figure Legends>

  <fi>

  <big-figure|<ifcase><nofigure> <image|Figure01.eps||||> <fi>
  |<with|font-series|bold|Graphical representation of a Turing machine.> The
  machine is shown at time <math|t>, in the internal state
  <math|<math-bf|s><around|(|t|)>>, located on the <math|k<around|(|t|)>>-th
  cell of a infinite tape.<label|Turing>>

  <big-figure|<ifcase><nofigure><space|5mm><image|Figure02.eps||||> <fi>
  |<with|font-series|bold|Stable state for the occupation number of the
  highest occupied performance class.> The blue curve represents the function
  <math|n<rsub|s><rprime|''><around|(|n<rsub|s>|)>> defined by
  (<reference|nss>), while the red line corresponds to
  <math|n<rsub|s><rprime|''>=n<rsub|s>>. The green points represent <math|15>
  iterates of the discrete map (<reference|nss>) starting by the initial
  datum <math|<wide|n|\<bar\>>>. The asymptotic value of the map
  (<reference|nss>) will be <math|n<rsup|<around|(|2|)>><rsub|s>> for any
  initial datum <math|<wide|n|\<bar\>>\<neq\>0>.<label|dinamica>>

  <big-figure|<ifcase><nofigure><space|13mm><image|Figure03.eps||||> <fi>
  |<with|font-series|bold|Results of the numerical simulation described in
  section ``The deterministic model''.> The green points show the occupation
  numbers of the <math|40> performance classes obtained after
  <math|2\<cdot\>10<rsup|6>> generations while the red line connect those
  obtained after <math|10<rsup|6>> generations. The prediction in
  (<reference|qt>, <reference|quali>) for the best occupied performance class
  is <math|s>=21.<label|simul>>

  <big-figure|<ifcase><nofigure><space|30mm> <image|sigma.eps||||> <fi>
  |<with|font-series|bold|Histogram of the distribution of
  <math|\<sigma\>/<wide|<nc>|\<bar\>>> for the <math|3740> runs of our
  simulations.> <math|<wide|<nc>|\<bar\>>> is the average number of coding
  triplets for the best individuals in the last generation and
  <math|\<sigma\>> is the corresponding standard deviation. The size of the
  bins is <math|0.0025>.<label|sigmaNc> >

  <big-figure|<ifcase><nofigure><space|12mm><image|Figure05.eps||||> <fi>
  |<with|font-series|bold|Performance versus the number of coding triplets.>
  The performance is shown for the best performing TMs at generation
  <math|50000> for the <math|3740> runs of our simulations. Each color
  corresponds to a different value of the mutation probability as indicated
  in the scale under the image. The dashed lines correspond to the critical
  number of coding triplets for the <math|6> highest value of the mutation
  probabilities. For lower values the corresponding critical number of coding
  triplets lies outside of the graph. Notice that many points are
  superimposed.<label|correlation> >

  <big-figure|<ifcase><nofigure><space|32mm><image|Limit.eps||||> <fi>
  |<label|nclimit> <with|font-series|bold|Plot of the number of coding
  triplets for the best machine in the population.> The number of coding
  triplets after <math|50000> generations, averaged on the seeds, is shown as
  a function of <math|<pmm>>, for all the values of <math|<pii>>. The black
  thick line on the right represents the critical number of coding triplets,
  according to equation (<reference|Ncrit>).>

  <big-figure|<ifcase><nofigure><image|Figure07.eps||||> <fi>
  |<label|performance><with|font-series|bold|Best performance value in the
  population at the last generation.> The best performance is averaged on the
  twenty different seeds and plotted as a function of the states-increase
  rate <math|<pii>> and of the mutation rate <math|<pmm>>. (a) shows a 3D
  view, while subfigures (b), (c), (d) correspond to the three orthogonal
  projections.>

  <big-figure|<ifcase><nofigure><space|40mm><image|Figure08.eps||||> <fi>
  |<label|esoni> <with|font-series|bold|Number of coding triplets in the
  population at the last generation.> The number of coding triplets is
  averaged on the best machines and on the seeds; it is plotted versus
  <math|<pii>> (right) and <math|<pmm>> (left).>

  <big-figure|<ifcase><nofigure><image|Figure09.eps||||> <fi>
  |<label|doppio><with|font-series|bold|Data along the generations.> Here we
  show the evolution of the performance (top) and of the number of coding
  triplets (bottom) with the generations, for the values of <math|<pii>>
  indicated by the matching colours and for two values of <math|<pmm>>. In
  (d), the dashed line represents the maximal number of coding triplets
  (<reference|Ncrit>). In (c), the corresponding line is outside the graph,
  being <math|<nc><rsup|\<ast\>>=1047.0>. Data are sampled every 100
  generations and averaged on the seeds.>

  <big-figure|<ifcase><nofigure><space|35mm><image|codingtriplets_pi.eps||||>
  <fi> |<with|font-series|bold|Correlation of the mean number of coding
  triplets <math|<wide|N|\<bar\>><rsub|c>> versus the states-increase
  probability <math|<pii>>.> For each value of <math|<pii>>, only the four
  best values of the final performance (at four different <math|<pmm>>) are
  retained for the evaluation of <math|<wide|N|\<bar\>><rsub|c>>, for each
  seed. The green straight line of linear regression is evaluated on the
  range <math|<pii>\<leq\>3.33\<cdot\>10<rsup|->*2> only, in order to compare
  with figure 7 of <cite|PRE>.<label|correlazione>>

  <big-figure|<ifcase><nofigure><image|Figure11.eps||||> <fi>
  |<with|font-series|bold|Comparison between actual and previous data.> The
  subfigure (a) corresponds to figure 4.c restricted to the range of
  <math|<pii>> values considered in <cite|PRE>. Subfigure (b) is the same as
  (a) but for the data obtained in <cite|PRE>.<label|comparison>>

  <big-figure|<ifcase><nofigure><space|10mm><image|Figure12.eps||||> <fi>
  |<with|font-series|bold|Extinction times vs mutation probabilities.>
  Logarithm of the theoretical (continuous line) and observed (points)
  extinction time <math|log<rsub|10> \<tau\>> versus the number of coding
  triplets <math|<nc>> for the indicated mutation probabilities. The black
  vertical lines correspond to <math|<nc><rsup|\<ast\>>>, the critical number
  of coding triplets of the deterministic model (eq.
  (<reference|Ncrit>)).<label|ext1>>

  <big-figure|<ifcase><nofigure><space|10mm><image|renormalized.eps||||> <fi>
  |<with|font-series|bold|Relative extinction probabilities versus the
  generation number.> The curves correspond to
  <math|<pmm>=4.44\<cdot\>10<rsup|-3>>, for five different values of
  <math|<nc>>. The error threshold for the given <math|<pmm>> is
  <math|<nc><rsup|\<ast\>>\<sim\>52>, represented by the blue line. Data are
  renormalized by setting to one the probability of observing an extinction
  event before generation <math|32768>.<label|estinzione>>

  <big-figure|<ifcase><nofigure><space|-5mm><image|storia_1234_pm_0.000364.eps||||>
  <fi> |<label|equilibria><with|font-series|bold|Each coloured line shows the
  evolution of the performance during the generations for a single
  simulation.> The mutation and seed values are shown in the upper left
  corner, while the state-increase rate <math|<pii>> is indicated on the
  right by the matching colour. We observe the presence of long stasis
  periods alternated by short periods of fast evolution. The small black
  rectangle is zoomed on in the right part of the figure to show the actual
  jumps in the performance.>

  <big-figure|<ifcase><nofigure><space|30mm><image|salti.eps||||><vspace|-5mm>
  <fi> |<with|font-series|bold|Distribution of the performance jumps versus
  their amplitudes.> This histogram shows the number of increases in the
  performance versus their amplitude.<label|salti> >

  <big-figure|<ifcase><nofigure><space|30mm><image|storia_1234_pm_0.004444.eps||||>
  <fi> |<label|oscillations> <with|font-series|bold|Performance evolution
  near the error threshold.> Here, as in figure <reference|equilibria>, we
  show the growth of the performance in the generations but for the much
  higher value of the mutation probability <math|<pmm>=0.0044>. For this
  value of <math|<pmm>> and certain values of <math|<pii>>, TMs reach the
  error threshold. From there on, a typical oscillatory pattern emerges.>
</body>