Sunday, November 21, 2010

Collatz Ecologies

I mentioned the Collatz Conjecture in "On Design" as an example of the qualitative difference between simulation and inverse problem solving. In this article I want to use it for another purpose: to show how structure emerges out of iteration. Specifically, I want to create a very simple model of Darwinian evolution and demonstrate with simulations and mathematical proof that patterns emerge naturally. In a later post I will talk more about what is significant about this, but here's the preview: when stable patterns emerge in some iterated system, it's possible to build new systems on top of the old ones. Moreover, these new systems can be seen as independent of the old ones. The full discussion on that can wait. This is the fun part.

Iteration at the heart of the conjecture is a single branching formula that works on (usually positive) integers:
$N_{new}:=\left\{ {N_{old}/2\mbox{ if even}\atop (3N_{old}+1)/2\mbox{ if odd}}\right.$
I have used the more compact form of the formula that goes ahead and divides by two in the odd case, since 3N+1 will always give an even number when N is odd. As an example, starting with 8, we get the sequence 8 ->; 4 ->; 2 ->;1, since all but the last is even (8 is a power of 2). A more interesting example is 3 -> 5 -> 8 -> 4 -> 2 -> 1. The unproven conjecture is that any starting number eventually ends up at one. If the conjecture is not true, then for some starting N, either the sequence grows without bound, or it forms a repeating loop. The Wikipedia page has a nice summary of what is known.

Artificial Life is the use of computer simulations to understand biological-like behavior. Conway's Game of Life is one of the best known. For more on the general topic see the Wiki on ALife. For our purposes here, I want to use the Collatz iterator to construct a population that is subject to Darwinian evolution. For that we need these components:

  1. An initial state from which to begin the simulation. In practice this will be an individual species, identified with an odd integer 3,5,7...
  2. A simulation of change over time. This will be the Collatz iterator acting on the numerical species.
  3. A fitness function. If a species "evolves to" 1 via the iteration, it is eliminated from the population. The Collatz Conjecture in this context is that all species eventually go extinct.
  4. Reproduction and variation. For any odd numbered species, when it is transformed by the iterator, an imperfect copy is also created. The copy is the species N-2, where N is the new number of the original after iteration. For example, when 11 -> (3*11+1)/2 = 17, the species 17 - 2 = 15 is also added to the population.
  5. A carrying capacity C. When the population comprises C species already, reproduction is paused until a vacancy opens up through some species going extinct.
For my purposes I don't care how many copies of an individual species exist, since they will all share the same deterministic fate. It also helps keep the computation cost down. So if 17 already exists in the population, and another 17 comes along, I don't create two copies of 17.

Initial Results. Alife sims are just plain fun to play with. I include my code at the bottom of this post in case you want to try it yourself. For what follows, the carrying capacity is set to 100. Here are the first seven starting species charted over 20 generations.

We see exponential growth capped at C except for N=3 and N=5 (they overlap), which form the line at the bottom. What's going on there?

A little investigation shows that the population {2,3,4,5,6,8} forms a closed "ecosystem" that is generated from starting with 3 or 5 (or 6 if we include evens). Moreover, this system follows a stable pattern that will go on forever, showing that there exists Collatz ecologies that never go extinct. In fact, if any other N-ecology ever generates 3 or 5, it will create this pattern as well, and so will also live forever. This gives us:
Conjecture 1: All odd N-ecologies with N > 1 survive forever for sufficiently large C
I hedged a bit there, but I don't think C needs to be very large at all. Another question that we can frame as an affirmative in the form of a conjecture is:
Conjecture 2: There is only one bounded ecosystem for odd N > 1.
Here bounded means that the largest species encountered doesn't grow beyond a ceiling. It looks like the other graphs are bounded too, because they run into C, but this is only true for the number of species at any one time. The individual species identifications continue to grow. We'll look at that next.

The graph below shows the population of N=9 after 16 generations. It has been capped by C, and so the number of species will stay about 100, but which ones those are will continue to change. The individual species numbers are shown by the heights of the bars. The x-axis is unimportant--it just arranges the species from smallest to largest. Each bar is a distinct species. The four highest ones are {1893,1895,1896,1898}.





Obviously there's a lot of structure here. If we plot the populations of all 16 generations together, we see a consistent pattern over time. Each generation has its own color:

The quartet we saw as the top plateau for the population seems to be an enduring structure. Moreover, this pattern is visible on the other populations created from N = 9, 13, and 25, for example:


The graph shows the 16th generation of each, when N=9 and 25 are just hitting the carrying capacity. The quartet and plateau patterns are obvious in all three.

A note about the above graphs. I was constantly fiddling with the program, running different numbers of generations and messing around with how reproduction gets handled. The final version of the program kills any number that attempts to grow beyond a billion, in order to prevent integer overflows. So if you run the program at the bottom and compare it to these you'll see minor differences at the upper end. The graphs above were generated without that constraint, but no overflows actually happened in the data represented.

The Quartets are not hard to explain. Whenever N begets another two odd numbers by the action of the iterator and mutator/reproducer, a quartet will be formed. And because the "two odds in a row" means that the starting number is multiplied by essentially (3/2)(3/2) = 2.25, it and its three new kinfolk will leapfrog over the competition. Suppose for some odd N we get another odd number (plus new variation). This will happen when $(3N+1)/2 = 2k+1$ for some $k=1,2,...$. Solving for $N$ gives $N=(4k+1)/3$. This fraction is only an integer when $k$ is of the form $k=3m+2, m=0,1,...$ Plugging all this back in, we find that $N=4m+3, m=0,1,...$, so $N=3,7,11,15,19,...$ all have the property that they are odd numbers that also yield new odd numbers under iteration. When this happens, two pairs of new species are generated. In terms of $m$ they are $9m+8, 9m+6, 9m+5, 9m+3$. This gives exactly the pattern we observe in the simulations.

Plotting the population of N=7 over 100 generations against a log scale shows how the shape of the ecology changes over time. The lines of dots sloping up represent the constant log(3/2) growth of the odds. The dots seem to pretty much cover all the integers (not all at once, but eventually). That gives us a third question:
Conjecture 3: Except for N = 1, 3, or 5, the ecology for odd N eventually reaches any given positive integer.
As supporting evidence (certainly not proof!), if we look at N=7,9,...99 and locate the missing integers after 1000 generations, we see that every one has swept up the integers through 250.

The horizontal axis is N (odd numbers from 7 to 99), and the vertical axis locates the missing value: each dot shows an integer that was missed by the N-ecology after 1000 generations. The distribution of these misses, inclusive of all the populations above shows some structure:

Is there some crazy number out there that isn't reachable by a given N-ecology? Does the carrying capacity change this one way or the other, over straight exponential growth?

Behavior at the point of capacity shows another emergent pattern. We see this in the number of species in an N-ecology after it faces the limitation of C, and new species are created much more slowly. We would expect that larger Ns have lower extinction rates once population size C is reached. This is because in order for a species to die, it has to land on a power of two, after which it zips to 1 and goes extinct. Powers of two are distributed more sparsely (on a linear scale) as N increases. Of course, there is only one "2" in the population at any given time, so the actual extinction rate due to the 2 -> 1 evolution is either zero or one each generation. The graph below shows 1000 generations, averaging the odd Ns from 7 to 99, giving the demand for new population, the actual number created, and the extinction rate. The first of these would be expected to be half the population, since the odds get reproduced, and indeed we see the line bounces around 50. The extinction events come from 2->1 and from the overflow limit, when a number tries to grow beyond a billion.
The red line is more interesting. It shows the actual number of slots that opened up for new population. If we subtract out the ones we can account for due to 2->1 or overflow, we get a constant average of about 5.4. This must be the rate at which two species turn into one species due to the action of the iterator. In other words
$(3N_1 + 1)/2 = N_2 / 2$ or $3N_1 + 1 = N_2$. 
How likely is it that $N_2 \mod 3 = 1$? Should be one third, but we have to remember that $N_1$ is odd and $N_2$ is even, so for positive integers $n1$ and $n2$ we have $3(2n_1+1)+1 = 2n_2$ which gives us $3n_1 + 2 = n_2$. That doesn't change the odds. So if all integers were in the population at the same time, we'd expect a third of them to join up after iteration, decreasing the new generation's size by one each. Given that we have to account for an actual average loss of 5.4, this seems to imply that at any given time, the ecology is 16% saturated (5.4%/33%). At 100% saturation, all integers are present, and one third will disappear due to collisions. If this analysis is right, is this saturation level really constant over time? It seems unlikely.

Reference. I discovered by Googling around that Hiroki Sayama had a similar idea to use the Collatz sequence to study artificial life. His version is completely different from mine, and you can find it here.

The code. [download perl script from snipplr]


Edit: I made a couple of minor edits to two sentences after posting.

1 comment:

  1. Collatz is just the name of the conjecture. The sequence is often called the Hailstone sequence.

    ReplyDelete