# collatz plot thickens

consider the collatz problem as a graph traversal of an infinite directed graph. each vertex of the graph has an integer associated with it. there is an edge from vertex $a$ to $b$ if $f(a)=b$ where $f(a)$ is one iteration. it would seem, a proof of the collatz problem must be something like a graph traversal algorithm for this graph (or in other words a proof of the correctness of a verification algorithm, which would take the form of a graph traversal). it must visit each vertex and mark them as “verified” as having a path ending with the 1 vertex. another way of looking at it is as a streaming algorithm where it “chews off” unresolved integers on one side, digests them, and outputs “verified” integers on the other side.

now see the collatz problem as the plot of the function such that $x$ is the starting point and $y$ is the number of iterations $f(...f(x))$ required, eg on wikipedia “collatz stopping time”. imagine an algorithm that verifies all integers. a simple algorithm starts at each $x$ and verifies each $x$ sequentially. but imagine a more sophisticated algorithm. it can jump around to different $x$ and verify them out of a sequential order. it could also have partial calculations for $x_1, x_2, ...$ and have some set of these partial calculations, and some strategy for adding new unexplored integers to this set, and extending the partial calculations of the current set.

the simple algorithm is problematic for two reasons. (1) it repeats work already done if it re-encounters some integer in the middle of a trajectory. (2) it leads to highly irregular and nearly-random-looking results. in fact it suggests that collatz iteration strategy would make a good pseudorandomness generator, and surely not merely coincidentally there is amazingly strong similarity between “linear congruential pseudo random number generators” and the collatz problem. this extraordinary randomness—apparently a variation of a number-theoretic fractal!—is very much at the heart of the fiendish difficulties in finding a proof.

speaking of fractals, the formula for the mandelbrot set $z \leftarrow z^2 + c$ and its magical quasi-random results has strong resemblance to the collatz problem also, and in the case of the mandelbrot set, it has been proven to be undecidable whether a point $c$ is contained in it! actually, if you ask me, I would say fractal phenomena was discovered in number theory ages before anywhere else, eg by the greeks with primes, etc!

the prior blog had another such somewhat-naive algorithm, more advanced than totally basic, but unfortunately on closer inspection it is a little too naive. it has a “trajectory set” of partial trajectories that are currently being considered. note that if one trajectory is found to “collide” with another trajectory, the two trajectories can be merged. [this is somewhat reminiscent of the merge operation of red-black trees.] however in this algorithm, it has some merge logic using a variable called `seen` to track if a integer has been encountered in an actively considered trajectory, but merging two partial trajectories is not equivalent to resolving either one of them, and this code does not handle that subtlety further.

it is not a proof of the problem because the following scenario cannot be ruled out. the output shows it continually merges trajectories in a very consistent way. but it does not keep track of how long an existing trajectory has been in the active partial set [ie basically whether it has terminated or not]. so trajectory merges could keep happening but a trajectory in the active set could get very old, in fact it could “hang around forever” as the algorithm continues, in the case that there exists a “loop” or [what is known as] a “diverging trajectory”.

the algorithm does have another interesting strategy of determining which partial trajectory to advance. it finds the trajectory with the smallest last iterate out of all active partial trajectories. it adds to the set of partial trajectories if the next integer to consider is less than the smallest last iterate.

* * *

considering the shortcomings of the last algorithm, the following is a new algorithm with more sophistication. it explicitly keeps track of whether trajectories have terminated. a merge of a trajectory that has not terminated with one that has results in terminating the first one also. but a merge of a trajectory with another one that has not terminated does not result in a terminated trajectory, just a newly modified partial trajectory that is the result of the merge. when a trajectory terminates, all the vertexes it visited can be output as “verified”. [based on the logic to keep trajectories “nonintersecting” the algorithm will never output a vertex as visited more than once.]

hint on this code: trajectories are stored as triples $[a, b, c]$ where $a$ is the starting point, $b$ is the current iterate, and $c$ is a list of visited vertices/integers on the trajectory. if the trajectory is terminating, $b$ is set to 1, eg with a trajectory that is merged into another terminating trajectory.

```

def out(l, y)

\$f2.puts(y)
l.each \
{
|x|
\$f.puts([\$t, x, y].join("\t"))
\$f.flush
\$t2 += 1
}
end

\$f = File.open('out1.txt', 'w')
\$f2 = File.open('out2.txt', 'w')

l = [[3, 3, [3]]]

seen = {3 => 0}

i = 0
\$t = 0
\$t2 = 0

loop \
{
\$t += 1

a = (0...l.size).to_a.map { |j| l[j][1] == 1 ? nil : j }.compact

if (!a.empty?) then

i = a.sort_by { |j| l[j][1] }[0]

puts

a.each { |j| puts((j == i ? 'x' : ' ') + " #{j} " + l[j].inspect) }

l2 = l[i]

n = l2[1]

n2 = n * 3 + 1
while (n2.even?)
n2 >>= 1
end
l2[2] << n2
j = seen[n2]

if (!j.nil?) then
#				puts("#{i} => #{j} (#{n2})")
out(l[i][2], l[i][0]) if (l[j][1] == 1)

l[j][2] |= l[i][2]
l2[1] = 1
else
l2[1] = n2
seen[n2] = i
end

end

if (a.empty? || i == l.size - 1) then
n = l[-1][0]
n2 = n + 2
l << [n2, n2, [n2]]
seen[n2] = l.size - 1
i = 0
end

}
```

there seems to be some key significant patterns in these graphs, hinting at proof directions. so—a rorschach test for number theory geniuses? what do you see here?

* * *

the strategy here is to use datamining on the collatz problem, but in a way that seems general, hinting at some wider vista of theory, some large expanse of terra incognita. the idea of traversing an infinite graph based on a problem with an algorithm is something that can be applied to any problem, such that there is probably some way to formulate this that shows its basic/inherent property of TM-completeness, aka “undecidability”. but here that is only to be taken not as a synonym for “impossible”, as is the frequent connotation, but as a synonym for “equivalent to theorem proving,” which of course happens every day by humans but is still typically out of reach of machines.

normally algorithmic verification traversals of the collatz problem are highly irregular and look random, so some success of this approach can be gauged based on how orderly the algorithmic traversal ensues. the algorithm converts something noisy and random to something orderly. if it can be done to a high degree of success, a proof would seem to be lurking in this conversion. it would then just need to be extracted or “reverse engineered”. [easier said than done]

the graph traversal algorithm can be compared to classic/famous algorithms such as A*. in this way it takes on the characteristics of an intelligent robot attempting to map a [infinite] maze. the maze may not have a 2d or 3d nature at all. it is multidimensional. actually, solving the problem seems to somehow involve determining the “dimensionality” of the maze.

so in traversing an infinite graph, the algorithm can basically do several things.

• increase the number of vertices in the “frontier set” by adding a new vertex, but not one that is adjacent to the existing frontier, using some particular strategy.
• look at a vertex that is adjacent to the current frontier. choose which of the vertices to jump from via some strategy.
• discover that a new vertex is “equivalent” to some vertex already encountered. this is like a “graph wormhole”.
• “resolve”/”verify” a vertex is terminating and therefore all vertices leading to it are also, and mark them and report them.

it is natural to organize this traversal into independent trees that may merge during the traversal, and trees that are partial and those that are terminating.