# ***startling*** collatz plateau!

💡 ❗ ❗ ❗ ❓ this looks like it could be something very big! (this initially hazy idea has been poking me for a long time but was not able to articulate/ pin it down formally in logic/ code, but also building on some earlier ideas spread out in several places.) this is very simple code that just looks at how much of the (base 2) msbs of two sequences match, one is a “long (glide)” collatz sequence, and the other is a sequence with the same starting seed multiplied by 3x where x is the number of 3n+1 operations in the collatz sequence so far. the results are rather startling/ unexpected and yet somehow obvious in retrospect…?!? 3 separate runs graphed. this can apparently be theoretically justified (but need to think it out more): the initial plateau length perfectly matches/ coincides with the glide length. 😮

(6/9) revised/ slight code change that also puts the total bit length, as in 4th graph. 5th is same graph in different filled plotting style.

long18b.rb

(6/10) “the honeymoon wears off” again. this still seems quite significant/ notable unf it is not immediate obvious how to leverage and/ or exploit these results. one idea that came up is as follows. what is the inverse of this operation? roughly, given any particular integer in a trajectory, one might wonder if its in the “middle of some other trajectory”. this is not so easy to define but it could probably be formalized. in other words what is the lowest seed that includes this current value in its trajectory? one can (possibly) recover this initial lowest seed via this 3x relation (subject to some “noise” in the sense of “false candidates”). the inverse involves finding seed(s) n such that the “higher bits” (msbs) of n * 3x = n2 where n2 is the current value in question. this is apparently not easy to compute via arithmetic or code but a semi obvious idea is that the multiplication/ msb relation can be encoded in SAT and candidate n can be found that way (worked long ago on code converting multiplication/ factoring to SAT and might write up that topic sometime on the blog).

(6/11) that line above about determining “the middle of a trajectory” reminded me of some other approach. this is an idea that showed up many months ago in a slightly different form of looking at resolving sequential blocks and also locating multiple graph centers with a greedy algorithm. an earlier conjecture was that it might be possible to do this with a constant amount of iterations or that the graph centers might be within constant distances. had refuted that earlier but didnt post the code. just wrote up something new to analyze this that was a bit involved, and then realized there was a very simple variation on an earlier idea.

this similar code does not work in blocks. its very simple and just keeps a large array of “seen” (visited) integers and resolves trajectories in a forward/ sequential order. the trajectory is resolved if it “falls below” (ends its “glide”) or and/ or hits any prior seen integer.

the “seen” array can be thought of as a 2 region distribution. all the integers below the current n are “seen”/ visited in a large contiguous block. above that, there is a massive cloud or scattering, roughly in a gaussian distribution (the exact nature of that distribution would be interesting to study). new trajectories may hit this upper scatter cloud early without extending the whole glide. how far do they extend? this code lists the current max # of iterations to hit the seen cloud in the 4th column, for the starting seed in the 3rd column, and also outputs a current status line for each new power of 2 in trajectory seed size.

it ran for many hours (gory details: started at night, woke up to it in morning with it thrashing the hard drive with virtual memory swapping) and built a cloud of over 106.7M points (5th column) for integers up to 134.2M. the max # of iterations is never very large (max 236 at the end) even for large integer start points. it also does not seem to be bounded, it continually trends upward in a nearly linear trend (on a logarithmic scale) with no apparent levelling or maybe even slowing.

seen.rb

```["11", 2, 3, 1, 2]
["101", 3, 5, 1, 2]
["111", 3, 7, 3, 6]
["1001", 4, 9, 3, 7]
["10001", 5, 17, 3, 11]
["11011", 5, 27, 36, 52]
["100001", 6, 33, 36, 53]
["1000001", 7, 65, 36, 75]
["10000001", 8, 129, 36, 113]
["100000001", 9, 257, 36, 197]
["1000000001", 10, 513, 36, 399]
["1010111111", 10, 703, 50, 598]
["10000000001", 11, 1025, 50, 834]
["100000000001", 12, 2049, 50, 1631]
["1000000000001", 13, 4097, 50, 3235]
["10000000000001", 14, 8193, 50, 6510]
["100000000000001", 15, 16385, 50, 12983]
["1000000000000001", 16, 32769, 50, 26099]
["1000101101000111", 16, 35655, 84, 28436]
["10000000000000001", 17, 65537, 84, 52062]
["100000000000000001", 18, 131073, 84, 104178]
["1000000000000000001", 19, 262145, 84, 208222]
["1000001111110111111", 19, 270271, 102, 214670]
["1011000011101100111", 19, 362343, 103, 287933]
["1100001111011111111", 19, 401151, 104, 319279]
["10000000000000000001", 20, 524289, 104, 416459]
["10011000111010011011", 20, 626331, 105, 497483]
["11111010110101100111", 20, 1027431, 114, 815591]
["100000000000000000001", 21, 1048577, 114, 832481]
["101000100001001111111", 21, 1327743, 117, 1055247]
["101010100011011111111", 21, 1394431, 134, 1108386]
["1000000000000000000001", 22, 2097153, 134, 1666981]
["10000000000000000000001", 23, 4194305, 134, 3335969]
["10111101011010010111111", 23, 6206655, 139, 4937470]
["11110110110100111111111", 23, 8088063, 154, 6436014]
["100000000000000000000001", 24, 8388609, 154, 6675842]
["110011001100110001100111", 24, 13421671, 180, 10675205]
["1000000000000000000000001", 25, 16777217, 180, 13343701]
["1100101111010100111111111", 25, 26716671, 187, 21248398]
["10000000000000000000000001", 26, 33554433, 187, 26688268]
["11011001001001101100011011", 26, 56924955, 193, 45280067]
["11110011000110100111111111", 26, 63728127, 236, 50692993]
["100000000000000000000000001", 27, 67108865, 236, 53381599]
["1000000000000000000000000001", 28, 134217729, 236, 106759575]
```

❓ 😮 kinda weird/ strange, dont know if this is something significant or just a coincidence but look at that, all the prior found starting seeds are either divisible by 3 or prime. huh! (they also seem to be inside/ nearly at the center ~½ “density core”.)

```3: 3
7: 7
27: 3 3 3
703: 19 37
35655: 3 5 2377
270271: 270271
362343: 3 269 449
401151: 3 133717
626331: 3 17 12281
1027431: 3 3 3 38053
1327743: 3 3 151 977
1394431: 1394431
6206655: 3 5 7 13 4547
8088063: 3 2696021
13421671: 13421671
26716671: 3 3 2968519
56924955: 3 3 5 853 1483
63728127: 3 3 3 3 97 8111
```

(6/12) this is another idea looking at associating a trajectory and a “lower partner” via changing the msb of the seed, looking over very long trajectories with starting seeds up to 21000. it looks at the maximum difference in the (logarithmic scaled) trajectories until the lower partner falls below/ ends its glide. it seems to be bounded/ less than a ceiling…? or is that just wishful thinking? 💡 ❓

long23.rb

this next code is a slight modification which finds long glides based on glide length instead of starting seed bit width, and computes the max (log) trajectory value inside the glide minus the starting (log) seed (red), also for the partner (green). if this increases as expected while the “lower partner” difference seemingly stays flat (red line), this is further evidence the lower partner difference is being constrained/ bounded. this is indeed the case although the max glide minus seed tends to slow down in its increase (still increasing and concave downward). note this code and the last one end with Float overflows probably in the logarithm function which seems to fail around numbers at ~21024.

long24.rb

next version refactors some of the variable names for better readability and takes out the log calculations in favor of log2 operations instead which dont overflow. it replaces the hard seed search with finding seeds that have a large max trajectory height versus start seed difference. it was manually terminated after seemingly to have gotten stuck (aka “painted itself”) in a dead end (in the hard seed search) at the end of this plot. this plot is more revealing and seems to show that the max partner difference is definitely attenuated but probably not bounded. 😦 😳

long25.rb

(6/14) ⭐ 💡 ❗ 😮 😀 holy cow this looks like a really big deal! was musing on ways to go from the prior result. was thinking about ways of trying to “tie together” the partner trajectories eg by looking at whether they have common factors, other ways of compressing them in parallel, etc., and extending them at different rates to try to keep them in sync. in a sense the worst case of nearness between the two functions, allowing for variable extending, would have to happen at their respective maxes, ie a variable extension method could not do better than distance-between-maxes as worst case. & then came up with this.

a tiny modification on the prior code allows looking for the max of a hard seed trajectory versus the max of the partner over the partner glide, ie the log2 difference. it appears to be bounded by a constant! (~15) this code ran nearly ~24hr and final individual points took about 6 minutes each and ~6K list entries. (epic history: after not having any other obvious/ immediate ideas for awhile after prior results, this slight riff was devised on sunny day yesterday shortly after intense exercising & while nodding off lying down on a scenic park bench at a favorite park. am making a note of this for a film crew if they ever want to do a documentary someday.) 😎

long25b.rb

(6/15) refactored that code a little. a lot of this series has a funky “quick-and-dirty” approach of recalculating trajectories with different code than used in the hard seed generator. this code uses a common subroutine. variable names have been renamed again. another immediate possible weakness of prior code to rule out is whether the “secondary” partner glide terminating earlier than the “primary” glide affects results much. good news, it doesnt! not anticipating any surprises, this run is terminated earlier in about half the graph size and a fraction of the runtime (which goes up by ~n2 in graph size).

partner.rb

💡 another small variation in the code but maybe with major implications? this code generates seeds with long glides, and then compares the stopping distance of the 2ndary partner (green) with the primary glide distance (red) showing the latter is always less than the former (difference in blue) except for a few outliers, maybe all less than a small constant…? now, but how could this be proven in general?

partner2.rb

(6/18) revisiting an idea from last october (2014). that was some interesting code that resolved integers in a nearly linear rate with a slightly increasing buffer size. have been musing on some rough ideas as follows. suppose one had a buffer or “glob” of data that represents intermediate computations and integers resolved so far. suppose that one could consistently do a constant x computations, resolve a constant y new trajectories, and increase the buffer/ memory size by some unpredictable amount. wouldnt this be that )( close to a proof? that old algorithm comes close, but diverges at least in the following way.

this is a slight refactoring/ reformulation that looks at min and max trajectories (differences) resolved within equal sized advancing intervals and finds that its noisy and increasing. the plot of the min and max trends lines is however (as discovered/ documented earlier) extremely smooth linear. moreover its possible that the min resolved trajectory is nearly exactly linear which would superficially at least seemingly satisfy the conjecture/ criteria. but note that this algorithm is not computing a constant x trajectory iterations per interval, it varies exactly by current buffer size… so then how does that vary?

since graphs are cheap, am including several which are not exactly dramatic or anything (and are typically noisy, so not pretty either) but are included for completeness to illustrate the experiment. 200K (odd seed) trajectories. intervals are 100 iterations. 1st graph, (instantaneous) buffer size. 2nd, resolved trajectories per interval, note average is about 100. 3rd, difference in max/ min in each interval. 4th, min and max (nearly totally coincident).

in a sense, seemingly, the more noisy the internal processing (graphs 1-3), the smoother the output (4th). this may not merely be a coincidence. could the algorithm processing entropy be high in a kind of tradeoff with the problem entropy…? the 1st graph represents roughly computations per time and while noisy is apparently significantly less noisy than calculating/ resolving collatz trajectories iteratively, which is basically as noisy as the classic stopping distance graph.

so could this maybe be the overall objective in general? somehow decreasing the noise of the calculation (resolving sequential trajectories) until its “smooth”/ “derandomized”/ “recursive”?

parallel2b.rb

(6/21) that last run led to some new experiments & some new ideas. there was some misconception. the code creates threadlike lists that are processed in parallel. however, this does not affect the rate that trajectories are resolved because there are no interactions between them (where rate is # of trajectory iterations required to resolve trajectories on average, which is experimentally unbounded anyway). so attempting to stabilize that rate based on increasing or decreasing the “thread-lists” is impossible!

but then immediately wondered what would happen if some other stabilization strategies were attempted, eg the # of resolved trajectories per overall thread loop iteration. that leads to somewhat notable results but unfortunately nothing extraordinary. looking at the standard deviation of list count, it is not divergent, but noisy and apparently impossible to stabilize with a simple strategy of increasing or decreasing the list size. there is a parameter on this code for a exponential average strategy on the prior list sizes. there were two runs with param 0.01 and 1.00 reflecting heavily weighted past average, and no running average (totally weight only the current value). this code does seem to “attack the entropy” of the problem but never stabilizes and ends up “thrashing endlessly” in trying.

1st graph is min, max, avg of the lists size over two runs (6 lines total). the non-average-smoothed strategy shows two major regime shifts at ~50 and ~750.
2nd graph is the std deviation of the list size. the past-weighted strategy leads to more noise and both show gradually increasing noise.
3rd graph is the trace of list size over sequential iterations and shows the triangular-like vs fractal random-walk of the two runs. the list size hovers lorentz-attractor-like within min/ max range.

parallel4.rb

(6/27)

musing on this led to some more ideas. again the basic idea is that it seems one wants a traversal algorithm for trajectories that is “ordered”. but thats a very tricky/ slippery concept. what is order? maybe a thousands of year old philosophical question even addressed by the greeks & their philosophy/ religion eh? (aka pandoras box etc!) and also collatz seems to have a deep fractal structure and the idea of traversing a fractal in an ordered way seems hopeless. however, maybe something small )( can be salvaged here.

the modern concept of order is entropy, a very crosscutting concept across many fields. (am intending to write a whole blog on it sometime, have many links, but its a daunting idea.) and this is not always well described in CS but compression theory has a lot to say about order. a compression algorithm is based on “finding order”. now what is compression in general in CS? that leads us to kolmogorov complexity which is proven to be undecidable to compute. however, taking a more pragmatic pov (as this blog long/ nearly always tends to do) any compression algorithm can be seen as a “rough approximation/ estimation” of kolmorogov complexity.

in this blog compression has already shown up in a signficant way in experiments with grammar compression on collatz sequences/ data. that led to some interesting analysis that showed they are indeed highly compressible, which is not immediately obvious/ expected from all the theory about collatz as a random walk.

that grammar compression used a subroutine to find a (pairwise) longest common/ matching substring in a single string. that code took a long time to debug & creates O(n2) size temporary arrays. it occurred to me this can be all done much more elegantly “in memory” without the temporary prefix array table just with list sublist operations and sort/ comparisons. here is some code for that in the compress subroutine.

then one can use this compression to measure (dis)order/ entropy of sequences as simply the final size of a compressed string or a compression ratio of compressed vs starting string. then it can be used to traverse the collatz graph or DAG. which requires a few extra ideas but yet is full of experimentation possibilities.

my 1st idea was to have a buffer that fills up to a max size and then the lowest “complexity” trajectories are extracted from it (or ie shifted out running/ sequentially), and then one can look at the overall order of this result. this is quite similar to some earlier ideas tried out that just advance the partial trajectories in a buffer with the lowest current intermediate value or whatever, except using the trajectory complexity as a greedy measure. the 1st parameter is the buffer size. following are two runs for 500, 1000 buffer sizes. somewhat unexpectedly the output is ordered for about the same # of points/ trajectories 1st processed after the buffer has filled up (strictly speaking, slightly later) and then the disorder sets in. the results look promising and the algorithm does indeed seem to “chew” on the complexity of the problem so to speak.

compress.rb

so then the next fairly obvious idea/ step is to have buffer that periodically flushes. that is not totally simple though because one cannot reprocess trajectories that are already processed, so theres again a familiar seen cache that remembers all previously processed trajectories to avoid them. this is again run for 500, 1000 buffer/ period sizes.

💡 😮 ❗ wow! that turns out to be quite ordered! in fact its so ordered that am doing something that is rarely done and plotting very many pts, 10K. usually this is avoided around here because the plots tend to become too cluttered or whatever. it leads to a mildly fractal pattern. the only obvious difference between these two graphs is that the 2nd has larger gaps. so some inherent self similarity. the big question is, can it be exploited?

these graphs seemingly come very close to something that have been seeking from day 1. an enumeration pattern is a recursive way/ pattern to traverse all integers. it seems possibly all recursive algorithms can be connected to “enumeration patterns”, but need to work out some more of this theory.

compress3.rb

(7/2) did some more experiments that didnt work too well. the 1st idea was to have a buffer/ frontier of say 100 future/ partial trajectories. then one can look at the overall “compress order” of incrementally extending any of the current frontier trajectories. however, using this scheme the “incremental order” of each trajectory is not different because many have the same parity and then some other critiera must be used to discriminate them if the algorithm is not pseudorandom. the obvious choice is the least trajectory seed, but then this gives results somewhat similar to earlier ideas independent of the expensive compression calculation.

then came up with this idea. have 100 frontier trajectories and add those full trajectories that greedily minimize the overall compression (dis)order where the computation is over the current total trajectory “glob” so to speak. this leads to the following run. it enumerates roughly in linear seed order, with outliers. is it just me and/or wishful thinking or do the outliers seem to get more rare as it progresses? could it be very, very slowly “converging” to some ordered equilibrium? anyway this code is very computationally expensive because it has to constantly/ continually recalculate the cumulative disorder over all the frontier trajectories, which are not independent, its versus the trajectory wrt current “glob” which changes with every new trajectory. 2nd graph is current glob order measure.

compress5.rb

(7/4) some further thought, maybe that was way too complicated. an interesting finding would be if just compressing the first n trajectories leads to a gradually increasing compression size. from the above plots, that seems likely to be the case, and a somehow meaningful and conceivably exploitable property. then the compression can be seen as a sort of smoothing mechanism. and a proof would involve showing the “incremental compressions” always exist & increase nearly incrementally in size. however, from past experience, each local compression is a much different structure than its neighbors. it seems very difficult or near impossible to relate one compression with its nearest neighbors. could it be done somehow?

however the idea of nondeterminism/ searching seemingly comes into play here. the compression can be seen as a sort of certificate of proof that a set of trajectories up to n halt. then the problem of showing incrementally increasing compressions seems to relate to nondeterministically searching through certificates of increasing size, and being guaranteed to find one that “matches”. ofc this is quite similar to the idea of setting out constraints (say in SAT) that trajectories of certain lengths exist and finding a guarantee that those solutions are always satisfied. (this lines up with some of my earlier musings on “SAT induction” wrt empirical study.)

the possible difference in working with compressions of trajectories instead of trajectories is that the compressions are somehow “better behaved” and smoother, and maybe have some kind of inductive structure where the “straight/ raw” trajectories do not. so overall it seems somehow as mused some earlier, nondeterministic search seems to be a flip side of inductively proving halting.

to flesh that SAT idea out a little more (which havent done so far wrt collatz, but this idea has now been bouncing around in my head for some months), think about it like this. suppose that one had an array of SAT circuits. each instantiated circuit computes sequential trajectories and asserts, “the trajectory is finished by f(n) steps”. the circuits grow in size, based on a conjecture of a limit on # of iterations required to terminate each collatz trajectory, ie a conjecture of a precise/ recursive f(n).

lets call them C_n. a proof of collatz involves the proof that every C_n is true. but this leads immediately to the idea of SAT induction sketched out earlier. an inductive proof would be in the form that “if C_n is true then C_(n+1)” is true (aka “has a satisfying assignment”). but it seems nobody has created such proofs anywhere in the literature at all. its an apparently genuinely new idea. and it clearly has amazing ability to somehow capture the basic nature of math/ CS theorem proving in general.

so this can be related to one of my favorite problems SAT which have been studying over ~2.5 decades now. but, SAT formulas have been analyzed for their graph properties in one of my favorite refs (have to dig it up, a masters thesis cited on cstheory.se). also another idea is to look at SAT formula properties related to their “core”. there is an emerging study of a structure called the “Minimal Satisfiable Subset” (MSS) of clauses. have been wondering if it is a way to attack SAT induction eg in the sense, could the MSS of C_n be related to the MSS of C_(n+1) somehow?

but then here is my fear of this attack. does it really reformulate the problem in a way that adds leverage? so much of collatz study seems to be going around in circles and restating the same problem in different ways but without any additional leverage. (quite a frustrating/ endless disappointment in that way!) one seems to be left with the SAT case also of attempting to relate patterns (expressed instead in SAT form) that have no “relatable structure”…

(7/6) musing on some of those prior swirl of ideas on compression vs attempting to “relate” adjacent trajectories… this is some somewhat tricky/ cryptic looking code but the idea is not so complex. the idea is to compress new trajectories not so independently or based solely on best (highest) compressions. instead while it is again based on prior trajectory “runs” (“runs” the compression sense of eg Lempel Ziv ie identical parity sequences), in addition it also attempts to reuse the most-frequent runs as best possible (using a greedy approach).

there is a tradeoff between selecting long runs and shorter runs with more frequency. that is captured in a weighting function (of run length and frequency “weight,” a sum total of sequence element reuses) at line 52. an extra array keeps track of run reuse by incrementing counters associated with each element in the run and the counters are initialized to 1. the output is the index of the current match. results show significant order but alas, still ultimately pseudo-random looking. heavier weight on the run length leads to higher points and more weight on the frequency leads to lower points which have a “preferential attachment” effect.

run3.rb

## 3 thoughts on “***startling*** collatz plateau!”

1. Pingback: collatz killer | Turing Machine