collatz search for meaning

victor frankl, an extraordinary figure, a psychologist who survived nazi german camps, wrote the book “mans search for meaning”. read it last summer, was impressed. in contrast also am fond of the kurzgesagt video on “optimistic nihilism”. it seems one can get lost searching for meaning in certain math problems, the hard unsolvable ones. it appears that Church was involved in one of these “wild goose chases” (found that described in a paper somewhere, wanna find out more). finding a solution is a lot like the search for meaning in the problem. for me, as written at times, if something meaningful can be said about the problem, then feelings of despair about the unlikelihood of a solution can be held somewhat at arms length… like wandering around in underground caverns, its not entirely clear if one is any closer to an “exit,” but its clear that many diverse, even wondrous properties unseen by anyone else have been isolated over the years. the fractal nature in particular, a theme explored in great depth/ detail with this attack, is still not remarked on/ noticed much by almost all other authorities. other concepts such as “mixing” and the pre vs postdetermined themes tie in closely with ideas outlined by authorities/ experts Lagarias and Terras.

dont have a lot of engaging/ strong ideas at the moment. its been a period of rethinking or to some degree unthinking. feel have very powerful tools now that shed major insight into the dynamics of the problem but still seem very limited in making generalizations, the key to the solution via induction.

mentioned in last months installment that it seemed like long 0/1 runs (ufos) in the glide decline were rare. this was once called “glide right” and its a simple nickname. technically this is postdetermined glide right, not exactly the same thing as earlier glide right concepts which was not restricted to postdetermined. the bitwise code seems to confirm this but have seen before how this algorithm can be limited in the search for ufos. mentioned that wanted to create a hybrid search that also is multivariable optimized. looking back over previous code, did in fact write multiparameter optimized code only about 6wks ago (4/2019) in hybrid15b. forgot! so much other stuff is going on lately outside of work on this prj.

so this is some relatively quick/ sophisticated adaptation/ refactoring. its multiparameter optimized on ‘mx01r’ max 0/1 ufo size in glide right, ‘cm’ glide max index, ‘cg’ glide length, ‘nl’ seed bit width. for multiparameter optimization there is an optimal point in the weighted multidimensional space denoted by the ‘z’ variable. this is output as the ‘h_0’ value. then there is the max ‘mx01r’ found over all points, called ‘mx01r_0’ in the output.

results: the code is able to find very large ‘mx01r’ values but not at the same time as increasing the other variables, they are found only for very short glides eg cm=0 and cg=2. it is possible to find very large glides also in the search space but then they have low ‘mx01r’ values around ~19 or so. the hypothesis seems to be confirmed with more advanced/ sophisticated searching but it has to be phrased in a multidimensional way. it seems to be a reliable/ dependable feature of the glide right region (ie large ufos not possible there) but from prior analysis/ study doesnt apply to the postglide decline/ drain (ie large ufos can be contrived/ constructed there).

did notice the prior multiparameter optimization code hybrid15b did not reevaluate prior points every time the parameter weights are recomputed, while its apparently not crucial, this is probably something of an oversight. without it, with this algorithm an early-found ‘z’ optimized point stayed optimal indefinitely and thats not surprising because presumably the later weight recalculations/ trends would tend to make earlier points look like outliers.

another point is that the other pattern for multiparameter optimization used is based on sort rankings. that is a little more inconvenient to run here and havent built an incremental version of it yet. its not known if it is more powerful but it seems like it might work better on non normally distributed variables, which tend to show up with the recurring fractal aspects of the problem.


(6/7) it made sense to visualize that in a basic way. the relationship between seed (bit) size and ‘mx01r’ is something like an inverse, but not exactly. its more like a “spread wedge”. graphing it leads to the following results. this code ran for 5k iterations. the 1st graph is the optimization trend and the algorithm reaches very high bit lengths ‘nl’ magenta up to 6k spikes and around 2k baseline, very high sizes for any optimization ever run so far. scatterplot graphing ‘mx01r’ vs the trajectory size variables cg, cm, cr red, green, blue respectively where ‘cr’ is the right trajectory length cg – cm, also included in the optimization, leads to some nice insight in 2nd graph. for low trajectory sizes there is significant spread and the algorithm can find high ‘mx01r’ but going rightward for larger trajectories they become extremely rare; there are a few very isolated points/ spikes—definitely need to look at these closer. as always the question here is whether absence of evidence is evidence of absence. on other hand the postdetermined ufos were initially found by a similar hybrid algorithm (after a bitwise algorithm missed them).

this graph seemed like it really wanted to be in log/log scale axes, in 3rd graph, revealing more detail. here the left compression and right isolation of some points about the baseline are more clear. thinking about this, was wondering how to have the algorithm put more emphasis on optimizing for the hard variable ‘mx01r’ at the expense of other variables. it certainly seems like the algorithm is maybe getting “carried away” some looking thru higher, very large seeds without focusing much on the smaller ones. the answer is quite simple and ingenious, dont think have seen/ tried it before, see if you can figure it out! hint: its 1 line of code chg.





answer: the line in the code with the optimization variable weight assignments at the end, assigning ‘v’ line 314, controls the relative weights of each, and can be used to tune the emphasis put on one at the expense of another. eg one can increase optimization weight/ attention/ priority on ‘mx01r’ this way.

😳 however trying this out led to finding a defect. at line 222 dist subroutine the code uses the ‘sd’ variable as a weight, but it also multiplies out by the weight at line 138 adj subroutine, so the effect is cancelled out. this defect is not apparent/ effective in the prior optimization setup with only positive 1 weights ie the prior code ran ok/ as intended somewhat by accident. the line at 222 has to be deleted to fix it.

but then trying this out, small weights on ‘nl’ did not really push down the ‘nl’ trend that much. so tried even negative weights on ‘nl’ ie it is optimized to be pushed down, and even some of those didnt work, it still trended upwards. the reason for this is that the other trajectory length variables are all implicitly pushing ‘nl’ up. also, the ‘nl’ trend is very sensitive, not smoothly affected, ie theres an inherent nonlinearity, and after some adjusting it, cant really be controlled very effectively with a single weight (eg for some negative ‘nl’ weights some runs may decrease and others may increase, or an increase followed by a decrease, etc). am thinking need some other more sophisticated way to control/ constrain/ manage ‘nl’ growth independent of the other variable optimizations, dont have an immediate idea for that yet. and overall it might even point to a more general multivariate optimization direction/ technique.

(later) 💡 😳 maybe was overthinking this some. a trick used earlier is to throw out the mutation operators that affect bit string length, lengthening and shortening, ext and short.  then the algorithm should not change bit string lengths and a single length can be studied as a (presumably) representative case—the idea of different behavior over different starting seed lengths seems a little outlandish and hasnt really ever been seen. but then this quickly led to uncovering a defect. the relatively new initw routine introduced in a refactor is supposed to just paste a random-length uniform bitstring onto a ½ density bitstring. but its all mixed/ messed up, doesnt work anything like that, and increases the lengths of the strings. this probably explains some or much of the algorithms (over?)eager tendency to increase string lengths.

fixed that up and adjusted the code to run for 100k iterations on fixed 100 bit length strings, it can run longer because it can compute them faster than when they grow large. the graphs are actually (over)plotting all 100k points (instead of the previous sampling, ie 1/5 1k points of 5k total in that case) and it takes about 5s each to parse the large datafile. the wedge/ tightening effect/ pattern is more clear/ pronounced/ distinguishable/ outlined here. it seems to be something like a transition point thats maybe been observed in some other contexts.




(6/9) ❓ these results seem to suggest that maybe mixing is happening in the glide right region. this probably ties in with the idea of mixing as not a sharp transition but gradual/ a continuum from left to right. again its tricky to relate this to pre vs postdetermined ranges. it can be hard to visualize how all this fits together after now 70+ blogs on the subj and many related experiments. but this idea of a transition point reminds me of a few earlier experiments. crunch2 experiment from 11/2018 shows a transition point where “crunch spans” ie long sequences with larger-than-x max 0/1 runs become much shorter as the 0/1 runs are higher. also there are many prior experiments relating climb distances to max 0/1 runs. still cant quite visualize the “big picture” from all this. coming up with some general trend seems to require figuring out how mixing relates to other basic boundaries such as climb max, glide span, pre vs post determined etc, and there are many different findings on this but not exactly a unifying theme yet.

(6/10) 💡 😮 ❗ missed something. looking at the previous results, on closer examination its clear the transition point may be the familiar one: in the 1st graph the narrowing happens right at the postdetermined point around 100 bit width/ iterations. this immediately leads to the following idea. heres a very simple few-line twist that looks at the same statistics over the postdetermined glide. if the glide is shorter than the postdetermined range the function evaluates to 0. its quickly found the wedge/ narrowing effect is mostly absent—its roughly a level plateau esp if scaled by starting bit size. overall the optimization cant find ufos larger than about ¼ the (starting) iterate size.

this combined with other experiments/ observations seems to suggest that large ufos are a mostly postglide drain phenomenon (or also predetermined glide range, ie basically “non postdetermined glide”). after all the chasing/ wandering, it looks like the 0/1 run idea may continue to be applicable, maybe even crucial, once all the subtle distinctions/ qualifications are isolated. the details (surely devilish as always!) need to be hammered out but basically this seems right now a strong candidate for an inductive proof (yes, am amazed to be typing that myself, but also dont see a more straightfwd way to say it). it seems to (1) be a mixing property and (2) actually “causative” in the decline (as suggested by many other experiments). in other words, roughly, “non-ufos are characteristic of mixing, postdetermined glide is mixed, and mixing causes decline.” if all this holds “somehow” its sufficient for a proof. a similar algorithm was able to find ufos in the unrestricted trajectory so it seems that this one might not be missing them.

ok all this is great… so then what…?



(6/19) dont have any really big ideas right now but this is an incremental step similar to previous analysis. was wondering about the 0/1 runs in the postdetermined glide right parity sequence. here its called ‘prmx01’. the optimization is by that in addition to the recent variables cm, cg, cr, nl. this is a very large 100k run that allows expansion and contraction of the bit strings starting at 100 width. the graph is ‘prmx01’ versus ‘nl’ bit width and doesnt find much trend. the climb in bit widths in the optimization is rather gradual but doubles over the 100k iterations; the optimization find ‘hmx’ max bin/ bit width ~194. ‘prmx01’ seems to be quite limited/ bounded in the graph.

💡 in the glide right region, something about the iterate bit structure is leading to “identical distributions” such that the distribution is somehow self-similar, ie something like a fixed point of a distribution, and points in the distribution are draining.



(6/28) 💡 ❗ this was an idea in the back of my mind now quickly tested using bitwise framework (really getting a lot of mileage out of these frameworks now, almost like rapid prototyping!). there is a concept of a “bit length forward/ lookahead trend”. measuring the trajectory (iterate bit width) at each point x distance into the future trend, where x is the current bit width, leads to a new sequence. what are its properties? even though based on the collatz function, the sequence is obviously decidable/ computable—at each point that is, and terminates essentially iff the collatz sequences terminate. earlier experiments related to Terras constructions suggest its “better behaved” in key senses. was wondering how much. a lot of earlier work looked for a monotone declining indicator/ estimate of remaining sequence length. this forward trend sequence seems to come close. a simple test of its orderliness is the (old/ familiar technique) nonmonotone run length here ‘mx’. hooked that up and found it doesnt increase much, it tends to max out around ~200 as other climb length variables increase. then looked at the ratio ‘mxnw’ or ‘mx’ over initial bit length ‘nw’ and found that tends to narrow to less than 1, black line right side scale.

overall this seems like it could be a big deal. it ties in with glide distances maybe not exceeding current bit widths by much, maybe just bounded by a fairly small constant (~200), as found in earlier experiments, and is linked almost “intimately” with the Terras construction/ ideas. maybe proving the convergence of the problem can be reduced to proving the convergence of the forward trend. the approach involves relating the collatz sequence to the forward sequence within bounds as derivable properties… ❓




3 thoughts on “collatz search for meaning

  1. Alberto Ibañez

    I have been reading about Collatz on Terence tao blog.i would like to make some questions. How must be a demonstration? I do not know, but if you think in a function when n is odd you make n+1 or n -1 and when n is even you make n/2, is a function that goes to 1 very quickly. Is easy to demonstrate this? is useful to demonstrate Collatz?
    If we think about the 5n+1 problem, is easy to see why it fails, some numbers go to infinity and there are some cycles not trivial. How can demonstrate this? is difficult? is useful to apply to Collatz?
    weak Collatz is a problem that involves powers of 2 and 3, is Collatz a similar problem? (i think that yes it is).
    About congruency, even numbers are=mod 2(remainder 0),and =mod3(remainder 0,1,2),and odd numbers are= mod 2(remainder 1), and =mod3(remainder 0,1,2). is this useful or important?
    later i wiil ask about 9232 and prefered number for Collatz.thanks

  2. Pingback: collatz match | Turing Machine

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s