# collatz finetuning

the last idea from 10/2018 was bouncing around in my thinking some, it left open a question about “degree of monotonicity”. it shows that the max 0/1 runs seems to be almost monotonically decreasing even in a climb/ glide. havent really looked at the monotonicity of the max 0/1 runs all that directly with the nonmono run length measurement. ofc all the recent analysis is related to monotonicity, but not exactly tied to nonmono run lengths. there is some relation. at this point am interested in finding long nonmono run lengths in the max 0/1 runs sequence. last months genetic algorithm code looked into this but for a fixed (“1d”) bit width. what would that look like for variable bit width (2d)?

the last “2d” genetic algorithm search code was `mix32` from 9/2018. the `mix` series is aptly named because it mixes quite a few ideas across many algorithms. it has single dimensional genetic algorithms, bitwise optimizations, and 2d genetic algorithms (ie both within given bit widths and over multiple bit widths). am going to rename the 2d genetic algorithm here `hybrid`.

got all kinds of cool ideas for refactoring/ streamlining working on it. it has some abstraction for all the initialization and combination/ crossover operators. it has a very sophisticated 2d algorithm that is similar but different from prior algorithms. it has no restrictions on expanding “bit bins” except that the expansion operator does it only by a max of 3 bits. it dynamically analyzes bit bins for fitness similarly to analyzing fitness within the bins based on top performing candidate in each bin, and combination operators work on the top performing bins. it currently throws out the most underperforming bit bin and currently keeps the total bit bins to 50 and the size of each to 50 for max 2500 candidates. the code enforces a minimum of the max 0/1 runs in each iterate, here 6. starting bit size 10. the recent `initw` bit string initializer code turned out to be crucial after the minimum was added because the other initializers dont seem to create many iterates with sufficiently large max runs.

after ~1hr this code seems to paint into a corner for a max nonmono run length ~650 and max bit sizes ~300 (total 122K candidates explored on last output) but needs further analysis/ runs. letting it run longer about ~9hr it reaches max 415 bit width, 1091 max nonmono run length, 1274 glide length, 609K candidates explored. this is very little code and packs quite a lot of (optimizational) punch. (honestly theres a lot of moving parts and all very sophisticated and only a very advanced student of lots of prior inquiry here would be able to comprehend it all.) what this is suggesting is that for a large enough min max 0/1 run ~6, no seeds exist of arbitrary size with arbitrary large nonmono run lengths in the max 0/1 runs, or they are very rare/ hard to find. lol, everyone got that?

this almost seems to suggest an extraordinary infinite/ finite transition point where maybe glides of arbitrary length exist (for larger iterate sizes) with small 0/1 runs but not for larger 0/1 runs, but such a conjecture while tempting would be risky/ questionable; one near achilles-heel of this type of research is (as pointed out/ commented long ago and intermittently) that its very hard to differentiate very rare/ slowly/ hard-found solutions from total lack of them/ nonexistence. have to look back over the prior FSM results to try to figure out how it meshes with those. ofc this code is fundamentally different in that its operating over a glide instead of a trajectory and is in line with the idea that both have to be meshed somehow to work toward the “final solution”.

another somewhat tricky element of this logic is in the nonmono run length analysis and has something like a “fencepost” aspect/ factor. in the current code a later max 0/1 run of exactly the same count does not reset the length counter. the alternative is to reset it each time on equality and then the results would be different, am now wondering about that given the high prevalence of the case in the result, suspect the alternative logic could lead to substantially different results… this “order” selection happened somewhat by chance in current code and was not intentionally assigned. from inspection the `min_by/ max_by` operators are apparently selecting the 1st equal element in the list (in this case preferring/ prioritizing the prior minimum). this subtlety has come up occasionally in prior code over the years…

here is a graph of the max 0/1 runs red left side scale on the final iterate/ glide found after code was terminated at ~9hr and iterate bit width in blue right side scale. from the graph theres very strong overlap/ “coupling” between the 6-min 0/1 run range (entirely the glide trail extending into some pre-descent climb range) and the glide decline. this further generally is aligned with/ supports the “crunch” idea but again specifics need to be worked out.

n=16298397491632173330154922373989217019805837799941759439490191263164482415394740301070197391506878932875994066099915194367

`hybrid.rb`

(12/7) was thinking quite a bit about the remarkable histogram for `review30` from 5/2018, my thoughts return to it & it really calls out for further attn, meant to revisit it. my recent thinking/ conjecture was that something like it might arise naturally on all iterates descent region. was a bit surprised to find this is not the case although in 2020 hindsight/ retrospect the conjecture seems naive. as a relatively simple exercise created some random iterates based on ½ density and then looked at/ analyzed the 0/1 histograms over 10 iterate windows.

results are as follows. the earlier histograms are color coded hotter/ starting 1st because it seems like it looked better that way eg wrt overplotting. the 0 runs are on left and 1 runs on right and theres a strong symmetry in this case. the prior peak at width 4 is not apparent, the peaks are instead always at 1 although the histogram shapes do change somewhat. so there is some scale invariance found here but not like the earlier case. am looking at/ thinking about other ways to normalize these curves which so far dont seem to add insight, need to try by iterate bit width next. am also wondering about the earlier data for post-glide region, would it look the same as previously or as below? in both cases theres some kind of (different-yet-similar) stability in the histograms over iterates…

another way to look at this, it also again shows the distinction of entropy vs density. the `review30` distribution and this one both have ½ density but different entropies. have to think about this more and what it means. it seems almost like 2 different “fixed points” of the collatz mapping (for both density and entropy) and maybe it makes sense to study fixed points more generally. some of this is confirming the current hypothesis about these random-looking patterns actually themselves being a sort of pattern as seen in the histogram statistics. are there some more subtle patterns embedded in them? ❓

`stats.rb`

another fairly simple calculation. this looks at max 0/1 runs (red) over a 5-iterate window. the bit width average is in blue right side scale and declines in a rough linear way but the red max is a little warped/ nonlinear in comparison, a possible edge to exploit? also graphed are ratio and inverse ratio green, magenta. this suggests theres “something going on” in the “higher length” max 0/1 runs as seen in some prior analysis.

`stats3.rb`

(12/14) this idea occurred to me thinking about density + entropy. how can one vary the two independently? there is some interrelation. my idea was to start with a fixed density and then vary the entropy over that density. this involves cutting or sectioning the 0/1 runs similar/ related to an earlier idea from 10/2018 in the `gap3` code. from this code it seems like maybe entropy cannot vary over all values for given densities. this is probably not hard to prove once found but not very trivial either.

in this graph the x axis is density, the y axis is entropy, and the color is the change in new iterate size versus prior iterate size using the compressed mapping. the pyramid shape shows the entropy-density restriction. 2nd graph is color coded by log of semicompressed mapping divided by log of iterate.

`blend2.rb`

(12/16) that quote by sun-tzu on strategy vs tactics is on my mind lately. do not feel like there is a lot of overall direction. have some very powerful tools at my disposal at this point but lately it seems theyre not exactly adding up as in the sense of “the whole is greater than the sum of the parts”. did have some new ideas.

my thinking lately is that some of the difficulty is over trying to find “representative” samples of glides. as years of testing have shown this is a very problematic concept. its like trying to find “representative fractals”. but also there seems to be some senses in which representative samples do exist. but its maybe a key part of the extremely difficult challenge of the problem, because it cuts to the heart of the “generalization” issue which ties in with induction structure. a proof is about finding something general which then involves application over “representative” samples.

my new idea is that in some sense prior approaches are not exactly representative. there was a huge amount of generation analysis but a lot of it was oriented around the “bitwise database” and about 8 separate bitwise algorithms. in particular the bitwise generation scheme is now seen to lead to some subtle biases eg explored at length with the ‘w’ scheme and maybe some others. am now thinking due to its nature the genetic optimizer might have a little less internal bias wrt this issue.

😳 ❗ so am banging on the hybrid optimizer further. after some careful testing/ isolation discovered that the `initw` scheme was not correctly always creating w-width iterates. it has to do with (not surprisingly) edgecases/ fencepost issues. this was discovered via the following idea. instead of making other multiple adjustments, simply taking out the `ext` operator is a neat/ clever way to alter the algorithm from 2-d to 1-d. all the other operators do not extend/ alter the iterate size. then found the issue with the misbehaving `initw` code expecting to see no iterate size changes and then finding them rarely. fixed that.

then this code simply tries to generate long climbs using the genetic approach and a 1-d (constant width) optimizing. while optimizing it encounters some # of iterates of increasing climb length and saves them to write to the database file at the end. there is a time-based stopping criteria of a 3m interval with no improvement (no new candidates above a basic fitness threshhold found after that time). that turned out to be substantially hard to debug late at nite due to getting tripped up on some (non)copying/ duplication logic…

there is some built in review code. it sorts/ samples 50 trajectories and plots the 0/1 max runs (red). the max 0/1 runs are closely tracking the iterate density. the previously seen/ nearly familiar story about the special max run tail region in the range ~10-20 is apparent, its basically “yet another” transition point and its footprint can really be seen in some of last months results (eg/ esp `crunch2`). its quite remarkable its apparently exactly the same as found for the bitwise ‘w’ method generated with much different logic and arriving at the same result. it turns out the final initial iterates found just simply vary noisily over the density range from low to high, mostly high, roughly correlating with the left-to-right sorted climb length (dashed blue line, plotted over all the trajectories found about ~150/ 200 count, right/ top scale).

but very interestingly/ notably theres more to the story; they are not as simple as just picking varying density iterates! this can be seen in the green trajectories that are plotted “upside down” (negated max bit width) which start with a random iterate with exactly the same density. they are quite strikingly different eg usually substantially shorter and dont have the “bounded tails”. the graphs were run for width 100 and width 200 and the tail bound region is apparently about the same “height” in each case but note the vertical scale is doubled over the two graphs.

`hybrid3.rb`

(12/18) decided to look at any trend over larger iterates. this code looks at bit sizes starting at 100 and 100 increments up to 1000 and runs the optimizer. it has some analysis code to extract the leading part of the climb that has a monotonically decreasing max 0/1 run sequence. found some iterates that had a monotonically decreasing sequence but also with double-sequential-repeats for max 0/1 runs, this code handles that (by regarding equal successive values as part of the descent). the length of the remainder sequence, the “tail” section, corresponding to the crosshatched red sections in prior graph, is graphed in different colors wrt starting bit width, by tail length from left to right over all climbs.

the finding is that the tail sections are “mostly/ roughly” the same size typically ending around ~200 length. in other words the max for lower bit sizes is not clearly different than the max for higher bit sizes nor is there a discernable trend from lower to higher. this was somewhat surprising. immediately am wondering how the density measurements relate to this. a lot of old experiments looked at maximizing the density crossover point to the climb max and found it could be expanded indefinitely. does the density crossover come close to the lead-tail section transition being measured here? another way to interpret this is that maybe longer leading sequences can be found but the optimizer quits too early based on its hardness stopping criteria. there were maybe some prior experiments that showed that sometimes bitwise optimizing has an advantage over the genetic approach wrt certain optimization criteria.

`hybrid4.rb`

(12/20) was wondering about monotonicity of the 0/1 max runs lengths in the climb only. the earlier `hybrid` experiment tested over the full glide and did not seem to (clearly) find a limit.

😳 oops! discovered the stopping time criteria was messed up and was finishing after 3m instead of 3m after last best candidate found!

this ran ~3hr with the stopping criteria fixed but the test commented out and found a 402-length nonmonotone 0/1 max run sequence in a climb. the graph is the 0/1 max run lengths in green and the scaled ones (by iterate bit width) in red right side scale. the run starts out at a 4 0/1 max length and goes up to 15. one of the intermediate delays between best candidates was about ~90m. the other metrics tracking time of last new top bin candidate or any bin candidate were both found to not be very informative because they are constantly updated due to the (least-performing) bins being frequently deleted and recreated. iterate bit width divided by 10 is in blue left side scale.

it would seem this tends to refute the hypothesis about short 0/1 max runs preceding a decline. its noticeable however that the min is a downward spike, ie an outlier. so again, maybe some kind of short 0/1 max run criteria over multiple iterates instead?

n=112128985689030716168423525390073875953612807070044492059201425377586787263

`hybrid5.rb`

(12/21) maybe never looked at monotonicity of the entropy/ order over the climb, dont recall doing that, and was going in that direction with some other measurements, so then coded this up. as mentioned entropy is expected to increase and therefore order is expected to decline and looked at the monotonicity of the latter. the code seems to find increasingly long nonmonotonic runs even after a long ~7hr run, improvement seemingly not running out aka “no end in sight”, last one found after 19m gap. it looks like the entropy (and density) can be very close to ~½ for nearly arbitrarily long periods (green, right side scale). the max 0/1 runs are low over the whole 553 length climb and never exceed 17 (blue, left side scale). iterate bit width divided by 10 is in red left side scale.

overall this seems to be yet another variant on the “curse of indistinguishability”. there are some other (sub)routines in this code that have useful smoothing properties and were used in some other exploratory experiments. its timeconsuming to write everything up sometimes. the good news is that the framework seems to be very powerful in avoiding painting-into-corner, allowing customizations/ trying out different ideas with minimal hassle, and seems to consistently find larger iterates with unusual/ significantly constrained properties. elsewhere looking at the “static” sometimes have the urge to do fourier analysis on these patterns… yet another analysis technique to try!

n=2883481216062657139846755870748961915683958171563828656170558493021276597516504651416145266496282781022093159

`hybrid7.rb`

💡 ❓ ❗ 😮 ⭐ staring at that gives me some idea. a theme for prior difficulty is that there is some challenge phrasing stopping criteria a priori and not a posteriori. here is something new along the lines of the x-y trap idea, and in line with recent and last months FSM experiments. its similar to recent themes/ conjectures/ hypotheses but has some new twist.

how about the 2 part idea, (1) a sequence of x length with max 0/1 runs less than y ensures that no future max 0/1 run will be greater than f(x,y), and (2) a limit of max 0/1 runs of f(x,y) restricts the glide/ climb/ trajectory length(s) eg g(x,y). (3) (notwithstanding) there is also a possible “large prefix” of decreasing max 0/1 runs in the order-to-disorder transition.

another way of phrasing this is that a long sequence of “small” 0/1 run lengths cannot be followed by a “spontaneous” large spike in the 0/1 run lengths, or that 0/1 run lengths restrict future 0/1 run lengths. this was alluded to earlier: in other words, again this is similar to the idea of the 0/1 run lengths being another “inner” random walk that interrelates/ correlates to the “outer” random walk behavior. so the problem reduces in a sense to proving nearly the same question on the inner random walks! ie that they all glide! and that this inner glide constrains the outer glide! hows that for a fractal proof? now, how to code this? or rather, mount the directional attack via code?

so to summarize some, this nonmonotone investigation from several angles was interesting/ informative but it seems to fall down/ be ineffective on very long sections of plateauing metrics either in entropy or the max 0/1 runs at least within the glide and/ or climb (apparently there is more trend outside the glide/ climb), because it appears these very long plateauing sections can be of “arbitrary length” (for unlimited bit widths) from the computational investigations/ searches. on other hand this is in line with and was already suspected/ revealed from years old density investigations, and all these measurements seem to be mostly interconnected, eg the density crossover seems to correspond to the plateauing max 0/1 statistics etc. it looks like the metrics should be regarded as something like “noisy but (potentially) mostly flat” and monotonicity is a problematic measurement on a “mostly flat” region…

(12/22) maybe will just mention/ highlight some experiments instead of attaching full analysis/ results/ code for them, they are easy variants (“exercise for reader”). really try to avoid doing this in the name of scientific repeatability (incl for myself) but am going to give myself a pass for the moment in favor of expediting exploration/ freeform experimenting.

wrt recent ideas an interesting straightforward idea is to look at “climb length of 0/1 max runs” sequence.

• looking over the (outer) glide, it found a length 299 sequence and then seemed to paint into corner after about 7m. (looking at later results this was maybe terminated too early, the 3rd similar experiment below once found a new candidate after 12m.)
• looking over the overall trajectory, it seemed to be unbounded and reach length 846, but it was finding sequences consistently only over longer and longer descents, ie leaving open some question about it being unbounded out of glide or climb.
• looking over the (outer) climb only yields a 326 length sequence and apparent paint into corner at 30m. note this would also satisfy the 1st experiment criteria (because the climb is in a glide) and the numbers are not too far apart. on 2nd thought am at least including the code/ graph for this last run. glide in red left scale and max 0/1 runs in green right scale. its remarkable that the inner 0/1 max run climb sequence matches the outer climb exactly!
• a 2nd run found a 386 length sequence and went 134m since last improvement running ~8hr total. 3rd run found a 478 length sequence running ~10hr last improvement gap 6.4hr.

`hybrid8.rb`

(12/23) ❓ so now there is some question. it seems like a critical metric has been found thats bounded/ finite, ties in/ aligns strongly with the general “mixing” hypothesis, and which proving it so seems to be “close” to the full collatz conjecture, but how can the confidence in that be increased? recalling in a sense a proof is “100% confidence”… this “momentary snapshot of uncertainty” starts to cut to a near-epistemological or almost-philosophical problem alluded to in this blog quite a few times over the years. which reminds me of “if a tree falls in a forest and nobody heard it, does it make any noise”? a similar problem is “if new improved candidates stop being found, are they really not there?” ie “is absence of evidence evidence of absence”? aka in the language of this blog/ algorithms, painting into corner. eg is there really a corner, or is there a hidden escape from the corner? and this in turn reminds me of deep QM physics mysteries as alluded to by einstein, schroedinger et al about observation etc, eg “I like to think the moon is there even if Im not looking at it.” (lately maybe up for a long-needed serious collective reexamination maybe even on the verge of something bigger, more on the great new book what is real by becker later!)

this dilemma is part of the core challenge of turning empirical analysis/ finite searches into proofs and at the heart of the general open question at hand. right now there is not a clear answer (either specifically in this case or more generally) but there is also maybe some more than can be done eg in this specific case. eg the GA code has various metrics such as lowest and highest bins currently being searched and performance metrics over bins that can be analyzed further for some additional indications/ circumstantial evidence. it has to do with eg whether the algorithm “mistakenly passes over” a region with higher fitness candidates eg in the bit bin mechanism etc. the bigger unanswered question is one for the following decades of the 21st century. its a research program slowly taking shape that a few are aware of at the moment, but do see unmistakable signs its gradually gaining attention and traction.

actually, have been musing lately there is an algorithmic approach that can to some degree answer questions more definitively or even completely at least over finite ranges, by constructing giant satisfiability formulas that can encode searches, and solving them, and an earlier blog introduced the idea of “satisfiability formula induction” that might be viable “some day” on infinite ranges by “computational induction” on formulas. right now mostly just science fiction and just a gleam in my eye! but as mentioned previously the breakthru work of Konev + Lisitsa on EDP is very evocative of this direction, and the recent work here on FSM encodings has some very strong/ direct parallels to theirs.

(12/24) musing on some of these results gives new context to last months FSM ideas. that is all solid as written but theres another way to look at it and some of this was already alluded to. as seen here clearly theres a lot of noise/ variance in the 0/1 max run statistics sometimes over long ranges. so the idea of a “crunch” from a n-run to a (n-1)-run takes on a different picture/ interpretation in light of this. ie a crunch is likely followed by later iterates in the sequence with higher 0/1 max runs due to the noise aka “later upward bounces” so to speak. ie the crunch concept is inherently measuring something like local minima, the shortest # of iterations to reach a new smaller value, but not guaranteeing all later values are smaller. another natural concept would be to look at local maxima. ie look at successive maxima, ie sequences of local maxima that are monotonically decreasing. this is similar to the nonmonotone run length already calculated but a slightly different twist. the idea is to find the successive “highest” values in the sequence such that all later values are less. a simple way to picture this is as “finding successive climbs” but over the entire sequence, not just the “local” glide. and so the recent code that finds the 1st climb is already along these lines.

(12/25) ❗ 😮 switching gears. was studying the prior results/ dataset and looking for some statistical edge in the “curse of undifferentiability”. then this rather obvious idea occurred to me that maybe/ really should have tried a long time ago, immediately on the density discovery several years ago, under heading let the computer do the work for you. it doesnt even use any optimization but seems to show something quite significant. it is simply to generate 100 100-bit ½-density iterates and then look at their trajectory statistics. this sample is very similar to prior generated trajectories at least wrt the 1st iterate appearance. from prior investigation (both recent and extended) the ½-density trajectories can have widely varying attributes (variation in climb/ glide/ trajectory length).

after quite a bit of hacking came up with this minor edge, maybe observed somewhere before in another context, cant remember right now and a search could be very timeconsuming. this looks at scaled cumulative variance in density distance, entropy distance, and scaled max run length (dt, et, mxt), each scaled by current trajectory length, and finds it leads to some distinguishable signal for (final) trajectory lengths (sorted distribution in 1st graph), but only after a few hundred iterations (significantly earlier in the higher mxt signal). in general shorter trajectories have higher scaled cumulative values. the curves are color coded by trajectory length.

`vary1.rb`

⭐ ❗ 😮 😀 the prior statistics are computed locally. after further examination discovered this astonishing property! its more globally computed but still basically local (over a longer range) and apparently reveals something very meaningful/ deep even without any sophisticated optimization logic! was just looking at slopes of the trajectories and found that after a few hundred iterations they were highly predictive of the overall trajectory length. dont think any prior experiments really brought this out directly. moreover this didnt hold for either glide lengths or climb lengths.

this code looks for the lowest length trajectory for the 100 100-bit width random samples and finds its typically ~300. so then it looks at slope of trajectories (slope of the bit width changes) over that intermediate range (all other trajectories are longer), and uses it as an estimator of the overall trajectory length at that point. the fit is really striking. the red line ‘c’ is the trajectory lengths and the green line/ right side scale ‘r’ is the intermediate slope estimate. wow! honestly am a bit reeling/ shocked right now and have to take a little time/ pause to figure out/ explain what this means. in a word: the glides are exhibiting some kind of memory property. ie the classic mean returning random walk where the slope seems to be the “mean”. in retrospect the immediate prior experiment properties might be related to this one.

apparently these results are suggestive that the finite range collatz mapping might be a key estimator for the infinite range. how could this utterly fundamental property have been missed previously? @#%& argh kicking self! reminding me of the old zen expression, “in the beginners mind there are many possibilities, in the masters only a few.” 😳 😡 👿 😥

`vary3.rb`

(12/27) ❗ 😮 ⭐ ❓ its getting very hard to keep track of all the optimizations by now and its not really all that searchable. need to build some summaries/ indexes of all my prior blogs but thats now a rather herculean undertaking (or mini-herculean so to speak compared to all the actual work represented). my basic recent idea is related to the recent Terras analysis that seems to show essentially that the random walk of the semicompressed mapping can be completely “custom designed” up to about the length of the iterate, after which it must apparently have some kind of emergent property(s) at least if the Collatz conjecture is true (ie should be descending regardless of prior dynamics), which the prior experiment also seems to bring out/ confirm/ hit on.

so then was thinking about optimizing glide peak index minus starting iterate bit width. dont really recall doing that exactly, do recall a lot of related/ nearby stuff. and frankly a lot of that stuff needs to be reexamined/ reanalyzed/ reinterpreted in light of the recent new striking Terras analysis. and have noted before how maybe had some aversion to this type of analysis (ie limits/ bounds relative to starting iterate bit size) in hopes of other metrics that might have bounds that are more amenable to the FSM analysis approach (dont see how to use it to express bounds related to iterate bit widths but maybe have to think about it more). there are many experiments that look at crossover distance minus starting iterate size etc…

the closest that came up on quick search was from just this year 4/2018, the `mix12e` and `mix15` experiments, but on closer look they use the high-compression mapping, whereas the recent Terras thm reveals there is something special about the semicompressed mapping based on its role in the thm. it seems now maybe a major/ glaring/ gaping oversight that this wasnt analyzed sooner? but still have to look thru old blogs for it (where is that collective memory/ audience with photographic memory when you need em?)…

so dashed this off using the `bitwise` code which is still very robust for quick-yet-relatively-thorough analysis/ investigations/ “mapping”/ survey “sketches”. about ½ dozen of the latest versions of the recent code in that series is focused on multirun logic and has review code and other non-core features. so went back to `bitwise5` code and did some quick adjustment to come up with this. it maximizes ‘cm’ glide peak index minus ‘nl’ starting bit width, ‘cmnl’, and consistently paints into corner. moreover the two metrics nearly track each other indistinguishably in the graph, ‘nl’ bit width is in green but nearly overplotted by ‘cm’ blue. ‘cmnl’ lightblue reaches about a max of ~200 early on. for reasons outlined, this seems to be a big deal! …

`bitwise12.rb`

(12/28) ofc it makes sense to try the now easily adapted powerful hybrid optimizer on the same criteria. it does optimize farther and finds a 298 max ‘cmnl’ (‘z’) red line quickly in about 2m and then hung there for 20m trying to improve on it before manually terminated. this plot is a 2000 point sample of 14017 points (which didnt include the rare max by chance, and its a large outlier on the typical range close to zero) and has y ranges adjusted to do a kind of nonoverlapping multiplot, a technique occasionally used before. ‘d’, ‘e’ density and entropy lightblue, yellow are on right side scale and the algorithm favors high density around ~0.8 and low entropy around ~¼.

`hybrid9.rb`

(12/29) 😳 😦 (sigh) as reported the `vary3` experiment seemed like a big deal at the time, but on 2nd look got confused in the analysis. a simpler way to describe it is as follows. taking 100 ½-density iterates, there is some variation in trajectory lengths. the final trajectory lengths are mostly predictable after about 300 intermediate iterations. but the prediction for (“remaining”) trajectory length is simply the current iterate size in bits after the 300 iterations. so then the question remains, is there any possible deeper encoding of trajectory length than the bit width? this experiment tries to measure that. it generates many 200 bit starting seeds and extracts only the exactly 50 bit width iterates that occur after 600 iterations. then it generates exactly as many 50 bit width seeds using uniform random density. it outputs the lengths of trajectories of the 1st group (starting from the “intermediate” 50 bit width seeds) followed by the lengths of the 2nd. there does not seem to be any discernable difference. ie basically a null result. so the idea of “memory” wrt this angle seems to be refuted on further analysis. (so easy to get mixed/ tripped up sometimes…) but maybe there is still something to salvage here?

`vary4.rb`

(12/30) “those following closely” will understand the implications of `bitwise12` but its worth spelling out in higher detail. as related to the Terras custom random walk construction, it is analyzing the behavior of the trajectory after ‘n’ iterations (the semicompressed mapping is apparently crucial/ ie “natural” here) where ‘n’ is the initial bit width, and finds that most of the climb potential is “exhausted by then”. in other words there is some further climb potential but it seems to be limited by a constant # of iterations. another pov is that there is “sufficient mixing” after ‘n’ iterations to “nearly guarantee” descent at that point. another way to look at it: in early iterations the collatz sequence can have an arbitrary climb but the 3n+1 op during that climb adds enough “later bits” to the iterate “of a certain pattern” that correspondingly push it downward. this is again like the stack/ PDA (push down automaton) analogy. to quote newton for each action (climb) there is an equal and opposite reaction (descent).

as earlier experiments laid out the analogy, the iteration dimension is like a horizontal dimension, and its then/ also natural to then wonder about a (corresponding) limit in the vertical dimension, which corresponds to iterate bit widths. this is easy to code but had a small wrinkle. the basic idea is to look at the iterate width at the ‘n’ initial bit width iteration count ‘w1’ (say, call this iteration count the “bit width distance” for shorthand) vs the bit width at the climb peak ‘w2’. but sometimes the climb peak is “far before” the bit width distance, in which case this code inverts (negates) the difference instead. call this the “bit width distance difference” ‘w21’. results were found as conjectured—limited/ bounded. ‘w21’ black line is multiplied by 10 to help with the visual/ graph scaling but basically stays below ~10 (unscaled) and was once seen at ~15.

`bitwise16.rb`

again its easy to adapt to the hybrid code and it finds somewhat better but similar results (bounded). ‘w21’ is in yellow (10x scale) and it finds a max of 22 after 30m and a final improvement time gap of 11m after manually terminated. plot ranges again manually adjusted for layout.

`hybrid10.rb`

(12/31) am looking thru old blogs for related metrics. there was some study of ‘j2nl’ in 5/2018. reviewing, ‘j2’ is peak index (‘cm’) minus density crossover index and ‘j2nl’ is j2 – nl with ‘nl’ the initial bit width. ‘j2nl’ was found then to flatline/ plateau for semicompressed mappings. am then wondering how ‘j2nl’ compares to ‘cmnl’. which reduces to wondering how ‘j2’ compares to ‘cm’. this seems to then relate to comparing density crossover index and peak index. from memory it seems like that (difference) was unbounded with peak index overtaking the density crossover. there is also the question of how density crossover index compares to ‘nl’, cant remember studying that in particular at moment. actually on 2nd thought am suspecting that maybe density crossover can be regarded as bounded by starting bit width? the idea is that density crossover is basically measuring the time for the max 0/1 run(s) in the starting seed to decline to the “crunch region,” possibly which happens “quickly”. or that is, a ‘j2nl’ optimization is biased towards trajectories with a short density crossover index ie the case j2 ≈ cm. hmmm, that makes sense… and am now realizing have never studied the entropy crossover! + wondering if it has any significance…

as already lamented (have to be careful not to sound like broken record…), its getting hard to keep all these straight in prior experiments (they are very scattered over years). and note that only optimizing by one metric doesnt consistently give a full picture, ie the interrelationships maybe inherently need multivariate optimization strategies to better/ fully understand. and even multivariate optimizations may push metrics away from extrema. etc! need to try to integrate/ synthesize all this and build up a coherent story (although maybe understanding all the interrelationships/ dynamics is not entirely nec for a proof), but as the expr goes nobody ever said it was gonna be easy…

further looking/ notable: `extend11b` from 10/2016 looked at glide length minus initial bit width for the compressed sequence. maybe somewhat related there was a ‘div2’ index versus initial bit size comparison in 11/2016.

(later) this is a neat simple toy experiment that builds on/ conveniently illustrates some/ many of the prior ideas and illustrates some of my complex current musings/ thinkings banging around in my head.

at this point as noted it the goal might be to prove a bound on ‘cmnl’. as noted over the yearsand this is a key/ crucial point misunderstood by various troll(s) responses, at the heart of the difference in ideology here vs mathematicians thinking, aka paradigm shift—the optimization strategies have curious, sometimes verging on strong parallels to proofs. they traverse an infinite tree or graph in a certain order looking for optimal candidates. the traversal is very reminiscent of induction step(s). a proof is equivalent to some kind of demonstration that these algorithms never find unexpectedly (“outlier”) optimal candidates after running indefinitely. but again this leads to the entropy concept. a computational tableau with low entropy is closer to a proof than one with high entropy, and it seems that a proof is essentially equivalent to trying to build or find an (optimization) algorithm that has a computational tableau with very low entropy.

but one can further just throw out the ‘cmnl’ idea. it was based on the more fundamental idea of trying to prove a bound on the collatz conjecture. but the same idea applies there: if bounds can be found on every iterate via some analysis/ optimization algorithm, which has very low entropy maybe by careful design, its the same theme/ concept.

have been experimenting with different tree/ graph reorderings. its a lot like adjusting the sort in the `bitwise` code, or the fitness function in the `hybrid` code. some behind-the-scenes experimenting along these lines went on with `bitwise13, bitwise14, bitwise15`. however, as quickly realized, some orderings immediately have the problem that they get biased in certain directions/ regions, pursuing higher/ larger iterates, and indefinitely do not return to lower ones. so part of the reordering/ balancing/ tradeoff that is required is that the algorithm divides its attention among all areas of the infinite graph, avoiding what might be called “getting fixated” aka “drunk searching for keys under streetlight.”

a simple way to ensure this is to look at bit widths of iterates and make sure that lower iterates are continuously included in the frontier advance. and this has to be somehow balanced with trying to find large outliers in the optimization criteria. the literature on optimization techniques is vast and wonder some if these “design patterns” have been encountered in other contexts. on other hand maybe in other ways its very specialized and the idea of traversing an infinite graph does not seem to be very common. there are some rough parallels with some classic CS pathfinding algorithms such as A* that can conceivably work on infinite graphs (they are sometimes used in robotics where the idea originated, and have wide use in video games) but those typically require a “destination node”. is there possibly some ingenious adaptation?

heres a neat simple demonstration of these themes, with brief code. this code builds bigger iterates simply by prepending the ’11’ and ’10’ prefixes to the bit strings, overwriting the highest bit (msb). this provably generates all bit strings but not in the typical binary ordering, its more the “bitwise” ordering. (also note its essentially the sibling concept again!) it sorts the candidates by bit size minus some factor ‘r’ times ‘cmnl’ and chooses smallest candidates (ie trying to maximize ‘cmnl’). called for r=1.0 the graph is very noisy, high entropy, in both metrics. simply adjusting r to 0.1 changes it dramatically, increasing the weight of minimal bit sizes visited 1st vs the other ‘cmnl’ criteria and introduces substantial visible order, yet still advancing the frontier. to be more specific theres some remarkable fractal/ scale invariance features corresponding to bit size increases (“spiky staircases!”) similar graphs were seen years ago in much earlier/ earliest explorations.

the big idea/ theme long espoused is to use machine learning to somehow come up with a balancing criteria or “equivalent reordering scheme” using much more advanced techniques (ie rather than/ far beyond merely adjusting a single numerical parameter) that increase the order of the computational tableau… again ‘cmnl’ is now seen as an important metric as outlined but any metric could be attacked in this way eg simply climb or trajectory length etc.

`traverse4.rb`

## 6 thoughts on “collatz finetuning”

1. Pingback: collatz mod3 | Turing Machine

2. Pingback: collatz comb | Turing Machine

3. Pingback: collatz microscope | Turing Machine

4. Pingback: collatz match | Turing Machine

5. Pingback: intractable collatz | Turing Machine