# collatz comb

this is an idea that occurred to me, building on last prior theme, to look into bit patterns connected to the subsequences of the parity sequence. for a glide there is a predetermined and postdetermined parity sequence. this looks at ‘w’ width bit windows over the iterate sequences, here w=10, for a single long trajectory. for the climb it is clear that the lsbs (in the iterate/ window) tend toward 1 odd. what about adjacent bits? this somewhat ingeniously shows they are indeed also tending toward 1s in 1st graph, ie the familiar 1-lsb runs/ triangles. the visualization is a tree diagram where the branches are 0/1 with 1 the upper and 0 the lower branch, and are easier to draw than one might guess; the code uses a “cumulate coordinate halving” idea. the left side is the climb/ predetermined range and the right side is the decline/ postdetermined range. color coding is by iterate # in the ranges. the 2nd graph is the windows over the msbs instead of the lsbs. close study shows some difference in the “ending” comb density from one side to the other with left side a little denser than right implying some kind of different distribution. the color distribution seems fairly even/ uniform.

(2/3) have been having some ideas to use the Terras construction to study dynamics more. KT had some excellent ideas. in particular here is a notable/ meaningful construction that KT alluded to directly and ties in with other aspects of the Terras proof such as the logarithmic approximation. did not immediately grasp how to code this but then finally it occurred to me, dont know if his formulation is identical, would like to see that. it is possible to “synthesize” a nearly horizontal glide. the logic is revealing. this was probably done inadvertently in some earlier code eg 9/2018 `bitwise3b` and here its more methodical. the idea is to treat the glide, like Terras did, as a multiplicative random walk where there are 2 factors corresponding to the (3n+1)/2 and div2 ops of the mapping. then looking at it in logarithmic terms its an addition or subtraction “very close approximation” of an additive random walk (either 3/2n or div2). a simple greedy algorithm balances the two and leads to the following code.

1st graph is the straight logarithmic walk vs the collatz-synthesized walk, normalized by starting iterate sizes, and they are almost perfectly aligned. it looks regular on quick glance but its apparently an irregular/ nonrepeating fractal pattern. 2nd graph is the difference between them and this is due to the “small” +1 additive factor on the (3n+1)/2 op. 3rd graph is the bit width of the iterates and again its a nonrepeating pattern that looks regular.

`construct.rb`

this is a quick yet revealing riff. the prior code balances on a “razor edge” where the density is balanced but in a very finetuned way, ie both locally + globally. another idea is to look at the average density of the generated sequence and then create a parity sequence with that same density. this has the same global value but is locally more divergent. this is a special “transition point” in the parity density where the trajectory is most likely to run sideways, higher densities lead to climbs and lower densities lead to falls. that was seen in the 10/2018 `121` experiment. the “small” difference illustrates a subtlety of the different methods. the density transition point (note, of the parity sequence) turns out to be measured at 0.64. the next obvious step is to look at interrelations to iterate density.

how to compute/ derive the parity transition point? the up increment is log(3/2) = log(3) – log(2) ≈ 0.45. and the down decrement is log(2) ≈ 0.69. then the ratio of up increments to up down decrements needs to be r = log(2) / log(3/2) ≈ 1.7. this occurs if the (parity) density is r / (r + 1) ≈ 0.64. that formula “normalizes” the down decrement length to 1 and then computes the relative up increments to total length.

`construct2.rb`

❓ ❗ further look: not surprisingly the starting iterates are close to ½ density and there are not clearly discernable differences in density or entropy of the iterates in the trajectories (until typically the ends). this is one of the big question marks of this research. ½ density starting iterates can have almost arbitrary behavior: climb, descent, sideways, for arbitrary lengths, but chosen at random, tend to be declining. it seems to relate to the fractal/ self similar nature of the problem. just as mini-mandelbrot turtles can be found in the mandelbrot set “all over the place,” in a reminiscent way different glides of arbitrary shapes/ sizes can be found in the ½ density region. at least the Terras lemma is a powerful way/ tool to study the phenomenon. but maybe the way to look at density/ entropy is that in analyzing the problem they both have a lot of power and powerlessness at the same time. this has been very hard to wrap my brain around and come to grips/ terms with.

(2/5) 💡 ❗ 😮 ⭐ this is some code further building on the Terras 1-1 mapping. it creates 50 100-width iterates with varying (parity sequence) density similar to the `121` experiment just cited but doing a 100 sample integration/ average and analyzes more statistics. the dotted statistics are entropy/ density on right side axis and others are left side axis. theres a lot of info here but the basic story is that the mean statistics are very flat with “almost no differentiating signal” except for at the extremes in parity density (left/ right edges). in other words they cannot distinguish much the behavior of iterates in the ½ density core.

there are few almost-exceptions: entropy ‘e’ red dotted, density ‘d’ green dotted, and max 1-run ‘mx1’ length lightblue have signal toward/ nearly reaching the core density, and avg 1-run length ‘a1’ blue actually seems to fit the bill by a very narrow edge/ sliver. this is gratifying to see these key old statistics again confirmed as substantial from an entirely new angle (not long ago, one that CT insisted invalidated the entire paradigm…ironically his findings/ spirit lives on esp in prior and this exercise!). the big question, is there some way to “amplify” these signals toward the density core? also another immediate idea coming to mind is to look at deviation in these statistics. ❓

something else to note: theres been a long challenge in finding “typical” or “representative” trajectories and ofc thats an extremely problematic concept with fractals and this problem. and recently took other stabs at it with the hybrid optimizer `hybrid3, hybrid4,` (12/2018) and `hybrid11` (1/2019). but the Terras construction combined with density specification of the parity sequence is in a new sense a way of generating “representative” or “evenly distributed” random walks. the Terras 1-1 construction has a lot of implications for how to consider representative trajectories/ glides.

`construct5.rb`

⭐ ❗ 😮 🙄 did you catch that? ‘a1’ avg 1-run length deserves further notice. it looks eerily that it predicts less than ½ parity sequence density for less than 2 and larger than ½ density for larger than 2. this is a rediscovery of this threshhold/ transition point from an entirely different angle. the 2-attractor was seen in some generated statistics but the threshhold/ transition point connection to parity sequence density is new and bordering on eyepopping. heres a detail with a blue line at the ½ density. striking emergent property!

(2/6) the 1-run statistics (of the generated starting iterates) seemed to have more signal. was looking at the median 1-run length and it seemed to correlate fairly well with the parity sequence density, maybe the highest signal so far. then went in the direction of wondering about the 1-run distributions and then used this prior technique. it just sorts the 1-run lengths and then normalizes by count on x axis and max 1-run length on y axis.

was surprised to find this remarkable/ strange/ unexpected globs/ bunching/ threshhold/ fragmentation effect and dont have an explanation right now except to note on cursory further look some of this is due to overplotting where there are a lot of overlapping “grooves.” this is 200 densities and 1000 length parity sequences colored by parity sequence density. the effect was not apparent for some lower parameters showing some threshhold effect and how substantial effects can be hidden in noise. feel like some kind of basic analysis of the distributions would shed light on this but cant think of what right now, maybe some new formulation… guess (related) histogram analysis is an obvious idea to try… ❓

`construct7.rb`

(2/8) 💡 ❗ 😎 looking at 1-run histograms was inconclusive because they seem to be very noisy. however here is something very similar that builds on those last ideas, almost starting me in the face. sorting the 1-runs leads to the stairstep pattern and it makes sense to find the points on the stairstep and do linear interpolation between them. this is easier done than said so to speak because gnuplot does “linear interpolation” by default on line plotting. so this code has a new routine `convert` that finds the stairstep points basically by sorting the 1-runs and finding the transitions, and then converts those to coordinates. the coordinates are normalized by x, y width/ height. theres an integration over 100 separate samples of each density and the 1-runs are all combined into a single array.

this all leads to a very low-noise result, possibly one of the lowest noise/ smoothest seen. in a sense, a work of art. the density range 0-½ seemed more noisy so this is ½-1 range only. the display is revealing a shift in the distribution of 1-runs, a kind of “skewing/ squeezing/ squashing” effect as density increases. the prior bunching effect is beautifully adjusted, a real “x-ray” scope visualization into the heart of the problem. the bunching seemed to be related to a few integer #s of total count of 1-run lengths around a dozen. an immediate question easily answered, would a synthetic density gradation lead to the same results?

`construct9c.rb`

💡 ❗ pondering it more, the `construct5` diagram is really remarkable, it captures/ synthesizes many ideas. is it enough to create a “pre vs post determination” signal? it would seem so! havent sketched out this idea yet but basically its possibly near to a proof formulation because it might provide the inductive structure/ dynamics/ mechanics such that the predetermined region inevitably transitions/ evolves into the postdetermined region!

this next item is not yet remarked on, but its also remarkable. in the `construct5` diagram the left half is a region with ½ density and increasing entropy, ie leftward starting from center (oops the legend annoyingly overplots/ obscures some of this crucial trend, meant to center it). does that remind anyone of anything? it took a little bit to find it again but it was a direct/ memorable exercise earlier this year on 4/2018 `entropy` code where the same “gradation” was synthesized but starting instead from 0 entropy, the upper/ top ½ of the diagram would seem to correspond to ½ entropy and higher.

the binary entropy aspect/ measurement was discovered “only” at beginning last year pondering the ordered vs disordered regions although the more general concept of entropy has been mentioned/ pondered for years in this blog. also the striking pyramid entropy-density figure `blend2` from just 2 mos ago 12/2018 continued on this theme of looking at the entropy-density gradations/ variation/ spectrum. also, maybe the presence of ½ density seems to take away order from the 1-runs distributions statistics somehow, maybe explaining why the last code was noisy on that region, not sure how to tie this all together/ picture all this yet.

💡 ❓ there is likely some simple relationships between all the known metrics even for randomly distributed binary strings eg between density, entropy, 1-runs, 1-run avg, max 1-runs, etc… and how do those relate to the Terras mapping? none of this is very hard to study… and developing these basic views/ tools is already proven highly valuable verging on crucial…

(2/9) 💡 some of the old seeming barriers persist. the latest conjecture about “pre vs postdetermined” regions should somehow apply to some kind of set constructions. how are they technically/ formally classified based on some kind of property? defining them in terms of collatz iterations is a technical definition but for purposes of a proof verges on circular. another challenge is the old “global vs local” dichotomy. the global properties are fairly well understood but need to find local properties that connect to the global ones. the `construct5` graph is very meaningful along these lines but requires more attention/ work. another aspect is “dichotomy vs gradations.” the pre vs postdetermined concept is a dichotomy, but for a proof, it seems one needs a property that measures “distance from one or the other or between the two” and then a proof would involve showing convergence in that property. this is the familiar long search for a monotonically declining indicator.

it is not exactly clear what to do next but then came up with this straightfwd idea. the `construct5` properties show that all the following would seem to be expected to narrow from pre to post determined: average 1-bit run length distance from the 2 attractor, entropy distance from ½, and density distance from ½. but these are very noisy to measure and there are tricky aspects. basically at both the beginning and end of trajectories they are/ can be more divergent and somehow only narrow in the “middle.” so how to handle that? one could look at just the glide which excludes the end but that still includes the start of the trajectory. one can try to exclude based on ordered vs disordered regions and there have been statistics/ features constructed for/ close to that but as recently found (the “UFO” effect) they are problematic for various/ misc reasons. another idea is to look at glide descent only, but its not yet so clear how that relates to the forefront concept/ study of pre vs postdetermined regions.

so after all that hit on this. the ‘cmnl’ region was helpful to study. its the region after the predetermined region where the trajectory is still increasing. its the questionable area. earlier it was found to be at least ~300 iterations and its not known if it could be larger. its special so call it the limbo range. following is the `bitwise` code with the postanalysis ripped out. 3 new statistics ‘mxda’, ‘mxea’, ‘mxa2’ are constructed as indicated over the limbo range.

optimizing over these 3 only tended to paint into corner and the code eventually couldnt find nonempty limbo ranges at the end (‘cmnl’ going negative). optimizing over these plus all the trajectory length metrics leads to this result. these 3 new “narrowing” metrics (‘mxda’ dotted gray, ‘mxea’ dotted red) indeed narrow even when attempted to be pushed up by the algorithm. note ‘mxa2’ green is multiplied by 100. dotted lines are right side scale. also notably ‘cmnl’ tended to hover at very low values under ~100. in other words these metrics might be gradation signal possibilities for pre vs post determined. its a new pov on the old “density core” idea. apparently the density and entropy cores narrow in the limbo range. but maybe approaching a nonzero constant and not asymptotically approaching zero? notably/ interestingly/ remarkably this exact same constant apparently shows up in the `construct5` diagram where both ‘da’ and ‘ea’ seem to converge around 0.05 in the center region and its striking its apparently the same value for each. ❓

`bitwise24.rb`

(2/11) already understood to be very substantial, the `construct5` dynamic could be something of a gamechanger. the key is about what might be called “self-reinforcing properties” ie recursive, inductive, or convergent in a sense. density is known to converge to the core region and this meshes with the `construct5` dynamic, but am not sure much about any further convergence once in the core. lets call this graph the “random walk map.”

the problem is to try to show that all trajectories move toward the center of this graph as the trajectory evolves, this is the hypothesized ½-parity sequence density “strange attractor” of the postdetermined region. this is sufficient but not necessary for trajectory terminations. the graph represents (down)slope-mean-returning glides in the center and slope-mean-diverging for the right and left extremes (“faster” upward or downward resp). trajectories terminate “faster” on the left side but in the proof structure it seems this acceleration is attenuated from pre- to post-determined. for iterate density, it flatlines for left extreme which means for all these initial iterates, they already or “instantly” start in the core density region.

a convergent property is one that converges toward something through each iteration and as doing so, pushes the iterate density toward core density but also the parity sequence density to ½.

the random walk map might also be measuring other properties that are not necessarily convergent. for example am suspecting ‘da’ and ‘ea’ as associated with the 1st iterate maybe do not change much (disregarding their inherent noisiness) with subsequent iterates in trajectories. the `bitwise24` is showing some squeezing of ‘mxda’ and ‘mxea’ but an important distinction is that this is not necessarily implying particular dynamics over individual trajectories, its measured as a global property (of/ within the limbo region) as trajectories get larger; ie ‘da’, ‘ea’ may not be “convergent” within a trajectory. so part of the key challenge is finding the/ any convergent properties which seem to cause the trajectory terminations.

(2/15) 💡 ❗ 😮 this is a riff off `construct5`. was interested in “evolution of density/ entropy in further iterations after initial found.” so in this sense theres a “depth” of points associated with each sample in the graph. the code is not so complex but it really took a long time to wrap my brain around how to write it. maybe a little burned out in that moment. its a depth (iteration count) 200 along with averaging over the 100 samples. also there was a glitch in that the code wasnt paying attn to early terminations and then was surprised at results. it found cycles but it was the 4, 2, 1, 4 cycle repeated at end that messed up statistics. its not so easy to see due to overplotting but there is some surprising oscillation effect on the tail ends of the trends esp on left side for both density, entropy.

`construct5d.rb`

⭐ 😮 🙄 😎 ❗ ❤ this is some quick code to look at a left side iterate evolution, here 0.05 density on a 100 bit iterate. it made a little more sense to show the binary evolution left to right with lsb at bottom, ie a 45 degree flip from prior binary diagrams, and overplots the entropy and density on right side scale (note, the x-y ranges/ boundaries got mixed up/ swapped in the flip). these are some really extraordinary/ wild iterate patterns and combining a sort of correlated thrashing or sawtooth pattern in the entropy and density, stripe/ checkerboard patterns, lsb 0-triangles, and really can leave no remaining doubt whatsoever about the sometimes-amazing relevance of looking at binary diagrams. just when you thought youd seen it all!

2nd graph is a 0.95 density evolution. deja vu all over again—the familiar 1-lsb triangles correlated with the climb. there is some major vindication here in prior analysis that it found these structures already. it would appear from these two cases that 1-lsb triangles are the lynchpin and that 0-runs point toward decline-biased and 1-run point to climb-biased, and interestingly, related to the 1-runs, the density/ entropy converge/ narrow right at the glide max. however from other relatively recent experiments its known that apparently in some sense there seems to be no limit to climbs that have negligible 1-lsb runs.

😳 note! after noticing strange outliers larger than 1 that were not noticed in earlier aggregate statistics and are much more apparent in this format, the entropy calculation was found to be slightly off and adjusted; the parsing code in the `len` function was finding empty strings at the front when parsing 0s because of the leading 1. this had been encountered and handled before but newer code introduced this glitch and it affects a lot of prior calculations!

`construct11.rb`

(2/16) the Terras 1-1 density generator can be seen also as a glide generator and heres the standard statistics peak index ‘cm’ and glide length ‘cg’ and trajectory length ‘c’ with also the globally computed ones ‘cm1’ and ‘cg1’ which are indeed different where there is some thrashing between the two in the ~⅗-⅘ density range, the non-global/ local metrics are plotted with the dashed lines. the stairstep effect in max index is quite notable, its exactly the bit width of starting iterate 200 and maybe even suggests the recently studied (nondeclining) limbo region between bit distance and peak max is “atypical.” this actually gives strong further credence to the idea of the postdetermined slope-mean constrained random walk, as starting downward right at the pre/postdetermined boundary.

`construct12.rb`

⭐ ⭐ ⭐

😮 ❗ 😀 😎 ⭐ ❤ there is some kind of deep culmination/ vindication/ even bordering on closure here. it seems, the endgame is in sight. this is now a simple/ basic visualization that builds on prior findings and could have done this sooner ie maybe 1st priority! have already visualized this mentally at this point but here a pictures really worth a thousand words. this looks like a rosetta stone to the whole problem. this is 50 trajectories displayed based on parity density as the coloring using again the Terras 1-1 density construction. breathtaking!

its a beautiful, striking revelation of the pre- vs post-determined boundary/ dynamics at the bit width distance 200. its exactly as expected and reveals the deeper structure and possibly even the overall proof outline as already sketched out. basically an iterate determines the random walk up to the bit distance and then theres a quick conversion to a slope-mean-returning random walk. however, on closer look there is some wrinkle/ exception in that the very low parity density trajectories actually have a higher slope bias in the (early/ short) postdetermined region. the top density iterate also seems to be an outlier/ biased away/ outward.

`construct13.rb`

`construct11` code range mixup/ flip not fixed was bugging/ nagging at me and those amazing evolutions deserve some further highlighting. its a 1 line iota )( of a code chg but feel unless saved somewhere in cyberspace it doesnt exist (the modern take on a tree falling in the forest). also have some weird slightly irrational aversion to revising blog mistakes, maybe partly as a way to be candid-verging-on-raw about the research process/ history. (aka stream of consciousness which acc. to my highschool english teacher was once a paradigm shift in english literature style, maybe captured in catcher in the rye which reminds me of modern blog style.) here it is fixed and a gallery of variations. the characteristic tail disorder is captured/ included with the corresponding noisy region of density/ entropy whereas before it was cut off, but the opaque upper left legend obscures a little of the edge. as remarked once awhile ago, to repeat, its truly a wonder/ marvel the same algorithm is behind all these dynamics.

`construct11c.rb`

⭐ ⭐ ⭐

❓ something to ponder, a deep question, something long dancing, poking, or slithering at the edges of my awareness. earlier it was mentioned that a key challenge seems to be separating pre- and post-determined sets via some kind of computable property, and that defining this in terms of collatz iterations seems circular. but maybe this is some kind of bias or failed head-stretching going on. thinking about this more carefully, it is apparently trivial that pre- vs post-determined iterate is a decidable/ computable property: just iterate n iterations and examine the trajectory, and determine how close it is to the slope-mean-returning post-determined region. (although its also possible to “construct” a post-determined looking slope in/ at the end of the predetermined region… thinking out loud, is that a problem?) so can a proof be constructed out of this “property”? maybe its a deceptive/ hopeless/ futile quest to find a “simpler” or “more local” indicator, there is a lot of suggestive statistics but nothing really substantial that seems to indicate “something else might be there”…

(2/18) 💡 😮 ❗ ⭐ 😎 ❤ life is lived fwd but understood backward. this is the `construct5d` code with a single line of code change to reveal a different format/ layout, ie same data plotted with different orientation, even dramatically clarifying some of the prior obscuring due to overplotting (ie in retrospect this could be readily merged with the prior code). the code computes an average iteration depth/ density or entropy average matrix. just changing the transpose of the matrix (taking out the transpose operation) plots the average evolution of the iterates colored by generated parity density instead of iteration depth.

these graphs are further revealing. there is some breakthrough revelation here of underlying dynamics.

the basic idea is that the collatz function is like a sampling operator from a distribution where the bits of the iterate express the distribution. it samples the initial bit (lsb) of a iterate with some amount of (evolving) density and entropy.

however, these general tendencies are not necessarily easily detectable in individual trajectories where density statistics are very noisy. this following graph shows that roughly, the sampling operator is not obviously biased. it strongly correlates (ie probability of 0/1 measurement in the lsb) with the average entropy or density of the iterate. for example for the (hotter) climbing/ higher parity density trajectories, the overall iterate density is elevated throughout the climb and conversely, pushed down for the (darker) descents. its nearly a long initial plateau for most parity densities except the very hottest which is a roughly linear trend downward. there is some adjustment to that trend however in the later ~¼ part of the 200 iterate bit distance where average density/ entropy converge toward the core.

there are further ways to look at this and it would be very helpful to try to quantify computationally/ numerically whether there is any “bias” in this sampling operator. this is also possibly nearly a further revelation of the overall proof structure. it depends if this framework can encompass the entire dynamics of the problem without problematic “outliers/ exceptions” but the current signs are that its truly the beating heart of the problem. the “mechanism” of convergence apparently indeed turns out to be (somewhat ironically? aka “what a long strange trip its been…”) iterate density. now in retrospect the difficulty in showing this over the years has been removing the noise/ “extraneous factors,” and other techniques that are probably/ actually in comparison “outliers.” one might say these new Terras construction + averaging techniques allow a powerful xray-like view that overcomes the inherent noisiness of the problem that somewhat capriciously masks/ veils its internal dynamics.

`construct15.rb`

this is a revision/ reformulation that immediately answers the question of degree of sampling operator bias. (immediately on reading but not on writing.) basically a “local pointwise parity density” can be estimated over the iterates of the ensemble of trajectories emanating from each of the 50 starting parity densities. then the bias is the ratio of the iterate density divided by the pointwise parity density, here its graphed logarithmic scale due to the stretching otherwise. there is some systematic bias. parity density is biased high for low iterate densities and vice versa, biased low for high iterate densities. note the heatmap is inverted here by parity density. the story for entropy bias is a little more complicated but similar, there is a relationship but its apparently nonlinear and/ or discontinuous. the isolated downward hot spikes in the density plot are notable/ currently mysterious/ deserve further analysis.

it would be interesting to try to measure this same bias via other means eg using random starting iterates, its apparently fundamental but possibly a relatively simple exercise (at least as far as code complexity) thats been overlooked until now. but its maybe not trivial to construct either, because lsb sampling of a uniform density distribution will be unbiased by definition, have to think about it more. this bias seems to have something to do with the iterative/ sequence structure of trajectories. also from long prior analysis its known sometimes the density isnt uniform/ varies in the iterate from lsb side to msb side.

the smooth/ outlier trending lines are the extreme densities (lowest parity density trajectory in density diagram and highest density in entropy diagram). the measured bias is basically independent of the location in the predetermined region. bottom line, more basic/ deep properties that likely reveal fundamental proof structure.

`construct15b.rb`

this is a graph of the inverse value (expected parity density divided by iterate density) not logarithmic scaled, could this be more natural?

💡 ❗ further thought and comparison with prior graphs, this is showing the sampling operator acts in a sense as a nearly linear amplifier! it amplifies the signal of the current density core distance! in other words even for densities near the core, slight off-center bias above or below the center ½ density causes a large effect in the parity density distance (from center ½). notice the unamplified ratio ~1 is roughly corresponding to the center parity density ½.

`construct15c.rb`

(2/20) ❗ ⭐ 😎 😀 as already emphasized the `construct13` diagram is a very big deal. might as well call it the rosetta diagram. just cited it on Taos collatz page! (forgot to say, the time is ripe!) havent posted there in 4 years! currently have a total of maybe 10 thumbs down on my comments there over the years. lol, thanks guys! friends in high low places! lol oh well whatever you dont get to my position without some cyberspace scars over the years, some rather deep, and learn early on its better not to take everything personally. nevertheless it all leads to the new philosophical question if somebody proves Collatz on a blog page in cyberspace, will it (ever?) make any noise?

the rosetta diagram and the recent experiments are already substantially outlining/ building a major proof structure. the same global-local problem/ theme touched and pointed out over the years here continues in this new context, but at this point its apparently mostly overcome/ conquered! the theme for the prior few exercises is “integration (averaging)—effectively targeted—can be ones friend.” thats because its like a noise reducer, which is strong/ key medicine for this problem. this is a quick/ simple confirmation of that. its a more local calculation of the more global properties just measured.

this takes 200 100-bit width iterates of varying density and then looks at average density of the pre (red) vs postdetermined (green) regions, only over individual trajectories. for the low-parity-density range (left 50 points) there is no differentiation (but there is direct/ strong signal away from center) whereas for the high-parity-density range (right 50 points) theres a strong differentiating signal. hence as expected the global characteristic of pre or postdetermined range can be extracted from more local trajectory statistics, ie is recoverable from more local density, although some adjustment/ alternative will have to be made/ found for the low parity density region to fully generalize; some of the difficulty may be that its postdetermined region can be small. that deserves special notice/ repeating!

the global pre/ postdetermined region location can be extracted from (the very noisy) local iterate density/ core of individual trajectories and its subtle (non)bias toward the center!

(to be honest, think maybe this was uncovered in different context(s) in some earlier experiment(s), have to try to go dig them up.) anyway this is impressive wrt old experiments over the years that showed a lot of noise in this (iterate density) statistic and the idea of hidden bias was never really evident in the past, it would have seemed too radical. oh, and it still needs some more sharpening, because it was already known that with various climb algorithms there were 1-lsb runs in the ordered region (thereby affecting iterate density), and maybe these Terras-generated glides have significant leading ordered regions in there, and dont know how much this affects the overall average, have to analyze all this further (eg are the high density measurements in predetermined region for the high-density parity sequence guaranteed to indicate mostly ordered region? think so, dont see why not). in other words it will probably be important for the proof structure to show the signal is still detectable in the later disordered region of the predetermined range and that could be more difficult/ elusive… it depends on how much of this current calculation predetermined range is ordered vs disordered. if the leading ordered part is not large, it will not affect results much.

in retrospect view of recent experiments, it is nearly obvious, the integration experiments almost prove the bias has to be there (it would be a very interesting exercise to contrive/ explain a scenario where the integration statistics have a signal but do not successfully uncover a local property, maybe have seen something like this in another experiment but would have trouble pinpointing it). bottom line in arbitrage circles (which really are not so foreign from this work/ project!) this would be known as “a slight )( edge” and as denizens there know—one of them named Simons—its sometimes worth a lot.

this idea can be taken further. let me give away most of the idea already: the collatz problem is equivalent to studying large n-block multi-iterations where n is just iteration count within a “batch,” iirc there were some experiments posted along these lines a long time ago. already suspect this could be most of the proof structure right there… precocious readers will “get my drift.” another big fat hint: look/ think of the postdetermined density green line flatlining

`construct16.rb`

(2/21) 😳 this code quickly tests and somewhat dashes/ refutes some of those hopes of something like a short(er)cut, so that they land on “too easy to be true” or in the words in a letter of Einstein on Bohms theory of quantum mechanics, “too cheap.” the idea is to look at two adjacent n-blocks of multi-iterations and compare the average iterate densities over each, hopefully showing a decline, without much concern otherwise. this is not such a naive idea thinking about the ordered region and suspect it might even be provable there in some sense. but as already “learned the hard way,” hardwon intuitions/ findings built on the ordered region sometimes crumble or go to die in the disordered region… to borrow a dramatic/ striking/ evocative/ poetic turn of phrase from Zizek and the Matrix (near 2decade anniversary!), kind of like the barren/ scorched desert region of the problem. this is like prior experiment `construct16` except that the 2nd window, closer to the post determined region than the 1st, is fixed size and not variable, and also like `construct15` experiment in looking at average iterate densities.

the code/ graph is as follows, not esp complex compared to prior code but not simple to describe either. the code takes 200 trajectories in the 0.64-1.0 parity density range. the 0.64 density is found earlier as the “flat glide,” transition point wrt Terras generation, anything less is basically not a glide and anything more is a longer/ steeper glide. then it looks at 2 adjacent windows left/ right of fixed 100 iterate blocks and the average iterate density relation between them. it loops over starting bit widths of size 100..250 in increments of 25, in other words intentionally the left window is entirely in the starting predetermined region and the right window increasingly covers/ shifts into the remaining predetermined region instead of the postdetermined (a visualization/ diagram of this window placement would help…). the x axis is initial iterate parity density within the integer test runs. the difference of average iterate densities of the post minus the pre window is graphed in green if negative and red if positive. the green shows the desired decrease for an inductive proof structure. but the red becomes more predominant for larger bit width seeds over the 11 test runs left-to-right.

in other words the general idea is not rejected and valid “to some degree”—ie when the window size is nearer to the starting bit size as on the left, and this continues the trick/ theme of pulling a “rabbit signal” out of what was previously thought to be “hat noise”—but a “mere” fixed window/ multi-iteration block size is quickly ruled out and (apparently) “not gonna cut it” as starting iterates grow without bound, the trend moving right. this also relates to “insufficient mixing.” in short larger iterates require larger mixing as measured by iteration counts to achieve the same density decreases, and some more sophisticated induction beyond merely looking at multi-iteration blocks is required.

💡 further thinking just realized the slope-mean-returning property may have been discovered in at least one different context previously but its implication/ significance partly unrecognized at the time. have to go back and find them/ hunt them down; there were some experiments with siblings that showed their post-determined descents were somehow “tied together” in various ways, eg their descents only different in bit widths under some bound, before the concept of the post-determined region was better understood. the post-determined idea was implicit in the earlier idea that two “top” siblings only differ in 1-2 msbs. and maybe this has a key role in some kind of an induction framework… a distinction to think about is that as observed two random walks could be “tied together” which enforces a kind of order, although their “tied path” could be still quite random, but if they are slope-mean-returning, that is an additional level of order, previously missed at the time. am now starting to see some ideas/ hints of some kind of “stochastic calculus” dynamics/ framework/ more general theory…

`construct17.rb`

(2/22) 💡 ❗ ⭐ its like “kid in a candy store” around here lately. on rethinking last idea now have some excellent ideas that look like they could really nail down the whole enchilada so to speak (its both fun and frowned on to mix colorful metaphors but honestly am very excited). but then this more immediate experiment occurred to me after some tinkering (aka shiny object lying by the side of the road), parts of it are one of those basic ideas that could have investigated years ago. this is basically a study over the predetermined range of sizes 25..500 in increments of 25 in the coloring ie starting bit width of same sizes and again with Terras parity density generator.

3 metrics are studied: ‘ba’, ‘ma’, ‘da’. ‘ba’. the general idea is to look at over- or under- center ½ density as a binary indicator, almost like a “sign/ positive/ negative indicator.” this is also somewhat building on the amplifier idea. ‘ba’ is just average count or density of over/ under, over the predetermined part of the trajectory, and thats expected to correlate with density, its also like a “density parity” (note! to be carefully distinguished and not to be confused with (sequence) parity density already mentioned lots previously and at the heart of the Terras density generation, got that?)

‘ma’ is a little more advanced and is the average # of matches between the lsb and the density parity. then ‘da’ is the average density. results are really unexpected in some ways. ‘ba’ and ‘da’ would be mostly predictable from prior results—eg both ‘da’ and ‘ba’ relate to `construct5` which used fixed bit width size instead, and ‘da’ is very similar to `construct5d` although that experiment was on varying iteration depth instead of bit width/ predetermined size!—but ‘ma’ shows a remarkable/ dramatic shift on left hand side based on bit width/ predetermined size. ‘ba’ fairly closely tracks ‘da’ as expected (ie in a sense highly correlated) but not exactly and there is some distinguishable distortion esp in the left/ lower side. ie clearly something about the relatively subtle distinction between density and density parity is driving some of the underlying dynamics, in statistical terms, some “bias” is identifiable/ measurable.

actually all 3 plots show a dependency on bit width/ predetermined size partly in spread but also in trend shift. another way to think about all this is that apparently even in a very noisy density range where density hovers inside the core/ near center eg high/ midright end of ‘da’ plot, the density parity can have a lot of “spread” signal in it, and highly (anti)-correlates with the trajectory parity sequence as with ‘ma’ at the left/ right extremes. some of these tendencies could probably have been figured out a long time ago with density concepts and the “artificial” glide generators—density parity is very fundamental in retrospect—but it took this new conceptual framework to leverage the understanding.

`construct20.rb`

ah, that now dispatched, might as well write out my current big picture/ hypothesis/ conjecture/ outline. this aligns with prior results and is somewhat inspired by the density “squeeze” dynamics observed in recent and `bitwise24` experiments, also `construct5d` (clear on right side but somewhat obscured/ garbled on left side) and esp `construct15,` yet further in `construct16` and `construct17`lol anyone sensing a theme here? its a bold statement subject to adjustments around the edges. as in line with the building momentum around here noted, its nearly a proof statement. it bridges all the concepts together, connects the dots.

the general idea is about what might be soundbited as “density narrowing.” density narrowing has long been seen/ noted/ poked at/ delved into in the ordered region, but prior to recent experiments, there was not an understanding of density narrowing in the disordered region which overall had a very mysterious quality for many months of hard/ painstaking/ meticulous analysis. it turns out, at least in the average/ integration statistics, there is apparently “extraordinary” density narrowing measurable also in the disordered region, and its not at all apparent from prior analysis that didnt really expect to find it there consistently as a deep universal property (although there were hints maybe with the previously contrived glides)…

• for all iterates the ending iterate(s) of the predetermined region will have a “compressed” near-center-density range.
• this also bounds the density range of the postdetermined region and is sufficient to guarantee the mean-slope-returning random walk—crucial property not mentioned until now, ½ parity density corresponds to the DOWNslope-mean-returning walk!
• the sequence parity (emanating from the lsb sampling operator) closely correlates with the density parity which in turn correlates with the iterate density.
• also smaller than ½ parity density Terras trajectories (actually less than the 0.64 parity density flatline transition point found above) can be generally disregarded as they dont lead to glides.

❓ ❗ 😳 😦 😥 😡 👿 😈 how was all this missed?

• part of the issue in retrospect was that density parity bias/ signal and its key connection/ correlation to sampling operator wasnt apparent visually in the density core in individual runs.
• another aspect is that, somewhat equivalently stated, very small bias in the density around the center is apparently equivalent to an amplified signal wrt sampling operator. ie what looks like noise provably contains signal.
• there seemed to be a lot of experiments that showed nearly arbitrary behavior wrt climb/ descents inside of the disordered region were possible eg the ‘w’ generation strategy investigated at length; should maybe go back and re-examine/ reinterpret that, was it misleading/ somewhat illusory?
• was “late to the party” wrt Terras insights
• the Terras density generation idea is only about ~2wk old with `construct5` and its deeper implications more clear with the ~1wk old `construct13` rosetta diagram and am getting tremendous mileage out of it. while obviously not fully general, it seems to finally crack the long-stated subproblem to some degree of generating/ looking at “representative” glides from a “random sample”
• was this mentioned before? the problem is @#%& world class difficult
• etc!

the Terras related povs/ angles were able to uncover it finally. now once identified/ isolated there are probably some other neat ways of visualizing/ magnifying/ highlighting these interrelated/ interconnected/ intertwined concepts…

(later) 😀 quick early confirmation/ gratification. so `construct16` shows density narrowing in the aggregate over pre vs post determined region averages, and `bitwise24` also seems to show a similar aggregate density spread tightening for increasing seeds in the early postdetermined region. how about something more dramatic? what about single iterates? that is a rough prediction of about the first ½ of the hypothesis. so a simple exercise is to modify `construct5` (already yielding “high mileage/ insight”) to evaluate the same metrics over the “slice” of iterates at the beginning of the postdetermined region instead of the predetermined region (the entropy evaluation fix mentioned earlier for `construct11` is merged backwards so to speak).

sure enough even for those individual iterates (averaged over 100 samples, and larger 400 bit/ region length which improves the effect) theres a very dramatic tightening of the density visible in the extremes (low/ high, left/ right). as expected, its reasonable/ supported at this point to assume this tightening is solidly in the disordered region for nearly all the glides except maybe a few at the extremes. its interesting that density lsbs vs msbs `dlo, dhi` is different in the low/ left density region, and the max 0/1 runs `mx0, mx1` increase even as the average bit runs `a0, a1` are constant. this shows some shifting of the overall distribution in a way that doesnt affect averages and `construct9c` diagram points at similar “stretching” as `sd0, sd1` hint at. how to explain this? it would seem there would have to be some commensurate/ counterbalanced adjustment to sub-average runs as the high long-tail ones spread out. that could lead to some (more) interesting visualizations.

`construct5e.rb`

(2/24) have been using `bitwise` code for quite awhile, over a few years now going back to earliest instantiations. feel the latest structure honestly is a bit funky and was based on some “single graph lookahead” ideas from quite awhile back, and have been questioning it a little, it adds nodes as triplets ie node plus 2 adjacent neighbors. a question, long nagging at me, is a greedy search on the lookahead structure the same as just a greedy search on a nonlookahead structure? it seems so—since writing it having 2nd thoughts—but dont have a strict proof of that.

this is some new `bitwise` code similar to some recent code that changes it to modify the msb from ’11’ and ’10’ as 2 next nodes, which as noted is a basic induction pattern, and remove the triplet lookahead structure, and seems much more straightfwd/ streamlined, and think its nearly identical but do still wonder about how it might behave differently than the prior triplet lookahead structure. it also refactors the multioptimization ordering logic somewhat to be called as a subroutine `order`.

the basic optimization here is on ‘nw’ initial bit width blue and maximizing ‘d2a’ the density core distance of the 1st postdetermined iterate, green. density narrowing is already discovered in various aggregate statistics as stated but this wonders about density narrowing in single iterates/ trajectories, and finds it; ‘d2a’ cannot be increased even as pushed on. however, its not especially tight. the search ends around d2a≈0.05 similarly to `bitwise24`. another immediate question is if pushing on trajectory size/ length statistics as not done here would affect these results much. doubt it but its worth testing (at some point by someone aka “todo list”).

so all this is nearly exactly as expected and still in conformance with the general proof outline sketched out, but theres a remaining “fly in the ointment/ wrench in the works”—my other remaining issue is again with the recently identified ‘ufos’ or large 0/1 runs that can arise within near-center density regions and cause density spikes.

💡 ❓ ❗ an idea on pondering this was the observation that the emergent ufos apparently can only arise with 01 (pair) repetitions. but what if those are somehow absent? it could be an “explanation”/ property… hmmm… but this doesnt avoid that ufos were already found in the postdetermined region. so on other hand maybe they are a red herring? maybe the postdetermined region conforms to a mean-slope-returning random walk even with occasional ufos which dont really affect its dynamics?

despite some of the prior characterization it is not necessarily possible to separate every iterate into “pre vs postdetermined” because “context matters” and the same iterate could show up as either. need to nail down this distinction further and its tricky. have already alluded to it with the idea that a predetermined part could contain parts that look postdetermined in the sense that walks close to the slope-mean-returning shape look postdetermined. but on the other hand the postdetermined shape is (or must be) more constrained.

call the downslope-mean-returning shape the drain (could call it the flush but decided against it lol).

so another way of looking at it, shifting slightly away from pre- to post- determined property/ concept, is that there should be a “drain vs pre/ nondrain” property of individual iterates and its very hard to isolate it. its basically the disordered region but the ufos mean the disordered region is roughly but apparently not strictly characterizable by any feature seen/ means known so far except “semicircularly” in terms of collatz iterations.

`bitwise25.rb`

(2/25) some thoughts. ufos, max 0/1 runs, density are all interrelated in a fairly simple way but have never stated this outright. so directly connecting the dots…

• a ufo is simply an extended sequence of spatially nearby 0/1 runs, triangular in shape, ie decreasing width.
• an iterate can have a max 0/1 run bounded based on density such that almost all the 1s or 0s are adjacent eg a 1-run of size d·w where w is the bit width, or a 0-run of size (1-dw.
• there were older experiments almost a year old that looked for longest trajectories while limiting max 1-runs and these are essentially the same idea/ feature.
• the large series of recent FSM experiments are also essentially oriented around the same feature. etc!

the 3rd ‘nl’ graph of `hybrid14c` code from last month installment 1/2019 is on my mind and deserved further comment on a striking aspect. the search started out at bit width 100. looking at the graph, and it would pay to regraph this using a new format, there are longer 0/1 runs found in the range 110-140 and there are other 0/1 run “bumps” around that center but they are all shorter. there could be at least two explanations for this. (1) its an aspect of the search algorithm centering its “energy” on that range, and running it longer would cause this center to slowly move upward through larger iterates. (2) maybe not as probable, but a striking finding that would be very good for a proof: maybe there is something special about this range such that larger iterates dont have long 1-runs or maybe their relative size decreases etc (ofc all binary strings exist, but further context in the sense of “in a (long) climb, drain, etc” wrt collatz fn/ recursion are potentially more complex/ limiting). (or is all that already ruled out by other findings? not sure.) so more investigation/ focus on this phenomenon is warranted.

also dabbled a little with an idea of creating iterates with large 0/1-runs and “working backwards” with the inverse collatz mapping (not really a function because its not single-valued) and looking at what can be found/ is possible, either smaller or larger iterates. that code is turning out tricky to write because it may “get lost” wandering in a large space of smaller iterates trying to find larger iterates—or was it the other way around? need to review/ revise/ finetune/ polish the idea.

(2/28) 💡 ok this is an idea that have been thinking about a long time and never got around to it, and represents a different angle/ switching gears. its a sort of reverse-engineering technique that was not so simple to pull off. there is somewhat similar code in the past that looks at “pre-iterations” but this is more sophisticated. the idea is to find all possible smaller iterates that could lead to a given iterate and also wonder about 0/1 runs affecting this property. given the empirical observation from many months ago (“vertical vs horizontal dimensions”) that iterates never get larger than about ~1½ times the initial bit width (and this tightens for higher iterates), one can exhaustively dynamic search all possible smaller iterates for smaller iterate bit sizes.

once the pre-iteration algorithm is in hand then looked for glaring statistical effects. did not find them immediately and then had to write a little more sophisticated analysis. this ran overnight for ⅔ of a day and generated 9379 samples. it is looking at a null hypothesis type situation for 2 different iterate types, both exactly 20 bits long. longer iterates did not complete the exhaustive search due to large search space. (1) the 1st generation strategy ‘r’ puts a “moderate size” run on top of a ½ density iterate reusing prior code to generate samples/ candidates. (2) the 2nd generation ‘d’ is just “straight” ½ density.

then it runs the preiteration finding routine. this routine looks for prior iterates and returns the total # of pre iterates found and the longest pre-sequence count, in the code these are called ‘oks’ and ‘okmx’ but are renamed in the graph as ‘n’ and ‘c’. the algorithm may also return ‘x’ for no iterates found after exhaustive search completed or ‘e’ for a run that exceeded 10K iterations and “timed out”. a quirk of the code is that the sorting algorithm search based on least iterate was necessary and apparently more efficient (ie earlier path terminations aka smaller search space) than sorting by longest preiteration count (commented).

there are sophisticated statistical techniques to do “null hypothesis analysis.” empirically here is another less strict method, basically running average and visual discrimination. as in the graph after about 5K samples the most varying lines ‘rc’ and ‘dc’ start to stabilize and separate by a roughly stable distance, others stabilizing much earlier. could graph the narrowing standard error eg between them eg as error bars to show this more definitively, it would be interesting to look at (aka “worthwhile exercise for reader”). the major finding of the experiment is that its indeed possible to see a subtle trend/ difference/ distinguishability between total # of pre-iterations in the run vs density samples, and the run samples tend to have a slightly longer max pre-iteration count (‘rc’ green vs ‘dc’ magenta). the rates of timeouts ‘re’ black, ‘de’ red are further distinguishable. ‘rn’ dotted red, ‘dn’ dotted blue are very close. ‘rx’ lightblue, ‘dx’ yellow are distinguishable also. ‘rx’ lightblue is very close to ‘rn’, ‘dn’ and have no idea how to explain that right now (maybe just a coincidence? yet seems unlikely).

the idea that 1-runs somehow “boost” the potential pre-iterations seems somewhat coherent with prior results but also cant totally visualize at moment how all this meshes together. how does this concept of maximum-length pre-iteration sequences relate to trajectories/ glides/ climbs/ descents? ❓

`vary10.rb`

## 5 thoughts on “collatz comb”

1. Alberto Ibañez

Hello. I would like to collaborate with you but I have been reading your paper and I do not understand, because my level is very basic. Despite this, I have been reading some paper about Collatz as entertainment. I think you are very good with graphics and I am trying to do one but I am not able. Could you help me?
In the y-axis(or x) we set all n.in the x-axis we set the numbers with 0 odd rule applied, the power of 2, after, the numbers with 1 odd rule applied, after with 2 odd rules applied, and so on. I am very curious about the form of this graphic.

When we talk about steps, I prefer to differentiate between odds steps and even steps. So, we can distribute all numbers into a “group”:
– 3^0: the numbers that go to 1 without any odd rule applied, the powers of 2.
-3^1 : the numbers that go to 1 before 1 odd rule applied. These numbers are the numbers of the 3^0 list with the 2^x – 1 form that is divisible by 3 and their 2 multiples. I am not sure if this is a kind of congruence.they are the powers of 2 when the power is even. the odd number of this list connect to a number of the 3^0 list, the even connect to an odd number from the same list and could be able to connect to the 3^2 list.
-3^2: the numbers that go to 1 before 2 odd rules applied. these numbers are the even number from the 3^1 with the form 3^1-1 and dividable by 3 and their 2 multiples.
And so on with the numbers with 3,4,5,…odd rules applied.
-3^1: 5,10,20,21,40,42,80,84,85,160,168,170,320,336,340,341,…
-3^2 : 3,6,12,13,24,26,48,52,53,96,104,106,113,213,226,227,…
-3^3 : 17,34,35,68,70,,75,,136,140,,150,272,280,296,…
-3^4 : 11,22,23,44,45,46,49,88,90,92,93,98,176,180,184,186,…
-3^5: 7,14,15,28,29,30,56,58,60,61,112,116,120,122,,224,232,240,244…
-3^6: 9,18,19,36,37,38,72,74,76,77,81,144,148,149,,152,154,162…

Thanks.

1. vznvzn Post author