it might seem sometimes like work on this problem is “endless”. am coming up on the ¾ 100 mark count of blogs on the subject. it seems sometimes endless not unlike the sense of an unhalting Turing computation, but one which neither cant be proven nonhalting *(lol hows that for a triple negative?)*. there have been ups and downs to say the least. or maybe its been more of a rollercoaster ride. in the sense that bipolar disorder switching between mania and depression is also a rollercoaster. ok, just being a little )( dramatic there. but one tends to get dramatic after years of work on a problem. in agile programming theres a concept of “features, stories, sprints, epics” of increasing complexity. work on this problem is exceeding an epic at this point. speaking of that and drama, there was also an old book on software engineering about *“death march projects”*. they are real, they do exist. maybe am feeling a bit demoralized/ disillusioned after working on one at *slaving* work for ~¾ of a year now. ofc such things arent *the* *corporate salt mine**announced/ designated* (thats part of the *groupthink/ collective denial/ blind spot* that goes on!) so it comes down to *“for all practical purposes…”*

overall feel the work has been worthwhile but the goal of a proof is always lurking and by that measure, theres something of a failure. *argh!* but then, thats a very high standard that no human alive out of 7B, even more nonliving, has ever achieved. yes, lets keep in mind that mathematicians who have worked on collatz in the past have died, not the least collatz himself. it would be dramatic to say they “died working on the problem” although it would not be entirely inaccurate either…! hard problems were once compared to a *femme fatale* in this blog… and the other metaphor is relevant, *the (mostly unpublished, but still very real) “graveyard of failed approaches”* by others + myself…

its a bit awkward or unseemly to call all this an “obsession” but not entirely inaccurate either.

in this realm, have to take solace and reassurance in some small )( victories.

*small victory:*CS blogging superstar**RJLipton**recently inquired about my work, and replied to him. (one of the rare few who successfully published a book out of his blogs. utterly shamelessly, another that immediately comes to mind is*belle de jour*.) didnt lead to any further dialog so far but its a delightful moment/ incident/ vignette.- another very noteworthy item:
**Terence Tao**one of the biggest legendary living mathematicians, the only one winning $3M to my knowledge, filed a paper on arxiv on the collatz problem. afaik its his only “published” work on it outside of a blog page. have posted comments to his collatz blog page over the years asking for attn, never got his or others direct attn, the comments have tended to get anonymously downvoted—*no hard feelings lol!* - am reading the Book of Satoshi edited by Phil Champagne, what a awesome joy/ pleasure. SN is one of the greatest living computer scientists if you ask me, one cannot overstate his
*once-in-a-century*accomplishments, and as far as code affecting reality in an almost*superpower/ superhero or even godlike*way instead of the typical case of*vice versa,*he is unmatched, it is an extraordinary gift/ honor merely to be a contemporary and living on the planet at the same time!*a new hero of our age for the pantheon.*he accomplished what almost nobody has ever done. he managed to write and start/ lead the open source project and gain followers, almost making it look easy. almost all other cases of this type of accomplishment pale in comparison. his work is quite similar to Linus Torvalds in many ways and yet in many ways*on a much vaster/ impactful scale (honestly am a bit envious of them both, having long attempted a vaguely similar venture and having achieved an infinitesmal*)(*fraction in comparison).*basically,*geeks meet economics.*which brings to mind that old zen question,*what happens when an irresistible force meets an immovable object?*had no idea how difficult such endeavors really are. and theres a unmistakable element of randomness as with all “viral” phenomena… alas, there are very few clues as to his personality or identity…

⭐ ⭐ ⭐

brief summary: had some )( fun last month with some more entropy-traversal ideas. do believe these are important for the general problem of trying to find proofs from computer searches, a theme that is starting to emerge maybe somewhat more clearly. further thoughts on that. there are really 2 different key ways to look at the entropy dynamics in those experiments.

- average entropy: the traversal frontier has a set of points with some average entropy over all the points. it appears possible to keep this as a relatively stable distribution even as the traversal frontier expands, and that seems significant somehow.
- this was not brought out so directly in those last experiments, but 2nd thoughts + further tinkering help me identify it. the total entropy seems to be the sum of the entropy over all the traversal frontier points. and this is not so obvious from recent experiments (the reported parts), but the traversal frontiers almost always increase exponentially. did an experiment trying to greedy reduce/ minimize the traversal frontier, and it did have a major effect of maybe decreasing its size by say roughly a factor of 2, but it was still exponential. and then the total entropy over the traversal frontier must necessarily be expanding exponentially also.

the 2nd point seems to be relevant to understanding whats going on. one can “play (‘sophisticated’) games” with average entropy over all the points, but the overall complexity of the traversal measured in “total entropy” seems to be increasing nearly exponentially. now, ofc theres an idea of trying to find increasing order in the traversal order and that was pointed out/ emphasized, but the traversal orders found were relatively disorderly.

💡 idea on rereading that: need to somehow take into account the *(overall)* order of the traversal *along/ combined/ balanced with* the entropy dynamics of its points. or rather, the “entropy” of the traversal tree needs to be part of the optimization… do recall there were several-year old experiments that looked at/ optimized based on traversal orders without looking at entropy of the points…

on the other hand, these ideas seem to be generalized as far as trying to turn arbitrary problem optimizations that reveal feature/ property trends into overall proof structures. there are *many, many* optimizations over the many dozens of blogs on this subject, and they almost all seem to reveal trends, but those trends seem “purely” empirical and the quest of transforming them into (“more”) *universal properties* is (extremely?) elusive, out of reach so far, yet seems to be at the heart of the problem… ❓

**(10/6)** 😮 😥 😡 😳 gobsmacked/ reeling/ disoriented/ dizzy from some personal situation aka “once in lifetime midlife crisis” “served up” early this sunday morning, but long time coming, speaking very euphemistically a “deep personal-social paradigm shift,” but also feeling, borrowing a phrase apparently from the relevant foreign culture, *its like talking to the wall…* nevertheless am gonna try to write up some results developed last nite. `bitwise49`

from last month had some sophisticated ideas and looks at ‘mx01s’ over the global max peak index relating to the glide length ratio ‘h’. lately it seems like ‘mx01s’ over the postdetermined glide “remainder,” mostly descending, is crucial in “causing” the descent. this code names that latter measurement ‘mx01s’ and tries to maximize it along with the postdetermined glide remainder length (expressed as ratio over initial bit width), ‘cr’, along with the other standard trajectory length variables, along with a new combined variable ‘mxsr’ which is the product of ‘mx01s’ and ‘cr’. however, somewhat similarly as prior experiment, ‘mxsr’ in green seems to be asymptotically declining/ bounded by a constant for sufficiently large trajectories. the basic idea of ‘mxsr’ is that either large postdetermined glide remainder length (ratios) or large ‘mx01s’ 0/1 runs in the region will enable it to increase but “pushing on” both at the same time runs into a limit. the graph is very crowded/ overplotted/ obscured and needs some attn on that. but ‘mx01s’ ends right at the range ~0.1-0.2 and ‘cr’ more noisily in range ~0.1-0.4.

the 2nd graph is a detail on the same data ‘mx01s’ vs ‘cr’ color coded by initial bit width ‘nw’. its not a simple story somewhat because lower iterates behave somewhat differently than higher ones, but basically for higher (hotter) iterates on left side, theres a very steep “knife edge” cutoff where, moving leftward, decreasing ‘mx01s’ (*x* axis) seems to “cause” ‘cr’ (*y* axis) to decline steeply also. in other words ‘mx01s’ seems to measure mixing and decrease for higher mixing, and the mixing seems to cause decline, and higher degrees of mixing seem to cause steeper decline. not examined further or in more detail here is how mixing would or may tend to change over the individual decline (that is, after the predetermined range, because its long known that there can be large 0/1 runs in the predetermined range that disappear by the postdetermined point/ range). but that question reminds me of the breakthru/ almost eyepopping experiment `construct40`

from 2 months ago which did seem to directly explore individual decline slope causation, 8/2019, and which was able to strongly link/ correlate descent slope to the 3-power sequence.

**(10/10)** ❗ 💡 😮 going thru some crazy freaking @#%& personal stuff and dont have much time or sleep lately but cant stop working some on my favorite problem in the world of course. had some great ideas and havent had a chance to code them up. this is a quick result that is found without much analysis, and is *remarkable* and not all that much suggested by prior work except maybe with some very subtle observation/ analysis.

`construct40`

from 8/2019 has been on my mind some along with the cascade of related experiments on the “3-power sequence”. another way of looking at it is that initial iterates seem to control all later iterates esp during/ inside the drain. ofc this is already obvious but another idea is that even given that, somehow there is (or seems to be, hinted in some experiments/ analyses) roughly only a 1-variable (dimensional) range/ continuum of possible drain angles. how could this be further explored? my idea is there are different emergent properties of drains for individual trajectories, ie something persistent about their behavior. there are already all kinds of features to look at wrt this. and what exactly would it mean? a simple concept is that one part of the drain is “like” another part wrt the same features. the word “like” is typically translated in data science as “correlated”. this ties in heavily with “emergent properties” (of the drain). so heres a 1st cut on this very simple concept at heart that already yields striking results. didnt know what to expect and found something!

this code does a scatterplot revealing hidden properties. it looks at 50 * 4^n points for n=[1, 2, 3] ie [200, 800, 3200] points. ½ density iterates are generated ie likely in the drain or close to it (ie quickly iterating into it). the points are plotted (*x*, *y*) where *x* is the slope of the 1st half of the trajectory and *y* is the slope of the 2nd half. what is expected? anything? the 1st plot reveals no signal. but here is a story where, as in some cases, the point count turns out to be everything. increasing the point counts reveals a hidden trend. *the points cluster in lines!* (may have to click on detail image to see this.) but then was wondering about taking noncontinuous bitwidth slope measurements vs the continuous logarithm, thinking it could influence results, graphed red vs blue. it does, the linear trends seem to change alignment, but they dont disappear. *hmmm* ❓

**(10/23)** some further thinking about it. the collatz sequence can be seen as decreasing by log(2) or increasing by log(3/2). many diagrams show this pattern. am wondering if the previous clustering is related to that somehow. maybe there is more clustering for smaller trajectories due to the discrete increment nature of the sequence and the clustering tends to get “washed out” for longer trajectories…?

**(10/25)** ❗ 😳 looking over `construct40`

code again carefully, find a huge glitch/ oversight. *its too good to be true.* the code was trying to measure “density parity” ‘d3p’ of the 3-power sequence. but it was actually just a *count* of the density parity sequence, which is *longer* for longer trajectories. oops! this explains some of the difficulty of the reproducibility of the idea. now the 3-power sequence ideas need a huge reevaluation at this point.

this code 2nd looks at it all again a little more carefully using a similar framework. it uncovers a *visually striking trend,* but on closer examination just a basic property of statistics. `construct40`

did not look at parity density of the trajectories ‘d’ green and unsurprisingly it follows the trajectory lengths ‘ls’ red right scale. but then am trying to understand how parity density plays a role in the trajectory lengths. simply looked at parity densities over the left and right halves of the trajectories, `d1, d2`

blue, magenta. they are *anticorrelated around the mean* ‘d’. this *at first glance* seems to be a very big deal and appears to be a *long-term memory effect *but really its just a simple property of sequence statistics. the other metrics ‘da’, ‘d3’, ‘d3p’ are average iterate density, average density of the 3-power sequence, and density of the 3-power parity sequence, all coming up mostly flat, turned off in this graph. the `construct40`

results showed ‘da’ has some distinct correlated trend with trajectory length but only (“emerging”) over very many averages, 100 samples in that case. the big mystery is why some trajectories have different parity densities than others and have been looking at various features trying to explain this but am so far coming up emptyhanded. ❓

**(10/26)** this code looks into a new backtracking idea. the basic inspiration is the ongoing hypothesis/ theme that long 0/1 runs dont show up in “glide right”. this creates long 1-run seeds and tries to work backwards to find a glide that contains the seed, by minimizing the initial iterate, and is somewhat similar to some earlier ideas. it works starting on a bunch of random seeds (10) in parallel and runs for 1K iterations. then was wondering about the effects/ trend over different size 1-runs. it plots logarithm of iterate width found during search. the cooler colors are for shorter 1-runs and the hotter colors are for longer 1-runs. it looks for 1-runs over the whole range of 500 bits in increments of 10 (10 to 490 1-run lengths). the results are somewhat noisy and variable, but some results come out looking like this image, which tends to suggest that its somewhat easier to find lower starting iterates for seeds with shorter 1-runs, with the cooler colors pushing down at the bottom of the diagram, ie generally supporting the hypothesis. the neat technique/ trick of only plotting min, max values over a recycling buffer is reutilized.

*2nd look.* not revealed obviously in these logarithmic statistics, the algorithm is finding “slightly smaller” seeds of only a few bits shorter. am thinking that the algorithm is almost a little “too greedy”. it is constantly greedily trying to find smaller seeds and then misses smaller seeds that cant be found with a highly greedy approach ie maybe smaller seeds exist by relaxing the greediness some in the middle of the search.

**(10/27)** 💡 ❗ have been “coding up a storm” building many variants of ideas similar to `construct57`

and attacking from very many angles. tried almost every trick in the book so to speak. *almost* everything has turned up empty. the general question is, does anything (local) in the (drain) iterates tend to influence the total trajectory length? its the very strong curse of undifferentiability again playing out. one idea was to look at the sophisticated/ powerful histogram analysis last seen in `construct9c`

in 2/2019. this came up with noise, a null result. the technique was useful for Terras-generated glides but seems to find nothing in the drain type iterates ie ½ density. looked at other histogram ideas and couldnt turn up much. looked at the standard statistics ie run lengths averages/ standard deviations/ maximums, nothing, its all apparently “flat” across different size trajectories. ideas about density parity, parity density, 3-power sequence, *etc.,* many experiments, many null results.

then *finally* somewhat out of desperation came up with this relatively straightfwd idea which maybe doesnt relate exactly to trajectory lengths but is along similar lines. there is an observation and/ or assumption over many experiments that ½ density iterates are the same as/ indistinguishable from the drain. there is very strong support for this, but suppose it wasnt *exactly* the case? how could this be determined or isolated? what would be an experiment that could extricate any subtle difference here?

this code creates 500 200-width ½-density iterates and then finds the first 100 width iterate in their trajectories. this “near midpoint” iterate of the trajectory is maybe “more likely” to be “drain” than the starting ½-density iterate. can it be statistically distinguished in any way? this code tries to find a difference between the midpoint iterates and 500 ½-density 100-width iterates generated at random. *it does find a subtle repeatable difference. the standard deviation on the 1- and 0-runs are different. the ½-density iterates consistently have slightly less.*

the output is in pairs, the 1st for the midpoint iterates and the 2nd for the randomly generated ones. eg for 1-runs ‘a1’ the standard deviations are (rounded) 0.26 vs 0.19, and for 0-runs ‘a0’ its 0.27 vs 0.19. in short the randomized ½-density iterates have a measurably tighter distribution around the mean. a similar trend (tighter distributions) may exist for the max runs ‘mx1’ and ‘mx0’ although not as much difference. other statistics do not seem to be much distinguishable. eg average 0/1 run lengths among the 2 distributions are very close etc.; and something else to remark on here wrt technique: there is no graphing at all, its “back to basics” of just pure basic statistics. one might say in line with all this that while otherwise often powerful, sometimes graphs are just a distraction from numerical analysis/ *“merely/ purely numbers”*… however that said on other hand am already thinking of new ways to try to visualize this…!

"a1" {"a"=>1.9955529866762827, "sd"=>0.2646066067814119, "mx"=>3.2} {"a"=>1.9676823923380227, "sd"=>0.1912118771198235, "mx"=>2.9411764705882355} "sd1" {"a"=>1.2524662868261798, "sd"=>0.3902061098243887, "mx"=>3.2310848483214047} {"a"=>1.2368450734243273, "sd"=>0.32241384022292036, "mx"=>2.581744841618004} "mx1" {"a"=>6.02, "sd"=>1.8405433980213577, "mx"=>16} {"a"=>5.902, "sd"=>1.5660127713400043, "mx"=>13} "a0" {"a"=>2.019627252841446, "sd"=>0.27584362315643784, "mx"=>3.2222222222222223} {"a"=>2.0054265127185165, "sd"=>0.19980328081897816, "mx"=>3.125} "sd0" {"a"=>1.284487098791806, "sd"=>0.41230199771326675, "mx"=>3.0608201577040073} {"a"=>1.2947599235142957, "sd"=>0.36128297064957804, "mx"=>3.2800918843963967} "mx0" {"a"=>6.16, "sd"=>1.9376274151652593, "mx"=>16} {"a"=>6.198, "sd"=>1.6801178530091265, "mx"=>16}

**(10/28)** after quite a bit of hacking and finetuning to further isolate the feature and coming up emptyhanded (null result) on some other angles, finally this somewhat quick-and-dirty-looking code applies the `convert`

subroutine with `scale`

logic turned off (overall something like a 1-run histogram viewer) to look at the different distributions of the (concatenated) 1-runs, and changes the bit widths to 100 and 50 instead of 200 and 100 to amplify the effect some. it appears the effect tends to dissipate as bit widths increase and also “wash out” some with `scale`

logic turned on. its a very subtle bias but consistently detectable/ repeatable/ replicable. red is the iterated distribution and green is the uniform ½ density distribution. turning on the scaling also seemed to decrease the differentiation. in a sense the uniform distribution is slightly more “chopped up” into a larger number of separate runs. which seems like it should show up in some kind of entropy statistics but havent figured out how to isolate it from that angle yet.

**(10/30)** am thinking maybe the way to go is explore backtracking patterns. havent looked into some of this angle much so far. here is an early idea. this just calculates 24 levels of all preiterations of a seed using the same `backtrack18`

long 1-run generator and leads to around a few hundred pre-iterates when it doesnt “dead end,” avoided with restart logic. then it just outputs those found iterates using the grid logic, 1st sorted by generation order, and then sorted by iterates. the patterns are striking. in the 2nd grid it shows theres a combined lsb and msb pattern, with groups of msb patterns emerging, along with regular stripe structures. this is not all that obvious or expected. maybe the idea here is that lsb patterns must necessarily lead to larger preiterates and the msb patterns dont need to be considered. remarkably there also seems to be a single continuous band through all the iterates. have to think thru/ explore this more…

Alberto IbañezHi. I would like to propose a relationship of congruencies related to the periodic orbits in Collatz so that you could tell me if it is correct and if it has a solution, does the latex work in your blog?

As for infinite orbits, it is necessary to consider whether they really exist. It would be an interesting debate.

As for the oscillatory sequence of Collatz, it is because it is compared with the set of numbers ordered from least to greatest, which is not an absolute order, it is an ad hoc order. Like a sequence of Collatz, which is perfect order, in the sense that the function dictates. Why compare them? They are two perfectly ordered sets, each in its own way, even with periodic orbits.

Pingback: collatz going in circles | Turing Machine