collatz histogram discrepancy

based on 3mo 90d span, summer ⅔ over, it always goes in a blink. its like a colorful butterfly, bright, energetic, shortlived. have been telling people its like the Lost Summer this year. but as the infections continue to stretch out (two bigger states TX, FL esp feeling the pain), its almost turning into the Lost Year. the word unrelenting comes to mind as well as this old evocative analogy surveyed by RJL, 2009. but, as the japanese say, fortune and misfortune are like a rope twisted, or, the reverse side has a reverse side, so is that a con or a pro? coincidentally “unrelenting” is the kind of spirit/ discipline reqd to solve problems with world-class difficulty…

this is a quick take on the histogram discrepancy idea and already have new ideas, but its worth recording. my initial idea was to try to find expected values/ curves for histogram discrepancy of ½ density iterates. however, that is not a straightfwd undertaking, its maybe even a little daunting. theres not really a straight line, theres some curve, and curve fitting while very commonplace is still timeconsuming, and the other trickiness is that the curve would vary over iterate sizes. `construct146` from last month show this a clear/ nonnoisy/ but shifting/ nonlinear/ nontrivial trend.

so then my next idea was to just calculate the expected values. the immediate idea is to do a (lazy loaded) table lookup to get them. my other initial idea/ instinct was to use the stepwise code to look at minimizing the histogram discrepancy while trying to push up other trajectory size metrics. this proved to be both interesting but also maybe incomplete. the expectation is that minimizing the histogram discrepancy would tend to attenuate trajectory dimensions. it turned out to lead to substantially different dynamics based on which set of variables were optimized. ‘hd’ magenta is minimized in all cases.

the most striking result is #5 where minimizing histogram discrepancy but pushing up ‘nw’ bit width led to an alternating dynamic where both spike or flatline/ plateau in conjunction with each other and trajectory measures ‘cg’ green, ‘cm’ blue are very low. this shows a clear/ connected relationship. rarely see optimization trends look like this. it also reminds me of the adversarial algorithm idea thats been alluded to here many times now, where in a sense one algorithm tries to control another, or both try to control each other.

the other optimizations tended to push down trajectory measures with clear plateauing trend in graphs #1-#3. another standout run is #4 where ‘cm’ started to flatline and then oscillate between low and high values but all below bit size, which also seems to indicate nearly all the points in the search buffer converging wrt ‘cm’ at the end somehow. #6, #7 are more ambiguous in that the histogram difference didnt seem to push down trajectory size trends much, leading to further thinking/ ideas.

in other words, maybe mixed but still mostly preliminary results here. maybe had some idea/ expectation that ‘cm’ in particular would be (more consistently) pushed lower than ‘nw’, that would have been best case scenario, but stuff is rarely simple around here; on the plus side it did happen in 3/7 runs, #4, #5, #7. also surprised the dynamics work out so differently depending on the optimization parameters, maybe just my imagination but it seems more sensitive/ diverse than typical. as remarked previously each setup is like an energy system with different interacting elements/ facets/ components/ dynamics/ forces/ physics (again recalling that analogy put up awhile back about fusion, or again Bohms fishbowl panes). it is more than a use of a word/ mere metaphor to say optimization(s) push on different parameters. and many physics aspects have been found wrt the problem by myself + others.

the other consideration is that the optimizer doesnt have strong restrictions on how to do multivariate optimization and does a “best effort” balancing the constraints based on the (rank/ index) normalization logic, its a gradient descent algorithm/ “greedy” method that can be compared to “pursuing the low hanging fruit.” have had some ideas about adding more awareness/ detection/ analysis in the optimizer about how the different variables interact/ “trade off” against each other eg SVD (Singular Value Decomposition) analysis of the optimization vectors etc. but dont have clear direction on that yet.

did do some further analysis not pictured here. eg looked at the bit grids for some of the later iterates. it is possible the algorithm was finding a seed with low initial discrepancy and then losing that property later in the trajectory. if its happening, its not clearly visible yet. however, did notice some slight visual difference between trajectory bit grids and randomly generated ½ density control sequences of same size. definitely need to look at histogram discrepancy over the trajectory as a basic analysis. also looked at the “wing diagram” control comparison invented last month in `construct146`, and while there is some difference, nothing stands out strongly.

`stepwise9.rb`

⭐ ⭐ ⭐

💡 ❓ thinking its time to introduce some new ideas matching latest intuitions and helping with grasping/ visualizing. collatz problem seems to require constant hypothesis finetuning. it makes sense to define a key concept that has been alluded to many times, esp in the “postdetermined glide” focus/ now near fixation. call a glide ending in the predetermined range shortfalling, and conversely, call a glide ending in the postdetermined range overarching. as long noted it is sufficient to resolve the conjecture for the overarching glides only. now, restating the histogram discrepancy conjecture/ hypothesis outlined in prior blog, the idea is that all the iterates in the overarching postdetermined glide have nearly ½ density histograms, ie low histogram discrepancy.

the above optimizations are not looking at the distinction of shortfalling vs overarching, and the histogram discrepancy measurement ideally would work in both regions, but it only is necessary to work in the postdetermined (glide!) region, and so maybe some of the ambiguity/ mixed results above is not a problem. (reminder) the ufos existence show that spikes in histogram discrepancy can be found in the postdetermined postglide/ drain. also, some of the epic struggle so far has been about wrapping my brain around the quite different dynamics of shortfalling vs overarching trajectories. thats understated, its much more than a difference, it seems to tie into the crux or even connect with the “spine” of the problem, thinking esp of the rosetta diagram. alas, it took years to take Terras seriously, a massive blindspot leading to maybe much wandering “off in the weeds”… 😦

but if none of the prior optimizations are the crucial ones, then what is? this all also reveals the general difficulty/ limitation of trying to convert proof ideas/ directions into optimization algorithms. on the other hand it also seems about trying to cast the question into an optimization idea. somewhat like the highly tested skill (eg in SATs, scholastic aptitude test) of converting word problems into algebra equations to solve, but far more advanced… even after years of practice, still learning… 😮 👿 😈

heres what immediately came to mind: generate a distributed range of ‘cm’ glides over a fixed bit width. then look at the shortfalling vs overarching transition point, and look at histogram discrepancy for the overarching glides only. the Terras density ideas give an easy way to construct overarching glides, and can compare those found with reverse engineering approach to see if they are representative or not. another consideration is that the histogram discrepancy measurement/ estimate is inherently/ necessarily somewhat noisy, and one can look at the histogram discrepancy between two ½ density iterates to estimate its noisiness/ deviation. just like other features also have to look at its “relative size.”

further thinking (out loud). some of the difficulty is that the histogram discrepancy measure like most other measures maybe scales with iterate sizes; have to look at this more directly, it seems to be implied in some of the graphs eg #5. somehow have to factor that into consideration. the scaling of the histogram discrepancy wrt other measures such as iterate bit size and trajectory dimensions is intertwined, not extricated, and part of the interpretation difficulty of the prior code/ outcome and seems crucial to understand.

probably/ definitely need to look at average histogram discrepancy vs bit size trend. ah, a crucial oversight there. a theme is that the metrics have to be scaled carefully just as the problem is scaled. seem to have a tendency to invent features that start out as unscaled ideas/ concepts and then discover their scaling properties after further investigation. in other words, a theme is to constantly ask the question, after identifying some feature, but how does this scale? the big example of that are 0/1 runs, properties of which were chased for a long time, with a slow realization that many were “merely” related to scaling properties of ½ density iterates. the 0/1 runs are pivotal in various ways but their “natural” scaling was unnoticed/ misunderstood for a long time. on other hand, turning the volume up to 11 so to speak, histogram discrepancy is about a measure calculated over all 0/1 runs.

(8/5) however one powerful tool in the toolbox here instead of trying to carefully quantify trends is the lazy calculation/ caching technique which works great above, a highlight how easy it worked out. its almost an “on-the-fly” control calculation. the earlier work tended to miss the powerful idea of control comparisons/ distributions. again a key standout/ recurring concept is careful normalization.

another idea: there was an old metric studied of ratio of glide length to iterate size, aka “horizontal scale,” even before Terras concept was rediscovered; that seems to be crucial here, although maybe total trajectory length was 1st considered. the basic idea seems to be that for overarching trajectories, the (scaled) histogram discrepancy controls the horizontal scale in the postdetermined glide. now, can that be phrased in terms of an optimization search?

other idea: on other hand, despite all that distinction thinking now laid out, histogram discrepancy in another sense is independent of the pre vs postdetermined range. maybe its about trying to show that histogram discrepancy of x tightness implies drainlike trajectory for y=f(x) span of iterates, that is roughly how it was imagined/ phrased last month. again, trying to figure out how to phrase this as an optimization search, at least with less ambiguous results than the prior one.

ok, heres an immediate look at histogram discrepancy scaling over random ½ density iterates of given bit width using histogram cache subroutine. there are 2 immediate findings. it definitely scales, and its very noisy. ‘hd’ red scales somewhat steeper than max 0/1 runs, ‘mx01’ green in the graph (right side scale). my immediate idea was about decreasing noise through averaging. but how? the histogram cache stores histograms. it would seem one needs to calculate an “average histogram.”

that code was added, and it decreases the noise some in the histogram discrepancy measure, ‘hda’ blue, which is then computed via “fractional histograms” ie histograms with bin counts in fractions. but then realized, there is an inherent noise that cant be averaged out, because an iterate compared to the histogram wont have a fractional histogram, it will retain some amount of chunkiness, which turns out to be sizeable. so it turns out smoothing out the cache histogram doesnt decrease the noise much of comparison with single iterates. as already anticipated, this immediately leads to the idea of computing histogram discrepancies over multiple (adjacent) iterates. it appears one can get a rough fast estimate of histogram discrepancy, or a slower/ smooth/ more accurate one, but not both. a fundamental tradeoff inherent to practitioners of numerical computation… almost inherent to complexity theory itself…

`construct153.rb`

(8/6) while pondering that my immediate idea/ instinct is to do an adaptive calculation of the histogram discrepancy. it can do repeated samples and take an average and decrease the deviation until its under some value. however, expecting that to be expensive and dont think it would be so acceptable inside/ as a subroutine of the optimization code. putting all those previous ideas together, then tried out revised optimization code.

• there are 2 new trajectory relative size metrics `cmr = cm / nw, cmcg = cm / cg` although on the final optimization the latter was not used and only the former was used.
• the code also looks at relative histogram discrepancy ie the discrepancy divided by the expected discrepancy for a random ½ density seed of the same (bit) size. ie expected histogram discrepancy is between two random ½ density seeds of the same bit size.
• then there is an idea to improve the histogram discrepancy estimate. optimizing on a noisy value tends to lead the algorithm to find “inaccurate outliers” and there was some indication this was happening: it was finding estimates consistently less than unity, which is not really expected; the expected value is near unity. so something needed to be done. my initial idea was to just have a fixed number of iterations (sample count) and average. but, however, then saw bit sizes increase substantially. then had a little better/ revised idea. this code has a sample count of ‘ch’ initial iterates based on the bit width of the iterate, in this case its `ch = nw / 10` rounded (up) to integer. hence as bit sizes go up the (initial) sample count goes up linearly.

combining it all, then optimizing on various statistics (actually not nec in that order, ie behind the scenes actually working via interplay between the two as usual) led to interesting results. at first the algorithm was indeed “painting into a corner” and not finding larger iterates and/or glides such that the histogram discrepancy push-down seemed to be working.

this intermediate work/ results led to some better idea what the optimization algorithm can do wrt formulating/ testing/ proof ideas. while some of this is already known/ sketched out, it helps to make everything as explicit as possible, esp after just wondering about this in general (yesterday!). the general idea is to “push on” iterate sizes and overall trajectory dimensions but find elements/ features that restrict glide dimensions. the collatz dynamics may or may not “push back”. if it “pushes back” strongly the optimization algorithm “paints into a corner” and cannot find seeds that simultaneously satisfy the optimization criteria and maybe some feature is discovered that controls glide size. as long outlined, the expected induction framework would tend to be on glide size, with various qualifications, such as postdetermined range only etc.

so overall its about trying to find the internal dynamics/ (esp) constraints/ boundaries/ limits on the energy system so to speak. also notice the similarity to the adversarial algorithm idea, which is in a sense two algorithms pushing on/ against/ aka “fighting” each other. in this case it is reduced to two parameters pushing against each other and not requiring the (sophisticated) ML to find the algorithms. the algorithms are built up painstakingly by hand. the 2 classes of algorithms are basically (1) feature(s) that control dimensions, and (2) dimension measurements.

but, then/ now continuing the story, adding more trajectory dimension optimization parameters and tightening the histogram discrepancy accuracy eventually led to an apparent key finding.

the finding is that (after substantial/ finetuned effort/ coding) the general hypothesis/ conjecture seems to be falsified, ie disproven. the algorithm finds very long trajectories with very low histogram discrepancy over the entire trajectory, even with very stable to almost-gradually-declining normalized/ resampled histogram discrepancy ‘hdr’ hovering a little over unity, yellow line. this code ran a few hours at 10k total iterations and here is the longest glide max found, with max index `cm=425` and glide length `cg=616`, viewed as a bit diagram. as verified from the optimization graph, and visually from the bit diagram, the discrepancy from random ½ density #s is low. note the algorithm sampled/ averaged the initial 616/10 → 62 points for the histogram density calculation. some more work can be done to further/ better validate this, such as trying to come up with very accurate histogram discrepancy measurements, but much of that has already been done, and it looks very solid right now.

in other words, in short/ bottom line, the (isolated) iterates look drainlike but dont drain.

😳 😦 👿 😥 (gallows imagery, so again, the undifferentiated region is where ideas go to die, its like an increasingly large graveyard of bodies…)

❓ actually, thinking it over and looking at the bit diagram, are these really new results? iirc, (need to look this up but thatd be timeconsuming), some old investigations looked into trying to minimize max 0/1 runs while still creating glides and found again the glides iterates looked undifferentiated. are those much different than these? the limiting metric is a little less complex/ easier to calculate than the entire histogram discrepancy, but are results similar?

❓ 💡 there is only ~1-1½ “asterisk” or possible loophole on this apparently very tight finding right now. ok, after extended thinking about it, maybe some leads (my brain is now trained to think this way, out of the box…)

• all the glides sampled have steeply increasing early trends in the glides.
• actually,  one can also see some (corresponding) slightly higher density in the initial iterates/ lower left corner.
• and maybe despite the resampling idea, the histogram discrepancy can be seen as rather noisy. also, again relative value is key.
• would increasing its accuracy and decreasing its amplitude further limit the glides?
• could “measurement noise” actually be hiding a monotonic increase?
• is the optimizer adroitly but not overtly/ imperceptibly “exploiting” some “loophole” in the histogram discrepancy measurement?
• and can do more careful analysis/ further postprocessing of this captured data looking for any subtle bias.
• another thing to look for is any repetition or similarity in the found trajectories. is there some repeated theme? they seem eerily similar in some ways (partly to be expected) but maybe some of that is just staring at them too long and getting a little dizzy, lol. but note its also an expected result/ outcome of having multiple optimization parameters that “shape” them similarly.
• the big question, also, is there another way to format/sort this data that tries to show that ‘hdr’ somehow limits glides?
• some of the challenge of all this comes down to correlation vs causation. graphs typically show correlations but how does one format the data to try to show a causation? in a sense a lot of the challenge is attempting to extract causation out of the correlations...
• another idea in my mind on looking at this below, wonder if the corresponding 3-power sequence has some discernable signal (eg non ½ density seen earlier) etc.
• looking very closely in the bit diagram, it does appear to be a noticeable visual difference of increasing discrepancy after the 1st 61 iterates after the bottom.
• given all this challenge again getting the urge to look at the optimization metric distributions esp “mid-flight” to try to analyze/ identify/ elicit/ extract the underlying forces involved.

💡 ❓ this gives me another idea that has been sketched out/ mentioned previously but never followed up on. one can construct what might be called head fake trajectories that look like initial drains but then “double back/ rebound” into a glide, using Terras construction. —oh wait, lol, quick 2nd thought, that makes no sense; they have deep global max indexes but thats not a glide, which only has to do with the “local” (initial left-to-right) max index. but actually something like those types of trajectories were also created explicitly with the prior experiments that find drain-like presequences via “backtracking” in front of long lsb 1-triangles/ runs, except even after a lot of targeting/ analysis they never turned into initial glides prior to the lsb 1-runs. (exercise for reader, find that blog post.) do those initial seeds then have any discernable features different than ½ density? related, have been thinking a lot about the “twin string/ foreshadowing” idea seen previously, which was applicable to long 1-runs anywhere in the iterate.

`stepwise10.rb`

(later) 💡 ❗ 😮 🙄 holy cow that idea of refmtting the optimization plot data from some different angles/ povs really turned out to be a good idea and lead to striking findings— like crossections or bohms fishbowl panes. it makes sense to look at scatterplots of any of the variables vs each other but esp something to look at is the (supposed) “control variable” in this case ‘hdr’. scatter plotting ‘hdr’ vs the other variables and color coding by optimization iteration, an idea probably not tried before, leads to a lot of signal/ patterns that takes awhile to interpret. some of these could be guessed from the prior diagram, others not, exercise to reader to describe each/ why… next, interpretation, and the key question is whether any of these lead toward better insight into the potential control/ limiting dynamics…?

in order the graphs are `hdr vs cm, hdr vs cg, hdr vs nw, hdr vs cmcg, hdr vs cmr`. the key theme here is wedges! and in #3, an unusual pattern— (jumping the gun and offering immediate absentee/ AWOL reader assistance here, lol!)— lost in the prior graph, somewhat like a unicorn (horn)… a spiral, or screw! the wedge shaped patterns seem to imply a kind of squeezing. overall, significant emergent behavior in the optimization suggesting deeper/ hidden dynamics. “what is going on here?”

it could take quite awhile to untangle all this, have now been staring at this for hours and still scratching my head/ thinking. one implication from graphs #1, #2, #3, it appears that the optimization may be terminating early from these povs, and what happens if it continued? it looks like the upward/ progressing/ narrowing wedge shape is leading to painting into a corner or maybe a strange attractor. on the other hand, thinking in terms of fractals, maybe its an infinite wedge that endlessly keeps getting smaller…? also, is it a direct refutation or contradiction of the whole idea that the highest/ longest glides have histogram discrepancy of nearly unity? but on the other hand, the maximums on upper right edges decline after unity, consistent with the idea… looking more at the data, it appears #3 has a major overplotting effect, such that the “rungs” or “levels” change colors as they are overplotted. these 3 graphs make me want to increase histogram discrepancy accuracy… think it is largely driving their dynamic somehow… maybe they are hinting accuracy increasing for larger iterates?

oh! thinking out loud, there is some natural “wedge” logic in the code, recall? the code is increasing the histogram discrepancy calculation by 1 count/ sample every 10 larger bits. one generally wouldnt want an optimization to be sensitive to varying accuracy, either decreasing or increasing… although simulated annealing has a sort of related concept where accuracy increases during the optimization… again the analogy of an energy system…

a basic question: is the optimization an artifact/ exploitation of inaccurate histogram calculation or reflecting something fundamental about it? possibly the current data does not actually reveal the answer no matter how its analyzed. although high accuracy recalculation of the histogram measure is in a sense stronger/ more definitive than mere re-analysis.

(8/9) 😳 😦 was off on a wild trip last 3 days and had the strong urge to get back and start banging on all this again, feel had to leave off right in the middle with a lot of cool/ promising ideas left hanging. then, realized, stupid mistake! wrote some analysis code that reuses the standard output routine, which overwrites the graph file, which was being used as a datafile to reanalyze the optimization, forgot to rename it. so bottom line the hours-long optimization results just generated were lost. it can be recalculated, but some waste of cpu cycles, and future generations can blame me for some contribution to global warming, lol! a bit frustrating! but not a big loss. will rerun the optimization routine. expect nearly identical results but its a question anyway.

😥 other news, nowhere else to talk about this much. someone in my family very close to me (to say the least) got seriously injured on thursday eve, and didnt know about it, but had trouble sleeping anyway. and as his closest confidant just told me, we are all connected. at this point his latest injuries are now the least of his problems, and to put this vaguely, he is now facing what might be called the Big(est) Problem. if this blog had any regular commenters with any questions, would expand on the story, but as the Weeknd says in his latest hit song, the city is cold and empty… hey Weeknd! have you beat! cyberspace even more so…

(8/10) an update. coded up a storm and have some findings. highlights/ summary:

• came up with more accurate histogram difference measurement. average over x samples.
• its noisy but somewhat unexpectedly does not converge in an orderly way. looking at successive differences has a lot of noise in the differences even later in the series. so an adaptive stopping algorithm based on that is not very accurate. it looks like just taking 10-12 sample average is fairly accurate for iterates around under a few hundred bits.
• coincidentally something about the optimization has tendency to push density above ½ and entropy below ½. deja vu! just saw that ending last month on `hybrid34` and `hybrid35` which try to distribute ‘c’ trajectory length. maybe the data is telling me something here and need to revise thinking and see difference `d - e` as a strong signal.
• `d - e` nonzero suggests that something is not quite right about the histogram discrepancy measure, apparently even for low histogram discrepancy `d - e` can be nonzero. as long observed `d - e` is nearly zero in the undifferentiated region and low histogram discrepancy was designed to be an indicator for the undifferentiated region so something seems off. observed other biases in `d, e` for low histogram discrepancy.
• sorting the `stepwise10` output by ‘nw’ and looking at the prior inaccurate histogram difference ‘hdr’ maybe shows signs the algorithm is exploiting some kind of loophole or correlation in the calculation (ie inaccuracy) based on noisy spikes, similarly to last graph #3. could try to analyze that further, but then ran into this…
• 😮 ❗ 😳 🙄 😡 sorting the output by ‘hdr2’ the new higher accuracy measurement suggests that none of the trajectory metrics are much correlated with it unless it is far from unity in which case also density and entropy of the initial iterate is substantially away from ½ density. ie after all that work, shot down in flames, barking up the wrong tree, @#%&! (maybe the title of a future blog.)

`adapt.rb`

`adapt2.rb`

(8/11) ❗ 😮 😳 🙄 😡 as mentioned am thinking some more about the 3-power (match) sequence and how it might explain a lot. its a years-old idea at this point and need to collect the different experiments in that vein. however, was looking at `review134` and `review134b` from 9/2019 and discovered they are quite wrong/ broken! now looking closely, they look at the parity of a loop iterator `x1` to determined the 3-power parity sequence instead of the collatz sequence! moreover it incorrectly computes parity of the collatz sequence, also! in other words, it compares two sequences, neither of which was correct! GIGO! mustve dashed that all off way too fast in the excitement/ heat of the moment! unsurprisingly it “found” something like a null result, lol! these were crucial, even critical experiments that tried to analyze the seed database wrt 3-power sequence— in short, really botched it!— and they need to be redone ASAP! @#%&!

some of what led me to discover this blunder: have been reminiscing about `bitwise41` from 8/2019 showed high signal for the ‘w’ generation strategy, one of the database methods, or something close to it. it was not immediately replicated at beginning of following month (in `review134, review134b`) whereas at least that generation strategy should have shown/ contained it. coulda, shoulda, woulda paid more attn to that at the time… there are a lot of 3-power sequences study (many or even most of them with high signal!) but the crucial one that could have showed general applicability had a result-altering/ erasing(?) defect. where are those eagle/ sharpeyed commenters when you need them? AWOL! 😛 🙄

(8/12) part of my strong emotion/ frustration in last entry is due to finding already that theres a very strong signal that was missed, and this is the kind of basic experiment that coulda shoulda woulda have been carried out years ago, so again WDITTIE! the problem is hard enough, but defects make it harder, and a defect that loses a signal is more than a serious problem. this is the rewritten code over all 8 trajectory generation methods and the lower black line, match count over 1st 10 iterates of each trajectory/ sequence, reveals very strong signal for long glides over all generation methods more in line/ consistent with the mentioned `bitwise41` results (cf black #7 curve below). it also shows it tends to decline wedge-like for shorter glides more for several of the generation methods (#1 ‘c’, #3 ‘cm’, #7 ‘w’), some not the case or too small to tell/ ambiguous (#2 ‘cg’, #4 ‘m’, #5 ‘p’, #6 ‘r’, #8 ‘z’). overall all this is something of a (earlier “narrowly” missed, but now solid) breakthrough. the recurring idea of the collatz function as a sort of semi-unbiased bit-string sampling “mechanism” or “operator” is again strengthened.

`review134c.rb`

(later) a lot of times have noticed that the metrics in the database file `db2.txt` dont match the glides that much, that is because they were originally computed in compressed format and after focusing on Terras properties have tended to utilize the semicompressed format. this straightfwd code recalculates the found glides and metrics in the database wrt semicompressed fmt and saves into file `db5.txt`. rerunning the prior analysis code on the new database results in even higher signal/ less noise across the different metrics. note it will select different glides based on the new statistics.

`db5.rb`

further initial look at 3-power sequence matching wrt random ½ density iterates shows that it seems to follow the same high correlation, decaying slowly. in other words nearly a universal property over all iterates, a rare and valuable finding. can quantify this further than merely noting it. but, not sure if this was reported previously, it would take a lot of digging to find it, unlike as found for Terras seeds, ½ density iterates only slightly correlate between density parity and parity, although its more than nonrandom. so the idea of a bit sampling mechanism/ operator is not inaccurate but it also requires careful qualification, as here and has been outlined in a lot of prior experiments. actually need to investigate further the connection between density parity vs parity over all iterates (major attn so far but mostly over Terras density iterates) and maybe it is a key to the puzzle esp on predetermined vs postdetermined behavior/ dynamics. best case scenario, could it even be an indicator/ discriminator?

(8/13) 😮 ❗ 💡 😎 ❤ this extraordinary finding was cooked up in less than ½ hr from existing code. it takes last month `hybrid35` and is built to find an even distribution in the histogram discrepancy, using higher accuracy code with 8 iterations, while maximizing glide lengths, all over fixed bit width 200. then finally results are (re)sorted by histogram discrepancy ‘h’ magenta below. results are amazing! the histogram discrepancy is indeed controlling/ limiting/ bounding trajectory dimensions max index blue, glide length green, and total trajectory length red (amplitude!). the code has some refactoring of analysis graph logic.

in a word, apparent vindication.

this seems almost too good to be true, a stunning reversal of prior analysis, will ofc have to poke at it all further, but thinking right now the data doesnt lie. in retrospect/ light of these new results, the prior analysis was apparently getting mixed up by comparing histogram discrepancy across nearly arbitrary bit sizes. this is apparently difficult to do correctly, was mishandled wrt intentions up until now, and will have to totally rethink how to do that “carefully/ cleanly”.

part of the explanation seems to be that the prior optimizations never strayed out of `h < 1½` and this code moves into/ approaching `h ≈ 4`. here for `h < 1½` there seems to be low variation or at least a “similar spread” in trajectory dimensions, ie also not enough range in it to see larger variation happening at higher 2x scale. but it will take some more (maybe mostly easy) analysis also to understand how ‘h’ varies by iterate bit size. the ‘h’ values extracted also are extremely uniform, basically very close to linear, attesting to the high performance/ efficacy of the gap optimization logic/ technique/ implementation. another idea: the current histogram discrepancy is balanced/ “neutral” wrt longer 0 vs 1 runs, and (from long observation here) maybe the total trajectory length below is correlated with the imbalance of 0 vs 1 runs.

further morals to the story, as mostly already known in general but still being sought in particular: optimization setup/ arrangement really matters, relative values of metrics matter. neither not always discovered easily.

`hybrid36.rb`

(later) 💡 maybe there is more to the linear ‘h’ line than initially imagined. why is it so nearly perfectly linear? a rare result in this world of nonlinear fractals. is it maybe more about the feature than the power of the optimization algorithm? its also as if the genetic sub operations of the optimization algorithm are somehow perfectly tuned to explore/ traverse the ‘h’ space. the prior mixup seems to have something to do with ‘h’ overlapping values across different bit widths. rerunning the above optimization for higher bit widths will presumably “push up” the ‘h’ line, but overlapping with other ‘h’ lines for nearby bit widths. a natural question, how can ‘h’ be redefined so that these lines are not overlapping or maybe they all perfectly overlap/ coincide? it seems to point to some more fundamental feature… and that a bit-width unconstrained optimization algorithm would then traverse that new feature more… “naturally/ orderly/ coherently”… again it seems to come down to some kind of scaling/ normalizing

(8/15) `stepwise10/ hybrid36` are similar in what theyre optimizing on but results are significantly different, and am comparing/ contrasting, trying to come up with a bigger story, the combination is helpful vs looking at only 1 or the other, and at 1st glance its almost surprising theyre working with the same data/ “entity space.” the combination shows the limitations of gradient descent/ greedy type algorithms in trying to explore/ grasp the “big picture.” there are some multidimensional relationships going on that are harder to picture than 1 feature variable vs control variable which is maybe how human brains tend to think, and how many graphs are formatted, aka more anthropomorphic lens/ bias. `hybrid36` runs fast in less than ~½m and can get insight quickly on different bit sizes/ “slices.”

`stepwise10` seems to be better at finding the longest trajectories for low ‘h’. they seem to be mostly missing from `hybrid36`, they show up as spikes at the left side, but comparing the two, likely there are many long trajectories being missed. `hybrid36`, as designed, is better at finding the long range of ‘h’ over a given bit size. speaking of “overlap” and ‘h’ trends, the story seems to be that ‘h’ is very sensitive to bit size in controlling dimensions in the sense that for higher bit sizes, nearly identical (low) ‘h’ can lead to much longer trajectory dimensions. moreover ‘h’ high range is not stable and increases quickly as bit widths increase, ie the ‘h’ slope left intercept is typically ~unity but right side/ hence ‘h’ line slope increases for increasing bit size. quick study, naming ‘hmx’ the max ‘h’ found by `hybrid36`, for bit widths ‘nw’, `nw=100 → hmx≈2, nw=200 → hmx≈3` (above), `nw=300 → hmx≈4½, nw=400 → hmx≈4½`. (re last 2 points with similar extremes) also `hybrid36` is maybe not optimizing well to find highest ‘hmx’. the range/ equidistribution logic presumably tends to target its variable endpoints, here ‘h’, but isnt particularly focused on them. but overall the basic idea of ‘h’ controlling trajectory dimensions appears intact.

am thinking now need to come up with some multidimensional optimization/ traversal that ties all this together. what would it look like? thinking out loud, its almost as if (1) range/ uniform distribution (“equidistribution”) optimization (over ‘h’), (2) and extreme optimization (trajectory dimensions), (3) and fixed parameter (bit sizes) need to be all combined in a single optimization structure, possibly ingeniously using/ adapting existing optimizers, or maybe in some new way/ combination. as just outlined/ summarized, the general multidimensional dynamic/ feature space seems not to fit into a single optimization framework at the moment. also what is a revision/ scaling (esp by bit width) of ‘h’ metric that is more desirable? aka closer to a proof structure?

(8/16) more thinking, it seems to come down to (challenges of) multioptimization and “exploring a space” (traversal). the greedy concept works well on single variate optimization but then multioptimization is more complex, esp the tradeoffs between different variables. the current code uses the rank normalizing across all variables. this tends to weight them in a sense “equally.” the multivariate optimization can be seen as attempting to push outward at the edges/ surface of a multidimensional “ball.” as mused earlier, it is natural to examine the ball (or ellipsoid) wrt SVD analysis, not done yet. the equidistribution idea recasts it into an extreme-seeking distribution where the extremes targets are (largest) gaps between the values.

(8/17) 😳 ❗ had misc ideas about new optimization strategies but then ran into this stuff suggesting its “jumping the gun.” started to look closer at some of the solutions and was in for some surprise. the algorithm is generating long 0/1 runs even for low ‘h’, a “bad” result that was thought to have been designed against, because, simply, histogram discrepancy is a generalization of 0/1 run sensitivity. the culprit was quickly found. theres still a lot of noise in ‘h’. some cursory/ quick analysis was done in `adapt, adapt2`. but theres been a rush to get/ look at results instead of looking at stability more carefully. those earlier 2 stability experiments didnt look at the normalization by the average (or “expected”) difference (comparison with ½ density iterates) in the denominator. the denominator moreover isnt computed using any averaging in the latest hybrid code written thinking/ hoping it might not make too much difference. oops, yikes!

this led to this new/ revised/ careful stability/ discrepancy calculation experiment that even with existing code took hours to write (as the expr goes nobody ever said all this was gonna be easy). it uses 4 different methods of increasing accuracy and computational expense, computing over between 1 and 100 samples for an average. ‘h1’ red is the current calculation in `hybrid36` without denominator averaging and has very substantial noise/ variance that is not decreased with numerator averaging at all eg it varies up to/ over ~1 difference! ‘h2’ green uses a denominator with averaging. `h3, h4` borrow/ reuse the fractional histogram (averaging) code from `construct153` but with some refactoring. ‘h3’ blue is a single sample on numerator, denominator, and ‘h4’ magenta is the ratio of averaged numerator and averaged denominator.

`h2, h4` seem to have about the same amount of noise but experiments (namely, just rerunning this code) lead me to conclude that ‘h2’ is consistently under or overestimating the actual value. intuitively, its off by (roughly) the difference of the fractional histogram vs the integer histogram, where one (consistently) over or underestimates the other. so the winner, but unsurprisingly also most expensive, is ‘h4’. visually the noise/ variance seems to decrease up to about ~25-40 samples. ofc now the big question is how much this refinement/ sharpening would change prior analysis.

❓ strangely ‘h3’ variance based on fractional histogram calculation is larger than ‘h1’, an unexpected result and dont have an explanation for that right now.

aside, another point, where else do so many increasing samples in the averages not lead to more stability? only with fractals. working in these areas one gradually realizes, the concept of “normal distribution” is one of the greatest anthropomorphic biases of all time in human thinking… Mandelbrot raised something like this question; what if unnormal distribution is more “normal,” so to speak? the ideas of fractals as a paradigm shift has been rejected by many for decades, but around here, its regarded as the ultimate in backward/ retrograde thinking. math/ CS in the 21st century will continue to underline that.

`adapt3.rb`

❗ 😮 😦 😥 other news: DAD, RIP

(8/18) on some further thought, those graphs trend lines look fairly dramatic in differences but theres some extra consideration. the last `hybrid36` code calculates the normalizing denominator as a constant in the `hdiff0` subroutine and works only with single bit width iterates. the idea of “stability” can be a bit slippery. in other words, more specifically, here one would like/ prefer nearly the same constant to be computed across different optimization runs but variance in it across runs might not change the dynamic of individual runs much. the monotonicity property of the discrepancy measurement can still be maintained with different denominators. in short the apparent dramatic stability improvement of the above diagram is due entirely to a more stable denominator which is real/ nice but doesnt change the optimization over a single (fixed) bit width. even after all that makes me think there is some other comparison to make closer to actual usage pattern than this one but cant picture it immediately.

😳 ❗ 🙄 oops, lol, argh, sigh! “½ baked”! even with hrs of work, coded all that up too fast/ too late/ no 2nd checking. closer look, line 155 for fractional histogram `h3, h4` numerator is averaging a constant! might have been apparent with some other plot format. its annoyingly redundant but doesnt actually change results.

(8/19) was scratching my head and hacking on prior code (sometimes even at the same time, lol!) to try to get a better understanding, sometimes a helpful approach/ technique, and then developed this basic exercise to illustrate the basic ideas involved. in short (again) stability is not simple. it calculates a 3 fractional histograms and computes histogram discrepancy for each vs an initial ½ density value. ‘hd1’ red is calculated from scratch each time, in a statistics sense that is iirc sometimes called “with replacement”. ‘hd2’ green and ‘hd3’ blue are computed without replacement. ‘hd2’ and ‘hd3’ converge quickly but (a bit surprisingly!) not to the same value. ‘hd1’ apparently converges but very, very slowly. also ‘hd2’ looks like its converged early on as far as smoothness of the line but then deviates later. etc! while this is helpful, feel still cant fully grasp all the statistics of what is being revealed here.

❓ ❗ 💡 or… does it even really converge? am starting to think may have stumbled across discovered some kind of process/ measurement that doesnt ever converge no matter how many samples are averaged… does that exist in nature? maybe a sort of computational heisenberg uncertainty principle?

this exercise is basically showing “merely” computing the fractional discrepancy histogram accurately/ precisely is problematic. it also seems to show “sensitive dependence on initial conditions” where the initial conditions are simply early iterations/ samples.

re question on ‘hd3’ vs ‘hd1’ in the last graph, the “interrun stability” (to coin a phrase) might explain why ‘hd3’ variance seems high. maybe multiple runs of ‘hd1’ have a shifting variance range and its displayed variance is misleading. yes, graphs can be misleading and these are some new ways of catching/ visualizing it.

`adapt4.rb`

(8/20) 💡 ⭐ this code took a lot of finesse to match the sophistication of the statistical dynamics under study and gives a much fuller/ better/ detailed story. some new ideas/ patterns. it finally does an adaptive calculation after (painstakingly!) figuring out/ carefully separating/ isolating how to actually do it. theres a lot to comment on here, a lot of moving parts. this looks at iterates of bit size 20..800 in increments of 20. it computes the fractional histograms and then the histogram discrepancy using the “stable” or “converged” fractional histogram for that bit size.

• the 1st diagram tracks fractional histogram calculation/ stability. the total histogram sum as a measure of its overall/ aggregate value/ stability is graphed left side, and a 20 span rolling standard deviation calculation is on right side (+ scale). the standard deviations decrease relatively quickly. the sharp drops are due to omitting outliers in the initial series, ie they “cycle out” of the 20 count rolling window.
• the 2nd graph is similar except calculating histogram discrepancy for random ½ density values of the same bit size. however, the heisenberg uncertainty idea connected to lack of convergence rears its head here. even after 200 iterations the series (20 rolling count) standard deviations do not decrease much and dont seem to converge toward zero; they seem to maybe converge toward a nonzero constant.* somewhat unexpectedly/ unintuitively in both graphs the standard deviation goes down for larger iterates. this seems to relate to the so-called “chunkiness” of bit runs in iterates which decreases for larger iterates.
• the last diagram shows that despite the deviations noted, there is strong (as desired/ designed) monotonicity both in the histogram ‘s1’ red and the histogram discrepancy ‘s2’ blue measures. ‘t1’ green right scale is the iteration count to convergence for the fractional histogram calculation and very strangely/ inexplicably/ unexpectedly it declines somewhat for larger iterates.

* 😳 just realized the 2nd calculation/ graph didnt implement the running standard deviation, instead computing over entire series. it was changed to the full range after some debugging issues. its not exactly incorrect, is informative, but not exactly as intended either.

`adapt6.rb`

here is the revised code implementing the 20 sample running standard deviation and adaptive stability stopping/ termination for histogram discrepancy also. its fun to think/ talk about heisenbergs uncertainty principle but looks like its now approaching wishful thinking at this point (lol!). theres not much surprise here, there is a convergence, except maybe unexpectedly the convergence count seems not to have any clear trend other than “noisy sideways” over a very large iterate bit size range, graph 2 ‘t2’ magenta, and not sure how to explain that right now. ❓

ok, some addition on that. there does seem to be some HUP going on in different scenarios previously such as comparing ½ density samples against the “discrete” (non fractional) histograms such that long averaging/ smoothing doesnt take away the noise.

`adapt6b.rb`

(8/24) ❗ ❓ 💡 rewired the hybrid code to use the new discrepancy calculations. there is some new code to output/ save/ store the calculations up to bit width 300, and the hybrid code loads the table. it calculates histogram discrepancy using adjacent iterates and adaptive accuracy/ termination, also scaling by expected discrepancy by bit size. it found roughly similar results as `hybrid36` except for (at least 1 identifiable) peculiarity. for highest histogram differences ‘hd’≈4 the iterates were all even with large lsb 0-triangles. these iterates seem to “fool” the iterative discrepancy calculation in that it quickly finds low deviation and ends at only 20 count steps. so modified the code to generate/ search only odd iterates. this mostly resolved the strangeness except again a smaller number of high end ‘hd’ values still have some peculiarity. am trying to wrap my brain around it. again they seem to fool the discrepancy calculation somehow with quick termination due to low deviation.

extracted one of them and it looks like this. there is a very long lsb “line” of 2 bits. this is somewhat “misrendered”/ (or at least) distorted in the 1st graph due to pixel size larger than the detail/ “aliasing” like effect. 2nd graph is the detail on lower left of the 1st graph/ grid. am surprised, dont know what to think of this/ how to explain it yet. not sure if the lsb pattern is irregular or regular, have to poke at it some more. it would seem the strangeness never ceases on this problem, and optimization algorithms can sometimes lead to extraordinary findings. and “new unseen anomalies” seem to suggest one is on the right track. also the 1st instinct is that maybe something about the adaptive calculation should be adjusted/ tuned. its apparent a single large spike in the histogram representing the long 0-run is atypical but not weighted much or much differently, only adding ‘1’ in the formula.

n=803469022129495137770981046170581301261101496891396417650689

`adapt7.rb`

`hybrid37.rb`

(8/25) but then some further thoughts on that. do those outliers really “fool” the discrepancy algorithm or not? tightening up the calculations leads to more reliable “longer” trajectories correlated with histogram discrepancy. but is that entirely required for a proof structure? it would seem that the proof structure needs a condition in the sense “if a glide is long, it has high histogram discrepancy.” but the converse maybe need not be true, and the “weird” trajectories are about the converse. in a sense they are “false positives” with high histogram discrepancy but no glide. and maybe the proof structure can deal with them, and has more of an issue with “false negatives,” or low histogram discrepancy but long glide.

❓ 🙄 (closer look) the code seems to be generating mostly large lsb 1-runs to create longer glides. so it seems on cursory glance the histogram discrepancy measure is highly correlated with long lsb 1-runs. ofc the measure was designed/ intended to be more than something so simple.

❗ 🙄 😳 😦 (even closer look) further analysis, there do not seem to be any long glides without lsb 1-runs. also, altering the optimization to push down lsb 1-runs does not change the results, the algorithm cant find any. also comparison of ‘cm’ and the lsb 1-run shows they are nearly the same ie the glide is basically “due to”/ explained by/ associated with the lsb 1-run. alas, full disclosure, its like that collatz post title, going in circles…

💡 (later) it has been very slowly dawning on me, but recent blogs/ ideas underline the point (7/22), and sometimes its hard to figure out basic stuff. this cannot be repeated often enough, and its full/ basic implications have been still eluding me.

the postdetermined glide is basically statistically indistinguishable from ½ density iterates.

now for example, consider the relatively recent experiment `stepwise7` from 5/2020. it just looks at finding max scaled runs over the postdetermined glide. the optimizer goes linearly up to about bit widths ~250. but now how about this? just do a simple measurement of ‘mxs’ over ½ density iterates over the same range. this experiment is utterly basic. it should have been done years ago, but maybe wasnt; dont recall an exact match, but recall similar ones; think there are many near misses. the graphs are nearly identical. am feeling need to do some more review/ survey of this over prior blogs. ❗ 😮

💡 a basic related conjecture is that attempting to optimize for any statistics other than those of ½ density iterates over the postdetermined glide cannot deviate much. and that all the misc properties measured of the postdetermined glide are essentially nearly those of ½ density iterates.

but, the histogram discrepancy ideas definitely/ fully take all this to heart, and already have some idea about what direction to go next. a basic pattern is that the feature takes into account— builds in as a control, via comparison with— ½ density iterates.

`construct154.rb`

pondering further, maybe some of the conceptual difficulty/ cognitive dissonance is related to the following simple ideas not easily grasped, ie somewhat have run against my intuition, aka not easy to wrap brain around, and appearances can be deceiving. (a) the postdetermined postglide/ “drain” is (apparently) statistically different than the postdetermined glide, even though in nearly all graphs it doesnt look different. eg former contains ufos etc. (b) not all iterates have a postdetermined glide. ie many iterates are only a drain. (c) the concept of “drain” is still somewhat elusive and am not sure whether to define it as including or excluding the predetermined/ postdetermined glide.

2/2020 `construct120` near miss!

💡 ❓ ❗ looking at/ thinking about this, had a rather obvious idea that hasnt been tried before now and would have made sense to be investigated long ago (think alluded to this a long time ago). WDITTIE! (why didnt I think of that idea earlier?) one can look at 1-runs in ½ density iterates as a control and see if they have the same statistics! either way the finding would be significant. my expectation is that they will be (very?) close.

have been thinking/ remembering some about “crunch spans” which were searches for long trajectories with minimum run lengths on 11/2018. this work was carried out before understanding of the pre vs postdetermined region. it also fundamentally doesnt consider (understand) the idea of run scaling in iterate sizes. while the idea of scaled runs is old, a lot of work looked at absolute values of run lengths when it needed to look at relative run lengths to size of iterates, and further adjustment of the scaled values is apparently necessary, as in the last diagram/ experiment. ½ density iterates and their associated run scaling, which is clearly inherently nonlinear, are a key part of the puzzle. there was some notice of nonlinear run scaling in the crunch region awhile back, but it took this long to realize its maybe simply identical to ½ density iterates!

in a sense, with fractals, all the action is in scaling. the crunch spans were basically all in the drains, they werent looking at glides at all, or rather, comparisons with glides as a control. a lot of the problem is about trying to find “what is a control”. a reality is the optimization code will generally/ often focus on drains unless forced to focus on glides. as Terras showed long ago, and as many computational experiments reaffirm, glides are like needles in the haystack of drains.

a lot of ufo ideas were explored on 3/2019 right after the pre vs postdetermined distinction was made a month earlier. but they didnt take much into account glides either. in retrospect mostly the ufo code is “glide-blind.” it didnt imagine the distinction was crucial. it established roughly that ufos could be embedded anywhere in the predetermined vs postdetermined regions, and tried to sort out the distinction some, but didnt notice strongly they were always in (glideless) drains! there was some notice at the time that building pre-glides to a ufo was apparently very limited and/ or very difficult, but simply shrugged or attributed it to not powerful enough search algorithms, looking elsewhere at “low-hanging fruit.” it didnt consider it might really be impossible.

need to go back and sort through this more carefully at some point. but basically, as slowly, sometimes painstakingly evolved/ figured/ dissected/ extracted out over time, defying some expectations partly built up from visual studies (“looks can be deceiving…”), postdetermined vs predetermined distinction has a much different implication if there is any vs no glide and also esp crucial, the distinction of glides that are shorter vs longer than the postdetermined point (re “shortfalling vs overarching”).

3 thoughts on “collatz histogram discrepancy”

1. Pingback: collatz outline | Turing Machine