collatz new highs

am still clawing/ scrounging for any “big picture” leverage, resuscitating old leads. after some extended musing of new findings, came up with this latest idea. there are lsb triangles even in the disordered climbs for the uncompressed mapping. do these mean anything? seemed to see some cases/ trend where the triangle sizes successively decrease in the climb, ie dont successively increase. can this be quantified? this is somewhat similar to the old workhorse “nonmonotone run length”. instead, its something like “count of new highs” (in bit run lengths either 0/1). that statistic is (maybe not uncoincidentally) used in stock market analysis, which has to deal with some of the most wild/ intractable fractals of all.

it was not hard to wire up the last “dual/ cross-optimizer” (ie both within fixed bit sizes and also increasing bit sizes) to calculate this metric, named here ‘mc’, serving its intended purpose of trying out new ideas quickly. ran it for 100 bit width seeds and then it did indeed seem to flatline somewhat. upped the seed size to 200 bits and then more of a (very gradual) trend is apparent. it looks like a logarithmic increase (‘mc’ red right side scale, other metrics left side scale). ‘mw’ is the # of iterations since last max/ peak run length. the optimizer ran for a long ~650K iterations.

the significance of this is that bit runs in the iterates are a local feature and yet do seem to have significance globally even in the “hard,” otherwise apparently largely featureless disordered glides. they also relate to the prior bottleneck ideas and ofc density analysis also. the issue with the bottleneck idea is that it needs to be phrased in a local way such that a local feature can lead to a global prediction, and the prior bottleneck analysis didnt quite go there. ie its one thing to say “limiting bit runs over all iterations (apparently) limits the glide” and another to say “the current max bit run is [x], last encountered [y] iterations ago, the [z]th new run length max, how much longer is the glide now limited to?” it appears some limit along these lines exists, if stated wrt seed size. in other words a measurement that “current max bit length is [x]” is local and cant anticipate later increases, and a max bit run length over all iterates is a globally computed property.

mix22.rb

mix22x

(5/4) this is some slightly modified code with various tweaks. (1) output lines are controlled so its not so high, it only outputs 1/20 of the “new bin cases” and this keeps total counts/ gnuplot points around ~5K for the types of optimizations being considered instead of the over 100K in prior code (gnuplot does ok with that, parsing it in a few seconds, but its still on the large/ unwieldy side). (2) the 5 genetic ops are named and code tracks/ counts which lead to new bin entries vs new top entry cases, ordered by most successful. nicely it shows all are working at various levels of efficiency; its interesting that the results show a different distribution over the two cases and also for different optimization/ fitness functions. (3) was interested in the scaled bit run metric instead which shows an interesting declining trend. (4) max metric found ‘mx’ is also graphed (value x10/ right side scale/ green), that was oversight/ missed in prior code. trending right there is a noticeable/ unusual plateauing + spikes effect in ‘cm’ glide max index, magenta, deserves further attn/ analysis.

mix22b.rb

mix22b

(5/8) alluded to some uncomfortable directionlessness at moment. after some reevaluation feel there are really two major overarching directions to go in at the moment. (1) try to come up with limit related to the (local) bit runs which are seen in even disordered glides. (2) there is another hint of “relationships between metrics” somehow leading to a limit, maybe esp the peak glide metric which maybe havent looked at in a lot of detail, preferring more metrics related to overall glide. here is a little more along line (2).

the idea is that some metrics seem to be “deterministic” and others somewhat more “nondeterministic”. since this word already has special meaning in CS and am not exactly alluding to it, maybe a better term is “indeterminate”. the determinate ones are easily proven decidable. the indeterminate ones “seem to” or might be proven to have decidable limits, and bridging that gap seems to be the heart of the problem.

@#%& 😦 😡 👿 am having some difficulty with my hardware lately. its a nice dell laptop with large SSD drive and great quad core cpu, but its been intermittently overheating partly due to high load of “supposedly background corp security scanning software” alone maxing out the cpu at which point it reboots immediately in the middle of any critical situation (eg writing these paragraphs) without warning or saving data, even without running analysis. ofc running collatz analysis makes it even more likely to overheat, and it takes quite a few minutes to restart/ resetup my workspace. argh

this code has some small changes/ improvements but looks into goal (2) of interrelationship of metrics/ limits. it maximizes by glide max ‘cm’ and then does some post analysis. am interested in properties of the climb, specifically the div2 vs 3n ops which have been studied somewhat before. are these somehow limited in the climb? there are prior hints of this and some intrinsic limits must be present if the overall problem is provable.

this code finds something interesting. it counts ‘j’, ‘jl’, ‘jr’ the 3n op count in the whole glide, the climb, and the decline resp, green, magenta, lightblue. ‘jl’ is found to track the glide max index rather closely such that the difference ‘cmjl’ in orange is rather flat, even as ‘cm’ has a very gradual (but noisy) increase.

now ofc after long study these types of limits have the feel of “trying to nail jello to a tree”. optimizing by the same metric eg ‘cmjl’ that has some pattern in a set of trajectories that were optimized by another (here ‘cm’) might lead to a totally different dynamic. another simple question to wonder is how the ‘cm’ GA optimization compares to the bitwise method, want to do a more careful/ thorough comparison/ analysis between optimization methods and metrics at some point.

mix23.rb

mix23

this next code optimizes by ‘cmjl’ and has some slight other tweaks. it increases bit size to 200 and finds more of a trend in the ‘cmjl’ statistic (in red). so the apparent flatlining/ plateauing is not a property. but this does give me another idea…

💡 ❗ 😀 other “peripheral” good news: my world class IT department/ “tech bar” (clearly apples very deeprunning cultural influence there!) blew out the substantial dust from my laptop and gave me my own air can. nice! that helped but didnt quite fully fix the issue, and they replaced the entire chassis in only 30m, and it seems to have fixed the overheating issue. you guys rock! the reboot errors due to overheating indeed showed up after a little investigation in the windows event viewer (havent looked at something like that in maybe ~1½ decade! “old school, blast from the past”!). those modern fans sure get a lot of wear over their lifetime and they start to run hotter after a few yrs, presumably partly due to lubrication burning up. occasionally have to step out of the realm of platonic forms into the “real one”! and now back to the real action! am so sure my “productivity” is really gonna go up now! dont have to worry about frustrating crashes in the middle of typing my scintillating prose anymore! 😎 😛

mix24.rb

mix24

(later) 😳 holy #@%& was trying to analyze results carefully looking at bit patterns, trying to understand discrepancy in 1st bit pattern of the glide, and think theres a mutation and/ or other bug in the operators. have been burned by that exact glitch before! can you spot it? not sure how it will effect results, really need to rerun everything and see if they come up the same. argh

spoiler/ punchline: the bug is in ‘cut2’ mutator. (seems) it would corrupt the lower part of a random prior seed string in the arrays such that its measurements were no longer correct, and reevaluate an identical seed, which would then immediately be discarded (have to think about this more though, because then there should be no counts for ‘cut2’). the effect is not to allow new seeds with incorrect measurements, so really only shows up if one is trying to reevaluate their measurements starting from the seed. thinking it could have a minor effect/ degradation on not finding the best optimizations. (it can be rather difficult/ speculative to try to mentally determine/ estimate the effect of a bug…)

(5/9) had an idea about optimizing ‘jl’ / ‘cm’. this code will typically find long lsb triangles for which ‘jl’ / ‘cm’ = 1.0. altered it slightly to discard that case. this code fixes the ‘cut2’ mutator. it ran for a remarkable 7.5M iterations. the story is that long incremental improvements are found. discarding the equals case leads to an asymptotic trend approaching 1.0, right side scale.

mix24b.rb

mix24b

oops! inspecting the data, the seeds still have large lsb triangles and only some disorder in the msbs. so these seeds are not at all “disordered climbs”. another observation/ question is that shouldnt the earlier/ prior cm – jl optimization also find the same lsb triangles?

other observation: mix24, mix23, mix22b, mix22 all redid the count routine and took out the ‘d’ density distance from core measurement but the optimization code still referred to it and does not give an error. mix24b readded ‘d’ as density but not density distance from core, not exactly as intended (ie did not intend to change it, change was inadvertent).

❓ ❗ 1st cut on rerunning old results to check, reintegrated the mix20 code into last mix24b to get this code. it ran ~1.5M iterations, almost double the prior code, and optimized ‘j2nl’ quite substantially/ dramatically better in comparison, trending upward instead of downward! hard to say because of the probabilistic terminating code, but its presumably “merely” due to the mutation bug being fixed?* it does look like from this graph it could be bounded. theres some refactored code to output the current ‘c2’ gap in a 2nd graph, some surgical modifications which while logically is nearly identical, took surprisingly long to get right.

mix25.rb

mix25

* 😳 ❗ just realized mix20 used the compressed mapping and not the uncompressed like the prior experiment and presumably that explains (all?) the difference. need to compare apples and oranges! but needed to run that anyway and am glad those results showed up, seem defn worthy of followup/ further investigation (eg looking at same metric at higher seed widths).

?!? strange! maybe scale invariant behavior? this (graph #1) ran for 2.6M iterations on ‘j2nl’ on the compressed/ “packed” mapping instead and got nearly identical-looking results! weird!* 😮 🙄

(5/10)

graph #2, ‘mc’ optimization (red right scale) comparable to mix22, 2.9M iterations.
graph #3, ‘mcs’ optimization (red right scale) comparable to mix22b, 2.8M iterations.
graph #4, ‘cm’ optimization with the 3n op counts comparable to mix23, 2.6M iterations.
graph #5, ‘cmjl’ optimization comparable to mix24, 1.8M iterations.

note some of these will be a little tricky to compare because the colors/ line ordering was not matched. this code merges/ consolidates all the prior algorithms and took awhile to combine/ test and adjust graphing logic.

mix25b.rb

mix25b_j2nl

mix25b_mc

mix25b_mcs

mix25b_cm

mix25b_cmjl

* (later) 😳 🙄 😛 aha, lol, mystery solved, did you spot it? there was a defect where the wrong variable was assigned to handle the packing mode for ‘j2nl’ not triggering it correctly. (am leaving up prior version for a few laughs, was clearly focused on the other mass refactoring priorities, and am kinda on a long roll lately, aka hacker stream of consciousness…) this following code fixes it with slight modification to allow cmd line parameter to choose optimization method.

graph #1 below after 1.4M iterations and ~1h. also the iteration gap limit is adjusted to 2e4 from 1e5 and bit size set to 200. it appears the optimizations may be running longer after applying the mutation fix, the connection seems plausible ie a manifestation of degraded performance. ie possibly better performance per iteration count, and also improvement over longer iteration count post fix. the result is now totally in line with the earlier code mix20.

❗ 😮 ⭐ graph #2 seems like a big deal! the prior noted trend is even more standout, it again shows ‘j2nl’ unpacked metric seems to flatline/ be bounded as the seed side increases. 1M iterations in ~2h. this looks like a very solid limit on the (apparently/ so far extremely difficult) disordered climb region.

mix25c.rb

mix25c_jnlp

mix25c_j2nl

(5/12) wanted to run some longer/ harder optimizations. this code is 500 bits, 50 sized bins (instead of 100) and 1e3 stopping distance instead of 1e5. it ran ~4 hrs and stopped 30 seconds before midnight. reminds me of an expr my dad would use sometimes, “burning the midnight oil”… or in my case slowly burning off the fan lubrication.

it actually shows a striking moderate downward trend in ‘j2nl’ on the uncompressed mapping! actually thought mistakenly ran the compressed mode upon 1st seeing this! less steep than the ‘n2nl’ compressed trend but now quite similar. it crosses the x axis intercept around ~250 bit size. ofc this shows how much graphs/ chosen ranges can be incomplete/ misleading/ deceptive.

a few ideas for improvements after some long runs, some terminated due to lack of stopping in many hours:

  • it would be nice to see the intermediate progress on the optimization metric.
  • it needs to have (additional) time-based stopping logic for cutting off very long runs.
  • (very simple) it should track the max percent of the stopping distance seen so far instead of merely the latest.

mix25d

(5/13) ❗ as mentioned another direction to pursue is looking at max bit runs in disordered climbs (+ descents). wanted to expand on that and this idea was floating around in my head. this reuses the useful mix19 to generate a lot of disordered climbs starting at the same bit width (1059 total). then the left/ right ascend/ descend (‘a’, ‘d’) statistics (unpacked) are analyzed with histograms of bit runs for ‘0’, ‘1’, and ’01’ combination. the histogram frequencies are normalized by max counts over the ascend or descend. the ascends are more gradual/ extended than the steeper descends which leads to more counts in the former.

the ends after all the means is “null result”. as in the graph there is not much distinction/ difference in any of it (on the conjectured ascend vs descend, results same for packed mapping). the ’01’ combination max has a slightly different peak (5 vs 4) but its again the same for ascend/ descend. the conceptual logic is not very tricky but the code ends up not so simple looking aka devil was in the details. this is a very strong illustration/ manifestation of the indistinguishability of the disordered trajectory statistics. actually was a bit surprised at the (non)results; yet it looks like simply yet more hopes have been dashed. 😥

however this theme of distinguishability vs indistinguishability is sharpened even further like a sort of vexing, yang vs yin of the problem. (yin and yang are sometimes depicted as two powerful dragons fighting each other…) one might even wonder if there is some deeper principle such that almost any method might tend to polarize the data in this way. a bit )( wild thought.

the positive peripherally-related news/ “background color” is that much of this code was developed in front of a relaxing, roaring fire in front of a nicely ornate/ expensive/ elite fireplace. thats a real aesthetic luxury these days. another miscellaneous anecdote flickering thru my head while writing this: a “millenial generation” friend of mine “J” at work is interested in statistics and got a CS degree at a top college, and we discussed some a possible collaborative prj on bitcoin quant analysis. asked him if hes heard of a “histogram” and he said no. 😮 o_O 🙄

review30.rb

review30

❓ ah, but heres another card up the sleeve. it looks like any remaining hope of differentiation has to be pinned on the location of the bit runs inside the trajectory. heres a quick study of that. there are 6 bar scatter areas for ‘0, 1, 01’ bit runs, ascending above x axis and and descending below x axis, with nearest points to x axis as earliest in the ascent/ descent. the color coding is the bit run length with hotter colors the longer lengths. its limited to 30 trajectories leading to about 13.4K points. nice visualization but can any trend be seen here?

review31.rb

review31

this is a slight modification of integrating the “scaled trajectory position” over all the bit size histogram bins and in the same 6-way order/ bin coloring. the results are hard to judge. it looks noisy and no obvious trend is apparent. the lower and higher bins have higher noise due to fewer points. but even the intermediate bins with less noise dont seem to have a clear trend.

review31b.rb

review31b

(5/14) ❗ 💡 ⭐ 😮 😎 😀 BREAKTHRU? indistinguishability curse lifted? a discernable bit run location-related effect emerges! this similar code rearranges the points a little to put/ align the nearest points to the glide max to the x origin on the ascent/ descent and brings out new information that seems to show “hotter” spots near the peak on both the ascent/ descent, ie longer bit runs, and cooler spots at longer distances away from the glide max, using 1st 100 trajectories only. it doesnt seem to affect the shorter bit runs much or at all. this suggests another way of normalizing the data, ie by glide lengths. a bit )( amazing how sometimes subtle effects can be/ are extricated with only slight changes in logic.

review31c.rb

review31c

that last code was dashed off and its not so clear if the effect could be due to “distance from glide max” vs “long vs short glides” or some combination of both, and then was led to thinking of a way to isolate that. this code doesnt answer that definitively (have to think more on that) but does add a new angle/ pov on it. this excludes all the bit runs less than 8 and averages the remaining bit runs over all the iterates in the glide. it sorts the glides by glide max index from low to high. the results show that there are spikes of high 1-bit runs (red) for longer glide max indexes (blue, right side scale) and the effect is not much for the 0-bit runs (green). (but, looking closely, could this effect fall off on the high ‘cm’ range? and be confined more to the mid-hi-range?) changing (ie, removing) the bit count filter to include all bit runs tends to drown out the effect. am getting some idea of coming up with some kind of metric that captures this “feature”. it seems to/ may have something to do with different histogram distributions, have long been musing on some feature based on that.

review32.rb

review32

(5/15) ❗ ❓ 😮 this is rather eyepopping! something is really going on in the histograms of bit runs in the disordered climbs, but what!? this code finds an “average” bit run histogram of all the climbs and then normalizes each climb histogram based on the average bin counts (ie normalized by total frequencies over each bin, ie expected bin count per trajectory). this graphs the histogram bins vs log of normalized counts. there is a striking/ extraordinary upward linear cutoff effect after bin/ bit run size 8. this uses 2 different color schemes, the 1st is the trajectory order/ index in the entire sort by climb length, and the 2nd is the climb length as the color. the lack of noise in the cutoff edge, is knifelike/ never-before-seen…

an apparent explanation is something like that (roughly as previously already observed) the higher bins dont appear at all in “most” trajectories and only appear in longer ones…? ie when the higher bin counts appear, they are much higher than expected? also theres apparently some hint/ aspect of the “long/ fat tails” ie nongaussian, possibly power-law distribution?

as in the previous graph the range of ‘cm’ values is quite uneven, ie mostly low with a very large/ short spike at the end. its very tempting to try to look more at a linear distribution here, but it would be very timeconsuming to create one, because it would require sampling multiple instances of the current distribution, which takes about ~30m to generate. but, maybe an acceptable/ worthy cost at this point?

❓ trying to figure out why the colors are so different. does it have something to do with overplotting? many trajectories with same climb lengths but different colors? maybe unintended glitch? but the code difference is so basic… built this partly on intuition without really understanding everything going on here… looking at the normalized frequencies, they run less than 1 right at/ after the 12 bin which is also the x intercept; this seems to explain some of the shape but most of it is still hard to understand/ explain…

review35.rb

review35x

review35y

(5/16) ❗ 😮 ⭐ another interesting way of looking at it. this sorts the trajectories again by climb length, and selects 500 linearly distributed out of the total, and sorts all the 1-bit runs longer than 8, and normalizes the sequence to plot between 0..1, color coded by climb length, longer as hotter. the results are again rather striking!

this appears to indicate some kind of self-similarity between the overall glide length and the (longer, above-threshhold) 1-bit runs in the glide. this is almost perfectly a global-local interrelation long being sought. presumably the order would be even more significant with a more evenly distributed glide length distribution. this is globally computed however and there is some further challenge turning this into a purely local, “streaming” statistic. streaming in the sense that a local feature can only “look” at prior data in the trajectory up to a current position. the obvious/ immediate idea is a “running-sort” on the (“cumulative”) over-threshhold 1-bit runs. (but would this give any better information than the current/ running max 1-bit run? which by the way brings this blog to a circle with something close to the same idea it started out with.)

review37.rb

review37

(5/17) the ‘j2nl’ statistic is interesting, in that if there is high compression it trends below the x axis, and the less compression, the more it trends closer to the x axis. ofc many of the metrics considered have this high sensitivity to the compression scheme and sometimes behave in unpredictable/ surprising ways wrt it. was curious about a totally uncompressed run (on high bit widths 500) where there is not an automatic div 2 after the 3n + 1 operation as in f2 function. this implements that in the f3 function, and the metric indeed runs very close to the x axis although apparently not highly flat, but one still inevitably wonders, even with great code, if the optimizer is really finding the best solutions, its a hard question to answer, like if a tree falls in a forest and nobody hears it, does it make a sound, or is one only searching for ones keys under the lamppost, or looking at misleading shadows projected on a cave wall? very hard problems give one a lot of time to ponder deep philosophical questions. 380K iterations, ran about 4½ hrs.

mix25e.rb

mix25e

😳 just realized after trying look at individual trajectories that the mix19 is duplicating about half the found trajectories! one would never guess that merely looking at the code… so that kinda explains why when some are plotted its not a bug in the plotting code, they are identical 😛

therefore need to rerun prior ideas but suspect they will not change much.

this code fixes that by extracting only unique trajectories, and also had another quick idea to try. there is a similar concept to bit run lengths which is just run lengths divided by iterate bit size, the scaled metric, and instead filter by some fraction here 1/8. the iterate bit sizes start out as a same 100 but increase in their glides, so it makes sense to look at the scaled bit run dynamic because it presumably would be different.

from the graph it seems like it might be a little more linear from one trajectory to the next, is it a more natural measurement somehow, or not? actually looking more closely at concavity it seems to affect the concavity of the shorter vs longer glides. this scaled approach seems to make the longer glides more concave (less linear, near the left) and the prior unscaled approach seems to make the shorter ones more concave (less linear). the graph seems sparser even with the same 500 trajectories and not sure how to explain that right now, some overplotting going on? ❓

review37c.rb

review37c

this code looks at intermediate high-bit-run sort sequences for a single trajectory, the 6th longest, and finds its mostly self-similar even at early parts, after at least say about 20 iterates, it only seems to “shift right” some due to more lower occurrences than higher ones (other trajectory cases looked more noisy or irregular). the index in the trajectory is the color code. on thinking about this am noticing by restriction to climbs only, it handles the subglide concept. this uses the scaled method and 1/20 threshhold which looks better than the unscaled method due to far less overplotting into more distinct point locations. another immediate question, what kind of (“normalizing-like”) logic would increase the scale invariance/ self similarity of the trajectories?

review39.rb

review39

(5/18) was mostly following some whim/ intuition here and this transformation isnt totally natural but seems to show something significant. its a bit )( involved and not easy to describe/ explain in words. building off the last idea/ graph, was wondering about what might be called the “normalized scaled 1-run sort” by iterates-nearer-to-peak, lower bound 1/15. this 1st reverses all the trajectories so they go from peak backwards. then it computes the normalized cumulative scaled 1-run sort by position in the glides, cumulative in the sense of over all glides, but only at one index. normalized in the sense the scaled 1-run sort is of arbitrary length but is linearly resampled at 100 points over intervals. this is plotted in the 1st graph where color coding is hotter for longer distance from peak, in a sense a sort of “potential energy” speculated on earlier.

this leads to a asymptotic graph similar to the function 1/(100-x). then decided to find the vertical average of these points, the green line in graph 1, and subtract it from the vertical position, this is in graph 2. the results are a bit hard to interpret but seem to show close-to-peak iterates are below average, middle-distance-from-peak tend to be higher than average, and the long-distance-from-peak are more near the average. it has a fluid-looking aspect. this is not really a local metric because of the way its computed cumulatively over all the trajectories, but my suspicion is that a local histogram metric might look similar, and am looking into that next. note that some of this might be better coded using matrices/ vectors; ruby has some parallel/ natural affinity exploited here (eg map fn/ arrays etc).

review41.rb

review41x

review41y

❗ 😀 ⭐ 😮 😎 looked into normalizing histograms by iterate-distance-to-peak and it turned out to be noisy and not work out too well, and then tried out this other idea. this is somewhat modified code that alters some of the normalization logic and is purely local. its again a normalized scaled 1-run sort with no lower threshhold but it normalizes at each iterate instead of over all iterates. the normalizing is a 10-bin resampling. (suspect the already eyepopping results would be even better/ less noisy with linear interpolation on the bins.) then there is the column normalize logic where a column is a vector of iterates of constant distance from the peak. there are two types of column normalizing, subtracting or dividing by the column mean, graph 1/ 2 resp. the discrimination here is extraordinary, almost perfect esp on the left side/ lower bins. it also seems to reflect some hidden scale invariance/ self-similarity in the bit run sorts. breakthru!

❓ 💡 so roughly, smaller scaled 1-bit runs have less “potential energy” than larger runs. but isnt that exactly related to the earlier entropy property? now wondering, is this all some variant of the entropy phenomenon? was it overlooked just because the data wasnt “sliced and diced” in this particular way? the basic idea is to look at distance-from-peak and maybe astonishingly, while very simple and easy to code, thats a really novel idea wrt prior analysis…? another observation is that it looks like it should be possible to estimate peak distance by looking at only a particular/ arbitrary “slice” of bins, esp the very/ almost perfectly ordered lower ones!

review41b.rb

review41bx

review41by

(later) 😳 😦 👿 argh! a trap, encountered before! too good to be true™! that code was using the scaled bit run sizes. as discovered awhile back, for the disordered glides, the bit run statistics/ histograms/ distributions are about the same for different iterate bit sizes, and so scaled by the bit size leads to a strong predictor of the position in the glide. sigh, that cant be right, have to rethink which experiments this is impacting… this stuff is so @#%& full of pitfalls its like an unwinnable/ impossible video game… the 1st step to solving a problem is admitting you have one™ (old narcotics anonymous literature quote, as missattributed sometimes to einstein…)

❓ more thinking/ pondering, the core problem seems to be coming up with/ deriving effective local normalizing logic… it also seems to be related to figuring out local (vs global) “expected” distributions… this is already tricky because the iterate bit run distributions immediately/ already differ from any kind of “normal-looking” distribution. it does seem to have something to do with “long tails” in the higher run lengths…

(5/19) am going to have to think about all this more carefully. have been hacking in a bit of a directionless way. maybe need to get back to some basics. part of the difficulty is understanding the interrelationship between the metrics and the iterate bit widths. a simple visualization of bit widths in the sample would be helpful. (and 2020 hindsight, maybe should have started with this!)

1st graph is trajectory bit widths from peak to start, reverse order, color coded by climb length (therefore perfect color ordering on bottom x axis, losing some order moving left/ upward). 2nd is same color coded by final peak iterate bit size (corresponding perfect color ordering on left y axis, losing some order right/ downward). somehow these two aspects are nearly the same (presumably highly linearly correlated) but not quite exactly. but then looking at the 3rd graph, climb length vs final iterate bit width is not really very correlated in the scatterplot. looks can be deceiving… (in some cases and not others…) am actually now thinking overplotting may be making the correlation spread look tighter… ❓

review44.rb

review44x

review44y

review44z

(5/20) this is a rehash of review37 to add further normalizing logic. it uses unscaled bit runs and keeps all the other logic incl 8 or less threshhold filter and then just linearly scales by the min/ max of the higher scaled bit runs. it plots in a different order but the color scheme is the same with hotter colors for longer glides. (the code is missing the unique filter inadvertently/ by accident.) the extremely good discriminatory effect (signified by color differentiation, which is turning out to be a great human-friendly way to translate twisty statistics properties into something human viewable/ interpretable) mostly vanishes. this is a null result worth noting and discouraging but maybe there is some hope that in some sense its actually an “overnormalizing”. (have been musing there is maybe some deeper zen in that even null results are just other graphs to look at, and all the deeper meaning such as useful vs null aka yang vs yin or vice versa is human constructed.) 😛

the idea with overnormalizing is that the longer trajectories do have higher bit runs, but how many are “expected” exactly? is this normalizing by “more or less” than is expected somehow? the answer seems to relate to bit run sizes vs iterate bit size but maybe something other (ie more complex function) than a “mere” ratio and counts…

review37d.rb

review37d

this is similar to review37b with the scaled bit runs and a 1/15 threshhold and the linear scaling of the 1-run sort sequence. the threshhold adjustements seem to have a major effect on its shape. the different concavity seems to manifest as “spread” for lower threshholds and stabilize more for higher ones. re the concavity aspect it seems to play a role here wrt glide length although it appears many shorter glides are overplotted by the longer ones, reversing the plot order seems to create a nearly identical “inverse-heat” diagram which wouldnt be expected if there is some correlation between concavity and glide length. there was a slight glitch in prior code not dividing by the exact max of the sequence length, off-by-1 error, which shows up more noticeably in the scaled bit run scheme, corrected here.

sometimes have the urge to name these diagrams. because of the coloring scheme this one really “jumps out” and reminds me of that old silly comedy expression. is that a banana in your pocket or are you just glad to see me? 😛 😮 😳 🙄 😀 😈

review37e.rb

review37e

some experiments are inspired merely by looking at graphs. looking at the last one, was wondering how one could quantify concavity. a simple idea is to find/ estimate the midpoint on each curve. that is conceptually simple and not super simple to implement. an interval halving algorithm works. the results are null. the numeric concavity measurement is not discernably different than noise. the output is concavity measurement vs climb from shorter to longer climbs. looking at the prior plot one might expect to see a very mild effect. am now thinking the overplotting is somewhat visually deceiving wrt this. this code also fixes the computation of the glide length based on the actual uncompressed glide length rather than the database glide length which is compressed. however, as assumed up until now, the 2 are probably highly linearly correlated, and comparing the graph in the two different cases does not look much different. (looking for something to salvage in the effort, maybe this logic will be reusable later!)

review45.rb

matrix45

(5/22) sometimes have zero ideas and sometimes have several. have been banging pretty hard on this stuff & it helps to disengage sometimes to let the creative juices flow. came up with some new ideas thinking/ pondering/ staring at prior graphs/ stuff.

before getting into all that, again in line with when in doubt (and after some timeconsuming null results… defn really “living on the boundary/ edge” wrt that lately…) getting back to basics, this is a simple histogram calculation. this is partly inspired by review41. that code moves over a “slice” or “section” of iterates and finds some distance-to-peak signal, some mixed up. what is influencing it? another way to visualize it is that it is computing a scaled 1-bit run sort sequence over iterates of different sizes. the different sized iterates in slicing the data in various ways and trying to scale/ normalize correspondingly really seems to lie at the heart of the problem somehow.

this is a simple visualization color coding of the histogram of iterate sizes by distance from peak. it plots closest-to-peak 1st in dark colors and farthest-from-peak in brighter colors, progressing from a “small” impulse function of a single seed into a high spike and then into a gaussian curve. its similar info as visually conveyed in review44 but different angles nevertheless seem to help. besides, its kind of pretty, always enough justification alone (see halo effect). 😛

review49.rb

review49

looking over earlier code, again the rather-striking/ now-still-mysterious “quasi-local” review41 seems to have some signal mixed with noise. was just musing on processing it further, and curious about an idea. this changes the visualization a little differently to take out/ magnify the long narrow starting wedge. its two simple normalizing methods, 1st linear scaling by min/ max, the 2nd is a gaussian-like norm of subtracting the average and then dividing by the mean. theres a little refactor of the harness code to simplify it, remove the unused files etc. the 2nd graph seems a little less weird looking but the 1st maybe while less natural/ somewhat distorted looking reveals more detail with the “higher magnification”. there are certain (x axis) intermediate bin-ranges eg 30-40 that seem to have roughly/ nearly correct order in classifying the climb length (peak distance).

review41c.rb

review41cx

review41cy

❗ 😮 ⭐ 😎 😀 ❤ this idea occurred to me again thinking about/ over basic prior results. the general theme of prior results is actually mostly that "high bit runs exist scattered throughout the glides in various (apparently meaningful) ways". then was thinking of a way to visualize the "big picture". this following is a sort of obvious idea in retrospect. the review44 diagram is helpful in “laying everything out”. a natural question is where the high bit runs exist within it. what would that look like?

did a little analysis and then found that the diagram bit run threshhold had a remarkable effect on the diagram of including vs filtering out contiguous lower left areas. was surprised! was trying to understand the overplotting effect which was apparently substantial, then thinking of how to overcome it, and then came up with the idea to plot entirely in the order of the heatmap. this led to these remarkable diagrams. a lot is revealed here and maybe there is some major new understanding and explanatory power of everything encountered so far here.

this has some formula to find the difference between scaled and unscaled which were very close, and the “scaled” plots are actually a (mean/ stddev) normalized difference. these are breakthru! there are many extraordinary aspects! and almost immediately lead to a vista of new ideas/ angles/ directions/ leads! the predominant aspect is that, simply, the heat (high 1-bit runs) is contiguously oriented/ arranged. there are 2 “very hot” spots and not sure of their significance right now. they might relate to triangles in the unpacked sequences but they show up in packed ones also. some but not all of the signal/ differentiation disappears in the packed sequences. the packed sequences also nicely visually reveal how back-and-forth “jitter” decreases significantly in iterate sizes.

even with all the sheer delight here, theres some fineprint. one of the aspects of detail lost in the “mere” max 1-bit run metric is that some are lsbs, some are mid, and some are msbs and its position is surely quite relevant. something else to follow up on.

order: unpacked/ unscaled, unpacked/ scaled, packed/ unscaled, packed/ scaled.

review50.rb

review50w

review50x

review50y

review50z

(5/24) 💡 heres a nice simple way to conceptualize the overall goal. the metric needs to try to predict remaining glide length which would be a (“color”) gradient horizontally from left-to-right ie varying on the x axis. there are some scaled metrics that vary vertically top-to-bottom ie varying on y axis which are predictors over a single glide, but dont generalize across glides (the “trap” referred to earlier). known metrics seem to be very hard to combine to get horizontal differentiation.

at this point am tempted to throw a nonlinear ML algorithm at it. was looking into “auto ML” technology that has advanced significantly in the last few years with google recently announcing a big new cloud product. auto ML is potentially very powerful, it helps find a model out of many models apparently mostly using bayesian optimization to tune parameters. unf in googles case its only for images/ vision right now. would like to just enter the small dataset into an online system and see what it comes up with. think kaggle would be a natural partnership with such a service. theres an auto ML for weka and also autosklearn for python scikit-learn. wonder how hard these are to install and run? suspect its likely not a trivial process. am tempted to buy a new machine and try it out.

did some work on altering/ conditioning the glide distribution to make it more linear (lengthswise) and see what happens, and look at other metrics. need to write that up. it looks like the linear distribution takes away a lot of the effect, maybe mainly due to losing/ lopping off the (rarer) high lengths. also, just realized it would be natural to test the ‘mc’ consecutive-high count introduced at the beginning of this blog entry.

(5/28) this has been a very busy/ hi energy month wrt this problem judged by total word count in this blog now 6.7K. this is a writeup of the linear distribution analysis. this following code generates a lot of graphs mostly with the same story. the contiguous signal effect can be found in all the other metrics for the bit lengths namely avg, std dev, max ‘mx’, max scaled ‘mxs’, ‘d’ density, ‘e’ entropy (count of 0/1 transitions) and ‘es’ entropy scaled. the 2nd 7 group is the same metrics over the extracted linear distribution. the linear distribution takes the range 0.02 – 0.90 of the climb length sorted sequences, and seems to isolate the “hot core”. outside the hot core its “cool” and from this analysis there seems to be a strong/ fast transition from the hot core to the cool outside. additionally there are some superhot regions in the hot core. in 2020 hindsight it would have been better to use the same color scheme on all graphs instead of scaled in each 2 sets (exercise for reader). the line plot shows the linear range (blue/ green points) extracted from the nonlinear climb length distribution (red).

the prior review32 graph format above, ie the aggregate signal variable value eg max or avg arranged vs the sorted climbs, was another nice method of analyzing/ extracting signal here and on some initial analysis believe it will yield similar but noisy results requiring some further averaging to make it more clear. not hard, but involved & would like to have that but am thinking maybe more of a not-immediately-necessary detour at this point.

❗ 💡 a few days ago had a wild/ promising idea staring at the overall shape of these. its really not very hard and wished had come up with it much sooner. hint: review44 analysis has a very natural “feature detector” popping out.

review51.rb

review_51

😮 😀 ⭐ review44 scatterplot of climb length vs final bit width is not highly correlated as was somewhat expected but has definite linear correlation! it was staring me in the face all this time! the climbs were all roughly aligned with/ tracking each other and immediately noticed there wasnt much spread from one side to the other, but didnt realize how obviously this can be exploited as a metric/ signal. this feature, final bit width vs climb length, is simply the average slope of each climb! and its natural to plot it versus the climbs sorted by length, 1st graph. that leads to this correlation/ relatively strong signal/ fantastic result/ lead which seems extremely promising and full of potential at the moment because its apparently so basic/ fundamental. theres a distinct linear signal even in the linear distribution, 2nd graph, which cant be said even for the prior substantial signals/ results! red lines are the climb lengths right side scale.

review52.rb

review52x

review52y

(5/30) all those disordered climb data were generated with the older mix19 bitwise method which is now a bit dated. the question always remains whether all this accumulated now-deep analysis is “merely” an artifact of the generation algorithm, theres always that risk with this type of inherently algorithmic/ statistical attack. (honestly some of the horizontal/ vertical cross sections of the heatmaps looking kind of gaussian certainly makes me wonder. although its hard to explain either way!) another quick way to test this (but not definitively) is to try another generation algorithm. the other new high tuned generation code only outputs top iterates for each bit size/ bin, which is not as large as the other database.

so mix25e was modified here to output the entire table instead of the top scoring iterates in each bit bin. this code also rearranges the plot order to avoid overplotting and also has a time limit variable ‘ts’ measured in seconds. 200 bits, bin count ‘bc’ 50, stopping distance 1e4. it ran for ~1hr. the analysis/ plot code is modified somewhat to look at 10 slices through the bins at depths 0, 1, 2, 3, 4, 10, 20, 30, 49 in the following graphs, where depth 0 is the optimal iterate across the bit width bins. it would be nice to color code this in a single plot for the different metrics but maybe will hold off on that, this is mainly to see how much deterioration there is in this ‘j2’ metric downward thru slices and its not very much (blue line). its actually noticeably more jagged closer to the optimal slice and smooths out away from it, same with other metrics glide length ‘c’, climb length ‘cm’, dont have an explanation for (all) that at moment.

mix25g.rb

 

Advertisements

2 thoughts on “collatz new highs

  1. Pingback: collatz fusion | Turing Machine

  2. Pingback: collatz deja vu | Turing Machine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s