collatz mod3

ok, this is a sort of shocking discovery at the new year. it is almost obvious to look at properties of iterates wrt remainder by division. there was a years-ago look into factorizing. also there are years-old looks at the bit (base2) diagrams, and this was adapted to do base3 diagrams, but didnt notice anything unusual and left them unwritten-up. did not notice this until just now, but it might have been in those visual diagrams. was looking at some older glides associated with mix30d. was curious about the mod3 behavior intra glide pre/ post peak, actually wondering about existence of repeated 3 factors. was quite surprised to find this major discrepancy/ differentiator. randomly sampling/ spot checking, these glide iterates were never divisible by 3! also, for the other two mod3 values, the distributions are different over the iterates pre and post peak. for climb the mod3=2 case is about 3x the mod3=1 case. for descent the ratio is closer to about 1.5. this is demonstrated in this simple code.

three7.rb

then advanced this idea to compute a absolute difference between these ratios pre and post peak, a “differentiator”. if this is a real property then it would be hard to find long glides that minimize this differentiator ie it will tend to diverge as in prior measurements, if they were not just tracking some kind of specialized generation bias. that is exactly the case: its real signal. this next bitwise multivariate optimizer code (using indexing sorting/ reordering/ sum technique) has two cases. it either optimizes only ‘cm’ peak index or alternately both increases ‘cm’ and minimizes ‘r12’ the differentiator function which is basically the spread between the two aforementioned frequency ratios. the optimization dynamics then are seen to be dramatically different in the two cases. in 1st case it easily linearly increases ‘cm’ black in the now-standard way, in the 2nd case the optimizer cannot extend the peak indexes ‘cm’ past low values, consistently “corner-painting”. an interesting/ key aspect is that basically mod3 calculation is not a local calculation over iterate bits, its global over all the bits, and is also not correlated with density. so this could be a very important/ useful property. maybe even pivotal…?

bitwise17.rb

bitwise17x

bitwise17

(1/9) ⭐ ❗ ❓ when in doubt, come up with a new metaphor. this blog is full of analogies and metaphors. feel they are helpful in stretching the imagination and devising/ guiding strategy. was musing further on the idea of “traversals”. as in the exercise at the end of last month, the general idea is to advance the frontier in a way that minimizes disorder and is balanced in the sense it does not fixate on any region. this is a several-years-old idea at this point explored in many variations. whats another way to view that?

imagine an infinite array/ multidimensional landscape of fractal objects, call them “snowflakes” for shorthand. they are not finite sized but scaled they look self-similar. the snowflakes have various features and theres a lot of divergence in some of the features. some of the features may be finite and others infinite and its not known a priori which is which. there are various/ arbitrary/ unlimited ways to (algorithmically) traverse the landscape. the goal is to somehow traverse the landscape in a way that reaches some kind of predictable pattern/ “equilibrium”. there seems to be some sense of flow here, and also classification/ ordering. almost as if the algorithm is “digesting” the environment in a special/ adaptive/ somehow optimal way. its almost as if its “de-fractalizing the fractals”, or ordering the disorder.

when phrased in this way, the overall idea is not so entirely different from recent ideas about an AGI algorithm posted here in recent blogs from last year… and it does seem that machine learning is the general area of application that is being sought for this automated theorem proving problem/ challenge. feel the overall concept is sound and must inevitably lead to major results at some point. its the future! while it seems very few others are poking in these areas worldwide, other researchers will surely eventually discover the/ this same overarching principle, putting my technological Nostradamus hat on and gazing into the crystal ball momentarily. but the specifics for now remain tantalizingly out of reach, aka the devil is in the details…

(1/12) 💡 ⭐ 😮 ❗ 😀 😎 ❤ was somewhat drifting last few days thinking of ideas, had very little tangible direction to pursue. then late last nite, just curious/ poking around with some basic trajectory color-coded visualizations, after seeing them had some flash(es) of insight and came up with some simple yet grandiose new ideas and am very jazzed/ excited by them! they build on recent themes and seem to point very plausibly to a general outline/ explanatory framework. possibly even a rough proof structure!

the basic idea is to go back to the vary code/ experiments initiated last month. these are just using random samples of large ½-density iterates to look for some way to explain/ understand/ characterize the disordered region. there was a lot of excitement over vary3 experiment but then some retreat as the vary4 experment threw some cold water on that. but it deserves revisiting and theres striking new ideas embedded there/ here.

another way to explain vary3 is that at an intermediate stage in the trajectory, the current iterate is a fairly good approximator of the entire trajectory dynamics. in that case for 100 bit width iterates the shortest trajectories came at about ~300 iterations and then that was then chosen as the “measuring point”. but how about just a “little earlier”? these results need to be meshed with the Terras concept of predetermined glides. recall the “bit width distance” as the nth iterate where n is the initial bit width. it would seem this might be a key transition from a “(pre)determined” to a “constrained” trajectory. actually these words/ terminology are not easy to choose. am trying to explain this carefully and running into language limitations. in some sense the whole trajectory is constrained. struggling, how to say this?

the Terras construction shows the trajectory within the bit width distance can be completely arbitrary. the entire trajectory is strictly “determined” in a sense but this sense seems to change subtly pre- and post- bit width distance. these words are tricky to apply in this context because the entire trajectory has long been compared to a “pseudo-random walk”. it has both deterministic and nondeterministic attributes/ aspects it seems. carefully extricating/ delineating/ unpacking the exact meanings from these words may be crucial to comprehending all this. this also ties in with the recent ‘cmnl’ ideas/ study. ‘cmnl’ is just a basic measurement of how much the peak index correlates with the bit width distance.

😳 😮 😡 😥 here is another way to study all this, simple ideas that could have been applied years ago, and feel that they should have been. having some strong feelings, some chagrin/ embarrassment almost bordering on shame that they werent done sooner! (am being a little melodramatic here, shame in the similar sense a cyberchat denizen reflected on looking at the disorder of his windows download folder, lol!)

this code generates familiar 100 100 bit width ½-density iterates and looks at their dynamics. it sorts the list by many different trajectory parameters and then outputs the full trajectories color coded by the sort criteria. there are 3 standout signals seen below, the rest are basically undifferentiable noise. here they are laid out from basic to more revealing. it all seems obvious/ basic now but the 1st glimpses of these were a bit )( eyepopping. 🙄

  • the 1st graph is ordered by trajectory length ‘c’, this is a global parameter. trajectories are plotted lowest to highest, cold-to-hot by sort criteria. the strong signal/ ordering is apparent.
  • the 2nd graph is by ‘ncm1’ the log of the iterate at the (global) trajectory peak. there is initial signal/ ordering present in the short climb range less than the bit width distance, 100, and this signal/ correlation falls off fairly rapidly and is weak/ mostly gone by the end. another graph by ‘ncm’ which is the local trajectory peak has less strong initial signal.
  • the 3rd graph is by ‘nb’ which is the log of iterate at bit width distance. the graph is not a whole lot different than the 1st with strong signal, there is some remaining signal at the end. actually the initial signal is noticeably stronger in graph #3 than in graph #1. the signal is strongest right at the bit width distance x=100, note the strong ulterior black-lined blue streak in comparison at that bit distance in #3 missing in #1.

vary5b.rb

vary5b

vary5bx

vary5by

here are a few other features in this code.

  1. it outputs a datafile to look for correlations between some of the metrics eg in scatterplots. however, there is not a lot of resolution/ detail/ variation in some of them right now for this to be conclusive eg the glides are all “short”. one the main ideas was to look for entropy correlations, none were found, but dont take as definitive right now due to the unsophisticated sample, and anyway all these questions may not be too crucial/ central in light of new guiding principles.
  2. a 2nd analysis was seen earlier and am writing it up. sorting the initial iterates by (local) glide length ‘cg’ and then outputting them in binary reveals a striking property that ties in with earlier research: basically the thin 1-lsb “wedge” in upper left is correlated with the local glide length. the significance/ emergence of the 1-lsb triangles has been long observed/ known from other directions but its neat/ informative/ revealing to see this basic property “emerge” from this direction/ angle of random samples. signals were investigated for other bit diagrams by other sorts but didnt find any.

vary5bz

⭐ ⭐ ⭐

these are some striking results when carefully put in context. some aspects of this have been seen before eg similar color-coded graphs of long glides extracted that tend to look like self-similar mean-returning random walks but there full significance was not appreciated/ understood/ recognized in retrospect (have to find/ dig up those again to cf them). there is a very big picture here that can be sketched out now, am now having a moment of flashing synthesis, finally at long length seeing forest vs trees. have some new confidence in these basic ideas and the overall problem. its starting to all fit together, it all fits together. a basic insight is that graph #1 is “globally computed” (ordered/ colored based on entire trajectory lengths) and graph #3 is “locally computed” (based on bit width distance) and yet they are highly correlated.

so how about yet another/ new vivid guiding analogy? (synchronistically, just wrote a blog on robotics…) imagine a drone that is operator controlled but then can “return to home position” by pressing a button, some drone controllers now have this feature. here there is a dichotomy of manual control by operator vs automatic control. here automatic control is analogous to so-called autopilot. so here the “autopilot” or “return to home” function corresponds to the descending part of the glide.

but then there is another analogy. the trajectories seem to be “pre-” programmable in the sense that the initial iterate determines their trajectory via Terras construction.

(not wanting to overly complicate/ confuse the issue, but do want to note here in the spirit of careful analysis of terminology that in a sense the “return to home” function is also “programmed”—as an algorithm constructed by the manufacturer, only triggered by the drone operator. so again it seems like the language itself struggles with articulating the exact concept(s) here. various articles on “self-driving cars” are running into some of the same challenges of language to eg describe previously inanimate objects with autonomous near-human control capabilities—actually again spirally even further, is that yet another zen question? is a car an “inanimate” or “animate” object? and this dichotomy of language dates to some of the earliest 20th century terminology/ paradigm shifts when computer originally exclusively meant a human! the meaning precisely flip-flopped with the advent/ advance of technology! etymology can sometimes wondrously reveal lots about humanity and its history/ trends/ perceptions/ thinking/ consciousness ala “consensus reality”/ Jungian “collective subconscious” etc!)

but anyway, back to the matter at hand, there seems to be a key phase transition in the trajectory corresponding to the Terras construction and the bit width distance. not all the trajectory can be equally “programmed,” apparently in fact only its finite initial region can be. wrt drone analogy its as if/ like theres a transition from “manual” to “automatic control,” right around the bit width distance, which is also typically/ usually “not too far from”/ “in the neighborhood of” the trajectory peak in the “worst case,” ie “longest” peak indexes wrt initial bit sizes, ie largest ‘cmnl’, ie “some rough signal” remaining in graph #2 (in other words in graph #2 there is some rough correlation in iterates at or small distance between bit width distance and peak index). hence:

  • the trajectory regions are basically classifiable into what will be termed pre- and post-determined regions, where the boundary is simply the bit width distance.
  • the pre-determined region is completely arbitrary/ customizable/ “pre programmable” ala Terras construction/ algorithm, any sub-trajectory is possible. some can/ will “terminate” the glide/ climb/ even full trajectory within the predetermined region, ie “early”.
  • the post-determined region after the bit width distance is a semi-random, slope-mean-returning, descending walk. it could happen “significantly after” the peak and/ or glide if either or both occur “early” in the pre-determined region.

in other words the entire trajectory inevitably transitions from a “predetermined” random walk to a “semi-random” walk where the semi-random walk is “descent-biased”. the “Terras bit width distance” is the “exact” boundary of this phase transition. those following closely will understand the ramifications of all that but to spell it out, if all the above can be established formally, its all the basic components/ machinery of a full proof. leading to some burst(s) of interconnected/ intertwined thought(s):

  • now the challenge will be trying to verify this conjecture using different computational means. to the best of my knowledge it is consistent with all prior observations/ experiments.
  • in retrospect, a lot of prior experiments would be hard to interpret without this new understanding because they try to understand descents wrt peaks/ glides instead of descents wrt pre/ post determined regions and the above sketches out why that can be problematic/ hard to interpret/ misleading if either occurs “early” in the predetermined region, but also on other hand some prior experiments already point to this distinction; there were some scattered ideas/ suspicions expressed along these lines.
  • and this overarching 2-region concept is very central to a point that CT brought out in comments that led to/ triggered the strongly-/ long-needed deep Terras-oriented analysis recently, actually sketched out similar ideas in response to/ defense of CT at the time!
  • another interesting insight is that the long-analyzed ordered glides are apparently all within the “pre determined” bit width distance/ “programmable range”.
  • a major theme will be trying to understand “outlier” walks in the post-determined region, basically just the “disordered region” (now better qualified where it starts). from the prior plots they exist. this effort will involve nailing down/ tightening the boundaries of the mean-returning trend.
  • it almost surely has some intrinsic scale-invariant properties, such that “larger trajectories” might look like they have more divergence compared to smaller trajectories wrt certain “naive/ unscaled” metrics wrt the key transition point (unfortunately a lot look kind of naive in retrospect because so many are not scaled by or relative to starting iterate width, eg nonmonotone run lengths, trajectory indexes, max 0/1 runs, etc, although on other hand some “scattered” scaled metrics were constructed/ investigated).
  • another general theme would be looking at how well all the diverse different generation schemes, many of them regardable as outliers wrt the uniform density distribution focused here, still conform/ adhere/ fit in to this story.
  • another pov is that a mean-returning random walk is maybe about the simplest/ most basic kind of self similar fractal, and think its a very auspicious/ promising/ engaging sign to see them emerge at the heart of the analysis.
  • as for the disordered region, more substantially/ reassuringly/ impressively much of it has already been investigated, it has many known features, and its already not hard to see “how” the mean-returning mechanism may emerge from it. its obviously correlated with the ½-density, ½-entropy, short 0/1 bit run region. “just” need to look at/ identify patterns/ “downward bias” in the parity vector over this region… simple, right? (caveat: devil in details™ alert…!) 😈

yes it can overused to the point of becoming a cliche, but sometimes it really applies: ultimately this all represents a stellar/ potentially massive paradigm shift. there are so many potential avenues/ angles that this overarching pov nearly immediately uncovers/ puts into play, will write those up ASAP, some can be pursued in a straightfwd way with current optimizers. a very astute reader would be able to see some of the implications. the words of another great/ legendary investigator/ inquirer into secrets of the universe come to mind— who in his early career got a nobel prize in part for work/ analysis on random walks (which supported/ confirmed the atomic hypothesis).

it is the theory which decides what can be observed. —Einstein

⭐ ⭐ ⭐

(1/13) ❗ 💡 😀 this is a quick, nearly immediate/ instantaneous result/ gratification. all the above was a lot of words but maybe it can be distilled into some simple new ideas/ straightfwd incremental steps. my immediate idea was to come up with a new metric that measures the slope-mean-convergence of the postdetermined region and see if countering it could be maximized, ie maximizing divergence from the slope mean. what does that mean? in a sense it is amount of distortion away from a linear (perfect slope) trend. it was not too hard to construct. the idea is to fit the start and endpoints of the trend and then measure divergences (error) at each point in the trend from the linear estimate. the trend is the logarithmic value of iterates after the bit distance. its encoded in the slopediff method, inside the bitwise code. the results are in the 1st graph that measures a gradually increasing ‘m’ value. at the end of the run the largest divergence trajectory is analyzed in graph #2, ‘x’ log scaled iterate value red line, and the ‘y’ green line is the linear estimate, and ‘d’ blue is the absolute difference/ error. this is close to what was desired but the code rules out/ rejects that the divergence (by this 1st cut metric) is bounded.

bitwise18.rb

bitwise18x

bitwise18

the graph quickly reveals the issue, even somewhat already anticipated. even though visually theres a strong conformance to the trend (in a sense an “asterisk”/ “footnote”/ fineprint on the supposed refutation, ie in other words undermining its unequivocality), larger curves with the same relative divergence will measure higher divergence. so then heres a quick riff of changing only a few lines of code. this calculates the relative divergence ie prior metric divided by iterate logarithm, ie a scaled slope mean divergence ‘ms’ and then optimizes by it instead. sure enough, it looks quite stationary/ bounded. a scale-invariant measurement of the slope-mean-returning tendency, voila. apparently a key step in defractalizing already achieved.

bitwise18b.rb

bitwise18b

(1/14) ❗ 💡 ⭐ 🙄 lol had a flash/ brainstorm and it could wait until coded but then couldnt wait to share it. sometimes real substantial progress comes with just a thought/ paragraph. there was something hard to figure out about bitwise18 #2 graph, something missing, “something on the tip of my tongue,” “couldnt quite put my finger on it,” didnt quite like how the scaled metric works relative/ differently wrt the higher vs lower parts of the graphs. what is a natural measurement of discrepancy from the linear fit? oh geez (popping forehead with palm) its just utterly staring me in the face! how many times used it already? the correlation coefficient!

further brief thought/ question: there must be some straightforward/ simplified version of the formula when one of the variables is linear/ a line…? am sure its written up somewhere… it would probably be hard to locate though and am not going to go thru the trouble for now, its more just a curiosity and higher efficiency has no need here…

(1/16) ❗ 😮 😎 😀 ⭐ ❤ this is a big advance, basically a breakthru! wiring up the correlation coefficient ‘sc’ calculated in the slopecorr method was straightfwd (computed over postdetermined log iterate values), and optimizing to minimize it leads to extraordinary findings. the algorithm basically fails to minimize it as iterates get larger, ie it can (apparently) only increase for larger iterates. in the 1st run there is minimization by this ‘sc’ metric only. it does manage to compress it some (thick red dashed line) but generally “at the expense” of not-increasing iterate sizes. or rather iterate sizes increases seem “stunted” some as the optimizer tries to minimize a metric that is apparently harder to minimize for higher iterates. its not totally obvious from the graph but trajectory metrics for peak index and glide are low/ nearly flat, ie the optimization does not and/ or cannot “coincidentally/ naturally” increase them.

note the graphs are postprocessed data sorted by initial bit width ‘nl’ black left-to-right. the run ordered-graphs have quite a bit of thrashing in the ‘nl’ values. the 1st analysis looked at the lowest ‘sc’ seed and found it was not many bits, leading to the idea of the postprocessing analysis sort by bit width.

in the 2nd multi-optimization all the other key trajectory variables iterate size ‘nl’, peak index ‘cm’, glide length ‘cg’, total trajectory length ‘c’ are attempted to be increased while decreasing ‘sc’. here the results are even more striking and the ‘sc’ metric quickly climbs toward its maximum correlation of 1. occasionally “tooting ones own horn,” using the correlation coefficient is now looking like a stroke of genius. the basic conjecture is not only (empirically) confirmed but “confirmed/ even strengthened with flying colors!” spelling it out for those not up on all the details:

in short the postdetermined region is increasingly closer to a slope-mean-returning random walk as iterates get larger!

bitwise19.rb

bitwise19x

bitwise19

💡 further thoughts. these results are great fantastic but are a globally calculated bound (but necessarily with significant implications/ constraints on local dynamics, TBD). what about a more locally oriented metric? nonmonotone run lengths immediately comes to mind, am really wondering now about it, run monotonicity was originally constructed as a powerful local measurement yet also with global span. what is the max nonmonotone run length of the postdetermined iterates? looking for nearby measurement(s), the hybrid9 idea and powerful optimization from near end of last month gives/ implies some bound, it came in at 298 max ‘cmnl’; the postdetermined max nomonotone run length of iterates must be at least ‘cmnl’ (exercise for precocious reader).

but then another immediate question comes to mind. last month there was some philosophical handwringing about the “tree falling in forest, absence of evidence” problem. but just realized the prior methods have some relevance/ sign here. post-ordering the metrics found by the iterate size can imply/ (better) reveal a semi-natural or intrinsic/ ulterior/ hidden trend in the algorithm search, here of upward ‘sc’ correlation, esp if there is a thrashing effect in the bit size of the optimization frontier advance (and much less necessary otherwise due to the more natural bit width ordering). think not the 1st time used but its a valuable technique.

taking this into account then, what is the trend for ‘cmnl’ for higher ‘nl’? thats a key question of context. it could potentially decrease just like ‘sc’ increased and the 298 could be a local spike. 298 was not a very low number but the trend idea gives hope that maybe its a local spike and there is low/ tight monotonicity on the postdetermined iterates in the trend/ “bigger picture”… the extraordinary ‘sc’ trend might even be hinting and/or pointing to it…

(1/17) 😀 ❗ 🙄 more immediate gratification, on a roll lately! “good to be alive right about now!” wiring up the nonmono max measurement ‘mx’ over the postdetermined region (iterate values) is straightfwd/ not hard. the runs are as guessed: max nonmono run length red seems to be bounded as the frontier advances! in the optimization, there was not much bit length thrashing, it was more orderly, so it makes sense to just include the run graphs below. in 1st run there is optimization by ‘mx’ only, it found 247 max, but again theres a side effect of not being able to push up eg the peak index ‘cm’ green. in 2nd run there is optimization over the other glide related variables. ‘sc’ was included by accident from prior code but has no effect. it found 217 max ‘mx’ and pushed up other glide metrics more. color commentary: the old intuition about monotonicity measurement which has been typically much more elusive over the years finally bears substantial fruit attesting to its inherent utility/ value, once maybe the “backbone” or “spine” of the problem is possibly finally isolated.

bitwise20.rb

bitwise20x

bitwise20

(1/18) this is a revision of hybrid4 from last month. that experiment tried to find “typical” long trajectories from a genetic randomization approach, maximizing by peak index. it seems like it may lead to less bias than the bitwise approach. notable in these results are that the “typical” trajectories found start with long ordered triangles, they emerge even though many initial candidate iterates are very random/ disordered. that prior code was broken in that it didnt have a correct stopping criteria based on hardness, only a 3m time limit. this code fixes that and reviews the results. it cuts off the disordered tails (using same hybrid4 1st nonmonotone index logic) of the top 10 glides from bit sizes starting at 100 going to 1000 in increments of 100, graphing the max 0/1 runs inside them color coded by starting iterate size.

part of the idea here was wondering about a “typical crunch region” vs initial bit widths. expected them to scale somehow with bit size. but it seems to be significantly nonlinear. the 10x 1000 bit size iterate (yellow) did seem to lead to slightly larger 0/1 runs in the crunch region vs the 100 bit size iterate (black). but as found earlier, the tail region sizes do not seem to change in much trend; the higher bit sizes maybe are shorter from this but need more analysis to solidify that.

these results remind me of the bitwise9 and review117 results from 10/2018. those basically attempted to minimize (average) 0/1 run lengths while keeping down max 0/1 runs but found while the former can be kept nearly constant the latter tend to push/ grow upward gradually/ nonlinearly for larger iterates. its notable some similar statistics were found from the 2 much different optimization approaches (at least the rising max 0/1 runs wrt rising initial bit widths, also need to look at averages for the hybrid approach).

hybrid11.rb

hybrid11

(1/19) 💡 ❗ ⭐ so as long found/ known, theres something going on with “triangles” (connected to 0/1-lsb runs) in the crunch region. the prior code shows that some kind of natural optimization prefers initial triangles, and they were removed in prior analysis. whats left after those are removed? this code removes any initial triangle in the climb (using same logic wrt (non)monotonicity of the 0/1 runs) and then after in remaining region tries to maximize the relative 0/1 max runs ‘mx01s’ red dotted line (size of max run divided by bit size) and as suspected finds it cant push it upward even while pushing upward the other trajectory metrics as before incl starting bit width black, its apparently declining asymptotically toward a constant for larger iterates, trending to around ~0.1 in the limit. optimizing for ‘mx01s’ only consistently led to finding very high ‘mx01s’ values near 1 but failure to create glides of any length.

bitwise21.rb

bitwise21

(1/20) ⭐ the powerful optimizing framework is paying off along with the new insight into the deep/ fundamental problem structure where new insightful riffs come readily, almost easily. this quickly looks at ‘mx01’ over the postdetermined region and finds it flatlines at about ~19; it is multiplied by 10 for graph scaling/ visibility purpose. the 1st graph is optimization by ‘mx01’ only and the 2nd is by trajectory metrics and pushes them all much higher (roughly doubling them) but the basic outline/ story is the same, even better highlighting/ sharpening the ‘mx01’ bounding. in conjunction with prior graph it seems to show that the region with the largest non-initial triangle in the climb (which can increase in size as iterates increase) is basically in the predetermined region, a consistent result; triangle sizes apparently cannot be increased indefinitely in the postdetermined region.

bitwise22.rb

bitwise22

bitwise22x

this version uses ‘mx01s’ instead. it appears over the postdetermined region, ‘mx01s’ is slowly asymptotically declining toward 0 even though the optimizer is programmed/ “biased” to push it upward. in the 2nd graph optimizing by ‘mx01s’ only the algorithm cannot push up trajectory metrics much. all these results are substantial but have to be seen with the caveat that the bitwise optimizer has the limitation of fixing prefixes over time so it makes sense to confirm all the trends with the hybrid optimizer. but at this point it also makes sense to try to figure out how to construct a proof out of these substantial/ apparently/ hopefully universal properties/ trends. also, the earlier x-y trap idea now seems to fit on top of this recent framework. basically the predetermined region is the “trap” so to speak. its a new riff on the idea because its not x-fixed-sized, but it is variable/ finite-sized. it also makes sense now to define the postdetermined region as basically the same as the crunch region! some of the prior difficulty bordering on confusion in locating the crunch region is now more understandable.

the results are also apparently aligned with the earlier hybrid11 idea and the related experiments noted there. in that experiment there is presumably a rough overlap of the “starting nonmonotone region” selected with the postdetermined region and shows ‘mx01’ lengths gradually (sublinearly) increase with initial bit widths. actually the bitwise21 experiment suggests the starting-nonmonotone region somewhat precedes the postdetermined region. in the postdetermined region it looks like ‘mx01s’ asymptotically approaches 0 (this could be explained if ‘mx01’ approaches a constant in it even for higher iterates as bitwise22 seems to indicate) and in the starting-nonmonotone region it can be larger and approaches ~0.1. so at this point it looks like the variable trap idea reduces to this proof sketch/ outline, which seems aligned/ consistent with all experiments so far:

  • max 0/1 runs crunch down to a max constant say ‘mxc’ in the (ordered) predetermined region (by the end)
  • all max 0/1 runs in the (disordered) postdetermined (crunch) region are less than ‘mxc’ (aka “sufficiently mixed”)
  • limiting max 0/1 runs less than ‘mxc’ can fully account for, aka causes the (downward!) slope-mean-returning trend.
  • glide may be restricted to predetermined region or may extend outside of it into postdetermined region, but in that case only the descent side. climb is restricted to predetermined region.

bitwise22b.rb

bitwise22b

bitwise22bx

(1/21) this is a relatively quick adaptation of the hybrid code to search the bitwise22 metric without multioptimization, ie ‘mx01’ max bit runs in the postdetermined region. it ran for 1hr and then was manually terminated. results are a bit different and need more look. it is finding ‘mx01’ (here ‘z’ red again scaled by 10x) generally in the same range except for large intermittent spikes that approach ~60 and they seem to be increasing whereas the typical range plateaus around 10; its an unusual dual-trend phenomenon. this is a 2k pt sampled graph not displaying all the highest spikes. following are the 2 bit diagrams for 2 spike samples #1 and #7, bit lengths 85, 82, n= 36151301432163398945079295, 4518912679020424842969087

😳 😥 😡 in each case (0/1 both seen) the massive 0/1 runs/ “wedges” start almost exactly at the beginning of the postdetermined region, but both originating in the upper ~½-⅔. msb range. while seemingly already thwarting (“a key part of”) the basic conjecture (#%^&)—ie bullet #2 hypothesizing a limiting constant ‘mxc’—it can be said the bit lengths are relatively low (although not wrt total iterate size), and on the bright side the optimizer finding these “weird” cases is impressive. in two words spontaneous emergence. honestly am highly tempted to nickname them flying UFOs, lol! and a quick/ piercing shootdown of an invalid conjecture is actually a kind of indirect win esp on this problem where constructing/ analyzing relevant/ plausible/ formal conjectures is extremely challengingCT is grinning right now… 👿 😈 😛

it also may show illustrate/ demonstrate some limitation of the bitwise approach such that these eluded it. which by the way, musing on it more conceptually recently, another way of thinking of the bitwise approach is that it makes “small” single-bit changes/ extensions only on exactly the pre/post determined boundary. another way of thinking about the bitwise approach limitation(s) as noticed earlier is that it basically fixates/ narrows on some prefixes earlier in the optimization and cant make adjustments to the prefixes later in the optimization. as in the Terras construction this essentially “builds” trajectories starting from the earlier to the later parts, “left-to-right”.

hybrid12.rb

hybrid12

hybrid12x

hybrid12y

(1/22) was curious if maybe larger iterates had the same finding. it was easy to start the optimizer at 100 bit width instead of 10. this code ran overnight ~11hr and found a 98 width 0-run here and 97-long 1-run, so on spot checking there doesnt seem to be a bias either way. but also worth reporting is that a solid pattern appears in the mod3 diagram and also in the mod6 diagram in “roughly the same place with modified outline” (2nd/ 3rd below). the “starting black wedge” changes shape also. from mod2, mod3, mod6 the right wedge point goes from lower, to upper, to centered in both wedges. n= 1424050471526131671483742067328851476348221410770943

overall not sure what to make of this right now. but basically it looks like the crunch idea/ region is somewhat refuted at least as far as characterized by “small” 0/1 runs. this does not break the entire current hypothesis outline but once again requires some kind of substantial revision. the short 0/1 runs were basically a (long-running) idea to identify the structure of the postdetermined region from iterate “appearances”. possibly there is still some kind of alternate identification mechanism. is there some other way to view/ recognize them other than slope-mean-returning parity sequences? notably all these “anomalous” cases have large leading lsb 1-runs. but removing them does not seem to totally “take care” of the “anomalousness”. ie theres still a big gap between the two structures and roughly the size (in iterate count) of the starting bit widths between them. am next wondering about ways to (more directly) “synthesize” these embedded structures/ wedges… hint: look closely at preceding adjacent pattern, barely visible in the previous 2 bit diagrams…

hybrid12b

hybrid12bx

hybrid12by

(1/25) 💡 ❗ ⭐ 😮 this is some simple/ easy code to demonstrate the idea. its a new pattern! and shows how 0/1 runs can emerge out of even ½ density or entropy iterates. in short continuous ’01’ (pair) runs either lead to (adjacent) new 01 runs, 0-runs or 1-runs in subsequent iterates. not sure the exact rule, its not obvious, but it could be better understood by eg looking at the transducer behavior. this code takes random 50 bit width ½-density iterates and overlays/ “pastes” a random-size 01 run on top of it and looks at the pre/ post 0/1 run lengths. it graphs the deltas for 0, 1 in red/ green with the green line inverted. the iterates all start with small 0/1 runs so all the spikes represent cases that turned into long 0/1 runs.

vary7b.rb

vary7b

(1/26) 💡 ⭐ ❗ 😀 😎 still on a roll! its not obvious what to do at the moment but then came up with this natural/ general idea. the slope-mean-returning property is a very big deal and its already shown to be “tight” via bitwise optimization search wrt some key ways based on (1) correlation coefficient (bitwise19) and (2) nonmonotone run lengths (bitwise20), although need to hook up the hybrid optimizer to explore/ solidify those same metrics also. so this is very significant in that the so-called disordered region is (as earlier conjectured in a now-pivotal paradigm shift) actually apparently highly ordered in this crucial newfound sense.

order and disorder remind me of yang and yin which then reminds me of the yinyang symbol with “a dot of yin in yang and vice versa” and that old (buddhist?) aphorism about “yin and yang are like two dragons fighting”… this problem at its best is always flipping ones perceptions! but thats nearly the definition of a paradigm shift.

so whats another angle on this newfound order in the postdetermined region? at this point its almost like a shiny new toy to play with. there was an old idea a few years back, located it recently, but will have trouble finding it again, of looking at 0/1 runs in the parity sequence, and that old experiment already found some apparent limit. heres the simple concept. the 0/1 occurrences/ runs in the postdetermined parity sequence must be “balanced” somehow with 0 even occurrence slightly exceeding the 1 odd occurrences. one immediately wonders about the scale of this. it seems for larger trajectories it would be possible for these (max) runs to get bigger even as the slope correlation coefficient tightens as seen in bitwise19. but also, its not strictly necessary. on other hand the nonmonotone experiment bitwise20 shows an apparent constant bound that also actually suggests/ implies some constant limit on max 0/1 runs in the postdetermined parity sequence! the converse must be true also, ie max 0/1 runs in parity sequence limited implying bound on nonmono run length.

this code quickly confirms that is the case! ‘mxp’ brown line (scaled 10x) is easily wired up as the max 0/1 runs in the postdetermined section parity sequence and it flatlines/ plateaus maxing out at about ~20 even for increasing iterate sizes ‘nl’ light blue using bitwise optimization. the only quibble is that ‘cm’ green doesnt get pushed up much. the multivariate trajectory lengths optimization in graph #2 fixes that and is even more definitive/ persuasive, its not visually clear but ‘cm’ green hovers above ‘nl’ lightblue (as in many other optimizations and some that focused on their interrelation).

another simple way of looking at this is that its a search for postdetermined 0/1 lsb-runs/ triangles and they seem to be limited which is not inconsistent with prior observation of large spontaneous 0/1 run structures/ wedges emerging in the postdetermined region as long as they dont emerge at the lsb side. deja vu! there was a lot of earlier analysis of 1-lsb triangles not long ago but before the pre/ postdetermined paradigm shift and have to go back and review that. my current recollection was that the eventual finding was that long climbs were possible without 1-lsb runs, therefore apparently disconnecting/ decoupling/ ie “not necessitating” them from/ in the climb dynamics and then not further pursuing that line of analysis at the time. but maybe that doesnt interfere with the dynamic of 1-lsb run absence in the decent region. in other words maybe there is some basic difference between the predetermined lsbs non 1-lsb run/ lsbs “disorder” and the postdetermined lsb disorder. ofc that statement is a tautology for the single lsb (which generates the parity sequence 1-1), but maybe extends further.

oh! but notwithstanding a basic distinction. there is a concept of 0/1 runs in the parity sequence vs runs in the iterates. lsb 0/1 runs in the iterates necessarily lead to runs in the parity sequence but the converse is not necessarily true. there could be a run in a parity sequence without lsb 0/1 runs in the iterates… right? have to think about this/ nail it down a little more. ❓

another big deal: if both 0/1 runs are limited in size it means that adjacent pairs of them must be limited in size also. but this is actually extraordinary because it then implies the postdetermined parity sequence is entirely comprised of a sequence of concatenated finite-set-count “pair” patterns. this experiment even points to the (rough) finite width measurement of the pairs: 2×20 ≈ ~40 bits 😮 ❗

but this then also leads me to wonder in a wild late-night reverie/ flash. there is a sense that iterates in binary are converted to a parity sequence in both pre- and post-determined regions, sequentially/ locally computed. could it be that this iterate-to-parity-sequence mapping could be computed somehow by a FSM transducer? its a tantalizing thought that could really break open the problem dramatically if true. honestly really have no idea at moment how to figure this out. ❓ 🙄 o_O

bitwise23.rb

bitwise23bitwise23x

(1/27) this quickly hooks up the hybrid analyzer to the question of 01 max pair lengths in postdetermined parity region ‘mxps’. finally changed the code to optimize by the variable name instead of the generic ‘z’. ran it for 1 hr. sampled results are below. trying to set aside wishful thinking the results are encouraging but not extremely definitive. the optimizer finds a max 55 and the graph looks somewhat bounded. however it has a bimodal appearance of spikes vs baseline reminiscent of the recent hybrid12 code and the spikes seem to increase in magnitude. visually it looks like they get rarer but slightly higher and theres a very subtle upward trend in the baseline left-to-right, like 2 long plateaus. some of the irregularity is clearly due to the bit width bin selection trends associated with the optimizer (green) and (as usual) more investigation is needed.

another way of thinking of the optimization dynamic here as noted in some prior experiments: the increase in iterate sizes will tend to be stymied as they will go “sideways” instead if the optimizer criteria is inconsistent/ incompatible/ rarer with higher iterate sizes.

hybrid14.rb

hybrid14

(later) it doesnt cost much to run another simulation, maybe just a little electricity, and this is an apparently very meaningful/ crucial property to test/ analyze. and its important to leverage existing code/ ideas as much as possible because honestly, despite the apparent profligacy here, coding is actually in some ways a very labor intensive/ expensive activity. this starts at seed bit sizes 100. it ran ~2hr and then ran out of memory (to keep track of candidates) with about 5M candidates evaluated. the max found was (surprisingly, less!) 42. either larger maxes are not seen at the higher bit widths or they are more rare and couldnt be found. the analysis led to something to think about. noticed something a little funky on graphing and then decided to code it up in the somewhat revised analysis routine. part of the issue is that the 2k point sampling is rather small compared to total # of points generated. looked at the way presorting the large point array before the sampling effected the graph. in another case it didnt, in this case it did.

graph #1 is the unaltered ‘t’ time ordering. #2 is sorted by the optimization metric ‘mxps’ and doesnt look discernably different. #3 is sorted by initial bit width ‘nl’ and shows something moderately remarkable. the stairstep effect in ‘nl’ is due to the algorithm repeatedly finding local optima in either the same (level/ flat) or new bins (the upslanting increases) during the bin arrangements/ evolution. whats quite noticeable is that the ‘mxps’ spikes correlate/ line up with the step increases. dont have an immediate explanation for this right now. but it seems to relate to the bimodal effect already observed. and shows how there can be subtle effects revealed in superficially only slightly different analysis logic. ❓

hybrid14c.rb

hybrid14cx

hybrid14cy

hybrid14cz

(1/31) squeezing in a final analysis this month. was wondering about the distribution of binary strings in successive iterate prefixes then came up with this visualization mainly in response to the data “shape”. it analyzes two earlier distributions from bitwise9 (‘a’/ ‘mx’). the lines are colored by sequence in the generation ie basically starting bit width. it analyzes strings of size 4-10 and counts total (cumulative) unique occurences over 100 trajectory samples. the vertical scale is 2^n scaled to 1 where n is the string length. the higher string lengths push the line trends downward.

what this is showing is that for short bit strings all are quickly found in the iterates (searching the predetermined range only) but for longer strings, a smaller/ “compressed” amount of all possible strings are found. there were some other experiments along these lines (actually among some of the earliest ones written up in this blog). another way of thinking about this is that a certain “small” portion of iterate prefixes are climbs/ (sub)glides and those will be seen more in the distributions. there is also some biasing towards certain prefix strings (basically the steepest climbing ones) due to the generation algorithm. on other hand it is possible to get arbitrary substrings in the prefixes at least over the predetermined range, exercise for reader, hint, the terras 1-1 construction applies/ can be used. however there is some kind of “balancing” that must play out such that downsequences have to be preceded by at “least as much”/ “corresponding” upsequences. but actually (as the astute reader would realize) my aim was analysis of the postdetermined region, and just noticed this strong pattern/ signal 1st and didnt want to let it go without mention.

prefix4.rb

prefix4x

prefix4y

4 thoughts on “collatz mod3

  1. Pingback: collatz comb | Turing Machine

  2. Pingback: collatz revisited | Turing Machine

  3. Pingback: collatz match | Turing Machine

  4. Pingback: collatz search for overarching strategy | Turing Machine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s