at the 2nd year of the 3rd decade of 21st century… the search continues… for an overarching strategy!
had some ideas on how to improve the
hybrid algorithm. looked at various scenarios then turned up this somewhat strange/ remarkable finding. afair (“recall/remember”), there is nothing so far to optimize by ‘hc’ or just total trajectory length. almost all optimization is glide oriented and optimizing by trajectory lengths is different. my initial idea was that optimizing by trajectory length would tend to imply long glides. but very much to my surprise it doesnt. dont really have a clear idea how to explain this right now.
- (long) wanted to do a major/ large refactor but settled for something medium. this code moves the iterate generation logic into a subroutine. it would have made sense to do this a long time ago but was busy hacking away in the meantime.
- this code has some new ideas for calculating binary differences faster. it occurred to me to do binary sorts and then compare prefixes of adjacent entries. this decreases a n2 comparison to n log n and it really shows up in the performance difference aka speed improvement. however, it is not turned on in this run.
❓ ❗ just looking at optimizing ‘hc’ turned up this effect, 1st graph also sorted by ‘hc’ left to right. the code creates noise in the lsbs and the msbs are nearly uniform. then looking at trajectories, the algorithm is selecting identical trailing trajectories. some feeling on deja vu on this, maybe have seen this before and didnt write it up. is there something exploitable here, some kind of inductive structure? if the longest trajectories are built recursively, and are somehow “unique,” that is a breakthrough finding. but ofc without much else to go on, this finding is likely due to the hybrid algorithm settling on a local minimum.
- the stopping detection/ criteria for this code was very tricky, tacked on last, and am not satisified with it. it stops if a time count of 10x elapsing since the last best iterate was found. over multiple runs it seems to either quit very early getting stuck in a local minima, or alternatively, take a very long time making small incremental improvements. on the other hand, a tension/ near root conflict playing out here some, the stopping criteria cannot really “overcome” an unruly optimization dynamic/ tendency.
- another problem with stopping distance: the algorithm is entirely working in a nonstationary way in a sense due to the shifting adjustment of the z-norm calculation versus the overall current best-candidate population and not over all points found. so in a sense, it cant really “tighten” against a un-fixed, moving target. have to think more about how this affects the optimization… suddenly trying to determine overall “fitness” is getting more complex. the challenge seems to point to trying to (efficiently) track/ estimate statistics over all points found, not just those saved/ retained wrt highest fitness.
the graph is for a long run with 250k iterations manually terminated. the 1st graph is the binary diagram and the 2nd graph is the top 50 trajectories sorted by ‘hc’ showing the algorithm settling on an identical trailing trajectory sequence of about ~1k iterates long, colored by trajectory #/ length shorter to longest.
what is this saying? it is almost the opposite or converse of an earlier hybrid scheme that created noise in the msbs instead of the uniform lsbs, my memory is that it was a glide maximizing algorithm, need to track it down. there is some possibility that there is some kind of “conservation” going on here, such that if a trajectory has a glide, its later drain is steeper than if it doesnt have a glide, so that yes, (for a given initial bit width) the longest trajectories by bit size are nonglides… this would be remarkable if it could be shown somehow… it seems to run counter to a lot of prior intuition/ experiments!
but one must always keep an open mind! or again, the algorithm is just getting stuck in a local minima, and its easier to generate drains than to find glides of the same distance, the more plausible explanation! sometimes, optimization involves “kicking/ turning screws” on “greedy” algorithms that also end up, not surprisingly, also being “lazy”! kind of a metaphor for human experience too, huh? 😮
(later) surveying, googling, trajectory merging was found eg on 12/2019
construct100b experiment on 1-lsb triangles. have done some other merging diagrams but not sure if they are written up, hopefully close to this (dont like to lose/ not record findings/ results at least somewhere!).
it particularly helps to do what might be called end or right-alignment of trajectories to identify/ study/ understand this property better and it would have made sense to have immediately applied it on those diagrams, have done it occasionally, but maybe the technique is more powerful/ applicable/ revealing than has been utilized so far, and am already getting ideas. eg think it hasnt been done on Terras density (rosetta diagram missed it!)
or even ½ density iterates, wdittie! *
* (oh, see that it was done for the ½ density iterates.)
cant easily recall/ find right-aligned diagrams. would like to do a reverse index of these pages by diagrams! that would be very helpful for retrospective surveys… aha, this is almost same thing,
construct101c on 1/2020 are right aligned heatmap diagrams over complete trajectory sets of given bit widths, the 2nd is ½ density case.
- this codes main idea is to use some higher abstraction of the optimization variable settings to control the auxiliary variable computations. this is somewhat tricky because the auxiliary variables may depend on the variable z-normalization, and also lead to new z-normalizations themselves, so its something like a 2-pass application of z-norms pre and post auxiliary variable calculations.
- then it is easy to specify linearization/ extrema optimization of both ‘hc’ and ‘cg’ with the variable settings.
- this code also turns on/ applies the binary difference logic. however, as found/ seen below, the optimization of the other variables pushes against it.
- the code is missing stopping logic and is manually terminated here after about 100K iterations.
- validating/ verifying the binary difference here called “prefix difference” is working ok can be/ was done by graphing that parameter separately and noting a high range to decrease for adjacent dissimilar-to-similar iterates.
graphs are sorted by ‘hc’ left to right.
- so the results are that in graph #1, strangely ‘cg’ blue (alas not easy to track in the clutter/ overplotting) seems to go through a 3-phase cycle of low, high, medium. have seen something like this before in some diagrams, feeling deja vu, but it would be a big search to find something like it…
- in 2nd graph, even though lsb-bit difference maximization is turned on, the algorithm is not able to achieve it for midrange-to-higher ‘hc’ ranges. and then there is again some strong msb bit similarity/ apparent trailing trajectory merging on the highest ‘hc’ glides.
- in 3rd graph, the low-high-medium aspect is shown over 40 trajectories sampled evenly distributed by ‘hc’ over the entire 500 color coded by ‘hc’ (index). with the new ‘cg’ attention the algorithm is now, after being more “led/ forced to,” finding long glides and they seem to be embedded in the same ‘hc’ range as in prior experiment. immediately thinking, here again, a right alignment would probably reveal more!
❓ this dynamic/ emergent outcome is quite repeatable over runs. overall it appears to suggest (1) the midrange glides lead to longest trajectories, (2) and that glides of various lengths can be found over trajectories of various lengths. are they independent or dependent? this data seems almost to support both povs. it makes me think about doing a 2-way linear optimization of glide length + trajectory length, what would that look like?
❓ also on thinking about this, it appears to me there are 3 main “characters” or regions in the glide: the left climb, the right descent, and the trailing drain. how are they related? is this anthropomorphic bias to see these? what is the “natural” way to look at trajectories? is the whole concept of a glide a human-centric projection? is it more just a random walk with a predetermined range and a postdetermined range? as usual struggling to wrap brain around the findings/ deeper implications/ ramifications. sometimes it all seems utterly slippery, with nothing to hold on to.
it is a really
trivial basic exercise to do these alignments and should have built that basic analysis code long ago, wdittie, — but actually, did think of it. the code/ data could obviously be streamlined by outputting the data only once along with the offset columns and replotting with different commands (column offsets) the “same” data each time, but oh well, whatever. 😀
this is alignment by of the same 40 glides by ‘cm’ glide max, ‘cg’ glide length, ‘c’ (right aligned) revealing some new povs! it was already somewhat apparent but this all emphasizes the varying slope of the drain. havent really thought about this a lot before and it needs some rethinking. the 3rd graph further shows/ highlights the highest “curvature” (initial glides) on the intermediate length trajectories and shows a familiar wedge shape just cited/ recalled in
construct101b/ construct101c. however, those somewhat missed the “inner/ embedded curvature” aspect more apparent here due to their different approach/ angle/ focus.
looking/ starting at the graphs some led me to suspect that
cm1, cg1, global peak/ glide lengths might lead to different results, but they dont, visually they are not distinguishable.
❗ 🙄 😮 holy cow that last graph really reveals a lot, did you catch it? actually it seems to be a confirmation/ validation of the basic proof idea long pursued. it appears it is showing there are roughly 3 types of trajectories that emanate from the “core” density/ entropy region:
- those in “middle” (horizontal) of “middle” (vertical) that glide roughly horizontally for some limited amount of distance before draining in a “midrange slope” drain.
- then there are those that “drain relatively quickly” on the right and less sides, that is, more precisely, enter the drain quickly, and have have more (right) or less (left) steep drains.
from prior binary graph #2 (
hybrid53b) the left side long drains tend to have similar binary suffixes, and the midrange glides/ midrange slopes tend to have similar binary prefixes. the idea of steepness of drains has been around since the beginning but havent really thought about it carefully, need to reorient on this now. again, as in prior
hybrid53 experiment, the longest trajectories by initial bit width tend not to include glides, but instead are the milder-slope drains.
the hardest cases to deal with are the “midrange” trajectories which lead to glides, the left/ right ranges dont lead to glides. so then it appears two key remaining, missing elements need to be isolated around the problem, largely already long sought but seen here in a new form/ light:
- a midrange trajectory must have some kind of parameter/ feature that “runs out/ counts down/ decreases” as the trajectory progresses, aka a (long sought) counter concept, ruling out the divergent case. as has previously been analyzed in many other contexts, it does appear here some of these glides might be significantly longer than the predetermined range (here 200).
- the midrange trajectory can “bounce around” some in its relatively narrow glide range, but it cant repeat hitting any points in that range, to rule out the cycle issue/ case. this question/ dynamic is similar to the “pigeonhole principle” found in math/ CS proofs/ algorithms etc
and it can be seen how some of the complexity in the analysis arises, a lot of prior analysis does not discriminate between these different cases of “left, midrange, right” (wrt prior diagram). it seems to be a key new conceptual/ organizational structure/ classification that sheds a lot of insight, and hopefully as always, exploitable leverage.
(later) this code builds on a subtle seeming glitch that can be seen in prior graphs. in prior graph #1 ‘hc’ red starts out at ~3 on the left side right underneath the graph legend. wait, what? is that right? can it go lower? what about the other side, the max? the algorithm is trying to maximize (extend/ push) the ‘hc’ extrema but ends up possibly throwing out low/ high values, its avoided but not directly ruled out. oops! but its not at all trivial out to work around this.
💡 my idea is neat, am psyched how it came out, am calling it something like “pinning.” possibly somebody has done something like this before, but its still cool/ novel-looking; it reminds me of patent critera, useful, novel, and non-obvious. the idea is the algorithm can track points that are at the extrema and never discard them if they are extrema.
however, much easier said than done. this has to be done in an online, incremental algorithm to save on linear-time complexity of looking for mins/ maxes over the collected data set. and also, there may be multiple instances of a single extremum, ie multiple cases of the same value, and its ok to delete nonunique instances. so this leads to some fairly sophisticated “online/ incremental” code that tracks multiple (nonunique) extrema with dynamic insertion/ deletion! cool stuff! its not very much code, but not easy to get right! the sophisticated optimization construction set/ toolbox continues to expand/ evolve… 😎
- so then the algorithm automatically pins any variables that are being extrema-optimized. tracking when the prior code was doing that reveals that it does regularly throw out extrema.
- full disclosure however, the prior algorithm was still working very well, eg maybe it was mostly only throwing out nonunique extrema anyway, and the new code doesnt seem to change the final extrema too much, but does offer better/ strict guarantee if retaining extrema are considered crucial. (so take that, hah!)
- since the last graph(s) showed both prefix and suffix similarity, there is also some code to do bidirectional bit differences by sorting by both binary prefixes and suffixes, and the bit difference is calculated as the minimum of both.
- this adds linearizing/ extrema optimizing on ‘cm’ to the mix
the code has stopping criteria but this was manually terminated after 200K iterations due to the apparent endless incremental improvement scenario playing out. as already hinted/ noted/ admitted with the extrema, the overall final results are not too much different, much of the prior noted/ described patterns are identifiable/ repeated, eg the prefix/ suffix bit similarity/ trend is still present even with the bidirectional calculation; the other pov/ side is that repeatability is good. the only graph with much difference is the right-aligned trajectories and it shows the algorithm successfully manages, as coded/ “directed,” to find slightly longer/ higher glides. it suggests again that finding glides is hard(er) and the algorithm greedily/ lazily tends to avoid it unless pushed to.
(later) scanned thru a few years of images on wordpress, it does have a by-date image browser/ index, very useful! and am a little shocked that, assuming its complete (have to figure that out/ verify asap!), a particular result wrt that is vivid in my brain seems never to be written up in particular/ detail. the closest experiment seems to be
hybrid3 from 12/2018 that optimizes by ‘cm’ and then finds long 1-runs. it seems not to realize, my suspicion is the 1-runs are all 1-lsb triangles.
(later) this is basic code/ exercise to optimize by
cm, cg separately in that order alternately and gets similar results across the cases; have been aware of these patterns for awhile but didnt write up and its included now for completeness. it shows the pattern where the algorithm chooses/ settles on long lsb 1-runs and a kind of “dripping matrix dust” in the msbs. this was seen previously/ recently (
hybrid45, hybrid46 on 10/6) but density was restricted to ½ in that prior case and here its unrestricted. there is a slight change from prior code. this terminates on 50k iterations.
the trajectory graphs are aligned by the optimization variable. in the 4th graph theres no visible correlation between glide length and drain length however theres clearly a trend in the glide where glide max is earlier for longer glides. for ‘cg’ the left-aligned graph also has a lot of pattern so its included here also, it shows the post-peak slopes tend to be fairly even, esp in the glide, and then tend to spread some post-glide drain. the 3 graphs are evenly distributed 40 trajectories but note the color coding/ sort sampling is by glide lengths in all cases (oops).
(later) 🙄 😳 😡 lol! found it! not going totally crazy after all! it was
hybrid24 on 12/2019. the wordpress thumbnails displayed the image cropped as blank/ empty/ white so didnt catch it, also forgot it reversed the black/ white bit coloring scheme. it was horizontally instead of vertically aligned (or vice versa). so now the documentation is “extra complete” lol … actually, trajectory plots are new for this and showing ‘cm’ hybrid optimization works similarly to ‘cg’ wrt 1-lsb runs has probably been semi observed but not noticed exactly, ie semi overlooked on
(1/2) ❓ pondering the feature-map outline idea. the prior ideas seem to show a possible wrinkle/ gap/ limitation. the feature-map outline somewhat seeks to convert the problem to a continuous one. however, there remains the key question of trying to avoid/ eliminate/ rule out cycles. the feature-map outline had some idea of converting the continuous states to discrete ones and then analyzing cycles in the discrete states like cycles in a graph. so which is it, continuous or discrete?
(1/3) at this point theres a real risk of repeating code, alas, thats a bit cringeworthy, at least if it took repeated coding, because human attn/ concentration is definitely the key scarce resource. ofc a lot of code is already repeated with low effort wrt copy pasting aka copy pasta. think the multiple glide alignment idea is buried in some old code and now its redone. am doing a lot of behind-the-scenes analysis also. another challenge is trying to write up all significant results, some are now lying around. a lot of “review” code was built up in the last 2 months, some of it throwaway, some of it not, spent quite awhile trying to review the review code, its over ½ dozen separate files now. wrt posting the last major new features were constructed in
review170b on 11/2020. after the new hybrid data is generated from
hybrid53c, wanted to do a quick analysis. really need to refactor a lot of code into a general “feature generating/ tracking” system, thats long needed/ overdue.
here is a neat idea that was lying around in code for weeks, probably intended to write it up due to its signal, and then got carried away with other stuff, so its almost collecting some dust at this point, and narrowly missed getting buried in the sands of time… looking back at the code it was ½-baked and left off in something of a highly commented/ modified/ nearly incoherent disarray, but vaguely/ even hazily/ dimly recalled getting some interesting signal, but could barely remember, so tried to reconstruct it some. got into a groove/ zone/ flow and managed to resuscitate it last nite after hours of careful attn.
so this is highly refactored and collects/ consolidates a few dozen signals into a single program, most of them seen before in some way or another. one challenge is that all the signals tend to vary in min, max, average, spread and graphing them all can be tricky. in the past, have sometimes normalized ranges, but here that can change the signal some. so in this case, went with collecting all the signals that have similar ranges into separate graphs. this has 10 separate graphs. it would be fun to display them all and they look interesting but its also something of a massive clutter at this point. for now, am just posting this neat new analysis that almost got passed by. the graphs come in pairs with the 2nd with a 100-count running average that really helps find trends.
there is some similar approach in years past but its been quite awhile. its a straightfwd/ basic calculation but leads to a lot of emergent aspect. this looks at the length of 0/1 runs and their positions in the iterate, and color codes by the run length, shorter to longer lengths cooler to hotter, with a dot at the starting run location. there are different filtering approaches but this one turned on selects top 15 1-runs. there is some remarkable/ complex pattern extracted here, eg am esp noticing the darker scattering in the msbs. again this analyzes the output of
hybrid53c. there some other related numerical signals building on it; apparently theres a lot to be extracted here relating to 1-run distribution within an iterate.
(later) 😎 ⭐ ❗ this is some remarkable analysis, have wanted to do something like this for a long time. it collects all the features into a prediction engine. there are some features that are based on parameters in this case bit 0/1 and 4 separate filters ie 8 total variations of 10 variables each, total 80! there are 33 unparameterized features for a grand total of 113!
then regression analysis is applied, but oops, that is much harder than one might expect because (this is not so surprising!) many of the features are linearly dependent. so how to deal with that? so much for manual intervention at this point. this code has a programmatic, systematic way that is not extremely sophisticated but works/ does the trick.
it looks at the correlation between each variable and the prediction variable, and sorts/ ranks them, and uses the best correlated variables, but only if they are separated by some minimal difference in the correlation. this works because linearly dependent variables will have nearly the same correlation with the prediction variable, but the converse is not necessarily true; if they have the same correlation with the prediction variable they still might not be highly correlated with each other, in which case this algorithm might miss some additional helpful signal.
this still leaves many variables to work with, a few dozen in each case. a more sophisticated analysis would look at the n2 correlations between variables, to deal with the case where two variables have similar correlation with the prediction variable but are not correlated with each other, which would be excluded, but this is acceptable for now, not sure how much the latter more computationally expensive calculation would improve fit right now. there are diminishing returns with adding variables/ signal, and here generally a lot of the combined signal comes from a few of the variables.
the prediction is on the basic glide/ trajectory length statistics
c (hc), cm, cg. the regressions achieve a beyond very respectable, 0.54, 0.65, 0.64 correlation coefficients bordering on impressive! its also very interesting that individual variable correlations never exceed ~.40 which certainly shows the complementary power of variables wrt linear regression, a sort of statistical equivalent of the rare outcome where in some sense “the sum is greater than the parts.” a glance at the variable selection shows that nearly all the variations are being used in a scattered way. so overall its highly sophisticated signal extraction system tuned for collatz from years of focus.
there is a massive amount of data to look at, eg one can sort by errors, absolute errors, or even predictions on each variable and look at those graphs for significant insights into the weaknesses/ “blind spots” of the model and how to build improve feature “coverage,” theres already so many visible trends its hard to know where to start, and overall it even looks like a whole new analysis angle/ lens/ array/ pov— informative/ information-packed, but timeconsuming! clearly, in most/ many/ almost all cases, the regression tends to cause errors to distribute into certain patterns, which are probably also striking hints about how features both “work” and “dont work.”
could easily paste more than half dozen graphs, but am going to limit it for now to these 3 diagrams which reveal intriguing/ standout/ maybe key/ somewhat strange trend that maybe/ apparently reveals some kind of intrinsic, central limitation. each prediction variable has a very distinct ceiling in its predictions. not sure exactly how to interpret this but its as if theres a “horizon/ boundary/ wall” that the predictions cant overcome/ “see past.” this can be seen in prior graphs but its a bit striking for it to arise here with each and every prediction variable. its almost as if the data itself is pointing to some kind of fundamental unpredictablity… ❓
the graph ranges could be improved and are compressed due to the manually switched off plotting of the other trend lines filling the then-empty ranges/ space. in the 3rd/ last case ‘cg’ it apparently corresponds exactly to the postdetermined range. in the 2nd case, it seems to be about/ very close to ~½ the predetermined range, 100. in the 1st case ‘hc’ while still quite distinct its harder to understand what the boundary is relating to, turning on other variable plots, it doesnt seem to relate to the other 2. idea: looking at the trajectory alignments again, it does seem to be about ½ way between the shortest and longest glides by length count…? ❓
💡 it might/ bit it would be helpful/ illuminating to try to plot the predictions on the real trajectories, wrt the different alignments.
(1/6) wrote a bunch of code to look at some modifications
- looked at how the predictions varied over trajectory plots and did not find clear patterns there.
statsampleis handling a lot of variables/ datapoints well/ quickly in some cases a few dozen variables x 1K points (impressive!), but very unhelpfully doesnt give any hint whatsoever on which specific vectors are “causing” the linear dependency… @#%& 😡 🙄
- did pairwise correlation comparisons between all the feature variables and tried to use results to fit more variables, and outcome was very marginal in comparison. was able to fit a few more variables but with no change in correlation, and mess up the algorithm quite a bit such that it would return all infinite coefficients without error!
- and so overall it was hard to figure out whether it would fit or not (linear dependency error) based on the correlations, could not figure out a clear scheme by hand/ experimenting.
- so it appears (1) a lot of variables are already included, ie in the sense that maybe almost all that are helpful are selected, and (2) maybe the regression code is a little too sensitive to linearly dependent vectors.
- from distant memory recall there are special linear algorithms to deal with nearly dependent vectors and maybe this code (
ruby statsample) is not implementing it. it would take awhile to figure all that out, ie only by trying out new (specialized) regression algorithms. and it might only lead to marginal improvement.
- looked at how the model works on ‘hc’ prediction on “other” points. it is easy to look at 2nd and 3rd iterates. it looks like the regression correlation decreases about ⅓ to ~0.35 on the 2nd iterates of the 1k trajectories without refitting and goes down to negligible on the 3rd point. refitting the 2nd point gives similar ~0.50 correlation but over different variables. so with all this there is some question of overfitting.
- another idea is to look a the combined prediction accuracy of two consecutive iterates ie 1st and 2nd, ie trying to predict ‘hc’ as the average of both. without refitting the 2nd, correlation drops “over the two” slightly. this ofc is unsurprising but there was maybe some magical thinking going on here that rerunning the model over subsequent iterates could give better prediction accuracy which is possible if prediction error is not correlated.
thinking over the last graphs, another interpretation is that the algorithm is tending to make predictions near the average, and when they are all lined up, it looks similar to a ceiling. another pov is that it tends to have better predictability/ signal on lower to midrange iterates and then just guesses a not-widely-varying “high” for the high iterates. this might tend to suggest the features help understand the predetermined range if it has the major influence on the overall trajectory, and maybe its the postdetermined range that the feature signal is more lost.
(1/7) 💡 ❗ ⭐ 😎 😀 the correlations for
review176 are great, maybe “too good to be true.” quite remarkably they are using signal from only single iterates. is any of this due to an unusual kind of overfitting related to lots of variables to choose from? some of the prior experiments were suggesting to me maybe the model was fitting on artifacts of the generation data again, in the way it was decaying with error very substantially, almost totally, over subsequent iterates, and choosing different variables to fit on over subsequent iterates. and, there was already the issue of binary differences in the iterates known to be “compromised” by the generation algorithm; from graphs/ analysis, higher ‘hc’ iterates have lower differences even as the optimization stabilizes. again in a single key word, originating from ML + data science, its all about generalization.
so, what to do… next? after some very heavy lifting on the signal analysis/ extraction side, looks like time to tighten the entropy/ disorder of the generation side again. its adversarial algorithms again with the developer switching black/ white hats so to speak.
hybrid53b was adjusted again, in a slight way. this version runs 5 times, and has a 50k iteration cutoff, and due to the more efficient binary difference calculation, overall this took maybe only 20m! theres a lot of logic to look at the incremental improvement as a stopping detection indicator, but couldnt find anything consistent/ satisfactory, and there are hints the optimization isnt changing too much after 50k iterations.
then, the 5 x 1K data files are merged using a fairly sophisticated criteria. the code recomputes the binary differences over the entire set (instead of the 5 separate sets) and filters by all binary differences exceeding a minimum. then, it advances the iterate, also to “thwart” any bias/ artifacts in the binary structure of the initial iterates found by the algorithm. looking at the final distribution, the results come out looking almost the same as the
hybrid53c diagram again.
this did indeed decrease the fit correlation substantially from prior higher ranges into the ~.30 range, which is not so much signal. how to counteract that? feature averaging over consecutive iterates hasnt even been employed yet, and so it was plugged in. this logic uses a clever/ elegant idea of an anonymous routine/ lambda to polymorphically compute feature averages, along with a really great, truly reusable reappearing routine called
avgs that very easily computes the averages over identical “named variable matrices” ie matrices with named columns, using hashes.
⭐ after some experimenting and tweaking various glitches, eg adj linear dependency spacing parameter, and adj some feature calculations occasionally not being computed due to being absent etc (which plays out as code trying to get statistics on empty lists etc), come up with these awesome results!
- ❗ ⭐ with relatively low 20 count feature averages, the code finds spectacular correlations of 0.73, 0.87, 0.89 in
hc, cm, cg. this is probably the point of diminishing return however; signals do not go up much more with 30 count averages.
- ❗ ⭐ very significantly, the code is consistently choosing/ fitting the same variables for
hc, cm, cg:
mx, a2, ma(with slightly different parameters) are consistently in the top 4 variables. this points to generalization and a fundamental property of the collatz mapping as opposed to “merely” finding signal “artifacts” in the generation algorithm. it seems to be fundamental characteristics of glides and/ or “mixing” (not?) found in drains…
- the variables chosen indicate its the longer 0/1 runs relating to signal, again some affirmation/ vindication of longstanding hypotheses/ approaches/ findings/ hints/ clues. further validation, theyre also visually related to
- for these high correlations, somewhat less like the lower ones, the code is squeezing most correlation from the top 3 variables, because the overall fit correlation is very close to the correlation of the top variable.
- there is a subtlety/ wrinkle already called out a few times seamlessly surmounted by the algorithm: advancing iterates in glides eg with the merge/ feature averaging operations significantly changes the local calculations of
cg, cmdue to the “falls in climbs and climbs in falls” aspect of glides, spec the 1st case of the 2. this has the effect of pushing the algorithm to compute/ predict the global
cm, cgvalues, presumably harder to calculate than the local values…
⭐ ⭐ ⭐
(1/8) (switching gears) sometimes it seems there is nothing to work on, other times it seems there are so many leads to follow up on and dont know where to start. lingering in the back or verging nearer to middle of my mind for awhile now, the
mod3 analysis of iterates was found to have significant signal on 1/2019 but then got into other stuff. it also shows up some on backtracking ideas eg
backtrack6b, backtrack9b on 5/2019. was thinking about it some more and awhile back (about ~1wk) found these very basic but high-signal trends but didnt write them up immediately/ at the time. theres a “½-life” for going back/ trying to remember stuff like this so am writing it now before any more neural/ neuronal decay.
1st, this simple code compares 3 basic sequences
x, y, z. 1st ‘x’ red is a Terras 0.64 density sequence, 2nd ‘y’ green starts with a random ½ density iterate, and 3rd ‘z’ blue looks at a synthetic sequence with the same iterate density as the 2nd ‘y’ sequence. the graph is the cumulative average
mod3=1 divided by
mod3=2 counts, only for the predetermined range ie initial bit size/ sequence count 200. it consistently finds a difference in all of them. also not shown,
mod3=0 tends to disappear early on in sequences, and the synthetic sequence as one would guess/ expect has
mod3=0 with ⅓ probability along with ⅓ probability for
mod3=2 (not shown).
mod3 value is known after a 3n + 1 operation for the uncompressed collatz mapping, but what about the semicompressed mapping associated with Terras decomposition and used for ages now, ie after a (3n + 1) / 2 op? this seems to be nontrivial and relate to the statistics of number theory and maybe its prime factorization etc…
on cursory look this seems to be very deep (yes those two contrasting or seemingly incompatible things sometimes go together at least this moment— which reminds me of pascal trying to make his long letter short and failing due to time, or maybe the surface vs depth iceberg phenomenon, lol) and probably/ really should have focused on all this more a long time ago. in a way, it seems even more fundamental than density analysis which was discovered literally years ago now. should this seemingly straightfwd property be easy to prove? yes, it seems so, but its not immediately obvious to me how to do/ show it, its the kind of thing youd think other mathematicians would have noticed, and yet wonder if its even been published anywhere…
then its natural to question other ideas, and in my case it was wondering about the
mod3 behavior of the 1st postdetermined iterate for either changing Terras density glide, ‘x’ counts, or changing density starting iterate ‘y’ counts. the results surprised me some; using 100 bit width iterates/ sequence lengths over 500 Terras/ density increments and plotting cumulative counts, the 1st postdetermined iterate still has significant
mod3 signal difference from a changing initial iterate density trajectory, esp compare lines
x1, y1 for the
(Ed. but see clarification/ fineprint below, not all sequences are “full length”). this also shows that for basically any sequence, ie either case here, the
mod3=0 case is “quickly” (at the beginning) suppressed by the collatz mapping function,
x0, y0 flatlining.
this is also another basic idea. already one is starting to wonder if the
mod3 values are correlating at all with climbs vs falls. after years of chasing seemingly similar ideas that turned out to be lukewarm/ borderline/ sketchy signals it would seem like just too good of luck for this to be the case, but the (number theory) stars are aligned or gods are smiling at the moment!
this is just 2 different trajectories, one starting ½ density 100 bit width iterate, the other a 0.85 Terras density glide starting at 75 bits, which causes its peak to be around the same 100 bit width, and then color coding by the
mod3 value of each iterate, 1 red and 2 green. clearly, even near strikingly, the
mod3 value is a very strong predictor of climb vs drain even outside the glide (theres a fully green mini/ intraclimb about ⅓ left into the ½ density iterate drain) eg climbs are mostly green with scattered red and the drains are roughly alternating. now years looking for a basic climb vs fall indicator, and… could this hold in general? it appears to be the case… ❓ 😮
😳 (oops) already some wrinkle/ clarification needed on prior experiment, but also along with more signal extracted. working with Terras density glides, this shows instead that the
mod3 signal is almost perfectly aligned with the predetermined range only, ie unlike a lot of other scenarios seen, a basically “instantaneous” transition point, and linearly (anti-)correlates with the parity density/ climb slope.
now think the prior idea from
construct184c that there is some remaining signal residue eg in the 1st postdetermined iterate is (apparently) incorrect or at least somewhat misrepresentative/ misleading; that code is biased in a way in that for short trajectories, the 1st postdetermined iterate concept breaks down, because the entire trajectory is shorter than the iterate bit width, although do maybe have to go back and poke at it some more to recheck/ confirm this. (this is not the 1st time for that type of oversight/ miscalculation, although on further thought maybe its only affecting the left side of the graph.) the prior code chooses ‘0’ for this case of “nonfull trajectories” (line 125)— racing thru this some, & nevertheless wow, even this relatively “simple” stuff has “gotchas”…
instead this code with 200 bit width seeds and 500 Terras density increments looks at
mod3 signal over the postdetermined range for 50 iterates if it is full length, red, and skips plotting it if it doesnt. it also looks at the signal over the last ¼ range of the predetermined range, green. it has more variance due to ¼ of the set size but is the “same” as the full range, blue. emerging/ revealing a striking essentially perfectly linear signal. actually, looking closer, it does appear there is further signal in variance of the 50 point green signal at middle vs tail ends: higher in the middle, smaller on the ends. but also, a corresponding middle vs outer variance stretching for the blue signal. hmmm… ❓
(1/9) 😳 (sigh, lol!) that worry about “full length” sequences is seen as spurious. the prior code was considering “full length” to be the bit width plus the 50 extra count. on other hand, every w bit width iterate will have at least w iterates in its semicompressed collatz sequence, proof left as exercise to reader. so
construct184c was working as intended. however, another basic factoid that is not at all obvious merely looking at code. much like math formulas in that way… as always both a language that sometimes or often requires careful thinking to grasp its nuances/ subtleties/ intricacies…
(1/10) some brief snippet/ flash of “near dialog” in comment section yesterday/ today with the elusive/ returning Anonymous who has some ideas about modular arithmetic, maybe has been seen before around here, maybe not. and maybe can use it to analyze some special case(s). ok, thx for the suggestions! this moment feel Anonymous seems to be semi bluffing about his/ her knowledge of math, having very little to say on direct response/ challenge, and says
Simple enough to see without any computational experiments?
❓ ❗ quite a query there, nothing to take personally, no reason to have chip on the shoulder, except—
😡 👿 😈 wtf?!? @#%&! fighting words! after years of (background/ scattered) troll comments on this blog & elsewhere… almost a taunt there, verging on trolling! oh, wait, Dear Anonymous, have you really been reading this blog at all? lol! oh yeah go ahead and innocently, blithely, (in?)advertently(?) question its raison d’etre! which, plainly stated, wrt this problem, is that maybe some math problems are so hard they cant be solved by the human mind alone, and the answer is to use machines/ machine learning/ data science, etc as tools! because mathematics itself is a tool, is it not? maybe collatz is 1 of those problems! or even a foremost example!
actually, the underlying goal is to expand the “tools of mathematics” in a novel way… this overarching ideology has been questioned/ even attacked over the years by kneejerk trolls, but they have no serious evidence against it, because none really exists, and in fact, there is mounting evidence in favor. the/ any supposed case against it is baseless, biased, actually “anthropocentric thinking/ bias”.
it may sound a bit radical, yet that idea is not actually all that bold or controversial, because the idea has arguably been in the back of the mind of everyone attacking collatz now for ~¾ century, even though it maybe hasnt been expressly written out, although odds are in some of the computational papers, similar sentiments are expressed.
from Lagarias written history, computational experiments came relatively early with this problem, at least maybe the 1950s; it was probably analyzed nearly as soon as universities had computer access… although informally, because there was not a precedent of publishing computational experiments— so running computers on collatz cases has been almost a guilty pleasure, almost an irrational taboo! and that, my opinion, is probably due to some psychological bias/ sociological groupthink (the two are interconnected)…
but, all that does lead me to a basic idea. what is being studied here is the “triplarity sequence.” it seems to have some similarity to the parity sequence discovered/ analyzed by Terras. but it is constrained in a different way. the Terras proof shows that one can construct (semicompressed) collatz prefixes with arbitrary parity sequences, ie there is a remarkable 1-1 mapping derivable.
but as Anonymous is pointing to (re taunting + the stopped clock principle, lol!), and my experiments seem to indicate, the triplarity sequence is not so unconstrained. there seems to be some “semiconstrainedness” going on here, a concept that has been explored in the pre vs postdetermined concepts. apparently some triplarity sequences can lead to (“customized”) glide prefixes, others are not possible. and, analogous to the Terras analysis, there is probably a general/ recursive algorithm to find them. so, what is it?
honestly the Terras construction took me many weeks, even months to wrap my brain around. at heart it is simple, but in practice it is complex. rederived it myself with computer code, but it was extremely painstaking work, some of the most painstaking work carried out in this blog. but it was also some of the most crucial and unavoidable. intellectual/ scientific/ mathematical equivalent of “biting the bullet.”
another idea is that one can look at combinations of collatz mappings and how they map onto/ constrain the triplarity sequence. maybe there is some way to specify some aspect of triplarity sequences that is unconstrained.
(later) 😎 when life hands you lemons, make lemonade! when you are #@%& pissed off because of some semi meaningful/ meaningless random internet comment(er) on your blog, and/ or yet another @#%& moody, irrational female left you for not being a wholly suitably cooperative human doormat… write some cool code! yep, thats the answer!
⭐ srsly, as they say on the chat lines, or, how about a great recent co-obituary of one of the pantheon heroes of this blog, Conway?
(…who, coincidentally, died exactly same age/ year as my DAD…) 😥
… remember that last “hmmm” comment? ok, that variance in the middle of the “triplarity ratio” made me wonder. what does it look like? then on a hunch, came up with this. it is a basic relationship between density and entropy, dont know if its been drawn out explicitly before, it deserves to be long ago when entropy was 1st identified.
🙄 ❗ 😮 this is a modification of
construct186. 1st graph is entropy of a smoothly varying (Terras) density sequence, ‘e’ red. wow, a bit )( eyepopping, its quadratic! and notice the variance— look familiar? fitting the curve and subtracting the parabola ‘eq’ blue leads to signal ‘e1’ magenta. look familiar? scatter plotting it vs prior ‘a21’ corresponding triplarity ratio minus its own linear trend green, gives graph #2. notice anything? the substantial correlation is computed as -0.60. interesting! around here, code, graphs, strong signal, all the antidote to the endless vicissitudes of life… “take that!”™
(later) ok, in further suitable deference/ credit to a rare commenter here, Anonymous‘ idea, again phrased unfriendly, tactlessly, more accurately with semi veiled insults + in nearly the most passive aggressive way possible (but still giving him/ her the civility/ benefit of the doubt that he/ she is not outright mentally disordered or autistic) is that the
mod3 sequence can be determined iteratively from the
mod3 sequence along with the
mod2 sequence, a lot like earlier transducer ideas, and insists its basic, and that it could be a coder interview question, lol, ok, right! and now wonder if Anonymous has any experience in this area, or if its another apparent bluff.
however, again, the graphs already initially revealed (starting with
construct185), and anyway can and was easily guessed from beginning, its not exactly a simple 1-1 mapping between
mod3/ mod2, and that seems immediately to lead to various questions, ie related to constrainedness. clearly, “constrainedness” is a concept that cuts to the heart of the problem.
alas, even with some demonstrated intellectual aptitude/ insight… Anonymous has not suggested any further ideas along those lines. which reminds me of blooms taxonomy, occasionally mentioned in this blog….
- asked Anonymous to point to any collatz refs, no response. is he/ she aware of any, has he/ she read any?
- Anonymous claims/ insists theres a serious misperception about the predetermined region floating around on this blog and in my head… wonder, has Anonymous even heard of Terras?
- asked Anonymous a coding question that is probably as basic to a computer scientist as modular arithmetic is for a mathematician… lets see if he/ she is up to the challenge lol…
- or maybe the whole underlying point of Anonymous is non reciprocity/ rivalry/ evasion/ denigration/ devaluation…
- assumed from 1st example Anonymous wrote some code to find it, but now am wondering…
so now am starting to come up with conjectures about Anonymous, lol ❓
ok, so as usual, exercising my own research brain to fill the gap (aka “nearly talking to myself” again)…
- in short, as mentioned unlike the
mod2(predetermined) sequence, not all patterns are possible in the
mod3sequence, so what is (fully) possible then? what would be a way of characterizing it?
- clearly a FSM transducer is sufficient, and a worthwhile exercise, but is there another way to look at it?
- the variance noticed/ mentioned in the triplarity ratio is actually directly tied to this. can it be characterized? can something interesting/ relevant be said/ found here?
- eg related, the above graph already shows a quantified correlation, now how about this: what are upper and lower bounds on that correlation?
(whew! maybe sometimes not so healthy to engage with all commenters even if they have demonstrated intelligence… alas sometimes intelligence and emotional intelligence are orthogonal aspects…sometimes unpleasantly so, or worse…) 😳 😦 👿
(1/13) looking at prior glide structures, there is something that really pops out, and this is maybe understandable in hindsight. it looks like in the glide region the glides are “roughly sideways” and this contributes to the ½ density and entropy statistics. this correlation is known/ seen from numerous prior experiments. so heres an idea: just generate “sideways glides” somehow of arbitrary length. what would be a basic way to do that? actually, this already relates almost exactly to one of the earliest Terras experiments that isolated the 0.64 transition point.
the answer is relatively straightfwd. one can construct 0.64 terras density glides that “terminate” in their “middle” (of predetermined range) in arbitrary positions with ½ density parity “tails” (drains). how would that be constructed though? one problem with 0.64 glides is that they might tend to terminate “early” in the glides due to the early noise “up/ down” noise pushing them below the “starting point.” it would be possible to generate many of them and then build a linearly varying sample, by glide length that is. have a big sense of deja vu on saying that, feel like have generated stuff along these lines, but cant immediately remember where.
💡 however, heres another clever/ effective idea. its possible to get a very close estimate of the actual glide values by the logarithmic increment method which has been employed a few times previously, and is intrinsic to the original Terras ideas. so here a “synthetic” glide is constructed of 0/1 alternations and tracked with the logarithmic estimate, and if on the “starting edge” forced to stay above/ “incremented over” the starting glide value, and otherwise continuing with random 0.64 parity density, all for some random count over the predetermined range, the end result of the 2 cases leading to inexact/ approximate ~0.64 density, followed by a “determined” trailing ½ parity density glide in the remaining predetermined region. lol, got all that? maybe not as complicated as it all sounds…
in short, its like a semiconstrained random walk with shaping aspects/ boundary-like conditions. then when/ after this 0/1 sequence is known, an “actual” Terras glide can be constructed from it which will tend to match it closely in up/ down/ other statistics mainly ‘cg’ sought here. plotting these glides right aligned again with
review174 leads to graph #1, note the similarity with prior glides, and these are much more easily/ quickly generated than with complicated hybrid logic runs.
overall these would also be presumed to be some of the “hardest” inputs possible to the signal detector.
- there are some minor tweaks on the review code here to deal with outlier cases and add an averaging parameter etc. full disclosure there is also a subtle “defect” fixed at line 178. the prior code was not normalizing the bit positions by iterate size and this could lead to some bias/ overfitting/ lack of generalization, and unfortunately the prior algorithms seemed to be finding significant signal in this bias.
- further thought on that. actually re “subtle” even that is not the whole story, heres spelling it out even more, because again it is found to relate to squeezing out maximal signal possible, the overarching exercise/ goal/ target/ difficulty/ challenge etc.
review181introduced averaging over sequential iterates. that leads to a question, should metrics be calculated using bit widths from initial iterates in the averages, or the bit widths over the averages?
- it appears to make a significant difference here where the algorithm is finding more signal by using the initial instead of changing widths for some of the features. ie there can be seen in general as two different multi-iterate calculation strategies. in other words there are different ways to expand “features” from individual iterates to multi-iterates, actually calculated as different multi-iterate signals. ie in a sense “yet another (binary) feature parameter.”
- so again this leads to an/ the idea of calculating (very!) many different kinds of features, many of them closely related to each other in calculation methods, and letting the signal analyzer just choose which ones are most relevant. sometimes subtle calculation differences lead to big signal differences and its possible to expect some of this in general but not possible to humanly anticipate these (which ones) in particular. “who could guess?”
- but actually heres yet another related issue. the features may actually be (in)directly simply measuring bit widths and local slopes in glides, which, sometimes depending on trends over all the glides in a batch, can be used to estimate the position in the glide, if eg the glide(s) in the batch have a “typical” 2nd derivative with some kind of concavity. which leads to the question about generalization and thwarting these kinds of statistics, which are known to be totally thwartable via artificial Terras constructions that basically can create any sequence of slopes in the predetermined range.
⭐ ❗ these are indeed measurably harder quantified with lower correlations, but nevertheless the sophisticated signal detector still does very well. for 50 count averages over the features,
hc, cm, cg 0.68, 0.54, 0.73, the higher averaging improving the 1st prediction more than the other 2, ie which achieve similar values with fewer average count. again the top correlation variables chosen tend to be similar, increasing confidence/ confirmation of real signal— keeping in mind, despite serious effort, never was able to reliably/ consistently weight individual variable signals with the complicated nearest neighbor algorithms.
overall these results seem to suggest the signal analyzer side “wins”… ie (“subject to further testing/ analysis”) it does not appear to be possible to “squeeze out” all “termination signal” out of glide iterates.
in short the attack on the problem is reduced to a sort of “adversarial” battle between the generator attempting to remove all signal and the signal extractor attempting to extract the most possible. if the generator wins, the problem is “more likely” unprovable. if the analyzer wins, the problem is “more likely” provable. this does not contradict the halting/ undecidability argument ruling out/ rejecting fully automated theorem proving because all the features involved are not “deterministic,” ie are human constructed, and the extracted statistical trend may not be convertable to deterministic/ airtight logic, also outliers could exist, missed special cases, etc.
in short its a interplay/ process that may or may not succeed. another way of looking at it, the adversarial attack is actually a subjective concept in a way. or, it lives in a world between subjective and objective, almost chimeric-like, and as it is better coded/ discriminates/ differentiates nonsignal (noise) vs signal, it moves from subjective to objective. how about that for a mix of yin and yang! ☯ 😮 😎 😀 ❤ ❗ ⭐
😳 after writing it all up, subtle logic glitch just spotted at line 149 in the generator! suspect it wont change results too much but ofc am gonna retest anyway.
(later) answer: 1.0 – 0.64 = 0.36; the code due to a misplaced/ ie flipped inequality sign was causing the algorithm to try to “target” 0.36 density away from the edge and hence choosing glides very close to the edge as seen in the diagram. fixing it immediately leads to trajectory graphs closer to/ more in line with
hybrid53c diagrams above where glides have more “above edge margin/ range.” rerunning the analyzer finds 0.70, 0.69, 0.81 correlations ie significant improvement in both
cm, cg! maybe a general way to think about this wrt already analyzed undifferentiated region/ Terras density glides is that glides very close to sideways have less signal than glides that deviate a little more from sideways.
(1/14) did you catch it? theres seemingly been a breakthru, a major paradigm shift here. even recently the undifferentiated region was considered/ referred to as the “graveyard for feature detection” and at this point its “easy” to see why— in short it almost invariably leads to very hard bordering on (previously) intractable analysis cases. but these new feature analysis methods are very powerful, and apparently cant be thwarted even with the most sophisticated generation algorithms possible. there have been many scattered hints of thin extractable signal in the undifferentiated region but this new framework seems to now unify/ consolidate the situation/ leverage.
the basic challenge of the problem as long/ early noted is “connecting the local to the global” and that is formalized technically with the induction function; a/ the crucial element is that it can be built with the aid of powerful ML. that has largely now been accomplished here generally, consistently, reliably and robustly. the highly local features have to be extended/ amplified some with sequential averages, but they find substantial signal predicting all the global trajectory statistics. what remains is to “connect dots + fill in gaps” but it actually seems probable in this moment the foremost hurdle has now been overcome.
however, there remains some fineprint/ caveat here.
- consistently some of the same variables are extracted, but not over different generation samples. typically
hc, cm, cguse similar variables for different samples, but across samples, different sets of variables are selected.
- theres 113 variables total and thats a bit unwieldy, even though there is not a lot of code involved. it works for now, but the algorithm is throwing out all but a few dozen at best. a nearest neighbors algorithm could conceivably work across all the features over different regions but it seems likely to be unwieldy.
- it is conceivable some “undifferentiated region” outside of this one remains, that it shifts to “some new place” in the analogy of trying to cover a floor with a rug that is not big enough. however at this point it seems unlikely.
so then what? it would be nice to plug these features into some kind of framework and have it now work/ “crunch away” on the general case. that is a logical next step, and many of the elements of that logic have already shown up/ been constructed, eg the hybrid algorithm exploring search space via complicated search criteria. it appears the features are comprehensive and work (detect exploitable signal) across all regions discovered. that remains to be tested.
the intuition/ expectation here is that, to spell it out, finding signal in the undifferentiated region is the hardest part of the problem to overcome, and outside it there is substantial predictive signal. that general signal mainly relating to density was isolated/ harnessed years ago. so in a sense the whole problem “reduces” to finding signal in the undifferentiated/ hard region.
however, there is no such framework for examining the general case immediately available yet. its almost like what is needed is a “on-the-fly” variable selection component combined with regression on the selected variables. have never seen such an algorithm; the above code comes close, but it needs to work with data sets instead of individual cases to find the relevant variables.
did mention decision trees recently which are close in that they typically separate “meaningful from not meaningful” variables. suspect may have to do more work on those or something similar. the nearest neighbor algorithms just cant handle large # of variables well and wasnt able to find consistent ways of narrowing down to the relevant/ key ones.
but ofc, thinking from another angle, knowing what is now seen, the everpresent hacker/ extreme programming question is whats the simplest approach that could work? maybe while it would apparently improve fit substantially it is also not strictly necessary for the algorithm to detect different regions and find their associated “control(ling)” variables… hmmm… that is not so different from what is already mostly coded, but scattered in different places… just need to “tie it all together” as already stated… as usual much easier said than done lol… ❓
💡 ❗ what would this look like? in short it seems the next step is, as already has been roughly outlined previously, to combine/ consolidate/ unify the generator and analyzer logic into a single coherent system/ feedback loop! aka adversarial algorithms! except now the code is largely already laid out, across 2 separate main algorithms: hybrid logic + the feature analysis/ regression…
(1/16) did some timeconsuming/ painstaking/ amazing work on the combined algorithm, getting results already, but not sure if they are correct, maybe needs a lot more analysis.
Anonymous has not returned for 5 days, or at least is silent. something s/he said bothers me (as usual!): “I responded to this not so smart assumption (to put it mildly) without paying attention to the “pictures”:”
❗ lol, oh, not paying attention to the pictures!?! touche! again another casual, offhand, flippant cut/ stab at the heart of this enterprise! the pictures are painstakingly constructed graphs/ diagrams that reveal multitudes, universes about the heart(s) of the problem, and anyone not tracking them is utterly missing 99.x% of the point/ enterprise/ research program etc! yes, ok admittedly it doesnt seem to have a single “heart” right now, or a “spine” as has been mentioned before. as mentioned it seems to be like endless caverns, but maybe the caverns are starting to be mapped out. it seems Anonymous is starting to take the role of simplicio— ie this reminds me of some great/ famous dialog created by Galileo and has been used by later writers eg Jauch wrt QM.
except… simplicio was simplehearted, nearly a fool; even fools are sometimes endearing, much unlike a troll… but ok, Anonymous seems on the face of it to be tempering his language talking about “this not so smart assumption (to put it mildly)”… but on the other hands, it would seem some trolls routinely use the trappings of human courtesy as verbal weaponry… again Anonymous only continues + adds to the mystery, perhaps somewhat apropos of/ in line with the original problem…
what kind of person is Anonymous, really? trying to find the underlying humanity here… MLK day is on monday, day after tomorrow! very strangely, BigCorp didnt name it by name on their calendar! hmmm! anyway despite/ momentarily overlooking the antagonism there is some whole consciousness/ universe there even as there is with all humans… one might say in a moment of reverie/ soliloquy/ weakness (after not being (in)directly insulted again for 5 days…), much evidence to the contrary, in spite of even their own best efforts… even trolls are human… lol!
⭐ ⭐ ⭐
“closure” is one of the hardest, most elusive things with research on “world class stuff,” compounded at times with the inherently highly transitory nature of cyberspace and its denizens. but, as mentioned, on other hand, not a fan of keeping secrets or not filling in the blanks either which might be said to inherently increase “unclosure.” “I have a marvellous proof of this, but the margin is too small to contain it,” lol!
so then not waiting any longer for his/ her engagement/ return, here is an answer to the question posted to Anonymous in comments, it was already hinted, alas Anonymous despite referencing coding interview questions seems to have no actual taste for coding puzzles.
construct184c (and others in the series) was directly “exhibiting” or even “testing” his math assertions empirically/ visually. line 125 calculates 0 for the
mod3 value of “nonfull” trajectories. but after some further question/ look above on this issue already, all the semicompressed trajectories were found to be “full” by simple reasoning. so why is it coming up nonzero in the ‘y0’ magenta line of the graph?
answer: its a fencepost error, the code was off by 1, if it used array index
w - 1 instead of
w, it would find the last iterate of the semicompressed trajectory instead of
nil. maybe Anonymous doesnt care about the graphs, but at least 1 other person/ human/ consciousness around here does! 😀 😎
(1/17) 💡 ❗ have been chatting some, meeting old+new cohorts, and turning up some cool links in the process. watched deepmind hassabis interview/ podcast with hannah fry, really great stuff! wow looking at her bio she did phd work in fluid dynamics! hey how about this for a big deal! Heule/ carnegie mellon is working on a SAT solver approach to collatz. inspiring! alas the kind of stuff that might help someone like Anonymous understand the theory/ approach better, but presumably wouldnt even read the article due to so-called “confirmation bias.” too bad, missing out! but no “FOMO”! … is anyone else out there listening? ❓
ok, this is the merged/ refactored code. results are not as expected but right now the logic seems to be correct. whenever making these grand new systems, sometimes hard-to-detect/ isolate bugs tend to show up early on. have done some basic checking. it was very helpful to output weights instead of (sequentially generated/ latest) features, the latter an old graph that has been hanging around for ages but is very noisy to interpret and barely ever noticed/ used much. the weight graph (below) also helps to understand when the optimization is going “sideways” instead of in some “particular direction.” this is not easy to quantify esp over all the variables and again relates to trying to write effective/ comprehensive stopping criteria. the stopping criteria is commented off in this code.
it took many hours to refactor this. the prior feature calculation code is a bit choppy/ pasted together with different arrays, an awkward structure that worked its way into the fit algorithm. my idea also was that the code didnt need to calculate all the features, only those that are being tracked by the regression algorithm, so theres some lazy evaluation, and that took a lot of careful coding/ logic/ adjustment. it also turned out to be sensitive/ tricky to make sure all the variables fit together without conflicts in the “variable namespace” however it seems to be meshing ok.
the short story is that every 1k new iterates added, the regression is run and the best features are found. then after the algorithm has them, it can calculate fit error on adding new iterates, and attempt to maximize the fit error, ie decrease the analyzer effectiveness. however, unexpectedly/ consistently the analyzer wins out and achieves in the very high ~0.93 correlation coefficient range. the top variables chosen tends to stabilize after some cycles. looking at weights the optimization tends to stabilize around 25k iterations at which point it doesnt expand the ‘cg’ range much further (
cg_a, cg_s, cyan, yellow), and this was run a little less than 65k iterations/ manually terminated.
looking at the iterate bit structures, even though the generator is pushed toward it by the code, it doesnt end up in the core/ undifferentiated region, as not managing to escape mostly high density iterates in the form of long lsb 1-runs on most of them. this can apparently be interpreted as/ attributed to the trajectory-lengthening optimizations/ variables winning out over undifferentiation and the greedy algorithm not finding the longer trajectories in the undifferentiated region; the optimization criteria is as last time which is extrema + linearizing on
hc, cg, cm, plus maximizing ‘cg_e’ the fit error variable.
the huge magenta spike is in ‘cg_e_s’, the standard deviation on the error for 1-2 particular cycles probably relating to a decreased fit from the algorithm choosing different variables, need to look closer at these. the corresponding blue line is ‘cg_e_a’ the error average. the overall cyclical/ periodic spikiness is due to the 1k regression fitting cycle.
“2020 a posteriori hindsight, sigh” visually would like to see some axes/ ie variable scales adjusted here in the plot, ie the flatlining ones. also there is some hair-trigger sensitivity in the fit algorithm such that have seen it occasionally/ randomly/ intermittently fail due to variable selection linear dependency. but the deeper issue here is maybe about gnuplot not always working like a workhorse so to speak. gnuplot while very powerful doesnt have a great way to deal with many variables at different scales and pushes that formatting/ design challenge back onto the user, and at times it shows signs of running into limitations wrt these investigations. it does give hints/ ideas about a more powerful system, am guessing some graphing packages can do better in these areas and maybe will have to look into them sometime!
❓ in short ‘cg_e’ is following a sawtooth pattern tracked by average and standard deviation, the algorithm increases it but then it falls/ snaps back, huh! what is that about? it appears to be showing that the generator can drive up error focusing on particular fit variables, but the analyzer always can find better fit with alternative variables as the generator gets “carried away” or “fixated” on particular/ specific/ latest variables/ features.
so then while not leading to fundamentally new insight or dynamics… yet… ie possibly needing more finetuning/ adjustment somewhere, this finally completes an old design idea/ goal that was formulated years ago, about merging generation and analysis in a unified adversarial system, which has maybe shown up in some few cases over the years, so rarely mostly because of formidable complexity and all the years-long wrestle/ struggle with undifferentiation, a balance now maybe fundamentally shifting. its renamed in the sense of “the whole shebang.” a work of art, am proud of it! 1322 lines! 😎 😀 ⭐
so then anyway what was more expected? my idea is the optimization dynamics would result in driving the population/ analysis region into the undifferentiated region and adopt new fit variables along the way, and the extracted signal would go down, but something like the opposite happened; while some fraction of population iterates could be considered undifferentiated at the low ‘cg’ end, it mostly (collectively) didnt trend toward undifferentiated and correspondingly the signal went up! so what are the options now? not sure exactly… anyone else have any ideas? ❓
- can try adjusting optimization variables eg putting more weight on the fit error
- another idea: give some advantage to the generator instead, like maybe decreasing the frequency of analyzing/ refitting/ switching variables
- try to better understand dynamic of how/ why the fit error “snaps down”
- add back in the entropy/ density constraint/ restriction to the undifferentiated region
- but ofc the real question is how to leverage all this into a proof, and not get distracted by irrelevancies to that— if only they could be reliably identified!
(1/18) 💡 as long laid out now this research is all about dynamics, systems, emergence, “behavior.” its a deep concept that algorithms can have all of these, consider that “behavior” was a term that applied to biological entities eg animals/ humans before the “machine age.”
whats a way to conceptualize this generator vs analyzer dynamic? speaking of adversarial algorithms, there are some aspects of games that relate/ come to mind.
- it reminds me of a card game somehow where the variables are like cards, which the analyzer chooses and the generator has to “play against.” but the heres another similar analogy that occurred to me. it appears this dynamic is a bit like a (rigged!) 3-card monty game, its also done with “shells.” the generator attempts to win by picking a facedown card controlled by the analyzer. this analogy is not strictly literal in all senses because the generator does have in some sense access to the variable info from the analyzer, but only indirectly via the fit statistic/ correlation error.
- but in a rigged game the “right” card is not actually dependent on card shuffling prior to the choice/ observations of the player but in fact dynamically selected by the dealer after the choice by the player/ “mark,” aka loser/ victim, ie a sort of magic trick, facilitating/ exploited for cheating! the way this plays out here (roughly) is that the analyzer can pick different variables that overcome the choices of the generator, in this case the “mark.”
- another similar example comes from sports and the corresponding metaphor in human psychology, relating to cognitive bias, its called “moving the goalposts.” here the generator tries to make a “goal” via current variables but the analyzer is in control of the goalposts and can move them by changing variables as the generator gets “closer.” aka the football being pulled away from charlie brown (generator) by lucy (analyzer)!
huh, all that is starting to remind me again of Anonymous lol…
(1/19) ❗ 😮 this seems to be some kind of extraordinary situation/ outcome/ finding. was generally doing some basically incremental changes on the algorithm ie improving graph formatting, investigating/ dealing with/ adjusting apparently (visually only) flatlining variables/ graph scales, adding a new graph for the fit variables, etc.; then tried the idea of decreasing the analyzer action/ capability. in this code the analyzer can pick fit variables and best fit (regression weights) over the 1st 1k cycle and then stops picking new variables or refitting, but it does continue to recalculate the correlation, in a sense “freezing” the model at an intermediate stage giving some advantage to the generator after that point. the (glide/ trajectory size) optimization was decreased/ restricted to only ‘cg_e’ model error and maximizing iterate binary differences ‘pd’.
this is a bit or highly strange (not sure which yet), kind of eyepopping, but the correlation goes up to a very high ~0.90 as the generator continues to “attack” the frozen/ limited model— “successfully,” ie while increasing error! correlation ‘r’ black in the graph along with the static or “frozen” fit weight variables. even more extraordinary there is some kind of an exact 4-cycle sawtooth in the increasing error average ‘cg_e_a’ blue in 2nd graph. it is not conceivable to me right now how this 4-cycle could be occurring with almost totally random mutations/ crossover operations by the hybrid/ genetic code.
possibly/ apparently some kind of hidden, intrinsic, emergent, exploitable order extracted/ “derived” by the setup. it seems to have something to do with the exact fit variables chosen. so now it may become crucial to capture these different sets. presumably or conceivably this behavior might be repeated with the same variable set, but it would also be surprising to me with all the other randomness in the hybrid algorithm.
graph #3 has more of the story, its just the ‘cg’ related variables displayed, ‘cg’ blue. the algorithm tends to push the predictions yellow toward low/ high only and increase the error on the high ones, black. not included/ pictured, the lsb 1-runs are significantly less prominent than last time and start at about the same rightward spot of increasing ‘cg’ ie most of the optimal population iterates are more undifferentiated. while the generator + model “dont fall down,” the long flat trend on ‘cg’ seems problematic, its interrelated with not finding longer glides in the undifferentiated region.
(1/22) 😮 ❗ 😳 holy @#%& what a massive screwup! after some careful, painstaking, almost painful/ excruciating investigation/ isolation, turns out those cycles are (not so surprisingly!) due to a bug and it took quite a few hours to isolate! can you spot it? lol! this was isolated by discovering that the analysis routine was basically causing some kind of “side effect” on the data that shouldnt have occurred (ie, in other words, corrupting it!) after the analysis was supposed to run in “read only” mode after the 10k read/write interval, and the side effect was occurring in the 1k / 250 → 4-cycles encoded into the analysis interval; narrowing it down took some near-epic detective work! so, it turns out, so far from exploitable, its really quite the opposite…
yes some caveat about possible defects in the (lots of new) code was already given and it was well deserved/ earned. the subtle needle-in-haystack defect is in some new code that ran without any problem but was computing garbage results. GIGO! in the
vars “lazy” recalculation subroutine, both line 1085 in
shebang and line 1091 in
shebangb, the variable name is passed into the
diffs1 analysis routine instead of the iterate bitstring! lol! the analyzer code then manages to run on this string without error and looks like it will compute random bogus both zero and nonzero results! oh geez, am so cringing right now.
even with this embarrassing, nearly ridiculous, seemingly showstopping numerical bug/ defect/ failure that somewhat randomly/ indiscriminately destroys a lot of feature calculations and seems to disrupt/ destroy most of the non-cycle variable calculations, however, amazingly, some of the basic findings/ trends are still present/ intact after fixing it, such that the system amazingly embodies some “resilience!” some of this is explained in that some variables were not impacted; ie it “only” impacts the lazy-evaluated parameterized variables (80/130 now!), and correct values are recalculated every cycle, and maybe, it appears, between cycle boundaries the analyzer was able to still derive significant/ enough signal from the non-broken variables! how about that for “really powerful,” lol! it can be understood something like a caching bug with time-(cycle-)based deterioration/ “decay” in the data integrity.
the new results are more coherent, there are no perfect-count cycles in sight. this code implements 5 multiruns and varies the analyzer switchoff point ‘tc’ in increments of 1000. running in that 1st case
tc=1000, the analyzer wins out, reaching an extraordinary ~0.90 correlation after 50k iterations even as the attack/ generator error is increased; the analyzer runs only once at 750 iterates added. in this case the generator has an initial advantage and the correlation is initially pushed down to 0.62 starting from ~0.80, but then the analyzer wins out again as correlation steadily increases even with the frozen model. other runs are mostly similar and final correlations ~0.90. for
tc=3000 the correlation was as low as 0.58 halfway thru and 0.76 after 50k iterations. in some runs, the average error tends to plateau.
what is all this suggesting? these dynamics suggest to me there may be a situation where the fit variables are “mutually exclusive” in a sense from the generator pov. it seems the generator can increase error via “pushing” on some features but then there is better fit wrt the “other” features. in other words, apparently the features selected are indeed linearly correlated as a property of the collatz mapping, ie the regression extracts a fundamental “property”/ function of trajectory dynamics as intended, and the generator cannot currently effectively find exceptional groups of iterates that “break it.”
thinking more, but, there is some trickness here, because it is not easy to define the “error wrt a particular feature.” or at least, havent done it yet. is there some natural way? currently error is a combined measurement of all feature weights. another pov is that the features are complementary, in other words maybe if one feature is “somewhat missing,” it is made up by other feature(s) that have “stronger signal.” these are just rough informal ideas inspired by watching the algorithm dynamics that need to be further quantified somehow. ie somewhat like emergent properties.
re the longstanding local-global dichotomy, even as the generator manages to increase local error on each iterate, the global fit error decreases (increasing correlation). not sure how to quantify/ formulate this further at the moment but suspect/ think theres some clever way to understand/ illustrate it. it seems to relate to the error distribution; ie something like that it can be flattened with longer extremes/ “flatter tails” but its average doesnt shift much.
there are now many graphs generated, so far its been a race to build/ understand rather than streamline, and they need to be reorganized some, eg below tracking the stable regression weights is not too informative, but it does show eg their relative scale.
whats the big picture here? recently it was conjectured that multiple linear models could work over different feature regions, but that a single model would tend to break down over all possible cases. these experiments seem to be showing that the (select!) new features and combinations of them are general, generalizing, generalizable, ie robust enough not to require the use of different feature regions. that is a big deal… right now it is hard to imagine an attack system more powerful than the now highly tuned genetic generator and so far the analyzer consistently overcomes it.