hi all. some extended wallowing in self appraisal/ reflection to begin with in this installment. last months installment of collatz had a big highlight, at least wrt this blogs history. as has been stated in various places, & trying not to state the obvious here, part of the idea of this blog was to try to build up an audience… aka communication which (to nearly state the almost-canard) is well known to be a “two way street!” there are all kinds of audiences on the spectrum of passive to active, and in cyberspace those in the former camp are also long known semi-(un?)-affectionately as lurkers.
must admit do have some “blog envy” of some other bloggers and how active their audiences are wrt commenting. one that comes to mind is scott aaronson. wow! thought something like a fraction of that level would be achievable for this blog but now in its 5th year, and candidly/ honestly, it just aint really happening. have lots of very good rationalizations/ justifications/ excuses for that too. ofc it would help to have some breakthrough to post on the blog and drive traffic here through a viral media frenzy… as the beautiful women sometimes say, dream on… ah, so that just aint really happening either. 😐
however, there was a highlight from last month, for this blog something like a breakthrough, but also, as you might realize the subtext on reading further, with some major leeway on where the bar is set (cyber lambada anyone?). got an anonymous, openminded, even almost/ verging )( on encouraging comment from someone who wrote perceptively and clearly had a pretty good rough idea of what was going on in that significantly complicated collatz analysis blog post as if reading substantial part of it and comprehending it, and getting to some of the crux/ gist of ideas/ approach here. nice! 😎
(alas, full “open kimono”/ self-esteem challenging disclosure… admittedly that is a very rare event on this blog, and despite immediate encouragement and my marginal/ long gradually increasing desperation now verging on
anonymous has so far not returned. this overall predicament is something of a nagging
failure gap/ regret/ ongoing challenge wrt the original idealism/ enthusiasm/ conception of this blog. which reminds me, also, long ago there was an incisive/ discouraging/ naysaying/ cutting/ near-hostile/ unforgettable comment, and may get to “highlighting” that one too eventually as part of the overall yin/ yang balance etc after changing circumstances and/ or building up enough courage wrt my cyber-ego, long keeping in mind that other quirky aphorism, success is the best revenge…) 😈
anyway here is the comment again, suitably highlighted/ framed/ treasured forever at the top of this blog:
What is a “glide” and how is it related to the trajectory length? Have you defined it somewhere earlier? What are your input variables for the model? What’s the reason to believe that even if you have a good predictor for your “glide” it helps to prove the conjecture?
who was that masked (wo)man? now riding off/ disappeared into the sunset? can you not see some flicker of a unintentionally deep socratic/ zen question here? this commenter/ probable-mere-passerby has somewhat accidentally summarized/ cut to some of the core conjecture being explored here over several years…! ❗ 💡 😮
so anyway, jolting me out of the typically nearly solipsistic reverie/ stream-of-consciousness writing of this blog (momentarily!), this proves there is intelligent life/ consciousness out there, and in this case it only took a few years to make some brief, glancing, flittering, incidental contact with it. thx so much,
anonymous & cyberspace! am feeling glimmers/ stirrings of enormous gratitude for this brief )( feedback, like “not all is wasted”. on other hand, looking back at those rosy early days/ expectations at this blogs inception, nevertheless in a semi-crushed mood at the moment, am reminded of that old saying by (19th century!) general Von Moltke sometimes quoted in eg warfare or chess strategy, no plan survives contact with the enemy. another one from sometimes-sun-tzu-or-zen-like rumsfield, you go to war with the one you have, not with the one you want, ofc with shades of the lyrics to that old rolling stones song! 😮
⭐ ⭐ ⭐
ok, enough with the
melodrama emotions (or those relating to publicity/ community engagement efforts), on to the latest installment. have been banging away on a lot of ideas related to general machine learning approaches on the collatz data from a radial basis function (RBF) angle. am very interested in recursive approaches. recently tried predicting/ fitting residuals of the RBF and didnt really get any results out of it, basically the residuals are entirely noise as far as the RBF is concerned so to speak.
another idea tried is that earlier code was computing a “hidden feature vector” but which was based on “cheating” somewhat by looking at the hidden/ blind data. then tried the idea for the RBF itself to compute the mapping of input data to the computed hidden feature. thought this was a very clever/ promising idea, but “reality intruded” (doncha just luv that expression, maybe a general theme for life etc!) & this also came up emptyhanded because apparently as far as the RBF is concerned, the hidden feature is noise/ unpredictable from the input data… notable/ interesting findings, but still “null results” too much of a hassle to write up in any more detail even though the code is quite involved/ delicate. (but did this morning just think up one more trick up my sleeve to try!)…
so for this RBF its a little disappointing/ frustrating to seem to have essentially no tunable parameters even after some massive effort in this direction, no shapely/ viable/ classic train/ validation/ test curves to stare at whatsoever, just a scattered/ desolate junkyard of laboriously-discarded full-of-many-moving-parts ideas/ “null results” (that word combination nearly an oxymoron!). 😦
nevertheless its been good exercise so far. and another way of looking at this is that maybe “straight” RBF is an inherently very effective machine learning approach that doesnt require much “training”. (another way of looking at training in machine learning is that its to try to summarize/ compress large amounts data, whereas there is essentially no compression in RBF, at least this version which includes every point in the dataset in the model in some sense, and is likely an intrinsic part of the powerful qualities of RBFs.)
so! without further ado, finally to the point! in this post, am starting here by getting/ scaling back to basics and just posting the basic RBF code which is not very complex but has been hammered at for days & has some new/ redeeming features. (and noticing just now after reviewing last post that never really did post that basic core code/ result, because was somewhat prematurely getting carried away/ jumping the gun with all the “bells and whistles” which mostly turned out to be duds…)
1st, here the data generation is decoupled from the curve fit logic in this code. then followed by the basic/ streamlined curve fit code. then theres a graph of the fit of ‘h2’. its noticeably better than linear fitting from prior installments but it has the same general shape, such that lower ‘h2’ values fit “not bad” but higher ones tend to generally have predictions of nearly average values. 2nd graph is the ‘ls’ fit. 3rd plot is the error in ‘ls’ fit.
⭐ ⭐ ⭐
(2/11) 💡 ❗ highlights! some )( evidence for non-total personal isolation/ reclusiveness… social media aka cyberfriends… heather with her upcoming physics guest speaker session & DS mention collatz in physics meta. freudian slip? 🙂 😛
Careful with the Collatz conjecture. I can drive you mad.
Continuing to make progress with my two pieces of code (factoring/collatz). This has got to be one of the funnest projects I have ever worked on. 🙂 *
also here in a fun chat on collatz & other misc topics, heather promised me to run every ruby program on collatz on this site… that could take awhile, am not gonna hold her to that one! but, “dream on!” 😛
⭐ ⭐ ⭐
(2/14) 😳 😮 😦 😡 👿 crushing! setback! weeks of chasing ghosts/ phantoms! back to the drawing board! @#$& was always a bit suspicious of the data distribution. had a closer look. there is an initial output of a lot of points that have a ls=2 but have various ‘h2’ values. this is a quirk of the generating process. a tiny change to exclude the trajectories with ls<=3 leads to the following code, and refitting with
fit20b.rb gives total noise prediction of ‘h2’ centered around the average. in other words there seems to be no predictive value to input variables in current form and prior predictability was due to the skewed/ biased distribution. 2020 hindsight, in future will look more for bias in the distribution before jumping the gun to playing with fitting…! (is anything salvageable?) 😥
(2/15) 💡 ⭐ 😄 ❗ ❤ 🙄 (typical research as bipolar?) again retrenching and (maybe?) snatching some victory from the jaws of defeat. going back to some of the earlier findings. this is some new more sophisticated data generation code and the basic linear fitting code separated. the generation code consolidates points over 3 separate runs, excludes short glides ‘ls'<=20 and samples half the ‘ls’ range over each ‘h2’ “slice”. including the whole ‘ls’ range in contrast (sometimes) seems to lead to nearly random fit. from the graph there is some impressive signal here but is it only due to bias in generation algorithm?
the question of bias in the generation is turning out to be rather subtle and apparently takes quite a bit of care to try to generate “nonbiased” samples. (and maybe that is the real story of last many weeks.) somewhat counterintuitively, maybe seeds that are part of “unbiased” distributions are “hard” to find. (definitely have to think more about this!) the separation of code allows the sample distribution to be examined more directly via the
data.txt file. one idea that is coming to mind is some older code that looked at a 2-way variable distribution to search for seeds, which could be generalized to some kind of multiway analysis/ frontier search.
addendum: maybe got an anomalous random run previously. changing to the full ‘ls’ range for each ‘h2’ slice sometimes leads to nearly the same results/ linear signal below. except the bottom edge tends to flatten out in that case.
(2/18) built some very sophisticated code that took hours to debug. it looked at the histograms on the frontier and selected set over 3 different axes, the ‘h2’, ‘ls’, ‘ns’ dimensions. it has a neat voting/ weighing algorithm that tries to add points based on different vote increments wrt gaps/ outliers in the selected histogram, accumulated over all 3 axes. however, wasnt understanding its behavior, it seemed to be largely selecting only small ‘h2’ values and larger ‘h2’ values were quite rare, and while sophisticated & running as intended, not ready to post it yet wondering if it is still not performing ideally/ as desired.
had to do some more thinking/ analysis. then was led to this basic analysis that maybe have done something like before (using somewhat different logic) & then didnt remember. this simple code tries to maximize ‘h2’ using more recent patterns. noticed h2 up to ~150 is findable. but on other hand large ‘h2’ values seem to be confined to lower valued seeds and maybe dont even exist for higher seeds! it appears the ceiling may asymptotically decline for higher seeds to around ~25-30. this graph is ‘h2’ scale on left and ‘ns’, ‘ls’ scale on right.
an idea from this data is that maybe fitting larger ‘h2’ values does not make sense if they dont exist for seed sizes approaching infinity! it also shows how a dynamic histogram approach could get messed up if its range is skewed by high ‘h2’ values associated with lower seeds only, later seeds will seem to fall mainly in “lower bins”. and its an uncommon case of a statistic with a ceiling that maybe declines for larger seeds.
1st graph is maximizing by ‘ls’ (1st arg) which seems perform best. 2nd graph is maximizing by ‘h2’ which tends to cause ‘ls’, ‘ns’ to run sideways.
this is a in interesting twist without too much new code & maybe comes close to what is desired. it seems 2 phases are needed, 1 to generate points and another to sample them in a more uniform way, because almost no matter what the more “raw” generation logic chosen, “easy” (short) trajectories are quite common and the “hard” (long) trajectories are rare. this code actually contains some of this tradeoff in the code.
the idea is that there are 2 strategies. 1 randomly samples points out of the frontier and is good at giving a balanced distribution & not getting trapped in local minima, but does not maximize the variables much except by accident. the other strategy looks at relative values of ‘ls’, ‘ns’, ‘h2’ and tries to greedily maximize the combination by choosing top frontier points. this 2nd strategy has the effect of putting more force into the maximization of the combination of variables but then its biased away from “typical” samples with more “medium” values. here are 3 runs, using strategy 1 (arg0=1), strategy 2 (arg0=0), and a alternation between them that gives the general desired result of maximizing variables while at the same time sampling broadly over the whole distribution (arg0 null). the last step not implemented yet is to resample the points in a somewhat balanced way across the ‘h2’, ‘ls’, ‘ns’ dimensions. (graph order/ colors switched in this graph)