the title is semi ironic, because it seems there is never a vacation with a hard problem. only a hiatus? did go on a trip last wk and had a great time, didnt think about math much at all! which is a good thing! work/ life balance and all that! although for some, math is life! would go into more juicy detail for all my loyal readers (like last years epic saga) but alas, havent heard from any of you in ages so not sure you really exist 😳 😥
just picture me in tattered/ dusty/ dirty clothes on the side of a busy cyberhighway, weathered/ sunburned/ wrinkled/ unshaven skin, sitting in the blazing hot sun with a cardboard sign scrawled with marker, million-mile staring-into-distance plaintively…
will write for comments! 😐 😳 🙄
came up with this code awhile back but didnt have a chance to post it. was musing on the results but need to just go ahead and post them. think this shows maybe both an advance and a possible limitation of the method. the code simply looks at the total # of nonconverging trajectories for collatz multi-iterations. it turns out that the nonconverging trajectories go down quite naturally as the collatz multi-iterations go up. wow! rather simple. however, whats the fine print? what this may be showing is that probability of nonconverging iterations decreases substantially (and apparently that trajectories decline more steeply/ consistently/ uniformly as multi-iterations goes up), but from long history, collatz is all about the “outliers”.
from earlier work it is long known that there are “many” exception trajectories that dont converge after ‘n’ iterations. so is this analysis just glossing over all the outliers somehow? (my initial intuition is that…) this seems to be a different dimension/ added wrinkle to merely trying to minimize the nonconverging iterations and am still thinking on what it means, how to interpret it, how to move forward. but am at least making some minor progress, in a sense (meta progress?) of posting this experiment in my queue (now out) for over a week or so.
💡 an immediate idea is to try to “sprinkle in” some of these outlier seeds by searching for them and including them in test and/ or training set.
(7/14) the idea came to me that the previous “wrinkle” or gap seems to come down to how much ‘wr_e’ can be increased outside-of-sample. so a straightforward idea is to try to do an optimization on increasing (the magnitude of) ‘wr_e’ as much as possible after the initial model is “fit”. the easiest/ 1st cut approach seemed to be with a genetic algorithm search working off the initial 50 starting seeds of 100 width each. this is 50k iterations of the genetic algorithm and current ‘wr_e’ output in green and max ‘wr_e’ encountered in red. the code finds about a ~6x increase from ~0.15 to ~0.85 but that seems (judging merely visually from the diagram) a hard limit. this is for a multi-iteration count of 50. this also has some code to enforce a minimum width here, ie 1/4th initial ie 100/4 = 25.
postscript: some of the ambiguity in interpreting these results is that its not clear how to compare this limit with converging/ nonconverging perturbations. musing on how to analyze/ represent/ cover/ close that. from the prior graph it is known that of all the error vectors including the one with max ‘wr_e’, for large enough multi iterations they all converge, but its not known (yet) how much more than the max ‘wr_e’ that there could still be convergence for the higher multi iterations.
(7/20) this is something of a tour-de-force construction that combines a lot of the previous ideas/ last two programs. it adds back in the nonconvergence evaluation logic, and has some code to incrementally alter the training set, and retrain, in a sort of adversarial fashion. the worst ‘e_m’ case is optimized for in the genetic algorithm after 1K iterations and then that point replaces the “best” ‘e_m’ point in the current training set, and then the model retrains (refits).
the general effect was exactly opposite what was desired. on one hand the ‘e_m’ values/ range of the training set tighten toward 0, as plotted below in green/ blue, max/ min, left scale respectively. but the nonconvergence count of the training set goes up dramatically/ nearly linearly (magenta line, right scale), although it notably plateaus in the middle, not sure why that is. one thing can be said, at least the ‘wr_e’ and nonconvergence can be systematically altered in this experiment. later thinking this over, its as if the “adversary” heavily overpowers the machine learning algorithm (although reaching a plateau/ apparent limit). the nonconvergence calculation is expensive so calculated here only every 5 iterations. alas, a/ the “fix” is probably not at all as simple as trying to do the opposite optimizations…
(7/21) 💡 ❗ 😮 😀 ⭐ this seems to be a veritable breakthru. tried “reversing” the optimization logic and surprisingly, it works! this code throws out the worst ‘wr_e’ point in the training set and replaces it with the best one (lowest) found from the genetic optimization. there is some minor refactoring of the code to allow for a test set nonconvergence evaluation in addition to the training set. forgot momentarily that the training set score may diverge substantially from a test set.
this plot is std dev, max, min, max-min for ‘wr_e’ in the training set on left axis, with a similar narrowing of spread as in last experiment, and ‘ta’, ‘tb’ the nonconvergence score/ count over the training and test sets, thicker lines/ right axis. ‘ta’/ lightblue declines quite reliably/ quickly, and the ‘tb’/ black test count tends to oscillate a bit (wildly?!) but hits the desired zero. something else to note is that even as ‘ta’ flattens to 0 there are other dynamics going on with eg the narrowing ‘wr_e’ spread and the training set nonconvergence count ‘tb’, in other words changes to the training seed distribution/ fit model that are not detected by/ reflected in “merely” the training set nonconvergence count.
am a bit shocked and need/ intend to tinker around with this a lot more but this was so remarkable that decided to report/ archive it immediately. this is a very big deal after spending many weeks attempting different ideas to find a model that minimizes nonconverging perturbed trajectories.
… guess its real! not a fluke! this is similar code that just ups the seed sampling resolution to 100 from 50 and does convergence testing every 4 cycles instead of 5. nearly identical trends/ results except more iterations (40*4=160 instead of 12*5=60). the same trend in the test set of increasing, somewhat plateauing, bottoming out shows up…? presumably the seeds are almost entirely different. however, do want to look at internal dynamics of this convergence/ optimization more eg general trend of density of seeds and other statistics. from an earlier experiment noticed that “worst case” ‘wr_e’ seeds found by the genetic optimizer tended to converge to half-density.
(7/25) these are 2 more runs with slightly modifed code that loops indefinitely instead of stopping at 200 and starts with
srand(2) instead respectively. the results seem a bit more random. its strange how the test error drops off dramatically (“cliff-diving-like”) to 0 from higher plateaus, and also the two runs seem to have some similarity in shape. mx/ mn right scale (unlabeled, oops) and ta, tb left scale, thicker.
(7/27) this is some adjusted code that outputs a lot of logging-like data, basically snapshots of the ‘a’ array controlling all the dynamics and some additional data. was interested in looking at behind-the-scenes metrics somewhat like many earlier experiments examine genetic algorithm convergence. the code analyzes all the stored data on the 2nd run after the initial 1st run of the code converges. the convergence stats output in
out1.txt are every 10 iterations instead of every 4, meaning even with the same random seed, this run is not the same as any prior based on different random variable generation.
the 1st graph is plot of # of non-converging trajectories by densities in the test set, with different colors for the iteration count, and output every 10 iterations and 0 the lowest density and 100 the highest. it shows the algorithm starts out with mostly lower density nonconvergence and then shifts to entirely high density nonconvergence profile.
the 2nd graph is a scatterplot of ‘wr_e’ over the training set by iteration, showing a sort of flowing/ rippling fluid effect with different density (in different/ new sense, aka concentration) regions. the 3rd plot is scatterplot of densities of the training set and shows that slowly higher densities are discarded until nearly all points have less than half density. final plot is worst ‘wr_e’ of training set in green, best ‘wr_e’ from genetic algorithm in red, both thick lines left axis, and density of the new added point from the genetic algorithm in blue, right axis. there seem like maybe some cross-correlating trends in all this but not sure what they are yet, it might take a lot of analysis to figure them out and these runs are very timeconsuming. & cant assume this single run is particularly representative of general convergence behavior.
my observation after musing on all this is that its as if the algorithm is a kind of mix of random and nonrandom shifting “somewhat meandering/ sideways” toward convergence, as if its sculpting something and reaches a sudden tipping point where the final “shape” emerges complete, even as (in prior runs) the count of nonconverging perturbed trajectories may be high right before the final solution is found. still mysterious! ❓