this is a modification of the collatz genetic algorithm attack to optimize “mx” slopes of the 7 different hardness methods individually (ie they are calculated/ compared as separate metrics instead of aggregated into a single metric), and skip the 8th for a out-of-sample test set. it seems to be running correctly except that it finds/ saves some “empty” solutions where the metrics dont compute (?), havent looked into this yet and seems to be some glitch but think maybe its not messing up the overall algorithm/ search for now. it found some quite remarkable solutions early on almost merely by random sampling but later solutions (so far) tend to converge toward the original curves for “mx” and have nonnegative “mx” slopes. the solution below is after only a few GA solutions post-random-initialization. the baseline trend is green, solution is in red.
one encouraging sign is that some of the early random solutions seem to affect the different seed methods differently, which suggests substantial complexity in solutions within the search space. but overall the results so far are unpromising because the key success metric is negative “mx” slopes
which are apparently very difficult to find if they even exist. having some of that “quixotic” feeling at the moment. 😦
(6/9) 😮 ❗ 💡 shocked! a bit more analysis. got a bit confused. the baseline plot has a little bit of code to skip missing metrics due to adding them after the database create code runs, if they are nil they output as zero. but it wasnt the baseline but the solution curve that is giving a flat “mx” value! however the other metrics for the solution are not missing. in other words the GA is finding perfectly flat “mx” solutions. these are exactly as sought, “full proof” candidates! need to further analyze these.
(6/11) 😳 (substantially refactored/ rewrote the analysis code to use a callback, some code thats been bothering me for awhile, and regard it as much improved, but not ready to post it yet.) found the source of the issue: the GA is finding solution(s) that dont weight any of the collatz-specific terms of the metric vector (that is, zero-weights them), just the sequence index terms (“j“). this ties in with the GAs sometimes-verging-on-uncanny ability to exploit “loopholes” in the fitness function aka what might be called “degenerate solutions”. (which reminds me of finding interesting page/ topic, googling on it, and then that page being 1st or only related result…) however the GA is finding other solutions that maybe are not trivial in the same way. not exactly sure how to “fix” this loophole right now, it seems nontrivial to adjust/ fix, because its not simply a matter of looking for zero values on certain terms, because then the GA can just pick very small “nearly zero” weights instead…
re/ revising/ clarifying the strikethrough line statement above: it does seem very hard to find negative slope “mx” solutions across the multiple hardness methods, but maybe not so hard on a single method, and not hard to find the loophole “0-flat” mx solutions.
(6/17) so as mentioned earlier instead of throwing out “invalid” solutions, added two metrics earlier that havent been mentioned yet. “mn” counts the nearest point to the origin and “cn” counts the number of negative values in the trend and both are minimized. these work as intended however the results are very disappointing. latest results are that it looks like the problem constrained in this way is not solvable and the found solutions (most “throwing away” about half the weights ie 0-weighting them) are all nearly close to the original “mx” curves. this latest code also tries to minimize average values across each of the hardness methods in addition to slopes; without it, the code was finding degenerate solutions that had good slopes but bad (lots negative) average values.
some further thoughts/ explanation: the overall problem is attempting to find a curve that is inversely related to the trajectory descents such that adding it to the trajectory descent gives a nice smooth monotonically decreasing descent. but none of the individual metrics seem to have that property, and therefore any linear combination of them does not seem to either. all the earlier solutions that seemed to be improvements were full of significant parts of negative valued curves.
quoting from a semifamous rock song, “what a long strange trip its been.” so anyway again “back to the drawing board” 😳 😦
(6/23) 😥 honestly, am having some serious misgivings and outright crestfallen disappointment at the moment, because this basic strategy devised over ~1½ year can now be considered a huge failure at this point. maybe even bordering on “shattering”? but the muse poked me some more last night and actually woke up with all kinds of new ideas (optimistically all related to the core of the problem)!
1st it seems interesting to visualize why this situation is occurring. this is a graph of the “mx” value for 17 metrics similar to those being used in the prior experiments. it shows “mx” for each metric separately evaluated is either highly correlated or highly uncorrelated (flat) across the metrics (note these are scaled by max values by each metric). 2020 hindsight! worked a little longer to turn this all into a very pretty visualization with all the colors (reminds me of that lemons/ lemonade saying…).
but then this led to an interesting/ natural idea: what would be a “perfect” metric that if present leads to a solution? am maybe going to pursue this direction, and am slightly hopeful the GA framework can still mostly/ largely be used somehow… (unfortunately its easy to predict no such simple perfect metric exists esp across all the different hardness methods, but its still not a bad idea, and maybe can be built on…)
further thoughts. what is the current GA attempting to do? it wants to find a “smooth downward curve” as some linear combination of the metrics. but what is the optimal “smooth downward curve”? a line! what does that line represent? quite simply, the fraction of the trajectory completed. so this gives a whole new pov. the goal is to use ML+AI to match/ generate the (global) “trajectory fraction line,” from local metrics—ie essentially estimating trajectory length at any intermediate position! (so once again, the same story noticed long ago, in the sense of attempting to determine global behavior from local metrics). but maybe there is some way to alter the GA to seek this “trajectory fraction line” more directly…?