last month was busy for collatz ideas/ code/ graphs etc., and coincidentally right on the now-conventional new month boundary, just came up with what might be some staggering new idea(s). it starts with a very simple observation. it may be either trivial and/or highly nontrivial, depending on how it works out. never really seemed to notice this simple factoid before, if so dont recall it. maybe have been a little too focused on glide lengths when something else was (not noticeably so far) popping out (maybe these two metrics have been tracked together in some prior graph, but along with other stuff).
insight can consist of narrowing down attn/ focus/ pov toward highly meaningful facts which alone/ on their face seem unremarkable. this is just a few lines of code to answer the question, where does the “cm” max iterate index occur wrt the starting seed bit length? for the 8 generation algorithms, “cm” is very much bounded by the starting seed bit length. actually was surprised! this ties in with a whole new direction with new povs/ maybe an entirely new paradigm (shift?). (hint: what does the starting seed bit length relate to?) 😮
the 5th/ 6th ‘cg’ or ‘cm’ methods seem to minimize the gap but the gap is increasing for both.
this code is streamlined/ refactored some and tries to (greedily) minimize the seed width/ ‘cm’ gap and only succeeds for a finite few starting seeds in it finding negative differences. its 5K iterations and then the graph of ‘cm’ (red) and seed width (green) is by seed order, and actually the gap increases for larger starting seeds (just like for all 8 algorithms in prior graph, attenuated to varying degrees).
the results are encouraging, but… aha! there is a big difference/ subtlety that is glossed over here and has not been remarked on very much in prior analysis. in contrast to the db code that analyzes full trajectories, this code terminates the analysis after the glide “finishes”, but that is not necessarily going to include the max value in a trajectory. ie how often does a max occur in a trajectory after a glide? its a bit counterintuitive. a trajectory could later rise higher than its starting seed after the glide, or even rise larger than the max of the glide… definitely something to look into…. easy! done! the 2nd code handles that case and gets slightly )( different/ divergent results with “later” spikes/ crossing in the middle after the low values.
(9/9) this revisits the “lo vs hi” density (msbs vs lsbs densities). earlier a relative density difference was found over the different sides of a glide, and a 3d surface plot was made. this looks at the 4 long glide seed methods, taking the 5 longest trajectories, and computes the cumulative lo vs hi density difference in the colored trends. quite a few of them are strikingly nearly linearly decreasing. others show some other trends. the 3rd method (green) is fairly consistent but the others are not. theres a concave upward trend starting the 2nd method in red. the corresponding glide trajectories are depicted below in red.
this is a few line code change that computes the same statistics for x*3^m where x is the starting seed and m is the number of 3n+1 operations so far. earlier it was discovered that the msbs of this construction “roughly” match the collatz trajectory. as suspected there are some striking trends that need to be explored further. remarkably it appears that of either of the two trends for lo/ hi density difference vs the n*3^m trends, at least one is (roughly) linearly descending.
(9/15) was just returning/ experimenting with an older idea that essentially tracks the number of div2 operations in trajectories and retains glides that are still longer than some iteration count. this does a greedy search through them based on minimizing density. it leads to an asymptotic looking density graph. not exactly sure what this means but it seemed worth saving. the plot is density after the nth greedy iteration and its exploring very long seeds at the end (3363 bits).
⭐ ⭐ ⭐
after melting my brain for a few years on very hard code ideas, thought a simple exercise might be a fun change of pace. had this brainstorm this morning. something has been bubbling up in my consciousness for a few months or so and finally thought of a way to approach it and decided to write it up. this occurred to me as a very neat metaphor for the collatz conjecture that even a kid could understand/ contemplate.
discovered this idea as a young kid experimenting with Basic programming language, over a decade before hearing of collatz, think it might have been my own, it possibly came from some source but dont recall being inspired by some other source in particular (except maze-generation programs in general). the code is utterly simple, it just outputs back- or forward-slashes randomly (via a 50-50 PRNG chance, quite like the binary odd/even aspect of Collatz) in each character square. the brevity is reminiscent of codegolf.
what are the results? the local randomness leads to global patterns. one can ask global questions of the “passageways” like “how many (dis)connected passageways are there? what is the longest passageway?” etc. here the passageways are analogous to connected components on a graph. also, are there some other emergent global properties depending on the (local) PRNG?
the html line spacing messes up the effect somewhat so following is 2 graphs instead, one with spacing (25%) and the other with none. there were some old character graphics systems eg commodore PET & atari 800 that had diagonal characters that rendered as connected. this construction is also reminiscent of wallpaper or cellular automata patterns. hey, maybe a real niche product here for a large nonrepeating geek wallpaper pattern for some real walls/ environments? eg maybe the dark lairs of math or hacker contests? is this pattern remarked on somewhere in the literature? wonder what the 1st case of it is?
this simple play is kind of a distraction from some harsh ideas on Collatz lately, the long view: sometimes it feels like, despite a lot of demonstrable/ tangible creativity to be proud of over the years, am running out of ideas and seems like there are no promising approaches to collatz and the whole thing is just a big hall of mirrors. maybe even though true there is no inductive structure possible to the problem. or maybe a 1st counterexample is of staggeringly large size. also wonder if no possible greedy approach (of which there are many in this blog, its a big theme) could succeed in anything substantial. but there were some recent results “out there” on SAT + a number theory problem that need to write up & am quite inspired/ hopeful about. and often when feeling down about options it can help to just take a (long?) break.
/\/\/\/\/\/\/\\\/\\\/\\//\//\/\\\/\\//\\/\\\\\\\\/ /\//\\/\\\/\/\\\////\//////\\\\/\/\\\\/\/\/\//\/\\ /\/////\\\//\//\///\/\\\\/\\\/\///\////\\/\//////\ ////\////\\\\\\/////\\\\//\/\/\/\\//\\\///\/\\/\/\ /\///\\//\/\/\//\//\/\\///\///\\\\/\\\\\\\////\\\/ /\///\//\/\\\/\/////\/\//////\/\\/\\/\\//\//\//\/\ ///\///\//\\\/\\\\///\\\///\/\/\\\/\\/\\\\\/\///\/ /\\\\\/////\\\\\////\\\/\//\\\\/\///\/\//////\//// \\\\//\\\/\//\\\\///////\//\/\\//\//\\/\/\/\\/\\// \/\\\\/\//\/\\///\/\///\/\/\/\\/\\\/\/\/\//\///\// \////\///\\/\/\\//\\\\\\///\///\\\////\\\\//\\///\ //\\/////\/\/\///\/////\\/\\\//\\/////////\\/\//\/ \/\\\/\\////\//\\\\\\\///\///\/\////\\/\\\/\\\/\// \/////////\///\\/\///\\/\\/\///\/\\/\//\/\\\/\//// /\/\/////\\\\\///\\/\/\/\\\\/\//\\\\\\\/\/\\\\\\\\ \/\/\\\\//\\///\\//\/\/\/////\//\/\/\\\\///\/\\//\ /\\/\/\\///\/\\\\\\/\/\\\\\\///\//////\//\//\\\\\\ \\/\/\\\\\/\\/\//\\\/\//\\//\\//\\/\///////\\/\\\\ /\/\\\\\/\//\\//\\///\/\\\/\///\\\\\\\\\/\\\////// \/\\///\//\//\/\/\/\\//\\\///\\/\/\\/\\\\\\\\/\\\/ \\/\\\\//\\//\\\\\\\/\//\//\//\\//\//\////\/\\\/// /\\\////\\\\//\\\\\\/\//\\\/\\\/\\/\\\\\\////\//\\ \/\/\/////\\\\\//\\/\\/\//\//\/\///\/\//\////\\\\\ //\//\\//\/////\\/\\/\////\/\\//\/\/\\/\\\\//\//\/ \\/\\\/\\///\/\\\\\/\\\////\/\\\/\/\/\\\\/\\/\\\// //\/\/\/\\\//\/\\/\//\////\\/\/\\/\//\//\\/\///\/\ /\\\\\/\\\///\//\\/\\\\///\/\/\/\/\\/\\\/\\/\/\\// /\/\///\//\\//\\\/\/\/\///\\/\/\\/\////\/\/\\///\\ \/\\\/\/////\\\\/\/////\/\/\//\////\\/\/\/\\/\\\\\ \////\//\/\\\\\/\\\///\\\\\\\//\/\\\//\\/\\/\///\/ //\\\/\\//\\\\/\\\/\\\\\//\\/\\\/\/\//\\/\\/\/\\// \\//////\/\/////\/\\/\\/\\//\\//\/\//\\/////\/\\// \//\\////\/\\\/////\\/\\\\/\\\\/\//\\\\//////\\\\/ \/\\/\\\\\\\/\\\\\\\///\///\/\///\/\\/\///\///\\\\ \/\////\\\//////\\/\//\///\\////\/\\/\/\\/\\//\//\ /\\/\\/\\/\\\/\//\/\\\////\\\\//\\\\\\\\/\\\//\/\\ \\\///\\\\/\\\//\\/\\//\/\\/\/\\\///\/\/\///\/\/// /\\/\\//\\\/////\\//\/\//\\//\\/\\/////\\\\\/\\//\ /\/\//\///\/\\//\\///\/\\\\\\/\\///\\\\\/\/\///\\/ //\\///\/\\\\/\\/\/\\////\/\//\/\////\/\\\\//\\//\ //\/\\/////\///\/\//\//\//\/\\///\///////\\\//\//\ /\//\//\\/\/\\\\\\/\\\/\//\\\/\\/\\///\\/\/\/\\\\/ \\\\////\\\//\\//\\\\//\\//////////\/\/\/\/\/\/\\/ //\////\/\///\\\//\//\/\////\\//\\/\\\////\\\/\/\/ \\\\\\\\//\///\/\\\\/////\\//////\\/\////\//////\/ \/\\/\\////\\\\//\/\\\\/\/\///\/\\/\\\/\///\//\\// \\/\/\/\\//\\\/\//////\/\\\\////////\\\\\\\\\/\//\ /\///\\\\//\\//\\\/\\\\\\\\//\\/\\\\\\\/\\////\/\\ ///\\\//\/\/\\\\\\\\///\//\/\\\/\\/\\\\\//\//\\\// \///\///\//\/\\/\\/\/\\//\\\\///\/\\\/\//\\\//\\/\
⭐ ⭐ ⭐
(9/16) this is some more pursuit of an idea that started out this blog post, using a rough check with precomputed database seeds. there seems to be some kind of rough connection between the ‘cm’ max index and the number of bits in the starting iterate/ seed, a rather glaring factoid in retrospect that mostly eluded/ evaded a lot of prior analysis. heres another quick look at that. this looks at the uncompressed trajectories and the position/ index in the trajectory where the cumulative div2 operation count is equal to the number of starting bits.
the inspiration behind this idea is the following intuition: a 3n+1 operation is sort of like a substitution of the bit string that “replaces/ overwrites/ substitutes/ adds” a few bits (1-2 max) on the iterate, whereas the div2 operation “deletes” bits. ie maybe the increasing trajectory only can happen in the region of the bit string corresponding to the original bit string length.
the code again works with the 4 nontrivial glide methods 1st,5th,6th,7th, finds the glides with ‘cg’>20, sorts them descending (red line), and then determines the index of the trajectory corresponding to the div2 operations (green) equalling the initial bit string length (blue). there is a rough correlation in the green and blue lines as guessed. in the 1,5,6 methods the div2 op index is usually less than bit string length, and in the 7 method theres some interchange. the lower ‘cg’ metric is not so meaningful given that the 3 metrics are computed on different “scale” trajectories, the ‘cg’ is computed on the compressed trajectory and the other two are computed on the uncompressed trajectory.
now an immediate next question would be, what is the gap between them, how big can it get, is it increasing, or could it be bounded? a greedy search on that is the obvious next thing to try.
💡 ❗ ⭐ 😀 this is that code, attempting to greedily maximize the difference between trajectory max index ‘cm2’ and ‘i2’ the index of base2 ops = initial bit width, a new metric ‘g’. it was run for several different random selection blocks
ARGV= [50,100,250,500] and 10K iterations but initial results look very promising and seem to suggest that ‘g’ could be bounded. red is ‘g’ metric, green is the initial bit width. note all metrics computed on the uncompressed trajectory. the higher block sizes generate more points but dont push the bit width as high, and therefore the first graph seems to show the plateau/ bound effect being analyzed/ sought here most substantially. in all cases the bound seems max ~150. a bound here is a big deal 😮
something else strange, upping the count to 20K iterations for 50 block size does not seem to find more points, need to look into that further. seems to be some “painting into corner” effect.
(9/20) this is a slightly different approach on that code that outputs the new points rather than the unseen new points, run for a large 20K iterations,
ARGV=100 block size. it shows that the gap ‘g’ metric (red) is quite noisy but actually (gradually) declining overall for larger starting seed lengths (green). the prior runs were mostly outputting the early spikes. so that certainly explains the “painting into corner” effect. while nevertheless informative somehow this seems kind of messy to me (tend to avoid plots with very many points) and was thinking of ways to improve the plot and have some ideas also with an expanded scope somewhat analogous to the versatile/ oft-used multi algorithm seed generator (maybe one of the so-called “greatest hits”).
heres another take with some code that optimizes/ greedy searches using a 1-step frontier lookahead instead, again
ARGV=50. the results are similar/ different. the ~150 bound in ‘g’ remains but the algorithm seems to be better at finding later spikes and any downward trend is not evident at all, and also there is nearly no negative ‘g’ range, and the starting seed bit size reaches far larger at ~400 vs ~250.
(9/21) am now thinking along these lines/ in general that anything limited is quite valuable, so to speak. another way of looking at all this analysis is that its searching for ways that the apparent random walks are not really random, and any inherent limits are connected to this. along these lines an earlier experiment looked at the parity sequence within the core region ie the 0-1 sequence denoting the sequence of div2 or 3n+1 ops, and found some limits on the max size of runs. just to review this, this db-analysis code again takes the top 4 glide methods but in order of those with more long glides first, ie the order [5, 1, 7, 6]. again ‘cg’>20 glides are selected but here sorted by starting seed width instead and the index of core (blue), and the max 0 and 1 run lengths in the (core-only) parity sequence are graphed (magenta/ lightblue for 3n+1 and div2 run lengths respectively), finally the seed width in green and glide length in red.
there is not much connection between glide length/ seed width (red/ green) and core index (blue) except for method #7 where there is some clear rising linear correlation. the run lengths seem to be mostly limited or bounded except maybe for moderately rising 3n+1/ magenta on the 1 method.
this code does a 1-step lookahead greedy search on the core parity 0 or 1 runs and outputs seed length (red), core index (green), and core parity 0 or 1 runs as blue. here are two runs for 10K iterations and 100 block size, 0/1, with parameters encoded into the filename. it looks like optimizing for 0 runs means that the core index gets bounded early. 0 runs are bounded at ~70 in this “sample”, 1 runs at ~40 but am not sure how stable this is. (another m1 run leads to no run ceiling, so its something of a paint-into-corner effect.)
💡 ❗ 😎 messing around some more with slightly different optimization makes me wonder. tried an alternating optimizing routine where the selection is alternating between the parity run length and the core index (in a 4:1 ratio) to match the rough ratio of hardness of advancing each, as a simple bi-variate optimizer. then the core index can nearly track the seed width but then pushing up the parity run is very constrained.
so overall it looks like these two metrics may be inversely related somehow. possibly something like this: the core index is a measure of mixing, and if there is high mixing, there is low likelihood of long parity runs (in the core). seems quite plausible. now, how to show that? and what are the bounds/ limits if any? this graph (scatterplot) seems to tell a lot of that story. however even with increasing core index (green), the core parity 0-run levels off to ~50 (blue). (the bilinear effect in the green/ blue lines is because optimizing by one metric messes up the optimization by the other.)
the parity 0-run reaches ~80 without levelling off on a 2nd run (identical params), exceeding the core index at the end. (the output of only even lines was unintentional carryover but harmless.) there is possibly some deceptive aspect to this graph where points with the combined optimal values dont exist because the optimal values (maybe) dont “coincide,” that will take more analysis to determine. eg a simple method would be to sort point order in the plots by the different metrics.
💡 another way of looking at this is that separately optimizing by different metrics does not necessarily lead to combined optimizations; some of the optimizations could be inversely related. thinking about this and how to deal with it/ overcome it gave me an idea similar to some neat/ clever/ingenious analysis code in the earlier genetic algorithms, intending to reuse/ redeploy that next. was also musing, seems there also might be a more general strategy in here of trying to push a bunch of optimizations all together and see which ones (if any) “break” ie run into limits/ ceilings/ bounds. ie maybe any bounds are pointing at potentially exploitable/ generalizable properties.
⭐ ⭐ ⭐
(9/23) 💡 ❗ 😳 this is some sophisticated code built to gear up for an experiment but without actual collatz code that has quite a bit of a back story. a lot of prior experiments use quicksort but really only need an insertion sort. quicksort is n*log(n) time, and maybe some variants get close to linear if the array is already mostly sorted, but presumably this is not happening with the ruby implementation, and hence this resorting takes significant time in some of the algorithms.
so then started experimenting with an insertion sort with a custom comparison routine. took awhile to code/ debug this and then finally realized that a pairwise comparison routine could not give a vector ordering except maybe for some simple approach eg vector distances. but since this code could be useful anyway, decided to include it.
as mentioned, while experimenting with the genetic algorthms, came up with the idea of an ordering method on vectors/ tuples based on their relative element values. a much simpler approach would be to convert the vectors to distances, but unfortunately its not clear how to weight the separate components and that might tend to bias that sort approach (the largest components will dominate the distance even though maybe relatively wrt other components they should be devalued).
the basic idea is to consider tuples as pairwise competitions of elements where one tuple may dominate the other. this is a partial ordering. but it can be converted to a full ordering by comparing every vector with every other and summing the “pairwise vector domination counts”. this is a n^2 algorithm. and it would be really nice to figure out a way to handle it incrementally.
that turned out to be quite a challenge. the conceptual background is not so complex but the devil is in the details. full disclosure, this took virtually an entire day to code and debug. it has a sophisticated test harness.
the code starts with 3 sorts using the fairly simple insertion sort and single-valued comparison method. then it gets into the testing of the tuple comparison. the code has 3 datastructures: the array of tuples, the hash which keeps track of pairwise domination counts (both pairwise between elements, accumulated, and pairwise between tuples), and finally a list which contains the total domination counts for each tuple.
then the challenge is to maintain these structures incrementally. the code allows pushing a new tuple into the array and incrementally calculates the pairwise domination counts. there is a delete function which allows deleting any tuple in the array and adjusts the pairwise domination count hash. this basically amounts to deleting a row/ column from a matrix. the code also adjusts the total domination count array (basically rankings with more/ most dominating tuples having higher values). the test harness calculates the pairwise and total domination structures from scratch vs incrementally and demonstrates they are equivalent. so the incremental operations are linear instead of quadratic. it exercises key cases where the tuple to delete is at the front, middle, or last of the array.
it will be interesting to try to compare results for live algorithms with much simpler distance ordering methods and see if there is some superiority to this approach. had a real sense of accomplishment after finally debugging this code after literally hours of intense thinking. but must admit after all this trouble, think its possible that a simple distance measurement combined with elementwise-scaling of the vectors might lead to similar results. cant know a priori and am looking foward to eventually doing that benchmark. anyway the tuple ordering has many interesting theoretical properties and it was an excellent intellectual/ logical/ mathematical exercise.
(9/27) that code was posted quickly and it does not have the incremental ranking array logic. that is fixed with this new version.
😳 on 2nd thought this algorithm is not quadratic because the array-shrinking code has to copy the entire subarray “outside” of the deleted column/ row. so whats the linear alternative, does one exist? never thought about it much, but consider how a linked list converts a O(n) delete operation to O(1). so theres the obvious answer right? has anyone heard of a 2d linked list? hmmm, have to think about this a little…! cant be that hard right? 🙄
💡 ❗ managed to get this code integrated today with the trajectory optimizing code, preliminary results look interesting. had the remarkable thought that this is basically a sort of discrete gradient ascent method! hmmm… 😮
(9/28) ❓ 😮 😳 some big cognitive dissonance that took awhile to untangle! an at first surprising/ unexpected/ so far inexplicable result, thought it was a bug but looked correct. after hooking everything up, even with substantial 50 random size block, the code resulted in generating seeds where i2=w where ‘w’ is the bit width and ‘i2’ is the trajectory length after ‘w’ div2 operations (lightblue points perfectly coincide over the green points hiding the latter). weird! didnt know how to explain this at first and was thinking it quite remarkable esp how it occurs for every single point even early ones. hopefully, maybe a big deal, something exploitable? some kind of hidden order in traversal enumeration?
but the answer is very simple, ‘i2’ is always equal to bit width based on the 3n+1 and div2 sequencing that adds one element to the trajectory for each “pair” of operations (where a single div2 without a prior 3n+1 op is considered a “pair”). am still including this code because otherwise it does not seem to be incorrect. ‘i2’ was introduced back with
review16/ extend4 and didnt catch it until now! need to rethink that idea and maybe redefine it in a nontrivial way… so much for coming up with an ingenious new metric! some defense is that while obvious in retrospect/ 2020 hindsight this is all definitely not obvious just staring at the code 😛
at least all the fancy comparison code seems to be running flawlessly! score another victory for test-driven development! a commented line at the end does a very strict consistency/ sanity check wrt total tuple rankings computed from scratch vs incrementally that exercises a lot of code. another little bit of commented code does a quick check of the metric evaluation across the database seeds.
fyi one run of this, not pictured, found an abnormally large/ outlier size ‘g’ value ~350 so the plots where it stays well below that are apparently “looks deceiving”. the plot below does have a ~250 ‘g’ value at about ¾ across (blue). magenta is the max ‘0’ run lengths in the (core) parity sequence. ‘cm’ is in red.
even more substantially/ encouragingly it appears that some of the previously conjectured bounds (namely ‘m0’ the core ‘0’ runs and ‘g’ difference metric the max index minus seed bit size) are still holding even when the optimizer is simultaneously trying to collectively push all upward. note that the compression of the sequence is crucial in the max index vs seed bit size. for the compressed sequence as in the starting observation of this blog with
review9, bit size exceeds ‘cm’ max index, whereas for the uncompressed sequence here, its vice versa.
so anyway now then musing, how would such bounds connect to a proof? this idea of using gradient ascent to find inherent ceilings/ bounds now seems very promising, is there some general theory here?
(9/29) ❗ 😮 this is some slightly adjusted code that works with the high compression trajectory and optimizes 7 parameters simultaneously. A new ‘i2’ metric is computed wrt the compressed sequence for a nontrivial variant instead but not bothered to be renamed here (see the difference?). the results are surprising. the ‘cm’ and ‘c’ (index of core) metrics seem to level off/ plateau/ be bounded. think previously ‘c’ was not optimized for(?), now thinking that is a major oversight. now wondering if they “push” against each other? ‘m0’ and ‘m1’ look bounded as conjectured.
these plots use an apparently fairly new feature of gnuplot where one can name the columns (in the datafile; shoulda been using that years ago eh?!) the runs take about ~80m and 3 different runs lead to similar results. one must keep in mind the combined optimizations may lead to different results than the individual optimizations. the “interplay” between the optimizations is possibly very significant for understanding inherent constraints and apparently now need to think about how to explore that more systematically…
here is a ~90m run of an optimization of [‘cm’, ‘c’] metrics only and the code is modified slightly to separately specify the output metrics from the optimization metrics. the core index starts to plateau as above but the ‘cm’ index of max does not appear to/ clearly plateau in this range. want to run a longer range but maybe will have to do it overnight.
these are shorter ~35m runs optimizing by only a single metric by changing the w1 assignment at the start of the prior code to ‘cm’, ‘c’ respectively, and block size was changed from 100 to 20. ‘c’ alone probably has not been optimized before. ‘cm’ ofc is a standard generator algorithm used “early on” as a generator but dont recall it analyzed for higher generation ranges/ hardness say after 2K iterations. esp in 1st graph it shows one seems to increase at the expense of the other even when only optimizing by the single metric. in the 2nd graph it looks like it was a mistake not to do a search/ long runs before now; ‘c’ appears to plateau. (but note the smaller bunch size makes local minima more likely… had to interrupt this run at very end, and need to do more) 😮