collatz hammers & nails

its not as easy to come up with new titles for these collatz blogs! but its also maybe one of the least challenging aspects of the overall prj! this new monthly installment is named after a thought that came into my head today, from an old CS saying, but which maybe even predates CS (it would be fun to know its more precise origins). “when you have a hammer, everything looks like a nail.” but on the other hand, not everything that looks like a nail is not a nail. the new multivariate discrete optimization routine is turning out to be useful, maybe something of a new tool or hammer, and leading to some new insights. will it be enough to claim the big prize? collatz is a very difficult nail, and a sort of ongoing/ underlying zen question/ theme of this prj asks, if it is even a nail, and even if a collatz hammer were eventually found, could it be useful for anything other than the collatz nail?

the last blog, now rather extensive from all the addendums (and thereby maybe reflecting some burst of ideas/ energy), ended with an analysis of optimizing the max index aspect ‘cm’ alone and found the core index value ‘c’ did not increase in that scenario, basically flatlined at a low value. so its worth looking into the different optimization where ‘c’ is also part of the optimization criteria and whether that changes any of the results. in other words sometimes other metrics besides those being chosen in particular get “pushed” upwards “for free” or “unintentionally” (so to speak) without being specifically optimized for, but if they dont, is that still the case when they are pushed on purpose by the algorithm?

so this code answers that by optimizing both ‘c’ and cm-c (named ‘cmc’ in the code) and finds it doesnt help any as far as pushing ‘c’ upward while ‘cm’ still increases. also there is a clear indication of an apparent tradeoff where ‘c’ starts out larger but gets pressed down as the ‘cm’ trend increases.

this code is also poking at a very notable region that has been pointed out before. in a sense the “unresolved” part of the problem is the region after the trajectory seems to necessarily converge toward the core, but afterwards still increase for apparently longer intervals that are not bounded. another overall direction of these last results seem to be that the convergence-to-core interval, which previously seemed like a significant part of the overall analysis, is maybe decreased in importance. it seems possibly to be bounded and also pushed down heavily when the more important ‘cm’ increases. (100 batch size.)



(10/5) 😮 am now thinking that maybe have had a long mental block/ avoidance about comparing metric limits/ bounds with the seed bit width, possibly because it means that resolving trajectories is roughly a linear time or linear space operation, which would then mess up/ escape some of the “theorem-proving” techniques associated with the FSM transducer (which works for some constant time computations). in light of recent results, and re the hammer/nail metaphor/ theme, this may have been a blind spot, focusing too much on (nonexistent?) low-hanging fruit, or a case of the drunk looking for his keys under the lamppost, so to speak.

to add further fuel to that fire, here is another simple single-variable optimization that seems to show a bound related to seed width. this simply calculates glide length (for the compressed sequence) minus initial seed bit width as ‘g’ and maximizes it, using the non-multivariate code. the code runs very quickly for many iterations, reaching 20k in only a few minutes. batch size 20. the big payoff/ reward is what seems like one of the most bounded plots seen in quite awhile.



(10/6) just wanted to try this, dont think theres a look at it before now and have a nice 1-step lookahead optimizer thats generally doing great. the “sibling” trajectories have a concept of a “unseen” part of the trajectory comparing the higher to the lower sibling. does that increase without bound? kind of expected this (it seems to be evident even over the db seeds) but wanted to look anyway, this is a long 20K run that shows the answer is likely yes.



(10/12) this looks at the max length of iterations for “density crossings” across the ½ density pt inside the core for uncompressed and compressed trajectories (green, magenta). there seems to be a ceiling for both (although less pronounced for the compressed) whereas helping confirm the trend, the seed bit sizes continually increase (red, blue).



(10/19) this code revisits looking at the mx max nonmono run length of the right side of the glide, ie after the peak, which was analyzed a few months ago, but this time using the new 1-step lookahead code. ‘mx’ seems to grow more slowly here, not exactly sure why that is. ‘ns’ is the seed bit width, ‘lr’ is the right side size, ‘ls’ is the glide length. but however notice, “reading between the lines,” that the remainder left side is apparently not increasing…? ❓



💡 😮 ❗ this immediate followup analysis could be a very big deal. it shows that earlier results may have been misleading/ missed/ overlooked in a way that was recently isolated/ explained. the optimization code could be somewhat biased in a surprising sense of splitting into eg two populations that have totally divergent/ different properties, but the graphs might obscure those properties.

this code is a similar approach to a prior analysis that tries to optimize ‘mx’ for left and right sides of the glide simultaneously. but the results are that it can only optimize on one side or the other, but not both! that is apparently previously hidden, but can be seen in two extra statistics graphed here. mx1+mx2 is the sum of both mx values for each side (lightblue) and rises nearly as either side does separately. also the max glide index in yellow alternates between an early bound and a later bound, close to the glide length. (mx1 in blue is almost totally overplotted/ obscured here by the last two.) the immediate obvious next step would be to try the multivariate optimizer on both mx1 and mx2.



(10/20) ❓ the code was not hard to adapt and the discrete multivariate optimizer for left and right mx values is below. it “runs out of steam” on the left mx. so is it possible that both cant be increased indefinitely? need to do a few more runs to see what happens & if results vary, this one took 50m.



(10/21) this code tries out the component-wise scaling for the global sorting/ optimizing search. it subtracts the mean and scales by the standard deviation for each component in this case mx1, mx2 and optimizes based on sum of components. the dual regime/ alternating pattern returns suggesting a weaker optimization effect/ outcome. otherwise results are somewhat similar eg the mx1 parameter seems to be stuck at a low ceiling (but has nonzero values, suggesting the multivariate optimization is ok, but need to do more experiments, eg comparing it to an alternative eg maximizing mx1, mx2 max for each trajectory as in earlier algorithms). ie it seems to be correctly optimizing across multiple components instead of only one at a time but maybe only weakly compared to the prior algorithm. the algorithm speed is much faster even without incremental logic, it only takes ~1m to run. ‘ls’ is the trajectory length. looking at this, wondering if there is some way to strengthen the multivariate vs single variate effect?



❗ this is somewhat surprising, just the tiny change of taking out the componentwise division/ scaling by the standard deviation seems to have a major different effect that maybe pushes the multivariate objective better at least for this parameter setup seen in the ‘mx1’ plateau instead of the alternating regime pattern. however maybe this run was a bit anomalous; another run looks about the same as the prior one. note mx1 and mx2 plot order/ colors were switched in this graph for better visibility.



(11/2) was wondering about a particular metric which is the density of local nonmonotone runs in the larger right side descent and hasnt been optimized before, but accidentally looked into monotone run density instead, nearly the inverse of the former. that is labelled “d2” here and optimized, but also over glide length. it seems to peak at around ~95% (red, left scale) even as the seed bit size gets larger (green) but the right side size seems to paint into a corner (blue). there is some variation in runs.



(11/4) this has the intended metric which is density of local nonmonotone runs in the larger right side descent, ‘mxd’. it does seem to be bounded and peaks at about 80%.



this is an ambitious idea that combines some earlier ones. the stat_test4.rb and db4.rb code from awhile back was used to attempt to linear curve fit the glide completed ratio with various basic trajectory statistics such as msb and lsb densities. these ideas are combined with two optimization phases in the following code. the 1st optimization just finds larger glides, more than 50 each with at least 20 length. across all that gives 1294 trajectory points (for this particular run; ofc it varies each time). the regression is applied to these points. then the optimizer is used to try to find larger glides that maximize the (average) error with the linear estimation (along with glide length, added after a run optimizing only for error painted into a corner ie failed to increase glide length from a low value). this all mostly implements an older 2-phase idea of “find a glide predictor and then attempt to thwart it” not simultaneously but separately/ sequentially.

the average error is graphed as ‘e’ (right side scale), glide length ‘ls’, peak glide index ‘j’. there is some fancy lambda logic as optimization parameters to find the glides with suitable properties and to apply the estimation in the 2nd pass. something that appears quite notable is that the error tends to level off even as the seed size and trajectory lengths increase more steeply, but also however error is definitely increasing (alas). also notable is that the error splits into two regimes and the lower one is obscured to the right by overplotting. theres also a 2-regime effect also for the glide length and glide peak index which presumably coincide. actually one might reasonably guess all the 2-regimes coincide but thats an assumption.

(whenever these regimes show up, it pays to reanalyze the data for the separation/ partitioning effect that can lead to deceptive plots, but havent done that yet here. the basic question is whether the different regimes line up/ coincide across points. a very simple approach that has big effect is to just sort all the points by one of the components and regraph to find hidden correlations.)



(11/5) 💡 😎 ⭐ some thoughts/ musing on the big picture a little more. the general theme now is that there are all kinds of (nonrandom) metrics to measure of trajectories and they have interrelationships and apparent bounds esp when compared with real random walks in which all the same properties are unbounded. in a rough sense “measurable properties”. so the question then becomes, how to systematically explore these properties and turn some of them into universal limits, which are then provable?

heres another really cool idea that recently occurred to me, and has been somewhat staring me in the phase for a long time, but missed a bit, and not sure how to turn it into experimental analysis/ directions yet. the whole enterprise relates to fractal patterns. heres a glaring fractal pattern. a trajectory is composed of subtrajectories in the following sense: a glide has “subglides”! ie in the falling region of a glide after its max, there are other subglides that have starting and ending points. and the long-studied nonmonotone runs are basically like subglides. the recent two experiments measure the min and max density of subglides and find limits on both. maybe something like 10-90%. this seems to indicate that subglides are unbounded in absolute length (and maybe looking for those absolute limits in many prior experiments is now looking a rather quixotic quest) but are bounded in relative lengths. on the other hand it looks like a quite natural reformulation of the problem to study how glides are composed of subglides.

another really beautiful aspect associated with the pov of “glides with subglides” is that its a completely general concept that would be associated with any “baseline returning fractals” not just the collatz problem. and there is an interesting sense in which all decidable problems are “baseline returning”. imagine a Turing machine computing any problem. suppose that it expands its tape usage during the computation to determine halting/ acceptance but then also empties its tape upon/ right before halting. the tape size goes from larger than input/ starting size down to 0, just like with collatz trajectories. (my last few questions on stackexchange CS theory capture some of that correspondence.) then easily provable, all decidable problems can be turned into a baseline-returning tape sequence. in other words theres an apparently natural similarity between collatz trajectories and TM decidable tape sequences (instantaneous descriptions).

(11/8) this code tries instead to minimize the nonmonotone run count ie # of separate nonmonotone runs ‘mc’ over 10k iterations. this particular run saw it slowly declining even near the end but several earlier runs painted into a corner and the ‘ls2’ right glide size and ‘mc’ both flatlined around the midpoints of the graphs. its not obvious in this graph but ‘mc’ (green line) is declining simply to a constant 3.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s