collatz unvanquished

(still hanging in there despite going thru/ in middle of tumultuous 1-time in lifetime personal hardship…) have been pursuing many miscellaneous ideas and firing off many different algorithms without a lot of particular direction and many mostly null results. then was looking closely at backtracking patterns of 1-run patterns (riffing some off the direction of last months backtrack21 code which found the stripe pattern in msbs) and then ran into this idea/ finding/ feature, maybe not obviously/ apparently/ closely aligned with overall proof ideas but definitely worth documenting. not sure how exploitable it is, but it was a little surprising. it uses a simple backtracking algorithm (with restarts) to create 20-length pre-trajectories on a 50-length 1-run in a 100 bit width seed with ½ density random background. then it was found that there is a signal that occurs about ½ the number of iterates of the initial 1-run, ie ½ * 50 = 25 in this case (graphed to 20).

the signal is a “twin string” pattern (in binary—what else?!). this code does 100 test cases and then plots the average length of this twin/ duplicate string along with standard deviation. a pattern is in the form xssy where x is the prefix, y is the suffix, and a string s is repeated twice. it (the string s) starts out as (containing) short 01-pair runs and then enlarges to “larger” patterns (that is, from the pov of the seed moving backwards). the patterns increase in size in the sense of becoming less repetitive and the larger patterns become more irregular. need to build up other ways to visualize this additional internal structure not revealed here. what is somewhat surprising is how “deep” this signal persists, and the pattern was not detected in earlier bit/ grid plots. in a sense, this effect demonstrates highly regular patterns can look random visually; a more specific explanation is that the repeated string s gets more “random looking” away from the 1-run, and visually its hard to detect adjacent twins (strings) if the twin is longer/ random. additionally wrt its structure, in a sense the “edges” of this initial pattern are “dissipated” going backwards and its size shrinks. this is in contrast to the other 0/1-run patterns long investigated that “dissipate” “going forwards” instead.

😳 holy cow! big/ long undetected defect! the stat function was refactored quite a few months ago to use the inject function instead of loops. and have been using this function in a lot of places and its “approximately” correct for larger size datasets but has a serious glitch where 1 of the datapoints, the 1st, is computed incorrectly, in the sum of squares calculation. the glitch is that the inject function starts with the 1st value of the array if the default value is not given as a parameter. in this case however one wants x^2 and not x in the summation! wow! alas none of my readers caught this, lol. it was only detected here because the standard deviation calculation was coming up negative for close datapoints (very low standard deviation) which then results in 0 in this function, ie in the prior buggy version; this code has the fixed logic. in 2020 hindsight, rushed off the refactoring too fast, went with “plausible looking results” chasing down other ideas, and had no side-by-side comparison/ unit test.



(11/22) am just poking around at different things then wanted to file this for future reference, nothing earthshaking. it creates ½ density iterates of varying widths 10 to 200 in increments of 10 and then looks at them in base 6. its found theres a pattern in the msd (most significant digit base 6). it was found in the grid layout graph 1 with msd left aligned (leftmost band) but it may not be so clear in this shrunk graph so then its plotted more in detail in graph 2. this is probably easily explained via some kind of straightfwd numerical property but didnt expect and actually dont see an immediate explanation myself. my impression is it relates to that the msd is an approximation of the log n value where n=2 or n=6 and binary values distributing one way affect the base 6 values distributing “the other way”.




(later) this is not easy to explain and have to delve into it more to understand it. this sorts ½ density seed trajectories by the 10th iterate, and then looks at the 1st iterate in the (newly ordered) list in 2 different ways, grid plot and then just the logarithm. in the 1st plot the msbs (right side) have a pattern. the signal is striking but not obvious to explain. it has to do with how the ½ density iterates still have a range of values affecting/ correlating with the longer term semirandom walk, and also how the random walk tends to be staggered in a patterned/ jagged way wrt logarithmic scale of the iterates. in the 2nd plot there are ~7 groupings/ clusters (tooths in the sawtooths so to speak) and this seems to relate to the jagged walk(s) ending in endpoint clusters. but am having a hard time visualizing how the strict monotonic increasing order is maintained inside clusters. it seems to be a sort of holographic-like effect where local and global effects are reflected in each other so to speak. another related angle is looking at the way that msbs and lsbs interact, wrt sorting…

💡 oh! there are as many msb suffixes as clusters, and each cluster has the same msb, so that the cluster gets sorted by the lsbs. its notable/ striking here how different views conceal or reveal different aspects of the same data. eg the striking ordering-within-cluster aspect is utterly undetectable in the grid plot and the (identical) msb groupings are not exactly obvious in the line plot—the groupings are clear but its not so obvious theres an invariant (partitioning) associated with each group. also from this one would expect msb grouping counts as powers of two. its interesting to contrast this many earlier experiments/ ideas that have looked at lsb groupings/ “prefixes” at the opposite end.




(11/25) 💡 ❗ that last experiment of “slicing and dicing” (½-density) trajectories in a relatively new way gave me some new idea. and was graphing some of the output, and then stumbled on this. do not recall simply graphing ½-density iterate trajectories! and was doing so based on different sorting looking at the last code (based on later-trajectory points), and then decided to do the most simple experiment in a sense. this simply sorts/ color codes by trajectory lengths in the 1st graph for 100 ½-density trajectories. notice anything unusual? it seems that there is an initial bump or spread in the trajectories but then they tend to smooth out more…? is it just my imagination? the numbers dont lie!

the next analysis code looks at the slope of 2 segments, the 1st segment from start to a later point, and the 2nd starting at the later point to the end. it computes standard deviations of the slopes for the 2 segments m1s, m2s over 500 trajectories. the later points are sampled/ moved in increments of 5 from 5 to 100. then in the 2nd graph these are plotted along with the slope averages m1a, m2a (later 2 with no detectable difference).  the trend is quite striking and its clear. there is a very high initial standard deviation in slopes that decays, green vs the nondecay in the magenta 2nd segment. in other words all the standard deviation variation can be attributed to the initial parts of the segments, apparently less than 5 points, after which it apparently totally disappears in the 2nd segment trend, but also causes the slow decay in the 1st segment.

another experiment to compare this with is construct79 from last month. it was very challenging but the basic question and a long theme pursued is, what is the difference between ½-density iterates and the drain? all the experiments with “ufos” tend to focus/ push on this question. from another angle, iterating a ½-density seed some number of iterations likely pushes it into the drain, and is it then statistically any different? construct79 revealed a difference, but it was very subtle. this also reveals a difference; dont know if its the same (from 2 different angles). this new finding seems to be stronger. it also suggests, remarkably, that iterates very close to ½-density are not exactly in the drain. in other words, the drain seems to be associated (more) with slight-non-½-density iterates. immediately this evokes various measurements on density distance metric, likely worthwhile to study next.




(11/26) 😳 😮 on 2nd thought maybe this is only measuring that standard deviation in slope of a random walk is high(er) for shorter lengths but doesnt change over the different sections of the random walk. maybe need to normalize for equal lengths of random walk. immediately trying this, the effect disappears! (the new ‘m2s’ coincides with same ‘m1s’ decay trend in last diagram.)



1 thought on “collatz unvanquished

  1. gentzen

    I remember a similar experience to your buggy stat function: Debugged more than a day at work, because numpy.vectorize uses the first computed value to determine the type of the returned numpy array (if you don’t specify that type explicitly). But the vectorized function had a case distinction, and the first returned value happened to be “0”, so numpy.vectorize used “int” as type.

    Sorry about your 1-time in lifetime personal hardship.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s