collatz deja vu

❤ 💡 didnt expect to focus on collatz so much when starting this blog, but hard problems require (sometimes intense) focus! as you can see/ tell it has sucked me in. in the sense of “femme fatale” maybe? such a much more interesting/ evocative metaphor imho than others that are proposed.

am gearing up for a new saga/ installment/ pivot (because last mos saga ofc wasnt quite enough, lol!). it requires a bit of “psyching up” as the vernacular goes. it involves dredging up some (now) very old ideas that are potentially quite powerful but at this point have cobwebs on them. blast from the past™/ back from the dead™ one might say dramatically. yeah its a possibly big success story of long having something valuable/ powerful in the back pocket and resuscitating it. its also a case of years-long near miss/ strong/ shiny/ sparkling lead that came close at the time but didnt realize exactly/ completely pursue it in the moment. what goes around comes around™.

background (“static”), have been assigned a new fat/ significant prj at work and am a bit bummed/ chagrined/ annoyed about that because it cuts into this “fun time with collatz” (so to speak!). usually end-of-year period allows some respite/ hiatus as the code goes into a temporary freeze period. whatever

😎 more painting scenic/ colorful background: on the porch in backyard this second with cap + sunglasses + headband + tshirt + long sleeve shirt + sweatshirt + thick black leather jacket + long underwear (“layers”) vs the sun rise, next to nice winebarrel firepit, rose incense in the shamanic wood tower holder imported from eucador, admiring my nice sculpture collection (from fish tank decor, repurposed!) including the 2 new 2ft faux rock dragon protectors doing a great job so far. if someone is really nice/ or maybe just human to me in comments, may post picture highlights! believe me its well worth it! it started out at 40°F on this fall day but already up to 50° in about only 30m. the headband + bike gloves with open fingertips defn help! am lately thinking/ musing that maybe the problem difficulty should be measured not in midnight oil mentioned earlier but in incense sticks burned, and it is quite numerous/ sizeable so far!

other news: am starting Frenkels book Love and Math 2013, was tipped on this half a year by colorful chat denizen 0celo7 RIP (its a long story™… oops post correction it was cohort Balarka) who pointed out the extraordinary porn star “erotica angle” lol! wow, iirc skipped buying this at the time, thought it was a little too fluffy maybe, but now think its an instant classic. he talks about beauty in math a lot the way Hardy did in Mathematicians apology. poetic! what a great writer!

ok heres the rundown. its back to the FSM/ transducer formulation to try to analyze the long 0/1 runs behavior after mappings wrt unlimited size iterates. looking/ searching thru blog history, generally havent poked at the FSMs in ~2½ yrs and in 2nd case it was for another idea besides collatz (prime analysis/ auto thm proving bertrands postulate via SAT etc). seem to have been inspired at time partly by the Konev/ Lisitsa SAT-EDP work in 2014 and have said so in this blog on occasion. that work is outstanding and highly aligned with certain directions/ themes/ approaches pointed out/ emphasized/ advocated in this blog repeatedly over the years but it seems like few have gone in the same or similar directions since then (maybe even the authors themselves!). ❓

the most recent blogs on this are from 1/2016 and 4/2016 which utilize the OpenFST library. another motivation at the time was that FSMs have a natural way to “compress” information using the minimization algorithm and this is an early investigation into entropy/ (dis)order concepts surrounding collatz before running into them more directly more recently. the other major “elephant in the room” around here wrt this is this early writeup that references transducer code.

(later) recalled writing up some of these ideas long ago but couldnt recall details. yep, almost exactly 3 yrs ago oct 2015 was musing on transducers and the relations between input and output lengths, citing my SE questions, this one is the closest to current directions. suspect one of the answers got deleted, the current one mentions the deletion, not sure what the deleted answer was about/ who wrote it, alas. (disliked those SE history vaporizing aspects.) the current question under consideration has some vague resemblance.

(11/4) this took a few hours of some careful coding but is basically/ simply a “warmup exercise”. it demonstrates the ideas of transduction and composition of the collatz transducer. the composition concept is alluded to a lot over the years in this blog, have written sophisticated code on that in the past, but no code is posted on that until now, and those following closely (is anybody out there?!?) will understand it will play a key role in this current angle. the transducer constructed was given here a few years ago on the general outline pg just cited. it combines the 3n + 1 and repeated div2 ops in a single pass (1st followed by 2nd) and runs correctly on odd #s (only) greater than 1. there is some binary encoding over 2 symbols per bit, lsb to msb, followed by a symbol 2 terminator, as described on the general pg. some transducer code is also already given on the general pg. this is a rewrite.

the fsm3 batch file demonstrates various sophisticated concepts using the FST(M) library cited, ie/ eg/ mainly the composition of 2 (identical) collatz transducers into a single transduction. the new library is compatible with the old AT&T FSM library format. there is a little code to handle ϵ-epsilon in/ out transitions in the transducer but its not fully general, it “only” is enough logic to work on the collatz transducer format given, where epsilon transitions do not occur with other transitions, and there are none at the starting state. the 2-transducer construction also has a fstepsnormalize which effectively takes them out of the final results. theres also an “arcsort” which is required by the library for some algorithms/ input files, basically another simple normalizing operation, it would be more convenient for purposes here if the parsers didnt require this. the test code verifies the 1- and 2-transducer constructions work for the case “27”. there is encoding and decoding of binary-to-symbols (“alphabet”) and symbols-to-binary. then it selects 3 random numbers under 1000 and tests both the 1- and 2-transducers with no failures.



fst stdout

(11/8) 😀 heres another exercise that is still mostly “warmup for the main event” but gets/ advances everything closer within shooting distance so to speak and is again not very much code but nontrivial to put together and debug/ validate carefully. the basic idea is that next steps will require a FSM encoding “all integers with 0/1 binary bit runs less than n“. it is a “bit” )( quite extraordinary that FSMs can encode such concepts/ sets, they are not available elsewhere in number theory, but this is considered rather elementary CS. more crosscutting/ interdisciplinary action/ building bridges!

this took several pieces of code. the 1st batch file constructs 2 FSMs for 10-runs and 9-runs calling the fstlen generator which outputs a symbolic description of the FSM to stdout. the language is fairly easy to understand, there are basically 2 alternating state runs/ branches, 1 for each bit 0/ 1, and which count the # of sequential bits currently generated, from lsb to msb with at least one msb set (ie not ending in a 0 bit). then there is some FST code to compile and determinize these for further processing (as earlier misc cmds from this list). here the arcsort seems to be unnecessary after the determinize step, but its undocumented that/ whether determinize sorts output arcs (actually the cmd documentation really leaves something to be desired).

then FST difference is called on the 10-runs minus the 9-runs sets. it is not completely obvious how to visualize this but the remaining set will have “all” (odd) integers (in binary) with at least 1 10-run in either/ and/ or both 0/1 bits. this can be verified in 3 separate trials which use the helpful/ convenient random path generator from the FSM library to create a random accepted string from the result (difference) set, parse it into the 0/1 binary format from the 2-symbol format using the previous/ recent sym2bin code/ function adapted, and then break up the bit runs for analysis, correctly finding the 10-run instances sometimes as either 0 or 1 bit runs in all 3 cases. apparently the randomizer tends to find cases where the 1st runs are 10 length but other samples show this is not always the case.

am thinking more about the mix22 code from last may, it looks at subsequent maximum counts in 0/1 bit runs vs seed sizes and finds it exceeds/ increases gradually. but its a little undefined based on the ruby max operator: does it find the 1st or the last maximum in a list if there are multiple equivalent maximums in the list, is it consistent/ defined? (easy to check, but away from coding right now, iirc that is the case.) that experiment somewhat directly looks into whether 0/1 bit run lengths can increase in trajectories and answers affirmatively, but on now 2nd look doesnt give a totally conclusive answer, and now looks like a near miss on finding the current guiding conjecture. the current conjecture is that for larger runs, there are consistent decreases (as found recently, ie/ eg gap3 3rd graph from last month), but for smaller runs, its random, and that prior experiment mixes them together in the analysis, and also doesnt seem to discriminate between multiple repeated maximums… taking into account the more recent results, it may be counting new “maximums” that are equal! this all shows (again/ as usual!) how crucial it is to get the analysis right/ carefully/ precisely formalized… ❓ ❗




fsm4 stdout

(11/9) 💡 building towards some more insight. this shows some of the remarkable strength of these techniques. eg they can find global minima across all integers although sometimes at large memory expenses. the basic question being answered here is “find a (smallest) number with at least 1 n-run that also has at least 1 n-run after 1 collatz mapping”. this takes quite a bit of careful coding (despite the fairly simple description theres a lot of different sets involved/ “juggled”) and uses the semi sophisticated fstshortestpath code to find it in the end, and some hack-like windows cmds to do basic stuff like create empty files or output lines without newlines, yikes, so ugly! really need to move to unix scripts huh!

honestly the bigger conjecture as alluded all along was more like “after something like t collatz mappings, all n-runs would turn into n-1 or smaller runs (for sufficiently large t, n).” but that looks like, while possible to test with an algorithm in theory, it will be practically difficult to test due to combinatorial explosion of states, ie it looked like on 1st cut the 2nd collatz mapping for the “at-least-1-n-run” language blew up to ~15K or more states immediately and required a few hundred megabytes of memory to process, may look into tuning that further but it might not help.

elsewhere, its never far from a graph around here for long. this graph is the n-run length on the x axis and the minimum symbol count on the y axis, which is basically double the bit size. the regularity of this graph seems quite remarkable and seems to point to deeper properties and ofc important, maybe crucial inductive structure. a little counterintuitively, the determinize and minimize operations do not automatically remove epsilon states, although that doesnt figure until near the end of the analysis because other earlier constructions dont lead to them. its natural to look at the actual found shortest numbers for some pattern, but whew all this took quite awhile.




(later) ofc that glaring oversight is easily corrected although it didnt take a totally brief amount of time. this code adapts sym2bin to be more generic and alters the batch file slightly to output the found strings/ seeds instead of measuring lengths. theres a slightly strange quirk to the library, the output shortest path FSM starts at a highest numbered state and works/ transitions backwards thru states, otherwise FSMs handled elsewhere typically start at state 0. theres a very nice utility called fsttopsort that perfectly handles this by renumbering states in increasing order starting at 0. the display utility looks at the final and corresponding initial seeds in binary grid format, lsbs left aligned, also with some embedded simple collatz mapping inverse logic. the patterns are quite remarkable. it seems almost like a collatz yin-yang and/ or checkerboard. the alternation from input to output of the transducer is a n-0-run on left to a n-1-run on right. feel like am close to something deep, still cant quite put my finger/ brain on it. its like logotherapy or life… what does it mean™?






(11/12) lots of progress/ ideas yesterday, writing it up today. here is another way of looking at it. as noted there are some similarities between the theory of (prime) implicants and the 1-runs ideas. need to come up with some better notation/ identification to try to avoid confusion/ ambiguity. call a “(x,y)” run set a set with 0/1 runs of length between ‘x’ and ‘y’, lower and upper bounds, the sets created earliest above (fsm4/ fstlen code). the set may or may not have runs at exactly the lower and upper bound lengths. a “(x,y,z)” set is a (x,y) set with at least 1 (0/1) run of at least length ‘z’. it might also be helpful to define a set where there is at least 1 run length of exactly length ‘z’. these are all slightly different but interrelated concepts. (eg if y=z then there must be an exact length run of that length.)

wrote some code to create the (x,y,z) set (as a batch file “subroutine”, basically (re)factoring it out of the fsm5c code above. it does this by building the (x,y) set (actually here x=1) and taking the difference with the (x,y-1) set giving a (x,y,y) set. then one can do an intersection of the (x,y,y) set with a (x,y-1,y-1) set (or actually any (x,y-n,y-n) set and find it empty, ie it leads to zero states in the intersection. these are simple “proof of concept” exercises. these concepts are not exactly simple to keep straight with words. in other words for this intersection idea, the sets with at least 1 exact y-run are disjoint with the sets with at least 1 exact (y-n)-run as long as all run lengths in the 2nd are smaller than y. whew! had to think about that carefully/ at length. “math is built forward but understood backward…”

(later) this is some not very pretty code (am going to mostly blame windows/ bill gates for that instead of my own prowess) but “gets the job done,” or better, does some neat/ somewhat deep analysis. at this point some sophisticated theorem-like experiments/ observations can be constructed. via nearly an accident in some code, noticed that some intersections of sets were coming up empty and then built up this more systematic experiment. to understand it, realize now the 1-run concepts are somewhat close to an idea in boolean function theory called “slices”. a (1,y,y) set acts a lot like a “slice” so call it a “y-run slice” here (but used with the caveat the ideas are not exactly the same). then one can look at intersections of a single collatz mapping with “nearby/ adjacent slices”. this maps an n-slice and then looks at intersections with the same n-slice and n-1, n-2, n-3 slices. it finds the n-slice mapping and the n-3 slice intersection is always empty. the slices are disjoint but the union of all (intersection) slices of a set equals the full set.

this is a nearly direct confirmation of some of the ideas about the mapping altering the run/ slice structure to speak. however quite solidly this is more approaching an inductive proof structure than the earlier empirical observations. it alters it but not past a certain limit, here found to be 3. this leads to a sort of concept of “higher and lower slices”. the higher slices tend to get mapped into lower slices in a sort of “cascade” effect. the minimize code is altered slightly in v2 to only output the final state count. also after some confusion the dos setlocal command turned out to be crucial to utilize to avoid variable collisions.

the graph is the minimized state counts in the slices with x axis the slice “length” count. as stated the n-3 slice is empty with 0 states. the minimization leads to some blowup in state counts as the nondeterministic FSM is converted to a deterministic one. have long searched for a great nondeterministic minimization algorithm and it looks like maybe nobody has ever constructed one in the literature ie is nearly an open TCS problem. the lines are so linear, wonder if they perfectly match some linear formulas. was surprised at that. somehow this suggests the pov is “perfectly” analyzing/ characterizing the problem somehow. ie maybe there is in some sense “analytic formulas” for constructing these FSMs. another way of looking at state counts in line with prior analysis themes is that its a type of entropy measurement of the sets.






💡 ❗ 😀 😮 breakthru! this is further vindication of some of the key ideas/ conjectures outlined. this looks at a 3-slice and subsequent collatz mappings. it finds all 3-slices “crunch” to 2-slices after 6 consecutive mappings. again a near-direct confirmation and again these results are a very big deal because they hold over arbitrary “size” integers. the minimization results are interesting too because sometimes the minimized sets are “smaller” (in state counts/ entropy) than the starting sets and other times larger.

alas, just analyzing the next 4-to-3 slice crunch looks like it is impossible due to state complexity explosion into tens or hundreds of thousands of states.* some of this is from the determinizing step but maybe cant be avoided even without it. this immediately leads to the idea of trying to figure out ways to manipulate FSMs without using the standard minimization methods. are there some kinds of patterns/ regularity in these FSMs that can be compressed but not in the standard way? maybe some ingenious reformulations are possible. another approach that has worked well in the past for me, not documented here, is to look at/ analyze size ‘n’ bit sets for some constant. eg one can easily construct the intersection of a t-slice with a n-bit set for arbitrary t, n. another powerful unpublished experiment along these lines looked at how many collatz mappings it takes for (all!) n-bit integers to decrease to a n-1 bit integer and could feasibly run for ‘n’ around say a few dozen (iirc)…


(later) 😳 😮 ❓ ❗ had some trouble/ strangeness with this code. made some changes in it and it didnt seem to affect results. also it was outputting incorrect filenames. calling wrong code/ batch file by accident? some kind of windows glitch? stuck/ hung process in memory? doesnt fully make sense. came back few hours later and had to make changes/ fixes/ replacement and it seems to be coherent now. cant fully explain what happened…

# of states                                       212
# of states                                       160
# of states                                       945
# of states                                       1115
# of states                                       3473
# of states                                       6262
# of states                                       8518
# of states                                       6424
# of states                                       2984
# of states                                       827
# of states                                       0
# of states                                       0

(later) * 😮 🙄 💡 ❗ 😀 😎 ⭐ breakthru II! an old expression: the impossible just takes longer™… spoke too soon! the 4-to-3 slice code works! it was as simple as doing epsilon removal before the minimization step. apparently there is a large blowup due to epsilon transitions. (“computer/ algorithm”) proven: the 4-to-3 run slice “reduction/ crunch” occurs in 7 mappings instead of the prior 6 verified for the 3-to-2 case. the 3rd step uses ~10.5K states. (more notes on this below discovered later; this changed the intersection set from last code and is not exactly comparable.)



# of states                                       274
# of states                                       375
# of states                                       2090
# of states                                       9359
# of states                                       10459
# of states                                       7556
# of states                                       6189
# of states                                       2744
# of states                                       3098
# of states                                       1107
# of states                                       345
# of states                                       128
# of states                                       0
# of states                                       0

(11/14) ❗ 💡 😀 😎 ⭐ breakthru III! yet another cool/ fantastic/ aligned/ consistent finding! building on prior tools this was easy to write/ debug in only about 1hr. the len code generates a FSM language for all possible binary strings of length ‘n’ (in symbol pairs for bits). this can be passed into the mapping code as fpre set here width 50. the fin intersection set is 10-length slices.

the code measures how many 50-bit 10-slice mappings lead to only 9-slices or lower, in this case 6— further confirmation/ vindication! the 4th mapping leads to a remarkable ~1.2M states. this shows some of the remarkable power of this “computerized thm proving system”. this findings/ property can now be expanded to look at other trends over different integer + slice sizes, for example it took fewer mappings for 20 bit width integers (for same 10-9 crunch)— these combined may tend to suggest there is no constant t for which t mappings always crunch n slices to n-1 slices. also graphing state counts thru the “bulge” of the reduction/ crunch would be interesting. the bulge involves a complexity/ entropy transformation/ transition point from low-to-high-to-low again.

notably, these results are far outside of anything that can be done by hand. while it couldnt be completely ruled out, there is no expectation at all that any by-brain+hand theory could prove such a fact. can relate, just read amusing article on online dating as howling into the void… so is anyone impressed yet? 😛

as mentioned also a basic exercise now is to measure how many mappings of n-bit integers lead to entirely n-1 bit integers. it would be interesting to compare those counts with the time to go from n slices to n-1 slices. the 2nd case is apparently substantially shorter/ therefore possibly (“far”?) more tractable to study. ❓



fsm8 m=50 n=10
# of states                                       15950
# of states                                       7366
# of states                                       73924
# of states                                       82085
# of states                                       498925
# of states                                       748780
# of states                                       1150225
# of states                                       345272
# of states                                       47662
# of states                                       17086
# of states                                       0
# of states                                       0

(11/15) 😳 some slight issues with that uniform-length language code on closer look. (1) it allows (accepts) a zero length string. (2) it allows msbs that dont end in a 1 bit, ie end in 0 bits. after fixing it, it appears it has no effect on the prior computation, leading me to think that the collatz transducer doesnt accept these inputs anyway. (verifying left as exercise for reader.)

⭐ ❗ 😀 😎 anyway need a slight variant of that language, ie “all n-bit integers up to n“. that is an easy variant in this new code and as a bonus the code also produces the “exactly n-bit integers” with a simple command line flag. combining these leads to the following code that has been alluded to for a long time but never posted before. came up with this construction in the early to late 2000s (working it at mainly two different times/ attacks in the decade, early and late… deja vu!).

the code is similar to prior code except “shaves off” the lower integers from the “running composition” using the difference/ subtraction method instead of intersection as previously. ie it starts with all exactly n-bit integers and then shaves off all n-1 bit integers and lower. there is a little trick to test for an empty set by constructing one and then testing for equivalence. there is an additional minimization step pre-difference to decrease the substantial complexity of the difference step. it finds all 20 bit integers seeds decrease to 19 bits in 125 steps using up to 110K states max at mappings 10-11. note this is not exactly the same mapping counts as real glides, it may take “somewhat more” steps for an n-bit iterate to reach n-1 bits than the glide length.

my feeling has long been that this code might potentially outperform exhaustive searches being used to rule out collatz ranges over the decades based on superior speed/ memory efficiency. there is a distributed one running these days, need to cite it sometime. this could possibly be adapted to a distributed version. its large-scale dynamics are fully unexplored and surely have many exotic properties. any takers? 😀 😎 💡 ⭐ ❗

(later) called with n=26 leads to about 3.7M states on mapping #13 and completes in 239 mappings total. memory usage on some of the operations seems to approach about ~1½ gigabyte and overflow after/ over a little more than that. there is clearly an exponential increase in states so some of that enthusiasm for time/ space complexity is clearly overblown. however its basically/ definitely a very novel way to parallelize the “search” that is unpublished in the literature. the remarkable combination/ fusion of CS and number theory is also quite novel.



fsm9 20 stdout

fsm9 26 stdout

(11/16) 💡 ❗ 😀 😎 ⭐ another breakthru! this code alters the mapping step “slightly” but gets very/ dramatically improved time/ space characteristics. theres also a little code to terminate in case of memory overflow in the generally most-memory-intensive composition step, although it cant be totally relied on because undetected memory overflows messing up the computation can happen in other places (noticed for n=40). the new logic is not easy to visualize but basically it avoids previously-visited numbers by doing an intersection with the prior set of “unresolved” numbers. this then excludes trajectories that increase and then decrease onto previously-seen numbers, ie sort of “delayed” trajectories. (in this way it functions a little like the “seen” cache used in the seen variable often seen in earlier collatz code.) maybe came up with this construction many years ago (about a decade) & recalled its strong/ remarkable performance characteristics but lost it in the sands of time. the same verification for n=20 finishes in 13 steps only using max ~8k states. its almost like magic…

  • the n=30 case takes 19 steps and max ~293k states. this same “start” overflowed memory with the fsm9 algorithm.
  • n=36 22 steps/ max ~2.2M states.
  • n=38 24 steps/ max ~4.1M states.
  • n=40 (undetected) memory overflow. fst util output memory error(s) (maybe in minimization subroutine?) but code continued (full disclosure/ caveat with incorrect/ “corrupted” file(s)!), manually terminated.

looking at results the max states seem to happen early on about iterations #11/ #12. which suggests (trying) some new strategy that tries to even/ smooth out the state bulge/ explosion somehow. what is causing it? again its basically an entropy mesurement. eg note it is possible to separately process partitions of the set in a parallel way. what is a great (memory-preserving) strategy for building these partitions? eg one would want to find “natural partitions” that somehow tend not to overlap as the computation proceeds.

on the other hand/ “bigger picture” pov there are so many great/ cool “spin off” ideas/ directions but really need to try to focus on what kind of structure(s) would lead to a proof…


fsm10 20 stdout

fsm10 30 stdout

fsm10 36 stdout

fsm10 38 stdout

(11/20) ❓ 😦 this all establishes a nice framework to try out new ideas. one can construct different types of sets “relatively easily” and the sets powerfully specify infinite strings. but it looks like something is recurring and inescapable: a quick blowup in number of states after a few collatz mappings. how to avoid this? the fixed-(bit)-length language seems to help out some with this, are there other strategies? this might help some but the bigger question is how to prove something like that all n runs shorten to n-1 runs but for increasing t mappings as bit sizes increase?

two other sets tried out that seem to lead to the blowup issue. one can look at limiting only the 1 runs or 0 runs separately while allowing the “other” runs to be unlimited.

another way to think about this: it seems that the max 0/1 runs inside/ “embedded in” the collatz random walk are themselves another kind of random walk, and the two are interrelated/ interconnected, and this is apparently a very deep insight which stands up under different types of scrutiny. this is brought out more directly by the last diagrams from last month that overplot these two walks. but alas the “inner” 0/1 runs analysis does not appear substantially easier than analyzing the “outer” collatz random walk, so far.

(11/21) ❗ 💡 back to statistical analysis. this looks at how many iterations lead from a max n-run to a max n-1 run. it creates 1000 bit integers with at least a single 0/1 n-run with no larger runs using some new generation code somewhat built on some prior code. then it counts the iterations associated with the so-called “crunch”. it does 1000 runs each and calculates/ plots histograms of the crunches, so that x axis is crunch iteration count and y axis is (log scaled) occurrences. this leads to the result that it seems to take about up to ~n iterations to crunch in rare case were n is the run length. am suspecting (not yet looking closer) that sometimes two n-runs tend to combine into a larger 2n run and then it takes ~n iterations to crunch the 2n run.

the bit count was not found to have any discernable effect on these counts/ histograms for up to thousands of bits, possibly a very substantial observation from the pov of a proof structure. the color coding is by run length starting from higher to lower (hotter to cooler). in other words most runs crunch quickly but some rare ones take almost n iterations to crunch. this is in line with (more general) prior observations that it takes more iterations to crunch as n-run lengths increase. 2nd graph is max crunch count (found) vs bit run length showing the nearly linear trend esp for lower bit run lengths. this trend seems a big deal/ finding and is at the same time remarkable, smooth (maybe more than it looks due to sampling error), and unanticipated.




😳 oops, lol, misfire! “heart was in the right place™” so to speak but closer looking at that code/ binary grid formats, a bit embarrassed. the generation logic creates alternating runs and result is not as intended. its creating large “even” gaps ie numbers in form 2n × m where m contains some other n size runs and simply shifting the # to remove the “lsb gap” takes ~n steps. it did seem a little too easy! only defense is its not totally obvious looking at the code, but… silly!

2nd thought this makes me suspect maybe some of the earlier set constructions need to be examined wrt the same issue/ hangup/ snag. in retrospect, need to exclude both 0/1 lsb-runs (and prior FSM sets dont do that either). a drawback/ pitfall of a little too much general/ more abstract analysis without looking at the details. “getting carried away”. here is a small )( consolation prize, a remarkable diagram/ pattern, and some more corresponding explanation that can be understood wrt it.

the iterate binary structure came mostly from the old initr renamed initw random initialization routine thats been used in a lot of the genetic algorithms. it creates alternating longer 0/1 runs. it was adapted to create a fixed n-run at a random location and then “work backwards” or away from that “center” region with random alternating runs. the longest crunch counts seem to always be powers of two, dont understand/ not sure how to explain this right now, it corresponds to a 0-lsb run and it seems like 1-lsb run variants should be possible too but are somehow excluded.

here is an example grid diagram with the crunch range (~100 count) plus 100 iterations following, remarkable! a striking/ dramatic depiction of a crunch and the similarity to a physics transition point. again these have been long created/ processed earlier in genetic algorithm runs but never looked at them closely/ simply visualized them. reminder its lsb left-aligned. this really highly/ strongly captures the “order vs chaos” concept/ dichotomy that was explored in some of the early cellular automata work but came up somewhat inconclusive at the time. there are some notable unusual curves in the left edges of the “blades”.

(later) running the code it very strangely seems to be biased against creating long lsb 1-runs and its not at all obvious from the code why this should be so, (1st thought) it seems to be maybe some interaction between the algorithm and the default ruby PRNG. 2nd thought/ look it maybe has something to do with the “ending logic” that “treats differently”/ tends to alternate more over the ending (lsb) bits of the seed within the run length.

a lot has been said along these lines, but maybe this hasnt been noted totally explicitly before and this diagram highlights the concept nicely in at least this case. the 0/1 runs function a lot like parallel counters! in the iterations #1-80 in the diagram, 1 of the runs (the leftmost) is “decremented”. in the range #80-180 the different runs are “decremented in parallel”.


(11/22) so this is a task that seems well suited to the genetic optimizer, finding long crunch ranges, both in anticipation and after trying it out. the last 1d version was quite awhile ago apparently in version mix15, all the later versions all seem to be the “2d” optimizer (over different bit width lengths). adapted the 1d version. this has some stopping distance calculations based on tripling the last (optimization) iteration count/ gap since a new optimum was found. the algorithm and the stopping criteria seem to work well because theres a definite/ fairly consistent improvement gradient to track and the algorithm successfully/ repeatedly “builds” on the current candidate solutions. the optimizer does throw away a lot of samples during initialization/ optimization that dont start with the exact run length, it might seem a little wasteful but dont see an obviously better way to do it right now and it performs great as far as the apparent high strength of the optimum.

the findings are a bit surprising. for a 1000 width iterate, it found a very long ~250 iterate run with 10-or-larger 0/1 runs in every iterate. this is an immediate advance on the recent code that found all 100-bit 10-runs crunch in only 6 steps. it looks like clearly iterate size increases lead to longer crunch times for the same run length here n=10. fyi the ~250 crunch count sample was found with an unlimited stopping distance where the algorithm was terminated manually. using the 3x stopping distance criteria the code found a max 210 crunch count sample. a lot more (time consuming!) runs would have to be analyzed to see how much the optimum varies/ spreads.

looking at the binary diagram, its a sort of meteor-smear effect at random positions. the runs crunch locally but new ones arise to replace the current ones. the entire trajectory trend is declining. this suggests that maybe the crunch region run lengths vary by iterate size? ❓



(later) this looks at the range [10..20] minimum run length and the max crunch span that can be found. the stopping criteria was found to be a bit convoluted and not so great because it is expressed in terms of genetic algorithm iterations but later iterations tend to take longer as the longer trajectories built at that point also take longer to evaluate. so a natural stopping criteria that takes into account hardness is just measuring “time since last best new candidate found”. this limits it to 3 min and it highly simplifies all the stopping logic allowing the several related variables to be discarded, and found a new best candidate at a remarkable 272 crunch span for 10 run length. nice! there is some other similar logic in some of the prior genetic algorithm code… somewhere around here

😮 ❗ ran the variant/ precursor code on single n=20 case (again bit widths 1k) and found that there was a “huge transition” and then coded up the range logic. this code ran ~5hr, theres a very long slow creep in the optimization at each n. expected to see some trend/ shift over that range, but its almost like a cliff! ‘mxc’ is the crunch span and ‘mx’ is the max run length. mxc dramatically plummets from 182 → 20 at mx 18 → 19. this is rather extraordinary and unexpected. tried running the code on lower values 2-9 and it finds spans in the range of thousands of iterations that take very long to compute, the code/ approach needs to be tuned/ adjusted for that case, just generating the initial 200 iterates takes very long and didnt wait for it to finish, at 1st thought there was some bug in the code because it wouldnt start up quickly! so overall it seems as if theres almost 2 transitions in mxc between high, medium, and low span ranges as mx increases. also there are two initialization routines for iterates that on closer look were not likely generating any candidates with the fixed mx for higher mx because they dont generate long(er) runs and those were taken out.

❓ 😳 2nd/ further thought. ok this is all great but something needs to be reconciled here that a very astute reader might have noticed, aka cognitive dissonance. the 2-9 ‘mx’ “very long spans” dont seem to mesh with the earlier fsm7/ fsm7b (11/12 above) that “supposedly” crunch all integers over the 3 → 2 and 4 → 3 transitions in 6-7 iterations resp. oops! these are “valid” calculations but are not doing exactly what one might think. 1st, note that the code is computing a n-1 slice as the f2 set but its unused by the code! that was some older code that was used in earlier experiments but (oversight/ mistakenly) not removed in the final version (dashed off a bit quick partly due to some excitement).

2nd, the slices dont measure/ include any iterates in “larger slices”. so ie the 3 → 2 transition is saying that all iterates have only 2 runs and no 3 runs after 6 iterations/ mappings. but some of the iterates are “moving”/ mapping to larger 4-slices and higher, and these are excluded from the calculation/ set analysis. taking them into account eg with the genetic algorithm indicates there are some very long crunch spans for all ‘mx’ 9 and less. so the set framework is quite powerful but one must be very careful with all the accounting! and this shows the nice interplay of the techniques where one can be leveraged against the other for additional insight/ validation/ verification/ sanity testing.

💡 ❓ this immediately gives some new ideas about variants to study wrt set analysis scheme… (1) the higher run lengths ~20 seem to behave dramatically different than those only a little smaller eg 10 lengths already studied and apparently crunch dramatically faster, so it makes sense to look into the higher run lengths and maybe expect some different behavior (not anticipated previously). (2) it makes sense to subtract (difference) the n-1 slices starting from the n slices instead of an intersection and that should allow/ not remove the n+1 slices and higher (maybe that was some of the near thinking in the original code that still had unused n-1 slice construction).


{"mx"=>10, "mxc"=>272, "j"=>24078, "t"=>"2018-11-22 18:09:08 -0700", "t2"=>301}
{"mx"=>11, "mxc"=>200, "j"=>20914, "t"=>"2018-11-22 18:21:03 -0700", "t2"=>301}
{"mx"=>12, "mxc"=>235, "j"=>55555, "t"=>"2018-11-22 18:57:49 -0700", "t2"=>301}
{"mx"=>13, "mxc"=>280, "j"=>56875, "t"=>"2018-11-22 19:39:24 -0700", "t2"=>301}
{"mx"=>14, "mxc"=>214, "j"=>73162, "t"=>"2018-11-22 20:23:50 -0700", "t2"=>301}
{"mx"=>15, "mxc"=>209, "j"=>61839, "t"=>"2018-11-22 20:54:53 -0700", "t2"=>301}
{"mx"=>16, "mxc"=>64, "j"=>82377, "t"=>"2018-11-22 21:11:42 -0700", "t2"=>301}
{"mx"=>17, "mxc"=>104, "j"=>60721, "t"=>"2018-11-22 21:30:01 -0700", "t2"=>301}
{"mx"=>18, "mxc"=>182, "j"=>116904, "t"=>"2018-11-22 22:16:28 -0700", "t2"=>301}
{"mx"=>19, "mxc"=>20, "j"=>64806, "t"=>"2018-11-22 22:21:44 -0700", "t2"=>301}
{"mx"=>20, "mxc"=>21, "j"=>100036, "t"=>"2018-11-22 22:27:25 -0700", "t2"=>301}

(11/23) 💡 ❓ some other ideas wrt the important fsm7/ fsm7b themes. there are some code patterns that deserve highlighting, would maybe like to refactor all this at some point; so far they all tend to have somewhat similar complexity profiles as far as state explosions but that is not to be assumed and deserves further careful study, esp if it can mean the difference between tractable/ intractable:

  • one can create sets and do a “sequential vs cumulative” intersections over collatz mappings. the “sequential” intersection starts with an initial constant set and continually intersects the subsequent mappings with that fixed set. illustrated in basic fsm7/ fsm7b variants.
  • the “cumulative” intersection does an intersection (of the image) with the previous “pre image” set. this seems to increase state complexity/ entropy (sometimes significantly) faster. as noted earlier conceptually the cumulative scheme affects trajectories that “increase” and then later “decrease” with some “delay” and also works somewhat like a “(previously) seen recognizer cache”. (11/16 fsm10 pattern/ variant above)
  • either of the previous schemes can be mixed with a scheme that starts with a fixed n-width language intersected with the language in question for study (eg slices) eg 11/14 fsm8/ len code, 11/15 fsm9/ uplen code.

so these are 2 separate/ independent switches that can be turned on/ off for 4 total code variants (one switch is sequential vs cumulative intersections, and another variant skips the intersection entirely). there are other subtle distinctions between the sets to be made eg the nature of slice sets. and this interrelates to the previous points. for example, did an experiment adapting the fsm7/ fsm7b code from the sequential to cumulative scheme, and found that it didnt affect the state counts at all compared to the sequential approach, ie gave equivalent sets in the (iterative) cascade! this is explained in that the slice sets exclude higher slices, those that are affected by the scheme change. yet another element affecting state counts: state minimization sometimes leads to more states due to sometimes (exponential) blowup associated with the determinizing step, same with epsilon removal, which generally saves states but could increase later set complexity.

😳 oops very close look again at fsm7 vs fsm7b and see now/ find that the 2nd does a sequential intersection with the n-1 slice instead of the n-slice! that was not really as intended, maybe dashed all that off a little )( too quick. this stuff can be close and reminds me of/ similar to so-called fencepost errors in CS!

another way of looking at this: there is an interesting analogy about time vs space in this computation. the length of the computation is the time/ # of iterations (of set compositions) and the space is the “width” or memory used by individual FST commands. to make the computation feasible one is looking for time/ space tradeoff algorithm reformulations that make the overall computation “long and narrow”.

😳 😡 😐 😦 😥 👿 (argh!) unfortunately after lots of trial and error, right now it feels like mostly error and the computations are often coming up “very fat” using up eg over 2G memory for composition or minimization steps (latter mostly the determinizing step leading to blowups). this analysis is very powerful but alas its not quite “magical” yet. creating algorithmic formulations of the questions is a breakthru but alas they are often not in practical computable sizes so far. need to come up with more tricks! one basic idea already alluded to is that maybe there are other ways to process FSMs that compress them using internal patterns that align with the various sets being constructed…

(11/24) 💡 ❗ yay, a win! this takes some of the recent ideas and builds it out. much of this built/ tested/ coded while staring at a bright 20x10ft HDMI wall projector over several hours just a little )( away from midnite, quite a different/ novel/ wild experience (as mentioned have watched movies but using the wall as a computer monitor with “windows” scattered all over the wall is quite funky/ near surreal, and had quite a bit of trouble sleeping last nite and seriously wondering if the combination bright projector rays in the dark room melted my brain just a bit, as the expr goes “you have to see it to believe it”….)

a new language that was previously eluding me is constructed but now looks quite crucial in retrospect. its simply “all odd integers with at least one 0/1 run of at least n length”. starting out with this set is not what is desired, because the arbitrarily long sets will never “crunch” (or more precisely, under the current hypothesis they will crunch but FSMs cannot “count” arbitrarily larger lengths than a constant), but it functions importantly as a key filter/ intersection set. also my idea is that it must be equivalent to the complement of the union of all k-slices for k < n or the union of all k-slices for k >= n but those ofc are more involved to construct (exercise for ambitious reader(s)). this code 1st was built to look at unlimited length integers but kept failing due to memory over flow for arbitrary parameters.

so here it uses the uplen fixed length set. the overall idea is similar to 11/14 fsm8 code. it was found the longest sets that could be computed were n=26 (226 = ~67M), a respectable number, which led to some intermediate computations with ~5.6M states. this calculation is similar to what the last genetic algorithm was computing/ optimizing except for a much shorter bit length (26 vs 1000). basically looking at max span of a crunch for n-length runs but limited to numbers starting with the limited width (more the idea of what was intended by some of the earlier code/ calculations but they eluded me at the time).

the filter language was verified by intersecting it with a n-slice and verifying that it returns all the same integers in the n-slice (ie the intersection with the n-slice set is equivalent to the same n-slice set). that trick helped catch a few minor glitches. originally the sets were not equal, the filter was missing some integers, then did a difference and found the smallest integers that were missing, and corrected it. (in particular/ detail the shortest “full” n-length 1-run was excluded and it was added with an epsilon transition to the n-1 run state after the 1st/ lsb 1 bit.) the short version is that again there are a lot of sophisticated validation/ verification techniques with these  powerful tools. but also one has to be very careful/ mindful with the set constructions.

the computations were very long and narrow after the beginning; early steps at around the 5th led to very large sets, and at that point it had the feel of a python swallowing an alligator (warning: google at your own risk, but hint, have never been able to get it out of my head after seeing it once, lol!). these 4 runs ran concurrently on a 16GB machine (a relatively rare case of parallel processing around here) and didnt seem to use much more memory than a single run, the memory use, even the large “bulges”, seems to be staggered over time. these are basically exploring the same “very steep” transition point just explored by the genetic algorithm from a lower bit size.

the results suggest that the size of the runs versus the bit size of the iterates highly affects the crunch span, because for larger bit widths eg 1000 there are dramatically higher runs in the thousands (for these small run lengths). apparently this ties in with mixing property being analyzed/ explored, as suspected earlier, taking more shape now: the mixing is the run lengths relative to the overall size of the iterate! the longest case n=3 ran about ~1½ hr others less than half that. understanding this mixing property and the way it scales, in an apparently scale invariant/ fractal way, is apparently going to be crucial to finally crack the problem.

💡 so this is remarkable to find this “inner” (somewhat/ hopefully more/ further provably decreasing bit run) random walk within a “outer” random walk of the trajectories, and the two are somewhat interconnected and independent at the same time. its clear that a/ “the” final analysis will somehow have to take into account both the bit run crunch and how it interrelates to trajectory increase/ decrease/ glides. right now all the signs are promising aka where theres smoke, theres fire™.




(11/26) always a good idea to try to leverage existing code and small variants for greater insights as much as possible. as indicated a few times, theres some kind of relationship between the slice size/ crunch span and iterate size, but what? there seems to be some scale invariance. heres one basic study: just look at the crunch time (using prior code) for a “quarter slice” for increasing bit widths ie slice size one quarter the bit length. the quarter slice crunch is not extremely memory intensive compared to other cases eg small slices vs large bit sizes. this code starts at a 3 slice and works upward. “as usual” the story is not so simple and it seems to be some kind of concave upward curve. ‘x’ is the crunch span. the last datapoint used about ~2½M states at one step. because the results are so close together though compared to other points this does look like it might nearly match the scale invariance. ie fractional slice size defined as ratio of slice size to iterate size, already used in some prior experiments, is a rough measure of the “mixing degree” at different “scales” (iterate bit sizes).

the next step led to memory overflow. a note on that: it would be nice to have better/ more elegant memory overflow handling; while outputting errors to stderr, this code continued to run even with truncated (in a sense corrupted) datafiles due to memory overflow. not sure if that is due to using stdin/ stdout pipes instead of filenames at all steps. does the code have some way to signal a “invalid result” in the datafile? maybe not (with stdin/ stdout/ pipes, one disadvantage to using them). therefore some such mechanism would be nice to have built into it. it seems to sometimes output “invalid header” maybe if the file is empty, which is close to what would be nice. do the utilities output empty files if they overflow? etc



"m=12 n=3 x=24"
"m=16 n=4 x=16"
"m=20 n=5 x=14"
"m=24 n=6 x=14"
"m=28 n=7 x=15"
"m=32 n=8 x=16"
"m=36 n=9 x=19"
"m=40 n=10 x=20"
"m=44 n=11 x=23"
"m=48 n=12 x=24"

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s