fwd news… marzio de biasi announced in theory salon chat room an interesting 8v collatz codegolf challenge linked with 2 counter machines which have been somewhat extensively studied in the theoretical literature.
hi all, what a @#%& hassle! managed to get code snippets working on this site long ago. there are many options to do this but it seems like none of them are foolproof esp recently. its quite a long saga mentioned/ alluded in some previous posts. finally got it working long ago and have been quite happy with the results/ status quo for maybe over ~2yrs, but now wordpress has recently thrown a wrench in the works & screwed it up. ah, figured this might happen someday! code can be so fragile sometimes! heres some of the gory details.
wordpress has pre & code tags but the wordpress web editor(s) sometimes alter the code after entering it, eg zapping initial whitespace/ indentation on lines after pasting code or, esp egregious, getting confused by/ messing up html special characters, ie basically corrupting the code! also just saw this switching between “visual/ html” modes of the editor… you know, basic/ elementary stuff. recently wordpress has modified its web editor (what else, to try to improve/ streamline it of course) so that this has again become an issue (again saw it coming, they finally completely disabled the “classic” editor which is free of the issue, joy). @#%& amateurs!
so this post is being typed up on the wordpress ipad app, which mostly works ok, but has exactly the same issue in corrupting code snippets. it also barely supports most of the important text editing features. oh yeah and it seems to have a very significant defect in not updating correctly & making it look like post drafts have vanished or changes have been lost. sigh. did I mention, #@%&? am feeling, could almost punctuate every sentence in this blog with that! and wordpress is now how many years old? it dates at least back to the dotcom era over ~1½ decade ago…. it makes one feel like web publishing is still somewhat like a toy system…
went to wordpress forums to see if anyone else is remarking on this significant defect (esp for us power user hackers who might—gasp—wanna put code in their blogs, an utterly subversive application far outside the general main scope of teenage blogger users) but the wordpress forum search capability is so crappy (its probably google based) that it wont turn up posts in reverse date order. have always been amazed how bad google is at date ordering its results, making it basically completely unusable for some basic purposes (eg looking up latest reported defects by keyword… just too much to ask around here!)
so another option is putting the code on another site, but just want to copy & paste it, and not setup eg a complicated source code transfer system eg as is assumed with “git” (which is cool, but timeconsuming). would rather not even deal with the hassle of creating yet another account on another site. ah geez.
as recently mentioned, user randomra utilized a new git concept/ feature addition called a “gist” (gist.github.com) which is a quick copy-and-paste facility. it seems to be mostly undocumented so far (sigh2). do not entirely like the idea of storing code on another site, which doesnt make it clear how long it will be stored or whatever. but have been banging out new results also, and am frustrated in not being able to share them.
so just signed up a github/ gist account and managed to paste a code snippet in ruby, it also has text highlighting. good enough for this post, finally. (oh but the URL is a long randomly generated string completely unrelated to the filename. nice!)
really like the wordpress feature of embedding code on a page (SyntaxHighlighter by Gorbatchev), the technique used copiously here over 2+ yrs, but that has some other issues also. the page loads slowly, and there is an annoying “FOUC” “flash of unstyled content” (as in all “FOUCed” up!) when the code is set to “collapse” mode; the page loads initially with all the code visible and then the sections collapse/ format later. ah, the small )( price of letting developers build layout systems eh? alas FOUC is a tricky issue, nearly intrinsic to HTML design/ dynamics, not widely considered, & found in many advanced products eg even MathJax etc.
yet another issue with the SyntaxHighlighter combined with collapse mode, possibly/ arguably a browser defect, is that a link to an anchor tag can get messed up; the browser initially goes/ scrolls to the correct position on the page without the collapsed content, but after the content dynamically collapses after page load, the browser is still scrolled to some position that is now not correct and off exactly by the same vertical offset associated with the collapsed content size (eg if the anchor tag is after a collapsed section). (hey, admittedly its a bit subtle, but a web developer with attn to detail and finesse notices this kinda stuff.)
another nice simple/ straightfwd option would be if wordpress would just allow uploading of plain text files, which rather amazingly is not allowed in its media upload capability. have no idea why they reject that.
then there is “pastebin” which works great but doesnt guarantee pastes will hang around and does not do syntax formatting on code.
randomra also used a site called ideaone.com which allows creating/ even running code pieces. (python example by randomra.) but again they dont seem to indicate how long results will be kept around. oh but one can just assume they wont disappear/ vaporize at some random time in the future right? its just that old saying the pioneers are the ones with the arrows in their backs.
pseudo induction revisited
anyway without further meandering/ circuitous/ ridiculous ado, here is the code! the basic idea is to revisit the idea of “pseudo induction” where new trajectory parity sequences are built out of prior ones. it has been observed that there is “low entropy” in these sequences ie they are “compressible” and therefore there is significant redundancy/ repetition (there are many different povs on this, eg grammar compression, etc). so its an overall plausible approach. these results are on initial appearance quite promising.
again a greedy approach is used. the 1st graph graphs the 1st longest prior prefix matched. it shows a distinct fractal/ self-similar pattern. the next graph is the total length of the sequence listing prior prefix matches. if this is bounded by a constant its quite promising and essentially hints that “new parity sequences can be built out of prior ones in a near-bounded way” (giving rise to natural DAG-like structures probably quite similar to ones already uncovered from a “different direction/ angle”). for those keeping score at home, and as previously indicated/ pointed out, that is a near/ potentially breakthru-level property…
was musing on some further ideas/ proof outlines along these lines. Erdos was a master of certain proofs from random graph theory (having nearly singlehandedly invented the genre himself) that almost magically showed that even if a graph was random, if a certain size it is guaranteed to have certain distinctive substructures eg cliques or whatever, and these types/ techniques are now very highly regarded eg partially winning an Abel prize for the Szemeredi regularity theorem. could a similar proof idea/ structure apply to collatz? the idea would be to show that even though trajectory parity sequences are random, they are random in a “distinct way” that allows building later ones out of earlier ones.
(4/21) this is a slight modification of the code that calculates the decomposed list size but disregarding even parity run subsequences (which are signified as “0s” in the list), and looks at max size over 100 trajectory intervals. it ran very long, about ~¾ day, and took about 20MB at the end. the algorithm is not very optimized and is basically quadratic, so its a impressive to get this analysis over trajectories up to 1M. while quite noisy, its also visually flat; these results are very encouraging and at least empirically for a large region, it looks quite plausibly bounded by a constant ceiling. ❗
more musing: this is reminding me some of the Erdos Discrepancy Problem (which was blogged on here earlier). there the problem is to analyze a bound over all possible random sequences and show a sort of hidden imposed order. the variation here along these lines is to try to show the existence of a self-similar DAG structure in a pseudorandom sequence that exhibits low entropy. this is conceptually similar to working with a “biased” PRNG and showing patterns that (“necessarily”) arise in it. wonder if there is some theory that would be applicable… ❓
(4/22) this is some slightly rearranged code that refactors for decomposition subroutine modularity and saves the full data for later analysis. for later analysis, it computes a running average of the decomposition list size (div2 runs excluded) outputting a point for every 1000 trajectories in 1st figure below. 2nd graph is averages computed over the 1000 trajectory bins and standard deviation bars. green, blue, purple reference lines are drawn at 0, 1.5, 3 resp. the overall trend is asymptotically decreasing and standard deviation seems almost constant. again quite promising. but admittedly from past experience (eg an exact case in an earlier blog) these types of statistics can hide “needles” (“in the haystack”).
this next idea uses an improved decomposition/ compression scheme looking for largest common sequences in the new sequence and prior ones. it then deletes the matched subsequence (only one instance) and repeats the process until the entire sequence is decomposed into “nothing” (empty string/ sequence). there is a nearly natural connection between the number of these decompositions operations to the Levenshtein distance metric.
😮 ❗ 💡 these results are astonishing and initially thought possibly to be erroneous or defective, sometimes the initial feeling on seeing a real breakthrough which this appears to be! unexpectedly the compression is very high and decomposition list sizes are all less than almost merely ~½ dozen elements and most are merely 1-3! in other words very long runs of sequences are repeated in an almost somewhat eerie way! this is graphed in the 1st graph. the 2nd/ 3rd graph are two views of the 1st prior seed reuse value (first 10K and entire ~45K points resp), which is not as orderly as the earlier graph that used sequence prefixes only.
next merely chg the line 78 sub on prior code (run3.rb) to gsub and the algorithm will do multiple substring replacements if/ where possible instead of a single one. did not expect this to have much effect, but it does! the compression is even tighter where most compressions break down to only one-two prior subsequences!
(4/28) it is known from a lot of prior analysis that initial seeds with identical base-2 prefixes lead to identical (initial) trajectories. so the interrelation of that property with these results has to be analyzed/ determined/ nailed down. the compression algorithms are likely utilizing this property although its probably not a full explanation of their behavior.
(5/2) this is an idea thats been nagging in my head for quite awhile, finally tried it out, got somewhat surprising/ unexpected results. one can see a parity sequence as a long string of 0s and 1s where the 0s are div2 operations and the 1s are 3n+1 operations. these sequences do not have consecutive 1s, ie the 1s are always separated by at least one 0. a natural idea is to wonder how new “nearby” sequences relate to these sequences.
this is studied in the following way. find one of the 0-runs in a generated parity sequence and “perturb” it in the sense of changing it to some new random length other than its prior length. here the random length is 1..10. then determine if this new “perturbed” trajectory is actually a valid trajectory. conceptually this is not so tricky but the code is a bit involved & has various edge cases. the result is that the perturbations never lead to new valid trajectories, except for modifications near the “end” of the sequence, ie only either the last or 2nd-to-last 0-run.
(5/15) this was some code that was not written exactly as intended but led to some interesting/ unexpected results/ patterns. the idea is that prefixes/ “lsbs” of the binary iterates have patterns as seen in prior analysis. but the code accidentally tracked msbs instead. for seeds with n bits, this looks at the n-1 msb bits every n-1 iterations (“blocks”). it throws out sequences where seeds fall below before the first n-1 iterations and then graphs the sequential msbs for the remainder, and finds a pattern. it evaluates all starting seeds of length specified on the first argument p. following graphs are for p=12..15. a remarkable property is that the graphs seem to have lines lined up if they are horizontally/ vertically tiled.