at the 2nd year of the 3rd decade of 21st century… the search continues… for an overarching strategy!Continue reading
the outside background around this post is that google just announced world-shattering results in protein folding problem with machine learning. this is historic, deserves to be highly celebrated (not mere typical marketing hype!), and crosscutting more than 3 of my favorite fields all tied up into one problem (bioinformatics + physics + ML etc), and am very inspired/ awed/ psyched about this. those feelings are not easy to obtain these days. would like to put large effort into commentary on all this, but alas my audience is not into reciprocity. drop me a line (comment) if you want to (rather easily?) prove me wrong…
am immediately working on the last code some, and my full data scientist expertise/ repertoire is being put to the test. its been some back-and-forth, almost a dialog or even conversation with the data, which is nearly the best case scenario. its like a kind of debugging, but on the level of data manipulation more than coding errors and has a lot to do with trying to understand the presence/ lack of generalization in the model, which maybe as has been indicated a long time ago, is the machine learning equivalent of induction. in other words, the code might work on less complex data, but it doesnt, so has to be further tweaked. this is an attempt to make a relatively long story short.Continue reading
💡 this following code/ signal came up somewhat indirectly, incidentally, almost by accident. had an experiment to generate long ‘cgnw2’ glides with nearly ½ entropy to force them into the undifferentiated region; as explored last month the ½ density constraint is not sufficient for that, leading to long 1-lsb runs nearly ½ the iterate width. the ~½ entropy constraint works fairly well but then found small initial 1-lsb runs even in those. so wrote some code to cut off the leading 1-lsb runs of the glide, and analyze remaining glide carefully.
was looking for any kind of signals at all, lots out of the “bag of tricks,” also looking at 3-power sequence, and again it seemed to come up undifferentiated mush with lots of sophisticated analysis. there is a lot of signal found in prior 3-power sequence analysis but in general, a lot of it was related to Terras density glides. the hybrid approach tends to produce more “in the wild” glides. there is still some hidden order, maybe significant, in Terras glides that various experiments have isolated. but some, much, most, or nearly all this seems to melt away on “in the wild” glides.
notably, this hasnt been pointed out, but the (local) density metrics associated with the 3-power sequence naturally tighten even for random iterates. so its important to try to separate this “typical” tightening from “atypical” tightening, and thats not a trivial exercise.
but then started working on some “baseline” comparison ie control data, and then chasing down some stuff, lost my focus on the glides entirely. the simple way to generate this is via random ½ density iterates and then look at the drains. then started looking at features over these drains. some of this work has already been done, but it seems that some key signals have been missed. this straightfwd/ finetuned/ yet conceptually near basic code is on ½ density drains only.Continue reading
last month wrote out a remarkable outline. did anyone catch that? it seems potentially gamechanging and have been rolling it around in my brain, impatient to work on it, with a kind of background excitement + methodical intent bubbling/ mounting, a tricky combination to balance, maybe like walking a tightrope, and building the risky urge to rush. was very busy last week with a family visitor on a rare trip + other key family member mixed in too and couldnt get to it quickly. after thinking it over carefully the basic idea is a mix of 2 patterns that have already been explored: nearest neighbor plus the hybrid search algorithm. the NN algorithm handles creating/ matching a finite set of classes, and the hybrid search algorithm tries to find samples that “break” the classification, or improve it; those two sometimes go hand in hand.
the sample relationships are mapped out in a DAG like structure over the classes. the concept of breaking the classification (via optimization parameters) is still yet fuzzy and indistinct but some ideas have been laid out. my immediate idea was that classes with a high out-degree are more “ambiguous” and therefore less desirable. on further thought a key criteria is looking for loops in the DAG, which relate to indistinct feature classes creating an illusion of possible nonterminating sequences, although am not sure exactly/ fully yet how loops relates to class ambiguity (with outdegree a rough measure, or maybe others), it seems likely there is some relationship but its not immediately obvious.Continue reading
RJL has a nice new blog that cites collatz/ Taos recent work on it. wrote a comment on it and it led to a negligible # of hits. lol!
this is some recent hybrid code that has some minor improvements and didnt want to lose the changes. the basic idea was to start with limited 1-runs and see how much the trajectory size could be optimized. it has to throw away candidates that exceed the 1-run limit. the limit is determined by average statistics with ½ density iterates. the graph code is a little improved by numbering multiple graphs etc. it was found the prior “init” routines were not working as expected and/ or had low probability of creating sub-threshhold runs so they were adjusted.
(uh oh!) wordpress has a new editor. always run into funky quirks in the past, some of them nearly showstoppers eg incorrectly handling escaping of code blocks thereby screwing them up/ aka corrupting them every time saved. but maybe this one will be slightly better. you cannot imagine how many times have edited posts, saved them, then had to scroll back to the point where was previously editing. omg, who the @#%& asked for that? what an unbelievably tedious
“feature” that shouldnt have gotten past UAT even one nanosecond and yet was default functionality for years, lol! (argh already found a code-set shortcut doesnt work, sigh) oh and it only took me about 30m to figure out how to insert the excerpt indicator… thanks guys! and it took yet more “fiddling” to find the rearranged controls for image embedding. welcome to 2020 the age of ubiquitous smartphones and endless fiddling, lol! Nero would be happy! also the idea of going back to old blogs to edit them, esp long ones, and trusting wordpress to “do the right thing” is a bit terrifying right now. welcome to modern software which is that weird mix of both extremely powerful and extremely fragile at the same time… its probably not unlike the ancient art of metal sword forging and the old word/ concept is called brittleness