collatz adversarial attack and the light at end of tunnel

❗ 💡 😀 😎 ⭐ last month was quite a tour de force against collatz, the culmination of many months or even years of hard work and creative ideas, and many different approaches all combined (pyramid-like) to lead to very solid results verging on a “candidate solution”. another theme that was pursued earlier here are “adversarial algorithms” which has been used with great success eg by Google/ Deepmind against Go. the basic theme is “two algorithms competing against each other” so to speak. along these lines there is one final idea to try against the prior Collatz “solution”.

the basic idea is to use the genetic algorithm to try to thwart the generated model. in this context that means trying to find large ‘wr_e’ and starting points that cause the model to be nonconvergent. interestingly, in line with the adversarial concept, this is nearly exactly the inverse of the earlier code to find the solution, in this case the GA simply tries to maximize ‘wr_e’ instead of minimizing it.

the code was modified somewhat but did not require much change. the count2 subroutine was modified to allow/ accept two arrays instead of a single one, one array containing the starting points and the other the perturbation points. in this case the 2nd perturbation points are simply sampled uniformly over the genetic algorithm candidate solutions (100 count even as array size increases reaches ~1K at end) and the starting points are generated at random evenly distributed over densities for each evaluation.

then again the grid “heat map” can be displayed, in this case every 40 genetic algorithm iterations, and the general optimizing goal is to “heat up the heat map”. the limit on the model is 500 iterations and success is revealed in that even after 1K genetic algorithm iterations with ‘wr_e’ increasing some with corresponding heat escalation, ‘wr_e’ cannot be increased much at the end and the max heat map iterations only go up to about 80-100 range. the final heat map is displayed below. there is some minor variability between maps and some of the intermediate maps looked somewhat “hotter”.

the bottom line is that something like a loop invariant has been isolated for a meta function that closely matches the collatz function dynamics. statistically/ experimentally the invariant is revealed as quite solid and very strong search/ optimization algorithms such as this one and others (previous code looked at more points via following trajectories) cannot find any exceptions. the next step is to mathematically prove the invariant so to speak. suspect that will be challenging but maybe conceivably/ hopefully not more challenging than isolating the invariant!



(9/15) the hard task of trying to figure out how to translate the current question into the literature remains (as cited earlier, closely related to matrix difference eqns[a10]). came up with something like the following and then posted it to mathoverflow under title “stochastic recurrence relation convergence”.[a9]

Consider a recurrence relation x_n = f(x_{n-1}). Suppose f(x_n)<0 for some "large enough" n. Now consider a "stochastic variant" x_n = f(x_{n-1})+y_n where y_n is a sequence of random variables. Under what conditions will there again exist a "large enough" n such that x_n < 0 ? E.g. does some c exist such that this "convergence" holds if e.g. |y_n| < c etc? I am looking for general ideas of attack / references. I am actually interested in the "multidimensional / linear algebra" case for a vector \mathbf x but am ok with theory / solution starting from a single variable.

zero responses/ dead silence so far! then started doing some google search/ mini survey which can actually be a fun exercise at times. in retrospect it does look like “stability”[a7] might be a better term than “convergence”. it looks like there may be some close concept of “local stability” wrt recurrence relations.[a8] also others asking about vaguely similar stuff on mathoverflow.[a4] footnote: think the concepts got slightly mangled in an edit and feel its maybe more technically correct/ in line with convention to say y_n is a sequence of random values and y is the random variable.

there is maybe some link to “reaction diffusion eqns”.[a1] another area that seems somewhat close is known as “probabilistic recurrence relations” studied by (small world, legendary) CS geek Karp.[a3][a2][a5][a6] am not sure if my equation can be fit into that format, it appears that Karps eqn involves a fixed function of inherently declining remaining “data size” and may not be strictly applicable. it seems like it ought to be easy to prove this always converges whereas the key challenge with collatz is how the datasize can intermediately increase in a semirandom way. but the equation does capture the idea of stochastic noise added at each iteration.

was feeling slightly dejected that easily applicable theory didnt immediately pop up because the whole area of linear recurrence relations is well understood, but it appears in constrast that the idea of a stochastic component is maybe not so accessible.

💡 then, had a brilliant idea last nite! it goes like this:

marble/ bowl proposition

(1) imagine a marble in a bowl. it will roll randomly back and forth at the bottom as it comes to rest. the randomness can be regarded as noise in its 2d/3d “trajectory.”

(2) (“pov2”) now imagine a marble in a bowl with intermittent perturbations on the marble. as long as the marble is perturbed with less energy than is dissipated in its descent, it will still descend.

now, thats the basic picture/ intuition. pov1 is the classic picture. the idea here is that theres low friction/ resistance like in a smooth bowl and the marble gains speed/ momentum. pov2 is the minor variation closest to the problem at hand where the idea is more like resistance in the motion so that the marble doesnt gain speed or momentum past a particular limit.

so can this be translated into rigorous math? after a little bit more thinking, suspect the idea is yes! my idea is that maybe the linear recurrence relation under study can be regarded as something like a multidimensional derivative. it is “pointing in a direction” of some kind of minimum in some kind of “larger landscape”. to find what the “larger landscape is, need to do some kind of “integration”. suspect there is some kind of solid/ comprehensive theory for this somewhere.

then how to make the marble-bowl proposition more plausible/ mathematically rigorous? ofc the elephant in the room here is gradient descent. (another similar connection would be simulated annealing) it looks like maybe the natural measure is “distance to minimum”. the linear recurrence relation maybe is advancing some “natural distance” δ in the “larger landscape” toward the minimum. then if the perturbations are less than δ, the conditions of the proposition are satisfied for “convergence” to the minimum. simple/ basic, right? 🙂

(9/16) some queries to physics chat room get no response. was looking at [a10] and at end it has ref to “linear quadratic gaussian control”. that is in turn connected to kalman filters.[a11] and maybe the exact eqn is cited on wikipedia:

\mathbf{x_k = F_k x_{k-1} + B_k u_k + w_k}

\mathbf B_k is the control-input model which may be omitted and \mathbf w_k is additive white/ gaussian noise and in my case \mathbf{F_k = F}. my problem does not seem to involve control yet but does seem like it is doing something like “measuring in the presence of noise.” my case seems to be for a system that has its own “steady-state” convergence dynamics.

(9/17) 😮 ❗ 💡 surprise! looks like the problem is “simply” a multidimensional/ matrix equivalent of the autoregressive model, but after quite a bit of searching still havent seen anyone specifically study the matrix version although suspect it may come up in some engr applications.[a12]

Chatfield [a13] (4th ed) has a nice basic analysis of single variable AR, and some proof that it converges to 0 based on the multiplicative constant |\alpha|<1 sec 3.4.4 p35-36,

X_t = \alpha X_{t-1} + Z_t (3.4)

however am thinking the basic proof fmt/ structure may be applicable by using matrix norms on (matrix analog) \alpha instead…?

also am now thinking the Kalman analogy seems vaguely similar but maybe too stretched. he says p187 wrt Kalman filtering, "In state space modelling, the prime objective is to estimate the signal in the presence of noise. In other words we want to estimate the state vector." whereas the "state vector" is fixed and known in my problem.

a. stochastic recurrence

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s