# collatz adversarial attack and the light at end of tunnel

❗ 💡 😀 😎 ⭐ last month was quite a tour de force against collatz, the culmination of many months or even years of hard work and creative ideas, and many different approaches all combined (pyramid-like) to lead to very solid results verging on a “candidate solution”. another theme that was pursued earlier here are “adversarial algorithms” which has been used with great success eg by Google/ Deepmind against Go. the basic theme is “two algorithms competing against each other” so to speak. along these lines there is one final idea to try against the prior Collatz “solution”.

the basic idea is to use the genetic algorithm to try to thwart the generated model. in this context that means trying to find large ‘wr_e’ and starting points that cause the model to be nonconvergent. interestingly, in line with the adversarial concept, this is nearly exactly the inverse of the earlier code to find the solution, in this case the GA simply tries to maximize ‘wr_e’ instead of minimizing it.

the code was modified somewhat but did not require much change. the count2 subroutine was modified to allow/ accept two arrays instead of a single one, one array containing the starting points and the other the perturbation points. in this case the 2nd perturbation points are simply sampled uniformly over the genetic algorithm candidate solutions (100 count even as array size increases reaches ~1K at end) and the starting points are generated at random evenly distributed over densities for each evaluation.

then again the grid “heat map” can be displayed, in this case every 40 genetic algorithm iterations, and the general optimizing goal is to “heat up the heat map”. the limit on the model is 500 iterations and success is revealed in that even after 1K genetic algorithm iterations with ‘wr_e’ increasing some with corresponding heat escalation, ‘wr_e’ cannot be increased much at the end and the max heat map iterations only go up to about 80-100 range. the final heat map is displayed below. there is some minor variability between maps and some of the intermediate maps looked somewhat “hotter”.

the bottom line is that something like a loop invariant has been isolated for a meta function that closely matches the collatz function dynamics. statistically/ experimentally the invariant is revealed as quite solid and very strong search/ optimization algorithms such as this one and others (previous code looked at more points via following trajectories) cannot find any exceptions. the next step is to mathematically prove the invariant so to speak. suspect that will be challenging but maybe conceivably/ hopefully not more challenging than isolating the invariant!

thwart.rb

(9/15) the hard task of trying to figure out how to translate the current question into the literature remains (as cited earlier, closely related to matrix difference eqns[a10]). came up with something like the following and then posted it to mathoverflow under title “stochastic recurrence relation convergence”.[a9]

Consider a recurrence relation $x_n = f(x_{n-1})$. Suppose $f(x_n)<0$ for some "large enough" $n$. Now consider a "stochastic variant" $x_n = f(x_{n-1})+y_n$ where $y_n$ is a sequence of random variables. Under what conditions will there again exist a "large enough" $n$ such that $x_n < 0$ ? E.g. does some $c$ exist such that this "convergence" holds if e.g. $|y_n| < c$ etc? I am looking for general ideas of attack / references. I am actually interested in the "multidimensional / linear algebra" case for a vector $\mathbf x$ but am ok with theory / solution starting from a single variable.

zero responses/ dead silence so far! then started doing some google search/ mini survey which can actually be a fun exercise at times. in retrospect it does look like “stability”[a7] might be a better term than “convergence”. it looks like there may be some close concept of “local stability” wrt recurrence relations.[a8] also others asking about vaguely similar stuff on mathoverflow.[a4] footnote: think the concepts got slightly mangled in an edit and feel its maybe more technically correct/ in line with convention to say $y_n$ is a sequence of random values and $y$ is the random variable.

there is maybe some link to “reaction diffusion eqns”.[a1] another area that seems somewhat close is known as “probabilistic recurrence relations” studied by (small world, legendary) CS geek Karp.[a3][a2][a5][a6] am not sure if my equation can be fit into that format, it appears that Karps eqn involves a fixed function of inherently declining remaining “data size” and may not be strictly applicable. it seems like it ought to be easy to prove this always converges whereas the key challenge with collatz is how the datasize can intermediately increase in a semirandom way. but the equation does capture the idea of stochastic noise added at each iteration.

was feeling slightly dejected that easily applicable theory didnt immediately pop up because the whole area of linear recurrence relations is well understood, but it appears in constrast that the idea of a stochastic component is maybe not so accessible.

💡 then, had a brilliant idea last nite! it goes like this:

### marble/ bowl proposition

(1) imagine a marble in a bowl. it will roll randomly back and forth at the bottom as it comes to rest. the randomness can be regarded as noise in its 2d/3d “trajectory.”

(2) (“pov2”) now imagine a marble in a bowl with intermittent perturbations on the marble. as long as the marble is perturbed with less energy than is dissipated in its descent, it will still descend.

now, thats the basic picture/ intuition. pov1 is the classic picture. the idea here is that theres low friction/ resistance like in a smooth bowl and the marble gains speed/ momentum. pov2 is the minor variation closest to the problem at hand where the idea is more like resistance in the motion so that the marble doesnt gain speed or momentum past a particular limit.

so can this be translated into rigorous math? after a little bit more thinking, suspect the idea is yes! my idea is that maybe the linear recurrence relation under study can be regarded as something like a multidimensional derivative. it is “pointing in a direction” of some kind of minimum in some kind of “larger landscape”. to find what the “larger landscape is, need to do some kind of “integration”. suspect there is some kind of solid/ comprehensive theory for this somewhere.

then how to make the marble-bowl proposition more plausible/ mathematically rigorous? ofc the elephant in the room here is gradient descent. (another similar connection would be simulated annealing) it looks like maybe the natural measure is “distance to minimum”. the linear recurrence relation maybe is advancing some “natural distance” δ in the “larger landscape” toward the minimum. then if the perturbations are less than δ, the conditions of the proposition are satisfied for “convergence” to the minimum. simple/ basic, right? 🙂

(9/16) some queries to physics chat room get no response. was looking at [a10] and at end it has ref to “linear quadratic gaussian control”. that is in turn connected to kalman filters.[a11] and maybe the exact eqn is cited on wikipedia:

$\mathbf{x_k = F_k x_{k-1} + B_k u_k + w_k}$

$\mathbf B_k$ is the control-input model which may be omitted and $\mathbf w_k$ is additive white/ gaussian noise and in my case $\mathbf{F_k = F}$. my problem does not seem to involve control yet but does seem like it is doing something like “measuring in the presence of noise.” my case seems to be for a system that has its own “steady-state” convergence dynamics.

(9/17) 😮 ❗ 💡 surprise! looks like the problem is “simply” a multidimensional/ matrix equivalent of the autoregressive model, but after quite a bit of searching still havent seen anyone specifically study the matrix version although suspect it may come up in some engr applications.[a12]

Chatfield [a13] (4th ed) has a nice basic analysis of single variable AR, and some proof that it converges to 0 based on the multiplicative constant $|\alpha|<1$ sec 3.4.4 p35-36,

$X_t = \alpha X_{t-1} + Z_t$ (3.4)

however am thinking the basic proof fmt/ structure may be applicable by using matrix norms on (matrix analog) $\alpha$ instead…?

also am now thinking the Kalman analogy seems vaguely similar but maybe too stretched. he says p187 wrt Kalman filtering, "In state space modelling, the prime objective is to estimate the signal in the presence of noise. In other words we want to estimate the state vector." whereas the "state vector" is fixed and known in my problem.

(9/24) 💡 🙂 ❗ ❓ rather busy these days & not able to channel much time into this as desired/ preferred but still pursuing a few scraps/ glimmers of ideas. am a bit confused because Chatfield seems to claim in his AR(1) section, without proof, $X_t \rightarrow 0$ as $t \rightarrow \infty$. but wikipedia has a relevant/ specific/ different formula in the AR(1) section where $\varphi$ is Chatfields $\alpha$ and $\varepsilon_t$ is $Z_t$:

$X_t = \frac{c}{1-\varphi} + \sum_{k=0}^{\infty}\varphi^{k}\varepsilon_{t-k}$

it looks like am searching for the matrix version of this equation. suspect it will give me the threshholds of (average) noise in the infinite summation vs the constant vector analog $c$ for convergence vs nonconvergence for sign $X_{t}$ negative or positive resp as studied in the prior code. (also, @#%& argh! wordpress editor is screwing up less-than and greater-than substitution in those formulas…) 😳 😦 😡 👿

(later) just asked[a15] on stats.stackexchange.com aka crossvalidated after finding a related question[a14] and richard hardy responded, its called “VAR”, vector autoregression![a16] unf wikipedia doesnt have much on its general properties/ behavior/ dynamics, although it cites application in finance and econometrics. survey ref [a17] “Rise of VAR modelling approach” (in econometrics) looks promising! also feel a bit annoyed/ sheepish that couldnt figure VAR angle out with google…! 😳

(9/26) 💡 ❗ after searching on “multidimensional/ multivariate derivative/ gradient” finally realized its called the jacobian.[a18] have a very detailed ~1180 calculus and analytic geometry book[a19] that includes multivariate calculus but barely mentions the jacobian at all! revealing a bit how abstract/ advanced this area is. scanning thru the jacobian info, it mentions newtons method[a20] and the hartman-grobman theorem[a21] for dynamical systems. the thm basically states that a nonlinear dynamical system can be reduced to a linear treatment/ approximation in a particular local area near a “equilibrium point” aka stable state. this sounds fairly close to some of the prior ideas.

thought long and hard and think have a proof structure based on newtons method for a multidimensional quadratic equation for which the iterative gradient descent is a linear equation, tying in with the jacobian which is naturally used for multivariate newtons method/ gradient descent. this also ties in with a formula mentioned on the jacobian page, the 1st order taylor approximation of a multidimensional function, same formula used for newtons method:

$\mathbf{ f(x) = f(p) + J_f(p) \cdot (x - p) + {\mathit o} (\|x - p\|) }$

here the $o(\mathbf{\|x - p\|})$ term acts a lot like an “error term”. as the newton method mentions, there are “basins of attraction”. my problem involves trying to prove that starting point is inside a “basin of attraction” and the newton/ gradient descent method stepwise convergence towards the stable point is (provably) a (“incremental”) distance that exceeds any/ all local noise perturbations. hasnt anyone studied this? have the feeling that it might be buried somewhere in the vast and not-easy-to search literature. am maybe getting far enough through the theory/ derivations to write some code to validate some of this.

also, am thinking maybe the time series analysis literature talking about AR(1) or VAR processes such as [a17] which does have the concept of “forecasting” is not so connected to the dynamical systems literature which considers “stable points,” and my problem/ analysis is somewhat crosscutting between them.

a. stochastic recurrence