# collatz perplexity

hi all, last month collatz installment made some progress, but unf have been a bit tied up with work, where a fiscal year transition/ boundary tends to lead to some crunch-like dynamics, leaving less time for one of my favorite side projects, namely this one, bummer/ ouch.

but, here is a small trickle/ dribble of some newer ideas, mainly benefitting from google searches and maybe a little algebra.

consider quadratic eqn $f(x) = a x^2 - b = 0$. then derivative $f'(x) = 2ax$. then the newton method approximation recurrence relation

$x_n = x_{n-1} - \frac{f(x_{n-1})}{f'(x_{n-1})} = x_{n-1} - \frac{a (x_{n-1})^2 - b}{2ax_{n-1}}.$

😳 but, oops! as was thinking/ trying earlier, this does not actually translate into a form $x_n = c x_{n-1}$. and this is only the single variate version and dont think the multivariate version offers any additional flexibility/ option otherwise.

however, more googling turns up some stuff. [1] is a better fleshed-out descr of multivariate newtons method than wikipedia. [2] is a very ambitious Msc thesis looking at how gradient descent can find a recurrence relation with arbitrary (increasing) accuracy in the presence of noise perturbations, apparently fairly similar to my current collatz reduction.

but am now wondering about exponential versions. consider the recurrence relation $e^{x_{n+1}} = e^{c} e^{x_n}$. then the exponents follow the recurrence relation $x_{n+1} = c + x_n$. now is that the form of a approximation recurrence relation for an eqn in some form?

looking up quadratic matrix eqns + newtons method led to [3] which has the approximation recurrence relation $X_{i+1} = X_i + H_i$, is $H_i$ which seems close, but $H_i$ isnt a constant, and not even really following how to compute it, it seems like they gloss over that.

[4] has some theory on a newtons method wrt a matrix quadratic eqn incl examples and ref to the Hessian, but cant follow it quite yet.

did some further search on stability/ convergence wrt VAR models and found some slides [5][6] which cover it (p5, p11 “Wold representation of VAR(1)” resp), and apparently also book [7] from another slide pdf.

but is this overthinking it? what about eqn $f(x) = a x^2$? then the recurrence approximation is $x_n = x_{n-1} - \frac{a {x_{n-1}}^2}{2ax_{n-1}} = \frac{1}{2} x_{n-1}$…? but then there is not even a arbitrary constant as in the form looking for. and my eqns actually seem to be in form more like $x_{n+1} = a x_{n} + b$… ❓

(10/3) was up late reading control theory books with some rising interest/ excitement, have 3 or so that have been lying around a long time waiting for some opportune moment. after some more searching, think that the problem seems to be close to the concept of lyapunov stability.[8] that over-century-old concept mathematically captures the idea of stable points or “attractors” given a noiseless function/ derivative/ slope and is maybe nearly a rosetta stone for this problem. it is studied in dynamical systems theory along with control theory.

❗ 💡 ⭐ from the control theory refs/ terminology, and the wikipedia lyapunov stability entry, my problem is about/ close to “stability in the presence of inputs” where the inputs are also called “disturbances”. there is some relevant theory about “bounded input, bounded output/ BIBO” stability for linear systems.[9] the basic idea is that if the input is bounded in some way, then outputs are bounded also and do not increase or decrease indefinitely. have been browsing some but cant find my particular problem. there is a lot of theory about stability wrt impulse response. am not sure how to define/ measure the impulse response of my problem.

basically am searching for some kind of transition-point or threshhold like phenomenon/ boundary between stability and instability which is what is observed experimentally so far. in the sense that “for small disturbances the problem is stable, for larger ones beyond the threshhold it is unstable” aka/ so-called “tipping point”.

❗ ⭐ 😮 🙄 also linked from the lyapunov stability wikipedia page, BAM! exponential stability[10] for “stability of linear systems” reiterates the “marble in a bowl” analogy outlined in prior installment of this series, only except for a “ladle” instead (defn close enough)! yet another related angle seems to be “perturbation theory”.[11]

also worth mentioning here, a generalization of BIBO is called ISS; acc wikipedia but with no entry for it, “Stability for nonlinear systems that take an input is input-to-state stability (ISS), which combines Lyapunov stability and a notion similar to BIBO stability.”

am thinking, would like to find some ref with many examples of BIBO cases, but cant find anything like it so far. my very thorough control theory refs seem only to cover it relatively briefly/ abstractly.

another recent thought, the newtons method approach seems to reduce simply to trying to find f(x)/f'(x) = g(x) solving f for some arbitrary g, ie a differential eqn, say g(x) a polynomial or simply ax. hasnt anyone thought of that before or looked into it? cant seem to find any ref on it yet, still searching. 🙄

😳 😦 😡 👿 😥 🙄 ❗ @#%& other news, my corp firewall now newly blocks wordpress blog editing. ARGH! they sure dont make your life easy do they? (understatement of the century…! viva la Corporatocracy™)

(10/6) thinking about differential eqns a bit, guess the answer is obvious: if f(x)=ax-2 then f'(x)=-2ax-3 and f(x)/f'(x) = -½x-2-(-3) = -½x1 and iterative eqn xn=3/2xn-1. however the arbitrary constant disappears…? but did notice that in newtons method there is a variant known as “relaxation” esp with matrices, which apparently involves a nearly arbitrary constant in front of the term.[13]

but, wait. in this case newtons method is equivalent to solving iteratively x-2 = 0 which has the “root” simply x → ±∞ ie apparently a singularity. ❓

ran across ref [12] looking for newtons method + relaxation. its apparently FEM “finite element modelling” for multidimensional newton method, treated as a dynamical system (!), with advanced/ complex relaxation concepts.

elsewhere: was able to bring up blog in editor behind corp firewall again. was it some temporary reference on the wordpress web page/ web editor that triggered it (maybe have seen something like that before…)? hope it holds! 🙄 😐

(10/11) a few new ideas. there seems to be a basic dichotomy between an iterative version of the equation and and exponential form “solution” eg for “matrix difference eqns”. am thinking maybe the newton method concept applies to the latter? still grasping for something definite there.

❗ ⭐ 💡 😀 turned up some very solid new refs today. looks like nearly the same problem is studied in some refs. the book [14] seems to cover the case in sec 4.6, “stability by linear approximation”. “the linearization method is the oldest in stability theory. scientists and engineers frequently use this method in the design and analysis of control systems and feedback devices.” however this seems to bound the noise term by a constant factor of the current “state vector” amplitude. but it has a strict proof of stability using the Discrete Gronwall inequality also involving the jacobian… on the right track! it cites Lyapunov and Perron.

similarly [15] seems to cover nearly the same material in chapter stability of the linear approximation citing Perron 1929. these refs seem to nearly nail the same problem being considered by me but are a bit abstract/ hard to follow. am now thinking of how to simplify some of the ideas in my own context.

[16] seems to contain very similar theory but instead has a sequence of multiplicative matrices that are bounded, maybe a generalization of my case.

(10/12) havent coded anything in over a month and feel a bit weird about that! finally after review and some linear algebra practice/ exercise, understand some of the basic ideas of the matrix difference eqn theory now. my case is nonhomogenous because it has a constant vector b in the eqn xn = Axn-1 + b. the theory for convergence without noise addition/ perturbations/ disturbances involves looking at eigenvalues/ vectors of A.

did some research and found that ruby statsample has an eigenvalue/ vector routine built in. after some perplexity on how to create a matrix (the so-called doc doesnt really make it clear, not 1st time dealing with that wrt statsample@#%& argh!) wrote some quick code to compute it (deceptively small size; took a long time to get to this point). however, it locks up for all cases tried of prior generated data (n=20)! it is apparently rather fragile/ feeble, or maybe a bug? it seems to succeed in computing eigenvalues/ vectors for some random matrices. so maybe there is something unusual about my matrices. is it not converging? even so decent code should not lock up and report an error. looks like am going to have to find or write some better code, sigh. 😐 🙄

😮 😳 another wrinkle: was trying to understand why the A matrix has diagonal zero entries, didnt notice that until now! this is the recursive formula being constructed to exclude the previous parameter value to compute the new parameter value, ie all parameters are calculated from prior parameter values for “other variables” except the same variable so to speak. surprised to figure this out! this is the fitting algorithm being handicappped in a significant way and still succeeding. need to alter the regression to include all prior variables instead of “all except same one” and presume it will improve fit accuracy, maybe substantially.

matrix.rb

(10/13) ⭐ 💡 ❗ 😀 just wrote and debugged some very tricky code, am proud of it, it realizes a very long-sought milestone/ ideal! this implements the analysis on the wikipedia page for matrix difference equations. it computes the steady-state solution and uses it to solve for the nonhomogenous case in terms of the homogenous eqns. wikipedia doesnt state it explicitly but its just a “coordinate translation” between the two. used the ruby matrix library [17] for matrix eigenvalues [18][19] and inverse, multiplication etc which turned out to be built in and work great! not sure what version it is, my rubydoc installed says v1.9.3 but ruby version is 2.2.0.

it was a bit of a wrinkle/ continued perplexity to figure out how to multiply a matrix times a vector, ruby has a Vector implementation but matrix seems not to be able to recognize it. after google led to a hint/ example on rosetta code (along with 95 other languages!) the solution turned out to be creating a “1-column matrix”. this code calculates the trajectories by iterations in detect and then by the eigensystem “power” formula in detect2 and compares accuracy of both. it looks over 100 trajectory samples by density and analyzes the longest one, which tends to be almost highest density, in this case 97th.

graph 1 is the ‘nl’ variable in red and the ‘wr’ variable in green. as defined earlier, ‘nl’ is computed from an iterated product formula from ‘wr’ iterates. the oscillation after the glide max is quite remarkable; esp considering all this behavior is emergent solely from initial conditions and the matrix difference eqn/ dynamics. the 2nd graph is errors between the ‘wr’ values for each method and the errors in ‘nl’ trajectory values. both are quite low at around 1.0e-15 although there does seem to be a superlinear increase.

the next step, also tricky, is “Extracting the dynamics of a single scalar variable from a first-order matrix system” as in the wikipedia section for ‘wr’ in particular. wikipedia is not all that detailed in this procedure but it involves again working with the eigenvalues. then would like to find a formula to calculate nl<1 ie determine/ estimate trajectory length from initial conditions.

all this is a very big deal because basically the overall problem, to be tractable/ tamed, needs to be reduced to some “mathematically analytic structure” and this fulfills that!

matrix2.rb

⭐ ⭐ ⭐

⭐ 💡 ❗ 😎 had some further thoughts on newtons eqn wrt this problem last night, more breakthrough, think have the basic structure finally figured out! its simple once determined but was seriously scratching head getting to this point!

consider $f(x) = x^b$. then $f'(x) = bx^{b-1}$. newtons formula iteration is then $x_{n} = x_{n-1}-\frac{x_{n-1}^b}{bx_{n-1}^{b-1}} = x_{n-1}-\frac{1}{b}x_{n-1}$. then to get in the “iterative product formula” form $x_{n}=ax_{n-1}$, requires $a=1-\frac{1}{b}$, which implies $b=\frac{1}{1-a}$. in other words for any $a$ in the iterative product formula, there is a corresponding $b$ such that the iterations can be seen as the newton method minimization sequence for $f(x)=0$. so what about $f(x)$ then? for $|a|\,\textless\,1$ and $b\,\textgreater\,1$ its a “power curve” with the origin centered at 0, ie simply the (concave upward) “bowl” with different steepness that was conjectured earlier! my feeling is that the “nonhomogenous” case is a coordinate translation here just as with the prior matrix difference eqns. suspect convergence doesnt happen outside the given range. (not sure about the multivariate case analogy yet, needs more thought…)

(10/23) have been pondering the multivariate case heavily without a solution, but today maybe had a breakthru with the help of cyberspace. the multivariate case (Newton method recurrence) is $\mathbf{x_{n+1}=x_n-J(x_n)^{-1}f(x_n)=Ax_n}$ where $\mathbf J$ is the Jacobian. some quick algebra gives $\mathbf{J(x_n)(1-A)x_n=f(x_n)}$. asking on math stackexchange[20] led to some help thats not really very thorough at all but had a key clue! (unf stackexchange is like that, ie sometimes not getting exactly the answer one wants unless asking exactly the right question…) was wondering if there is a concept of a “matrix exponential” when staring at the eqn but didnt seem to turn it up on googling iirc. however, there is exactly such a concept and suspect its the answer.[21]

there is the little wrinkle of trying to interpret differential eqns in terms of time change as difference eqns, a correspondence that is not always articulated so clearly, but there is a brief mention on wikipedia recurrence relation page. it doesnt mention the basic idea, if $\Delta t=1$, then $\dot{f} = \frac{\Delta y}{\Delta t} = \Delta y$.[22] in other words application of Eulers method for time increment 1, $y_{n+1} = y_n + \Delta t \dot{f}(t_n, y_n)$.[23] also Eulers method can be regarded as an “inverse” of Newtons method ie numerical integration an inverse of numerical differentiation both with a 1st order Taylor approximation of slope/ derivative.

(10/25) wondering/ pondering/ a bit perplexed/ cognitive dissonant in that my single variate solution, containing no ex term, does not seem to be a special case of the multivariate solution.

(10/27) have been wanting to write this basic code for quite awhile, its been in the back of my mind, but got sidetracked. the idea here is to calculate real trajectories and then compare them and the “discrepancy” with the meta function at each iteration. this is calculated as simply error measurements for each of the parameters. then, results/ errors are graphed. the code is streamlined/ refactor to eliminate some discarded computations of data “pairs” that was included in prior code merely as “quick and dirty” (or at least “quick and not cleaned”) code/ copy.

there are 100 trajectories as previously again low to high densities. in this case the trajectory errors are plotted lined up with the “left” or “right” of the trajectory (beginning/ end) in graphs 1, 2 respectively. graph 3 is (the same) ~1400 error points plotted consecutively.

the error is bounded in some rough sense, and its apparent there is some higher trend at beginning and end of trajectories. there is nothing to compare the error scale to but on the other hand there does not seem to be any “red flag(s)” such as huge spikes or large magnitude differences.

also on simple visual inspection there seems a clear anticorrelation trend between at least two signals, blue and magenta, ‘dh’ and ‘dl’.

😳 just realized on closer visual inspection graph 2 is not as intended, its plotting the trajectory backwards/ reverse starting from left. whipped that out a little too fast! exercise for reader, whats a quick fix for that? 😮 🙄

💡 ❓ am thinking the next step for me is to look at the stationary point and calculate how each meta iteration gets “closer” to it based on a vector distance measurement, and how this same metric is affected by the noise/ perturbations. there is a deeper theory involving the basins of attraction for the newton method convergence but while solid in some abstract/ theoretical sense its exact details are eluding me at the moment. part of the challenge is that while the “basins of attraction” for newton method is well established as a known concept, it does not appear (so far after some search) that investigators have looked into its precise technical details.

perturb.rb

(10/28) this is a quick/ minor riff that fixes the alignment/ mirror symmetry glitch and also adds another graph format where the horizontal plot is normalized by relative trajectory position, inserted as 3rd graph. this is a series of plots with a standout longer trajectory as seen in final plot at about ¾ density point (relative density determined by 4th graph) that shows a mean-returning aspect to the errors for multiple parameters esp evident in 2nd “right aligned” graph, because apparently that trajectory is also longest and is then isolated on the left. the alignments are then named “left, right, scaled, serial”.

perturbb.rb

(11/1) did some analysis of some prior code, looking at relative eigenvalues of the previously mentioned eigen-decomposition, and trying to learn the general theory better.[26] do not have a lot of experience/ knowledge in this area & am working to build up some experience/ intuition. it looks like in all my test data, the “highest” (technically, “dominant”) eigenvalue is a little less than -1 (negative). a lot of the theory talks about modulus of eigenvalues less than 1 for convergence, and my data doesnt fit that. did run across some theory on Leslie matrices.[25] these triangular matrices dont match mine either but the idea of population dynamics with a eigenvalue modulus larger than 1 matches. however also in that case, the dominant eigenvalue is positive.

trying to understand what is going on led to the concept of the power iteration to determine the dominant eigenvector.[24] the framing is different than my direction/ situation approaching it, but seems to be entirely applicable; this has a proof that a matrix difference equation/ recurrence relation converges to a multiple of the multiplication matrix dominant eigenvector if its corresponding eigenvalue modulus is greater than 1. this seems to be describing the dynamics of my systems and wrote a little code on top of prior code to validate that and it seems to work.

it calculates difference between current normalized “state vector” (vector divided by its magnitude) and the dominant eigenvector, and it indeed “converges” to zero. however, in a strange/ odd/ currently inexplicable sense: there seems to be a sort of alternating convergence pattern in the results, cant quite figure it out yet; it seems maybe to relate to consecutive integer powers of a negative number leading to a sequence of alternating signs. the error in each parameter (of the “state vector”) seems to alternate between some asymptotic constant value and closer to zero. maybe a numerical/ overflow issue? am delving into it further and plan to post some code/ results on that. it seems like a system with a negative dominant eigenvalue and larger-than-1 modulus may not be encountered much in the literature…?