# collatz ad infinitum

threw this code together and it was unusually/ surprising timeconsuming to debug yet has rather little code/ logic. the difficulty is related to the early-discovered property of how skewed (nonlinear) a random sample of trajectory distributions is. nearly all are very short and it seems quite possible its in line with some power law (never tried to fit it, but do have the overwhelming urge to, and just go hunting for as many power-law properties in the data one can find, strongly suspect they might be quite widespread).

the idea is to try to bias (or unbias depending on pov) the sample of randomly chosen trajectories (based on initial density spread sample) so that they are nearly linearly distributed in length. was developing the test data to test the trajectory (meta function) calculation logic. to be more exact was partly working off of the matrix6 code and wanting to improve the aesthetic graph results with “more linearity” in the sample. did not end up with meta function calculation logic tied to it worth saving, but given how tricky it was, do want to save this sampling code for future reference! this generates 10 samples of 500 points over uniform density, and then finds the min and ¾ the max of trajectory lengths, and tries to sample linearly over that range (starting top down), and the result is quite smooth as in the graph (of trajectory length).

(12/3) this is some tricky code that is the first to implement the (continuous) matrix diffeq formulas and euler method and validate it all by comparing it with the validated discrete matrix diffeq formula from prior experiments. as derived the continuous soln eq reduces to the discrete eq for step size $\Delta t=1$ using euler method integration. this code computes disc diff eqn as ‘l1’, the euler int method $\Delta t = 1$ (on cont diff eq) as ‘l2’, the cont diff eqn $\Delta t \rightarrow 0$ as ‘l3’, and the euler int method $\Delta t=0.05$ as ‘l4’. graphs 1/2 are found to be coincident and as derived earlier and graphs 3/4 are coincident, all showing the different interrelationships/ equalities previously derived formally.

however graphs 3/4 represent the trajectory evolution using the homogenous formula and am not exactly sure how it relates to the (inherent) nonhomogenous formula right now, it was more to validate the euler method/ cont diff eqn solver using arbitrary parameters not nec exactly related to the collatz parameters/ dynamics derived. bottom line, a significant tactical victory after some serious brain stretching/ crunching and somehow feel this brings me very close )( to my key problem but just cant quite line it all up yet still. 😐

matrix9b.rb

(12/7) ❓ in the prior experiment its surprising that just changing the step size from $\Delta t=1$ to $\Delta t \rightarrow 0$ alters the results so dramatically from divergent/ oscillating to convergent, and still cant exactly further derive/ picture what the convergent form represents and connection to my problem. am not really sure how to intuitively explain this right now. its a significant transition point at $\Delta t \approx 0.9$. chalk up another unexpected aspect of nonlinear dynamics. possibly a basic demonstration of the concept of numerical (in)stability.

was surfing wikipedia some and came up on the articles on numerical methods for ODEs and discretization. these are significantly related to some of the prior ideas. the 1st article at least has an actual derivation of Eulers method whereas the Euler method page does not, doncha just luv wikipedia! the 2nd article has an entire complex derivation that is related to discretizing the same matrix differential eqn via formulas for the matrix exponential (scenarios from control theory), and discusses precision/ accuracy/ exactness considerations. it has many other functions involved but the core relationship is unmistakable.

💡 ❓ am now thinking that some of my ideas relate to the opposite of discretizing, what might be called “continuizing”. am attempting to find a continuous function, solving for which gives the newton method “discretization-like” steps. am having some )( doubts about the newton method analogy because, re prior experiments showing more of a “blowup” dynamics, am not really sure if my equation is “converging” to anything in a standard sense. although the prior power law code shows there can be a sense of “convergence” (toward the dominant eigenvector) even along with a blowup dynamic.

💡 ❓ some other currently hazy/ rough ideas: the formulas to calculate the recurrence relations look a lot like exponential averaging, and subtracting the fixed point “offset” seems a lot like mean correction… has anyone pointed this out, developed it further?

(12/10) 🙄 😳 😐 👿 have been banging my head kind of hard on this tough problem for several weeks now, finally feel maybe gaining some headway. this answer by Gribouillis looks so simple but contains some hidden complexity. it took me quite awhile even to parse his few lines, think finally have a handle on the mere syntax. (some time to realize 2nd formula line uses the chain rule for derivatives!) part of my key problem is apparently trying to generalize the function $f(x)=x^a$ to vector form. what does it even mean in vector form? the Gribouillis answer might have some major hint. the last line is $f(e^{tA}x_0)=e^{t}f(x_0)$. the tricky part is that for non-single-variate case, $A$ and $e^{tA}$ are matrices. its hard to juggle the columns vs matrices in this problem (and the formalism doesnt entirely help there). merely syntactically it looks like in some sense the function needs to recover the exponent $t$ from the matrix expr involving $e^{tA}$. it looks like some kind of inverse operation… oh, ie, matrix logarithm but in some vector-like form instead of matrix form!

💡 after a lot of thought, finally came up with this. the matrix exponential $e^A$ can be written in the form using $U D U^{-1}=A$ for diagonalizable $A$ where $D$ is the eigenvalue diagonal matrix and $U$ the eigenvectors. further then it looks like if column vector $x=e^{tA}x_0$ then $x=U e^{Dt} U^{-1} x_0$ where $x_0$ is also a column vector. but through algebraic manipulation and due to properties of diagonal matrices allowing commutative rearrangement,* let (diagonal) matrix $x'=(U U^{-1} x_0)^T e^{Dt}$ and then $x = \mathrm{diag}(x')$. let (column vector) $c=U U^{-1} x_0$ and $\hat{c}$ be its “hadamard inverse” ie each element $\hat{c_i}=1/c_i$. then apparently $\hat{c}^T x'=e^{Dt}$, and then $\mathrm{ln}(\hat{c}^T x')=Dt$, or further/ finally, $\mathrm{diag(ln}(\hat{c}^T x'))^T D^{-1} = \mathbf{t}$ where $\mathbf{t}$ is a column vector of $t$ values. whew! wonder if anyone has seen anything like that before? despite all the “nearby” theory it seems a bit exotic and maybe unlikely to have been encountered before…? also am thinking there might be a way to recast that all to powers of diagonal matrices without the hadamard inverse… ❓

(12/11) 😳 😡 ❓ * argh, that so long to write out, but cant be right! $AB=BA$ is true if both $A,B$ are diagonal but not (nec?) if only one is. also $U U^{-1}=\mathbf{1}$ making it all apparently nonsensical.

💡 😀 aha! but maybe that stuff with the hadamard operations has more to it, just came up with this, “snatching victory from jaws of defeat”! using same definition for $x$, premultiply by $U^{-1}$ to get $U^{-1} x=e^{Dt} U^{-1} x_0$. let $y=U^{-1} x_0$ and $z=U^{-1} x \circ \hat{y}$ where “$\circ$” is the hadamard product. then $z_i=e^{\lambda_i t}$$t=\frac{\mathrm{ln}(z_i)}{\lambda_i}$ where the $\lambda_i$ are the eigenvalues in $D$, voila! ❗ ⭐

(12/12) 😀 ❓ just tried it in code and it works! ie prior code was modified to calculate the $\mathbf{t}$ vector from prior formula for the $\Delta t \rightarrow 0$ cont diffeq case and recovers the linear steps within high accuracy. so this seems to be a big advance toward defining/ constructing $f(x)$. however, the hadamard operations are quite awkward and wondering if it can be recast in std matrix algebra operations… the problem of determining/ deriving the jacobian of this complicated operation also remains (these two goals interrelated)…

lately am pondering a meaningful/ possible key/ previously overlooked expr/ eqn mentioned in Schneider + Barker and also the wikipedia matrix exponential pg but maybe only alluded to in the other recent related pgs/ formulas. is it some kind of answer for me? it defines the matrix exponential derivative in matrix form. all this notation is very confusing because it seems like sometimes identical notation is used to define the column vector derivative… is this (also) the jacobian? still that feeling of hazy confusion… been dazed and confused for so long its not true…[x] 😳 😦 ❓

$\frac{d}{dt}e^{tX} = Xe^{tX}$

(12/13) 💡 😀 ⭐ ❗ after some more wild/ feverish scratchings last nite, just came up with this neat/ remarkable derivation! this is equivalent to prior function with no hadamard operations as desired; the 4th line converts column vectors to diagonal matrices (as this shows, turns out hadamard vector operations on vectors such as product/ inverse are quite similar to operations on diagonal matrices). took awhile to fmt this within rather substantial wordpress math limitations, which allow a lot less than “typical” latex with std packages, found helpful pg on that, multiline aligned eqns in latex without amsmath align.[x]

$\begin{array}{rcl} x & = & e^{tA} x_0 \\ U^{-1}x & = & U^{-1} U e^{tD} U^{-1} x_0 \\ U^{-1}x & = & e^{tD} U^{-1} x_0 \\ \mathrm{diag}(U^{-1} x) & = & \mathrm{diag}(e^{tD} U^{-1} x_0) \\ \mathrm{diag}(U^{-1} x) & = & e^{tD} \mathrm{diag} (U^{-1} x_0) \\ \frac{\mathrm{diag}(U^{-1} x)}{\mathrm{diag}(U^{-1} x_0)} & = & e^{tD} \\ \mathrm{ln}\!\left(\frac{\mathrm{diag}(U^{-1} x)}{\mathrm{diag}(U^{-1} x_0)}\right)\!D^{-1} & = & \mathrm{diag}(\mathbf{t})\\ \left(\frac{\mathrm{diag}(U^{-1} x)}{\mathrm{diag}(U^{-1} x_0)}\right)^{D^{-1}} x_0 & = & e^t x_0\\ f(x) & = & \left(\frac{\mathrm{diag}(U^{-1} x)}{\mathrm{diag}(U^{-1} x_0)}\right)^{D^{-1}} x_0\\ \end{array}$

(12/17) 💡 ❓ am thinking very hard about these formulas over days/ weeks and still dont quite have the key answer yet. so far while there are strong similarities and have built up some major/ valuable insights/ advances/ intuitions it just doesnt look like my problem formulation fits into “linear approximation stability” situation even looking at integrated functions, and dont see a straightfwd way to reformulate it in this form yet… it appears the core nature of my formulation is a diverging eigenvalue that eventually drives the trajectory downward in greater oscillations and havent yet found any theory that translates directly to that situation. however/ nevertheless there is an apparently still reliable/ consistent principle at stake (ie, genuine proof potential).

on other hand this is a fairly quick riff, a simple idea, one of those “surprised this didnt occur to me before” ideas/ experiments. a lot of prior code looks at correlation of meta function glide length vs two variables in particular, ‘wr_e’ and initial density. initial density is fairly significantly correlated with glide length, but there is a much more direct experiment that makes a lot of sense to try out.

this code uses the nice sampling logic constructed earlier/ recently to get a linear sample of glides by glide length (instead of total trajectory length, and a slight tweak of chopping off the top ½ of the distribution instead of top ¼, and also bottom-up resampling instead of top-down, although not sure that makes a lot of difference). then it applies the meta function logic to estimate the glide length using only initial parameters. there is a fairly strong, unmistakable correlation although theres some noise. here are two different cases starting with the weaker correlation followed by the strong one. the red line ‘c2’ is glide length/ left scale and the green line ‘i’ is the meta function estimate/ calculation/ right scale. in the 1st there is better correlation early on and then the correlation drops off as the estimate plateaus. the 2nd even estimates the longer glides with fairly good accuracy. this is impressive and maybe the most accurate predictor of glide length ever formulated so far, although it cant be expressed in a simple analytic formula. some encouragement about “definitely on the right track” although the direct path/ endgame is still not yet clear.

matrix11.rb

(12/18) 💡 ❗ just had a huge brainstorm today, something thats been percolating in my neurons maybe for weeks. it comes from contemplating the following idea. at each new point or iterate in a trajectory, there is a new recurrence relation implied by the starting parameter. in other words, in a sense there is not one meta function but as many as there are iterates in the glide, and maybe that is part of my cognitive difficulty trying to adapt existing theory, which works for a single/ unique underlying (optimal) meta function. then the idea, what would it look like to superimpose/ average all these curves? it seems it would tend to converge toward the optimal underlying meta function! one would expect the matrix derived by the earlier optimization code would be close to this optimal underlying meta function, but ofc not exactly the same.

now the idea of an “underlying meta function” not exactly the same as the one derived from prior optimization code leads to some new potential. my immediate idea is to look at the integrated parameter curves for real trajectories which are, based on this model, sums of exponentials. then, one could try to derive the underlying meta function in another way, using curve fitting! there is probably sophisticated code that could simultaneously fit all the curves and find the matrix parameters (via gradient descent), but for very nice simplification, its possible to fit the curves one-at-a-time. esp of interest is how the residuals compare to this optimal underlying meta function fit. admittedly wishful thinking here, but not entirely implausible wrt prior analysis/ findings—it would be a veritable breakthru if the residuals tend to decrease as the prior linear approximation stability (control) theory requires for its theorem applicability conditions.

a quick google search leads immediately to some others documenting/ working on/ researching exactly the same problem, always a great sign! [x][y][z] 🙂

also, gnuplot has an apparently very good built-in nonlinear function fitting optimizer (seem to recall it working wonders once…) cant remember, was it used in any of my old blogs? they have piled up into a few dozen now & its hard to keep track now… 😐

(12/19) this code occured to me to try out, it is somewhat related to the prior ideas and easier/ nearer to carry out wrt prior code. this is similar to the earlier experiment that compares the meta function to the actual function and graphed differences. this one does nearly the same for the integrated functions, namely calculates differences between the integrated parameters for the real function and the integrated meta function calculated at each iterate. the 1st graph is the two functions superimposed, and the 2nd is the cumulative difference. then there are 2 more runs of the 2nd graph just for comparison of variability.

a few comments can be made. the 1st graph functions are quite smooth and nearly linear looking. it looks like the exponential function sums are very “early”/ gradual phase or tend to “cancel each other out” in some sense (eg incl individual oscillations). the meta fn integrations closely match the real fn integrations in 1st graph/ absolute scale. the later graphs show higher noise in the ‘dl’ and ‘dh’ (green, red) parameters which tend to be anticorrelated some. the other trends are smoother. also quite notable is that ‘sd0’ and ‘a0’ (black, yellow) consistently move away and then finally return to low error. alas, something like the hoped-for advancing error decline did not materialize. yet also quite notably the error does not go up very much, possibly linear/ less than exponential in most cases.

overall there seems to be some major technique or leverage lurking here, esp in that the typically very noisy parameters are smoothed out so thoroughly. note the glides end at points 232, 195, 222 resp for the 3 runs. overall the general global error accumulation/ bias ensuing from the earlier visualized local error is easy to understand from these trends. also the error tends to be over- or underestimated and not alternate much between the two.

matrix12.rb

(12/21) 💡 ❗ ❓ again thinking hard on the “vector power” function and its generalizations lately, maybe some new insights. the 1×1 single variate version with formal algebraic manipulation is simply $f(x)=\left(\frac{x}{x_0}\right)^{\frac{1}{A}} x_0$. now from prior derivation to get the Newton method recurrence relation $x_{n+1}=A' x_n$ leads to simply $A=(1-A')$ which matches the earlier result $b=\frac{1}{1-a}$ where $f(x)=x^b$$x_{n+1}=a x_n$ where there $f(x)$ is a concave upward “bowl”. earlier it was asserted $|a|\,\textless\,1$ but am now thinking that is halfway incorrect and needs revision. for my matrix difference eqns it seems in some sense that $|A'|\,\textgreater\,1$ ie the vector magnitude grows with each iteration. how does that translate to the 1d version? it looks like clearly if $a>1$ then that 1d iteration gets larger. in that case $b<0$ ie is negative! what is that function? its a hyperbola that gets closer to the origin (ie zero!) as $x \rightarrow \infty$. the multidimensional version seems apparently analogous. but what does a "multidimensional hyperbola" look like? it appears its exactly the "converging" matrix exponential corresponding to the matrix diffeq that is known to converge for all negative eigenvalues! so here is the insight that even as the recurrence relation is increasing/ divergent-looking, the integrated function is decreasing/ converging.

now this leads to contemplating the signs of the eigenvalues of $1-A'$. also in my multidimensional derivation there is a matrix logarithm step. looking carefully over the matrix log wikipedia pg it points out that matrix log is not unique, partly due to some complicated rotation/ invariance aspects. cannot quite follow that yet. but again reasoning by analogy, consider the simple 1d eqn $y=x^\frac{1}{2}$. then $\mathrm{ln}(y)=\frac{1}{2}\mathrm{ln}(x)$. but taking only positive logarithms as the function is typically defined in nearly all undergrad math, that leads only to a positive root. but as we all know $y=x^\frac{1}{2}$$y=\sqrt{x}=\pm\sqrt{x}$ ie two roots!

this shows up in matrix powering eg with square root where the diagonal eigenvalue matrix is $D^\frac{1}{2}$. in fact the wikipedia matrix sqrt pg asserts there are as many roots as $2^n$ where $n$ is the number of eigenvalues/ matrix dimension, ie all possible $\pm$ signs of all eigenvalues! this leads me to contemplate the (advanced) complex logarithm fn (simple intro/ property here) defined for negative values and wondering how it relates to the matrix logarithm fn which is unclear right now… part of my difficulty is that the matrix eigenvalues are fractional and how do signs really work in general for fractional roots anyway? this is the kind of stuff that gets glossed over in undergraduate math but is turning out to be pivotal/ crucial for soln to this problem. 😐

(12/22) 💡 😮 🙄 😀 ❗ this code turned out to be quite a saga. was staring at the 12/13 formula. so much thinking/ time/ labor went into it from many aspects, it deserves a name. say, the integrated (column) vector root/ multid hyperbola function (“cvrmdh”). wondering, has this exotic beast ever been studied before in the vast literature? the cvrmdh formula above was adjusted to write in the final formula. but after a lot of staring at it and some of the prior concerns about nonunique/ multivalued roots/ signs, was wondering if it was really correct, even after already testing it some numerically (did not publish that code yet). so wrote a little test code to create a random vector ‘x’ and a random ‘t’ value, and used A matrices from the test data. the code computes the matrix exponential $e^{At} x$ and then the cvrmdh function $f(e^{At} x) = e^t x$ from that, and compares that to a straight calculation of the latter $e^t x$, and calculates the error between the two, finding it small.

but the bigger story is trying to understand the (correct!) cvrmdh function. the calculation results show that always one of the parameters ‘a1’ diverged while all the others converged while applying the cvrmdh to actual vectors from the iterative matrix multiplication formula. huh? then pondering over the prior questions of the sign of the eigenvalues as it relates to the multivalued log function, on a whim decided to negate the  corresponding (also dominant) eigenvalue. it worked! all the parameters converged in that case. this is a veritable breakthru, a several-months long side goal finally achieved! and then also tried out negating other eigenvalues, and then the cvrmdh consistently did not converge.

so honestly this seems like near magic at the moment but probably there is some solid math to back it up,* but just dont have a clear idea of what it is right now! it does not help that the wikipedia page on matrix logarithm seems quite muddled on the theory of its nonuniqueness, eg wrt eigenvalue signs, noting the property but not really elaborating on it in a systematic way (maybe some intermittent/ sporadic mysterious elaboration though!). there is some other question because apparently in all cases of the data (27) the dominant eigenvector of the $1-A$ matches the same dominant eigenvector of $A$, but can this be proven? also there was one case (of the 27) where it wasnt the ‘a1’ parameter but instead the ‘a0’ one.

there was some more saga in trying to debug this code. it failed in a few cases. these turned out to be where the samples led to eigenvalues with imaginary parts. the code failed at the Math.exp(x) where the std ruby exponentiation function gave an error that it couldnt convert ‘x’ to a float. did a little research and then just include cmath and the same Math.exp(x) function worked, along with everything else! dont you just luv it when stuff just works? 😀

there was another small wrinkle to note. there is a line to compute d ** t for the iterative matrix multiplication formula. that failed for a matrix when t=0, and dont quite understand why, Ruby said “Not Regular Matrix” and was apparently trying to compute an inverse of the ‘d’ matrix. just changing the code to use the identity matrix for d ** 0 instead worked for that case and the rest of the computation!

the code outputs the eigenvalues for A’, A as ‘ei1, ei2’, the iterate details, dominant eigenvalue #s, etc. the runs are 2 graphs for samples #7, #12, #13. #7 is notable because it has the 2nd ‘a0’ dominant eigenvalue (green) instead of the typical 1st ‘a1’ (red). #12 is notable for the strange instability at the end, this seems to be a numerical glitch related to the particular matrix and/ or or its decomposition (it seems to repeat for different seeds, and was not encountered in any other sample of the 27). #13 is notable for a few complex eigenvalues (instead of none in other samples) and the very long ‘a0’ convergence.

matrix10.rb

["#7", {"c"=>937, "c2"=>363, "n"=>1267650600228229401496703205375, "j"=>100}]
["ei1", {"a1"=>0.9326431241434006, "a0"=>-1.0407631302949885, "dh"=>0.8036152708451543, "dl"=>-0.579292281815563, "sd0"=>-0.5013478498487111, "sd1"=>0.36545492077438674, "mx1"=>0.019689946196317296, "wr"=>1.1657724003579002e-17}]
["ei2", {"a1"=>0.06735687585659778, "a0"=>2.040763130294987, "dh"=>0.1963847291548463, "dl"=>1.5792922818155637, "sd0"=>1.501347849848715, "sd1"=>0.6345450792256139, "mx1"=>0.9803100538036826, "wr"=>1.0000000000000007}]
["dom12", 1, 1]
["e12", 1.9356233870942265e-08]

["#12", {"c"=>937, "c2"=>363, "n"=>1267650600228229401496703205375, "j"=>100}]
["ei1", {"a1"=>-1.361610005203782, "a0"=>0.923265874338447, "dh"=>0.7991545229477245, "dl"=>-0.6799908394658594, "sd0"=>0.43380143765931195, "sd1"=>-0.07843774204447226, "mx1"=>-0.03618324823137143, "wr"=>-2.656317789550015e-18}]
["ei2", {"a1"=>2.3616100052037785, "a0"=>0.0767341256615517, "dh"=>0.2008454770522753, "dl"=>1.67999083946586, "sd0"=>0.5661985623406883, "sd1"=>1.0784377420444715, "mx1"=>1.036183248231372, "wr"=>1.0000000000000007}]
["dom12", 0, 0]
["e12", 8.573274155486491e-08]

["#13", {"c"=>937, "c2"=>363, "n"=>1267650600228229401496703205375, "j"=>100}]
["ei1", {"a1"=>-1.0455826078376702, "a0"=>0.9343458893855396, "dh"=>0.8518810621247627, "dl"=>-0.6910311038859487, "sd0"=>(-0.014140018082487079+0.06441862544022388i), "sd1"=>(-0.014140018082487079-0.06441862544022388i), "mx1"=>3.469446951953614e-18, "wr"=>-0.02133320362171033}]
["ei2", {"a1"=>2.0455826078376687, "a0"=>0.0656541106144593, "dh"=>0.14811893787523758, "dl"=>1.6910311038859485, "sd0"=>(1.0141400180824873+0.06441862544022309i), "sd1"=>(1.0141400180824873-0.06441862544022309i), "mx1"=>1.0000000000000002, "wr"=>1.0213332036217102}]
["dom12", 0, 0]
["e12", 7.904159656753689e-16]

(12/23) 😦 😳 😡 👿 * @#%& argh! full disclosure! 2nd thoughts. some of the euphoria has worn off in the new cold light of day. too much wishful thinking, Too Good To Be True™. embodiment of “the devil is in the details”. the converging graphs look quite plausible and “expected behavior” but must be now regarded as at this point merely aspirational/ “near miss”. suspect the code is quite “close” to correct calculations but the prior code is buggy for multiple reasons. did you spot em? 😛

• some confusion over matrix logarithm. its somewhat implicit but the code is calculating $\mathrm{diag}(x)^{\frac{1}{D}}$ and the roots signs are associated with $x$, not the eigenvalues $D$! ie negating the eigenvalue while giving desirable/ plausible-looking results seems to make no sense!
• looking closely into code results, the components of the (diagonal) vector quantity $\frac{\mathrm{diag}(U^{-1} x)}{\mathrm{diag}(U^{-1} x_0)}$ are sometimes negative, and ruby is not choking on raising them to fractional (positive) values via the exponentiation operator “**“. its apparently computing a single complex root in those cases. also, it is outputting complex quantities into the data file (real + imaginary parts), but they are output with no space between the real and imaginary parts but with a sign +/-, and gnuplot just apparently skipped over the imaginary parts of the quantities in the graphs!

havent studied this stuff in ages but some of these considerations about complex roots are starting to remind me of de moivres thm.[x] but wikipedia states it fails for non integer powers, however a generalization exists. in that section it mentions “failure of power and logarithm identities”.[y] de moivre pg also mentions roots of unity.[z] this seems to be turning into even more of a rabbit hole…

(12/26) 😦 💡 a lot of brain crunching and am feeling a bit frustrated/ stymied right now. going back over eqns and code and my eyes/ neurons glazing over some. however, had a small )( idea. adjusted the code to allow an arbitrary matrix/ size. then tried out the code for 1×1 dimensions; in that case the eigendecomposition is $U = U^{-1} = 1$ and $D=1 - A$. it generally worked as expected, and now feel some confidence that the analysis/ code is generally correct. however, just realized that starting from a negative “initial vector” (for the 1×1 case) may be messing up the code/ convergence, presumably due to the imaginary root issue. so thats a simple way to conceptualize/ attack what could be the basic problem… this is giving me some flashbacks to an old numerical methods class in college and the teacher insisted that newtons method failed for finding imaginary roots and that no adjustment/ generalization was known.

(12/27) 😮 ❓ the “failure of power and log identities” section also leads to the concept of the principal branch. looking for generalizations of newtons method leads to this remarkable paper “generalizations of newtons method” by gilbert, talking about defining it on a riemann sphere. the rabbit hole goes even deeper! my problem is again leading to a sort of “6 degrees of separation” with very advanced areas of math! the wikipedia pg on complex log also mentions riemann surfaces.

(12/30) 💡 a few new thoughts. “the plot thickens…” maybe am overthinking this, getting distracted, going off on a tangent, losing sight of the main goal! after a feverish burst of coding, not all of it carefully considered, a few more moments of retrenchment/ contemplation.

• it seems part of the problem may be focusing on the simpler homogenous eqn instead of the nonhomogenous eqn. a key insight is that the actual datapoints fit are always positive by design/ definition. so the negative parameters encountered are in a sense artificial or an “artifact”. it appears the homogenous eqn may encounter negative parameters faster than the inhomogenous one which effectively offsets parameters away from the origin.
• my other immediate insight is that maybe am focusing too much on the theoretical convergence of the method/ eqns after infinite steps, because “when you have a hammer, everything looks like a nail”. it seems all the theory about convergence of similar eqns is phrased in this scenario of convergence as time approaches infinity. is this too much theoretical bias in the face of my practical application, which is quite finite, and more about determining the “end” of a trajectory that occurs when any parameter, in particular ‘wr’, is negative? (recall the earlier empirical finding that $\Pi wr_i\,\textless\,1$ and $wr_i\,\textless\,0$ tend to nearly coincide with the former slightly before the latter.)
• along these lines, reviewing prior results under matrix3 output/ graphs, it looks like some parameters turn negative for the matrix multiplication recurrence eqn very quickly even for “large” starting iterates of ~100 bits, ie even less than ½ dozen iterations, even for the “correct” or “more accurate” nonhomogenous eqn version.

fyi this is an extremely nice/ sophisticated site that deserved earlier mention. its a javascript graphing calculator on site called “desmos” and a page on newtons method. its been extremely useful in developing some intuition on the 1d case, was very impressed with it! and would like to write up some more case studies at some point, although to do this thoroughly can be timeconsuming.

(1/19/2018) 💡 ❗ ❓ took some time off to bang on other pursuits, some in this blog. after a rather large outburst, dont have immediate code ideas. dont have a major update but have some minor ideas (hence not starting new blog post). thinking hard about the overall newtons method idea again. have been thinking mostly about positive constant recurrence relations $a\,\textgreater\,0, x_{n+1}=a x_n$ and never really looked into the negative case very seriously, and that maybe it relates to the negative max eigenvalue situation for the multivariate case. now thinking there is quite a bit of subtlety there even just looking at the single variate case.

for $a\,\textless\,1$ each subsequent iterate changes sign and gets larger in magnitude. does it “converge” to anything in the earlier senses? it appears not. consider eg $a=-2$. then

$f(x)=x^b = x^\frac{1}{1-a} = x^\frac{1}{1-(-2)} = x^\frac{1}{3}$.

now that formula is concave downward and newtons method diverges on it as in this desmos graph, and theres no sense that it approaches a zero point as earlier with the hyperbola example. but heres a remarkable twist. notice that if $y = x^\frac{1}{3}$ then $x=y^3$ and the formula for $x$ can be solved with newtons method. this leads to a concave upward curve and newtons method converges to the zero (as in 2nd desmos graph). in other words newtons method can be applied along the x axis in 1st case or along y axis in 2nd case, its the same curve and the 1st diverges but the 2nd converges. in fact it is tempting to suggest the 1st algorithm “iterated in reverse” actually does converge to the zero point! maybe some generalization here?

hmmm! have not yet found any refs that talk about these kinds of subtleties with newtons method such as concavity affecting convergence. there seems like some general theory to be explored here ie which axis along which to find the solution and the relationship to divergence/ “iterations in reverse” wrt other axes, and am now wondering if the matrix multiplication may be simultaneously diverging and converging in different axes? is there also some difference going on between a gradient descent and ascent? its a ~3½ century old algorithm that deserves even more cutting edge modern analysis/ exploration! (for example as alluded earlier its an apparently quite nontrivial problem just to generalize it to work with functions with imaginary components.)

alas, caveat, heavily fear linkrot on those desmos links. and another idea, wouldnt it be cool if they could be embedded in web pages like youtube videos!

💡 another wild idea occurred to me today. the matrix multiplication formula seems to oscillate between even and odd iterates. what if one simply computes a 2-iteration $A A = A^2$ and works with that instead? will this just square the eigenvalues in the new squared matrix, or does it have no effect, what is that relationship? a better mathematician could likely answer easily! suspect it may help the stability/ convergence considerations significantly! again havent seen that (apparently great/ promising) idea elsewhere… 😮