GA tuning & a very good-looking solution verging on breakthrough!?

โ— ๐Ÿ’ก โ“ now the big fun begins, tuning the algorithm & looking for a real solution. did some further experimenting and came up with some new metrics. the prior metrics succeed in improving x_a and x_b but on further runs & thoughts (as is commonly the case with GAs), maybe these metrics that dont take into account slope of the sampled function are not sufficient to find the desired convergence behavior.

in fact to have below-constant max nonmonotonic run length x_b, the “sequential decrease/ nonreversal density” x_a needs to increase gradually as the size of the starting seed increases (because average trajectory lengths increase as graphed previously) so merely maximizing x_a is not enough. another basic metric is “average distance between local minima” and that needs to be constant, but again minimizing it over a small local region does not necessarily ensure the solutions will have a constant trend as is the case in some experiments. yet another simple (new) metric/ constraint worth looking at is “total count of local minima” and that also needs to increase at least as fast as the trajectory lengths to get the desired constant-bounded distance between local minima.

so then a natural idea is to look at trends of lines and a corresponding fitness function. how can this be measured/ quantified? the standard/ straightforward method is least squares linear regression, and that method was next employed.

experiments also showed that even with average distance between local minima nearly constant over a small optimizing region, sometimes there are massive irregular distances/ spikes in that metric. so next it seems like the variance in average distance between local minima is a key metric to minimize. of course small average distance between local minima is preferred but tuning GAs to optimize more than one thing at a time can sometimes be a can of worms.

the final evaluation function evaluates the linear regression of the variance in average distance between local minima fitted to the weighted walk starting at random starting seeds in the range n \in \{40...60\} at offsets of 2 (10 points). in other words the algorithm is searching for a constant average distance but, while it is preferrably smaller, there is no fitness directly based on/ connected to its magnitude.

some other bug fixes: after some troubleshooting, discovered the prior code sometimes gets “NaN” “not-a-number” values for averages and standard deviations in rare cases, in one case where there are all ‘1’s in the binary expansion ie numbers in the form 2^n-1.

so, after some not-difficult adjustments, ran the algorithm and found some striking results possibly verging on yet another breakthrough โ— outlined in the following graphs. the algorithm found 9 improved solutions over ~7 hours. it is easy to throw out some for which x_a is visually constant-bounded and not increasing (or declining) (even though x_b may often be quite noisy in those cases and difficult to interpret unambiguously individually). but whats left?

a very strange-yet-dramatic (or vice versa?) solution pattern emerged over at least two different GA runs as documented in the following graphs, seems to be a strong vindication of the techniques, and apparently fits the desired criteria over a large test region n \in \{1...750\} although in a somewhat unpredictable way. the weights are as below. note that this code has not been tested for strict resilience/ repeatability in floating point ops and controlled random number generator starting seeds, that will require some further analysis and code adjustments, although the current inherent seed randomness in the trend analysis seems to indicate some significant resilience/ repeatability of the solution.

the 1st graph is the whole “bottom line” of the max nonmonotonic distance (x_b). for many GA solutions these are very gradually increasing graphs as in the prior graphs. however, this solution is anomalous in many ways but also shows that the overall idea (linear weighting of nonlinear iterate analysis statistics) can apparently lead to quite structured behavior.

there are three very distinct regions. the leftmost region n \in \{0...{\sim}150\} has the training region n \in \{40...60\} and manages to stay below a constant under about 100, very desirable. but then it “blows up” around n \approx 150 and climbs rapidly in the 2nd region. then in the 3rd region starting around n \approx 250 its quite noisy but still apparently “sideways”, less than a ceiling of about ~1800 although with several large spikes.

the 2nd graph plots average (green) and standard deviation (red) in the nonmonotonic run lengths and has the same 3 corresponding regions. the 1st region is low. the 2nd region has rapidly climbing average/ deviation. the 3rd region shows both falling rapidly and then gradually! this is completely unexpected behavior in the test regions 2/3 outside of the training region 1 yet in the “final” region 3, desired, even “better than expected” or almost even “too good to be true” because only constant bounding was sought after. ๐Ÿ˜€ leading to a wild wonder/ conjecture, is an asymptotically declining average really theoretically possible? โ“ โ—

the 3rd graph plots sequential decrease/ nonreversal density x_a. in region 1 it is noisy and slowly climbing. in region 2 it falls rapidly. in region 3 it climbs gradually, the desired behavior (although does seem to plateau near the end right, which is maybe problematic).

the 4th graph plots total trajectory length in red (divided by 5 to scale it in the graph but which does not affect slope) and counts of local minima in green. the former climbs quite predictably but the latter shows the same 3-region behavior differences. in region 1 it climbs, in region 2 it falls, both in a single curved, concave-downward manner. then in region 3 it increases linearly. note in the 3rd region the local minima count climbs faster than the trajectory lengths, in line with the other statistics. wow!

the GA found “a” desired solution in the long run analysis although not in the form expected. it is nearly ideal except for some large nonmonotone distances and region 2; regions 1/3 are acceptable in their dynamics, with region 3 as the presumed asymptotic behavior. the big question is if there are any more region changes possible after the large analysis region. the limitations of empirical analysis are emphasized with these results. no hint of the unexpected test region 2/3 behavior is hinted in the training region 1 statistics. but the test-divergence-from-training is not at all typical of machine learning apps either. its not just a simplistic increase in error. its almost as if its the output of a program/ algorithm that has 3 different conditional cases…! ๐Ÿ˜ฎ

actually the wondrous nature of this could be emphasized even further. a bunch of random walks are weighted and added together to get a distinctly less random walk. its almost unbelievable or magical. a dramatic, maybe vista-opening reaffirmation of the ancient/ deep principle, “the whole is greater than the sum of the parts.” or like a rabbit being pulled out of a hat. ๐Ÿ˜Ž โ—

4
w0      -1.2965207298328
w1      0.00881255269287658
w2      0.834874460020838
w3      0.488768570929333
w4      -0.841996911397055
w5      0.461962697182684
w6      0.624682934548628
w7      0.182038750890474
w8      1.11402628548688
w9      0.853460506803983
w10     0.0276695282228376
w11     -0.428453944156957
w12     0.179899938636543
w13     -0.0481205801620707

image

image

image

image

#!/usr/bin/ruby1.8
 
 
 
def stat(l)
        return 0, 0 if (l.empty?)
        t = t2 = 0
        l.each \
        {
            |x|
            t += x
            t2 += x ** 2
        }
        c = l.size
        a = t.to_f / c
        z = t2.to_f / c - a ** 2
        sd = Math.sqrt(z < 0 ? 0 : z)
        raise if (sd.nan?)
        return a, sd
end
 
def avg(l)
 
        t = 0
        l.each { |x| t += x }
        return t.to_f / l.size
end
 
def d(s)
        c = s.split('').select { |x| x == '1' }.size
        return c.to_f / s.length, s.length, c, s.size - c, c % 2, c % 3
 
end
 
def rank(n, l)
 
 
        s = n.to_s(2)
 
        m1 = Math.log(n)
        m2 = m1 ** 2
        m3 = m1 ** 0.5
 
        d1, w, c1, c0, c2, c3 = d(s)
        d2 = (0.5 - d1).abs
 
        a1, sd1 = stat(s.split('0').map { |x| x.length })
        a0, sd0 = stat(s.split('1').map { |x| x.length })
 
        l2 = [m1, m2, m3, d1, w, c1, c0, c2, c3, d2, a1, sd1, a0, sd0]
        t = 0.0
        l.each_with_index { |w, i| t += w * l2[i] }
        raise(l2.join(' ')) if (t.nan?)
        return t
 
end
 
def f3(n)
 
        return sprintf("%.3f", n).to_f
end
 
def f3m(l)
        l.map { |x| f3(x.to_f) }
end
 
def coef(x, y)
 
        x1 = x2 = y1 = y2 = xy = 0.0
 
        n = x.size
        n.times \
        {
                |i|
 
                x1 += x[i]
                y1 += y[i]
                xy += x[i] * y[i]
                x2 += x[i] ** 2
                y2 += y[i] ** 2
        }
 
        d = n * x2 - x1 ** 2
        a0 = (y1 * x2 - x1 * xy) / d
        a1 = (n * xy  - x1 * y1) / d
 
        return a0, a1
end
 
def f(n, w)
 
        l = []
        t = 0
        while (n != 1)
                l << rank(n, w).abs
                puts(l[-1]) if ($mode == 'x')
 
                n = (n % 2 == 0) ? n / 2 : (n * 3 + 1) / 2
                t += 1
        end
 
        c = 0
        m = l[0]
        x = j = 0
        l2 = []
        (1...l.size).each \
        {
                |i|
                c += 1 if (l[i] < l[i - 1])
                if (l[i] < m) then
                        l2 << i - j
                        m, j = [l[i], i]
                end
                x = [x, i - j].max
        }
        r = c.to_f / (l.size - 1)
        return x, stat(l2), r, l2.size, t
end
 
def eval1(p, z)
 
        n = 2 ** p + rand(2 ** (p - 1))
        return f(n, z)
end
 
def eval(z)
 
        l = []
        x = []
        p = 40
        10.times { |i| x << p; l << eval1(x[-1], z).flatten; p += 2 }
        l1 = l.transpose
        l2 = stat(l1[0]) + l1[1..-1].map { |l0| avg(l0) }
 
        y = l1[2]
        return -(coef(x, y)[1].abs), l2
end
 
def trend()
 
        l = File.open($fn).readlines
        n = ARGV[0].nil? ? l.size - 1 : ARGV[0].to_i
        d = l[n].split("\t")
 
        s = f3m(d.slice!(0, 8))
        w = f3m(d.slice!(0, $v))
 
        $stderr.puts(n, s.join("\t"), w.join("\t"), d.join("\t"))
 
        f = File.open('out2.txt', 'w')
        (2..750).each \
        {
                |p|
 
                n = 2 ** p + rand(2 ** (p - 1))
                l = f(n, w)
                f.puts(l.join("\t"))
                f.flush
        }
 
end
 
def trend2()
 
        exit
end
 
def rsign()
        return rand(2) ? -1 : 1
end
 
$mode = ARGV[0]
$v = 14
$fn = 'var.txt'
 
if (!['a'].member?($mode))
 
        trend2() if ($mode == 'x')
        trend()
        exit
end
 
l = []
 
50.times \
{
        |i|
        w = (1..$v).map { rand() * 1.5 * ((i % 2 == 0) ? 1 : rsign()) }
        l <<= [eval(w), w, "[#{i}]"]
}
 
f = File.open('out.txt', 'w')
f2 = File.open($fn, 'w')
 
mhi = nil
mlo = nil
 
j = 0
 
loop \
{
        a = l[rand(l.size)]
        b = l[rand(l.size)]
 
        x = a[1]
        y = b[1]
        r = rand($v)
 
        c = rand(4)
        h = nil
        case c
                when 0
                        z = x.dup
                        z[r] = 0.0
                when 1
                        z = x.dup
                        z[r] = rand() * 1.5 * rsign()
                when 2
                        z = x.dup
                        z[r] *= rand() * 1.5 * rsign()
                when 3
                        z = []
                        x.size.times { |i| z[i] = (rand(2) % 2 == 0) ? x[i] : y[i] }
                        h = "(#{a[2]}+#{b[2]})"
        end
 
        op = 'a'
        c.times { op.succ! }
 
        h = "#{a[2]}#{op}#{r}" if (h.nil?)
 
        t = eval(z)
        l.pop if (l.size > 200)
        l << [t, z, h]
        l = l.sort_by { |x| x[0][0] }.reverse
 
        hi = l[0]
        lo = l[-1]
 
        fhi = flo = false
        mhi, fhi = [hi, true] if (mhi.nil? || hi[0][0] >= mhi[0][0] && hi[2] != mhi[2])
        mlo, flo = [lo, true] if (mlo.nil? || lo[0][0] > mlo[0][0])
 
        ts = Time.now.to_s
 
        f.puts([f3(t[0]).abs, f3(mhi[0][0]).abs, f3(mlo[0][0]).abs, h.length, j, ts].join("\t")) if (flo)
        f.flush
        f2.puts((mhi + [j, ts]).join("\t")) if (fhi)
        f2.flush
        j += 1
}

(1/10) this is some new code with some enhancements. oh, and oops the prior analysis code was truncating weights to 3 decimal digits! ^^’ this was unintentional but on the flip side, the good news is that there is very good stability in the results, ie not much observed butterfly effect and even truncating a lot of decimal digits in the weights seems to have no noticeable effect on analysis statistics.

the code has some changes in the logging, a bunch of new genetic randomizing gene alterations too numerous to mention for now but easily understood by inspection of the code. it has new analysis code and other similar fitness functions, 3 types as labelled by letter modes. in each case it attempts to maximize upward slope after fitting by linear regression around the same training region:

  1. the sequential decreasing/nonreversal density
  2. the local minima count
  3. the local minima density (prior count divided by trajectory length)

got some interesting/ similar yet somewhat different/ improved results in these 3 runs labelled a, b, c. the ‘b’ metric performed best and found a new solution with max nonmonotonic run length of less than ~650. here are the weights and max nonmonotonic run length plots for the best solutions found after over 24hr of GA iterations, ‘a’ red, ‘c’ green, ‘b’ blue. in some ways these seem to be a newly scaled version of the prior solution, ie variations on a theme. the 3-region phenomenon continues at roughly the same numeric boundaries.

however the improvement in the max nonmonotonic run length ~650 of best solution ‘b’ over prior ~1400 is about 53% which is quite encouraging and even dramatic. am hunting for a low value here but havent figured out how to do this (yet) combined with a constant trend in a fitness function, so am relying on presumed indirect effects here through the other 3 fitness functions. it is not known yet if the improvement was due to the change in fitness functions or if its due to starting in better random starting conditions or even just randomness in the search etc; its currently too computationally expensive to do multiple runs. (wheres that open science collaboration team and supercomputing cluster when they’re needed?)

image

12
0       -0.0621151129675874
1       0.027749950267922
2       0.0285131621680541
3       -0.84576774323482
4       -0.901049776447273
5       -1.61758279610741
6       -0.178417524589377
7       0.0
8       -0.0
9       -1.66351603027066
10      0.0
11      -0.832684135084786
12      0.0
13      0.19429829301678
 
21
0       -0.937405391284104
1       0.0670177950097334
2       0.0
3       -1.73005189948666
4       -1.38257016553931
5       -0.558885208274334
6       -0.175932470473479
7       0.00144369304594031
8       0.0
9       0.463002776570181
10      0.746205672068276
11      0.790480164773667
12      0.0415956136680239
13      -0.918165827279676
 
11
0       4.15867051538285
1       -0.13286683498869
2       -0.49242392058947
3       0.0986854597916064
4       2.8536377301632
5       0.239334156410213
6       -0.341876102500191
7       0.14631302770963
8       -0.361365204553417
9       0.472727150190803
10      0.411456277848107
11      0.0
12      0.306902447155646
13      1.43546799909965
#!/usr/bin/ruby1.8



def stat(l)
        return 0, 0 if (l.empty?) 
        t = t2 = 0
        l.each \
        {
            |x|
            t += x
            t2 += x ** 2
        }
        c = l.size
        a = t.to_f / c
        z = t2.to_f / c - a ** 2
        sd = Math.sqrt(z < 0 ? 0 : z)
        raise if (sd.nan?)
        return a, sd
end

def avg(l)

        t = 0
        l.each { |x| t += x }
        return t.to_f / l.size
end

def d(s)
        c = s.split('').select { |x| x == '1' }.size
        return c.to_f / s.length, s.length, c, s.size - c, c % 2, c % 3

end

def rank(n, l)


        s = n.to_s(2)

        m1 = Math.log(n)
        m2 = m1 ** 2
        m3 = m1 ** 0.5

        d1, w, c1, c0, c2, c3 = d(s)
        d2 = (0.5 - d1).abs

        a1, sd1 = stat(s.split('0').map { |x| x.length })
        a0, sd0 = stat(s.split('1').map { |x| x.length })

        l2 = [m1, m2, m3, d1, w, c1, c0, c2, c3, d2, a1, sd1, a0, sd0]
        t = 0.0
        l.each_with_index { |w, i| t += w * l2[i] }
        raise(l2.join(' ')) if (t.nan?) 
        return t

end

def f3(n)

        return sprintf("%.3f", n).to_f
end

def f3m(l)
        l.map { |x| f3(x.to_f) }
end

def coef(x, y)

        x1 = x2 = y1 = y2 = xy = 0.0

        n = x.size
        n.times \
        {
                |i|

                x1 += x[i]
                y1 += y[i]
                xy += x[i] * y[i]
                x2 += x[i] ** 2
                y2 += y[i] ** 2
        }

        d = n * x2 - x1 ** 2
        a0 = (y1 * x2 - x1 * xy) / d
        a1 = (n * xy  - x1 * y1) / d

        return a0, a1
end

def f(n, w)

        l = []
        l1 = []
        t = 0
        while (n != 1)
                x = rank(n, w).abs
                l << x
                l1 << [Math.log(n), x]

                n = (n % 2 == 0) ? n / 2 : (n * 3 + 1) / 2
                t += 1
        end

        return l1 if ($mode == 'x')

        c = 0
        m = l[0]
        x = j = 0
        l2 = []
        (1...l.size).each \
        { 
                |i| 
                c += 1 if (l[i] < l[i - 1])
                if (l[i] < m) then
                        l2 << i - j
                        m, j = [l[i], i] 
                end
                x = [x, i - j].max
        }
        r = c.to_f / (l.size - 1)
        return x, stat(l2), r, l2.size, l2.size.to_f / l.size, t
end

def eval1(p, z)

        n = 2 ** p + rand(2 ** (p - 1))
        return f(n, z)
end

def eval(z)

        l = []
        x = []
        p = 40
        10.times { |i| x << p; l << eval1(x[-1], z).flatten; p += 2 }
        l1 = l.transpose
        l2 = stat(l1[0]) + l1[1..-1].map { |l0| avg(l0) }

        y = l1[3] if ($mode == 'a')
        y = l1[4] if ($mode == 'b')
        y = l1[5] if ($mode == 'c')

        return coef(x, y)[0], l2
end

def out(n, w)

        puts(n)
        w.each_with_index { |w, i| puts("#{i}\t#{w}") }

end

def info()

        File.open($fn).readlines.each_with_index \
        {
                |ln, i|
                d = ln.split("\t")
                puts([i, f3(d[0]), d[-2], d[-3].length, d[-1]].join("\t")) if ($mode == 'y')
                puts([i, d[-3]].join("\t")) if ($mode == 'z')
                if ($mode == 'w') then
                        s = d.slice!(0, $s)
                        w = d.slice!(0, $v).map { |x| x.to_f }
                        out(i, w)
                end
        }
        exit
end


def trend()

        l = File.open($fn).readlines
        n = ARGV[0].nil? ? l.size - 1 : ARGV[0].to_i
        d = l[n].split("\t")

        s = d.slice!(0, $s)
        w = d.slice!(0, $v).map { |x| x.to_f }
        out(n, w)

        f = File.open("out_#{n}.txt", 'w')
        (2..750).each \
        {
                |p|

                n = 2 ** p + rand(2 ** (p - 1))
                l = f(n, w)
                f.puts(l.join("\t"))
                f.flush
        }

end

def trend2()

        l = File.open($fn).readlines
        d = l[-1].split("\t")

        s = d.slice!(0, $s)
        w = d.slice!(0, $v).map { |x| x.to_f }

        f = File.open('out2.txt', 'w')
        p = 10
        19.times \
        {
                l = eval1(p, w)
                l.each_with_index { |x, i| f.puts([l.size - i, x].join("\t")) }
                p += 10
                f.puts
        }
        exit
end

def rsign()
        return (rand(2) % 2 == 0) ? -1 : 1
end

def rw()
        return rand() * 1.5 * rsign()
end

def re()
        return rand(2) % 2 == 0
end


$mode = ARGV[0]
$s = 9
$v = 14
$fn = 'var.txt'

if (!['a', 'b', 'c'].member?($mode))

        trend2() if ($mode == 'x')
        info() if (['w', 'y', 'z'].member?($mode))
        trend()
        exit
end


l = []

50.times \
{
        |i|
        w = (1..$v).map { rand() * 1.5 * rsign() }
        l <<= [eval(w), w, "[#{i}]"]
}

f = File.open('out.txt', 'w')
f2 = File.open($fn, 'w')

mhi = nil
mlo = nil

j = 0

loop \
{
        a = l[rand(l.size)]
        b = l[rand(l.size)]

        x = a[1]
        y = b[1]
        r = rand($v)

        c = rand(12)
        op = 'a'
        c.times { op.succ! }
        h = "#{a[2]}#{op}#{r}"

        case op
                when 'a'
                        z = []
                        x.size.times { |i| z[i] = re() ? x[i] : y[i] }
                        h = "(#{a[2]}+#{b[2]})"
                when 'b'
                        z = x.dup
                        z[r] = 0.0
                when 'c'
                        z = x.dup
                        z[r] += rw()
                when 'd'
                        z = x.dup
                        z[r] *= rw()
                when 'e'
                        z = x.dup
                        z[r] = rw()
                when 'f'
                        z = x.map { |w| re() ? w : w + rw() }
                when 'g'
                        z = x.map { |w| re() ? w : w * rw() }
                when 'h'
                        z = x.map { |w| re() ? w : w * rsign() }
                when 'i'
                        z = x.map { |w| re() ? w : rw() }
                when 'j'
                        z = x.map { |w| w + rw() }
                when 'k'
                        z = x.map { |w| w * rw() }
                when 'l'
                        z = x.map { |w| w * rsign() }
        end



        t = eval(z)
        l.pop if (l.size > 200)
        l << [t, z, h]
        l = l.sort_by { |x| x[0][0] }.reverse

        hi = l[0]
        lo = l[-1]

        fhi = flo = false
        mhi, fhi = [hi, true] if (mhi.nil? || hi[0][0] >= mhi[0][0] && hi[2] != mhi[2])
        mlo, flo = [lo, true] if (mlo.nil? || lo[0][0] > mlo[0][0])

        ts = Time.now.to_s

        f.puts([$mode, f3(t[0]).abs, f3(mhi[0][0]).abs, f3(mlo[0][0]).abs, h.length, j, ts].join("\t")) if (flo)
        f.flush
        f2.puts((mhi + [j, ts]).join("\t")) if (fhi)
        f2.flush
        j += 1
}

there are so many interesting ways to improve the algorithm and analyze this data right now its a bit overwhelming. the blog writeup is also very time consuming at this pt. the GAs throw off a lot of auxiliary data related to convergence etc, but also there are so many different/ similar metrics and fitness functions now. it could take awhile to sort through all this.

also need to do weight analysis to figure out whether to cut any metrics, and also which metrics are most effective by weight and thereby suggest new metrics. right now the weights seem to be generally nonzero for optimal solutions. but some different weights are zero and not in consistent positions. this is somewhat surprising that totally different weight configurations seem to be leading to similar structured solutions. a big possibly key, even critical mystery right now. ๐Ÿ˜ฎ โ“

โญ ๐Ÿ˜Ž oh some other way cool news. am formulating some rough ideas about computing density over FSMs as hinted in the last post. not sure exactly how to do it (yet) but it seems computable and maybe even in P time. was able to formulate it into a cstheory.se question that got a few votes but more importantly, hit the jackpot with a quick answer by Shallit. its always a very good day when someone with a wikipedia biography replies in cyberspace. maybe even the highlight of the week. thanks much JS! ๐Ÿ˜€ (maybe if this all works out mine will be forthcoming soon) ๐Ÿ˜‰ … JS has answered one of my other questions before also… and is also notable in that it takes something different for him to respond; he doesnt have too much activity on the site with currently only 42 answers and zero questions. a real life math guru at the top of the mountain!

(1/11) โ— just realized on editing my Collatz page that Shallit introduced the concept of use of regular languages in understanding the Collatz problem in 1991, ref [1] on that page. also added two newly googled interesting refs [27][28] apparently written by advanced undergraduates as senior theses, LaTourette and Chamberland. alas it seems a pity so few undergraduates seem to be exposed to the problem…

(1/14) โ— ๐Ÿ˜ฎ some deeper eyepopping/ jawdropping analysis of what turn out to be remarkable solutions. a basic analysis is to run the multiple evaluation statistics (in general measuring random walk monoticity) on the “unconditioned/ raw” collatz trajectories and compare those statistics directly with the “conditioned” walks. those reveal the strong/ definitive differences, but that is to be expected. it turns out, even more dramatic is to plot the original 1d walk vs conditioned 1d random walk as a curve (ie correlation between the two) to figure out how the algorithm is optimizing and “smoothing” the latter.

these following plots are for “weight set 1”. samples are taken at starting trajectory bit lengths at increasing lengths of 10 between 10 and 300. starting positions are plotted in red. then up to 20, 50, or 100 of the initial trajectory values are plotted on top in green. the 3 corresponding regions are now apparent. region 1, 2 are the rising and falling sections of the concave-downward curve, roughly parabolic. then there is a climbing region 3. the curve is clearly smoothed but is surprising and unexpected and seems to represent some attractor in the optimization strategy. note the trajectories are plotted right-to-left.

def stat(l)
        return 0, 0 if (l.empty?)
        t = t2 = 0
        l.each \
        {
            |x|
            t += x
            t2 += x ** 2
        }
        c = l.size
        a = t.to_f / c
        z = t2.to_f / c - a ** 2
        sd = Math.sqrt(z < 0 ? 0 : z)
        raise if (sd.nan?)
        return a, sd
end
  
def avg(l)
  
        t = 0
        l.each { |x| t += x }
        return t.to_f / l.size
end
  
def d(s)
        c = s.split('').select { |x| x == '1' }.size
        return c.to_f / s.length, s.length, c, s.size - c, c % 2, c % 3
  
end
  
def rank(n, l)
  
  
        s = n.to_s(2)
  
        m1 = Math.log(n)
        m2 = m1 ** 2
        m3 = m1 ** 0.5
  
        d1, w, c1, c0, c2, c3 = d(s)
        d2 = (0.5 - d1).abs
  
        a1, sd1 = stat(s.split('0').map { |x| x.length })
        a0, sd0 = stat(s.split('1').map { |x| x.length })
  
        l2 = [m1, m2, m3, d1, w, c1, c0, c2, c3, d2, a1, sd1, a0, sd0]
        t = 0.0
        l.each_with_index { |w, i| t += w * l2[i] }
        raise(l2.join(' ')) if (t.nan?)
        return t
  
end
  
def maxrun(l)
	
        m = l[0][1]

	l2 = []
	l1 = []
	l.each \
	{
		|x|
                if (x[1] < m) then
			l2 = []
			m = x[1]
		else
			l2 << x
                end
		l1 = l2 if (l2.size > l1.size)
        }
	return l1
end

def f(n, w)
  
        l = []
        while (n != 1)
                x = rank(n, w).abs
                l << [Math.log(n) / Math.log(2), x]
  
                n = (n % 2 == 0) ? n / 2 : (n * 3 + 1) / 2
        end
  
        return l
  
end

def g1(w)
	
	 [1,20,50,100].each_with_index \
	{
		|t, i|
		
		f = File.open("out#{t}.txt", 'w')

		p = 10
		d = 10
		
		while (p <= 300)
	  
			n = 2 ** p + rand(2 ** (p - 1))
			l = f(n, w)
			l[0...t].each { |xy| f.puts(xy.join("\t")) }
			f.puts if (t != 1)
			f.flush
			p += d
		end
		f.close
	}
	
end

def g2(w)
	
	c = 5
	f = File.open("out.txt", 'w')
	f2 = File.open("outx.txt", 'w')
	
	p = 10
	d = 10
	h = 300
	v = 500
	(c * c).times \
	{
		|i|
		
		x = (i % c) * h
		y = (i / c) * v
		
		n = 2 ** p + rand(2 ** (p - 1))
		l = f(n, w)
		l.each { |xy| f.puts([-xy[0] + x, xy[1] - y].join("\t")) }
		f.puts
		f2.puts([x, -y].join("\t"))
		f2.puts([x, -y + v].join("\t"))
		f2.puts
		f2.puts([x, -y].join("\t"))
		f2.puts([x - h, -y].join("\t"))
		f2.puts
		f.flush
		f2.flush
		p += d
	}
	f.close
	f2.close
	
end

def g3(w)
	
	f = File.open("outy.txt", 'w')
	f2 = File.open("outz.txt", 'w')
	
	p = 150
	n = 2 ** p + rand(2 ** (p - 1))
	l = f(n, w)

	l.each_with_index { |x| f.puts(x.join("\t")) }
	
	maxrun(l).each { |x| f2.puts(x.join("\t")) }
end

def trend()
  
        w = File.open("w#{ARGV[0]}.txt").readlines.map { |x| x.split[1].to_f }
	g1(w)
	g2(w)
	g3(w)
end

trend()

trendx

trendy

trendz

the next 3 plots contain 5×5 subplots of full trajectories starting at increasing bit lengths by 10, so from 10 to the 250 range, and arrange from top left to bottom right. the x axis is mirrored so that trajectories move left-to-right instead. they are for weight sets 1, 2, 3. here there is a bigger story, a lot to take in and it would definitely make sense as an animation! the weights 2, 3 are strikingly much smoother, the regions are slightly different, getting compressed, and also tend to climb much quicker, and the regions seem to correspond to the changing “bump” size/ shape at the end.

trendw1

trendw2

trendw3

โญ โญ โญ the sheer smoothness of these final 2 plots is stunning! more strong, exciting evidence of a real, unmitigated breakthrough. and it is not at all hinted in the random walk monotonicity statistics which generally still look quite noisy. showing how differences in quantitative metrics might not reveal fundamental shifts in qualitative worth and how critical different types of analysis/ pictures are for understanding the big(ger) picture.

๐Ÿ’ก conjecture: interpreting the diagrams it is natural to wonder where the longest nonmonotone run is occuring. maybe the longest nonmonotone run for different trajectories correlates/ corresponds to the left upward side of the bump in the final left-to-right plots (right before the steep falling section). this should be easy to determine but havent gotten to it yet.

(1/15) the conjecture was too easy! it is simple to demonstrate empirically (but to say nothing of proof). it explains the prior graphs in a very elegant way. the code has been modified to add the 3rd analysis routine that tracks the longest nonmonotone run and allows plotting it superimposed on the full run, and it always coincides as the “ending bump”.

some more interpretation. imagine computing running averages of the original collatz random walks. that would definitely also smooth out the function but from computation, the longest nonmonotone run lengths seem to be unbounded. therefore no finite averaging would lead to finite max nonmonotone run lengths. the weighting/ conditioning function seems to be “folding up” the random walk in some currently highly mysterious/ inexplicable way to achieve bounded nonmonotone run lengths. concepts of topology theory seem to be relevant here. it seems to be some kind of topological folding.

the plots also seem to immediately suggest a new natural fitness metric that might behave much more smoothly than the other metrics and lead to new solutions: the euclidean distance from the origin of (x, y) where x is the collatz trajectory value and y is the weighted/ conditioned trajectory value! cant wait to try it โ—

(1/22) โ— ๐Ÿ˜ฎ ๐Ÿ˜ฆ ๐Ÿ˜ณ ๐Ÿ˜ฅ bad news! heres some analysis code that does a bit-wise greedy search for long nonmonotonic runs, successfully finding large ones. then it looks at the unconditioned vs conditioned random walks and the longest nonmonotonic run for each (red, green respectively). both tend to trend upward. if the conditioned results levelled off while unconditioned walks climbed as in earlier plots it would indicate (potential) success, but they are roughly correlated so the formula is essentially failing on “hard” seeds. (wince) another indication of failure is that the longest nonmonotone runs are as high as ~900, which were not seen in the earlier analyses, which didnt find runs longer than ~650.

this is for weight set #3 (1st arg). there does seem to be a minor improvement effect by the algorithm after x=~250 bits. overall, it looks like the GA is training well on random inputs but not generalizing to “hard” seeds which seem to have some different distribution. aka the needles in the haystack throw a curve ball at the conditioning formula & it strikes out.

oh and the code gets a numeric overflow in the Math.log function when the # of bits hits 1024. ๐Ÿ˜ณ never pushed ruby that far before, bummer.

there is maybe some idea to salvage here. if the hard seeds can be differentiated from easier seeds somehow in the initial bit structure then those would be better to train the GA with.

def stat(l)
        return 0, 0 if (l.empty?)
        t = t2 = 0
        l.each \
        {
            |x|
            t += x
            t2 += x ** 2
        }
        c = l.size
        a = t.to_f / c
        z = t2.to_f / c - a ** 2
        sd = Math.sqrt(z < 0 ? 0 : z)
        raise if (sd.nan?)
        return a, sd
end
  
def avg(l)
  
        t = 0
        l.each { |x| t += x }
        return t.to_f / l.size
end
  
def d(s)
        c = s.split('').select { |x| x == '1' }.size
        return c.to_f / s.length, s.length, c, s.size - c, c % 2, c % 3
  
end
  
def rank(n, l)
  
  
        s = n.to_s(2)
  
        m1 = Math.log(n)
        m2 = m1 ** 2
        m3 = m1 ** 0.5
  
        d1, w, c1, c0, c2, c3 = d(s)
        d2 = (0.5 - d1).abs
  
        a1, sd1 = stat(s.split('0').map { |x| x.length })
        a0, sd0 = stat(s.split('1').map { |x| x.length })
  
        l2 = [m1, m2, m3, d1, w, c1, c0, c2, c3, d2, a1, sd1, a0, sd0]
        t = 0.0
        l.each_with_index { |w, i| t += w * l2[i] }
        raise(l2.join(' ')) if (t.nan?)
        return t
  
end

def f(n, w)
        l = []
        while (n != 1)
                l << [Math.log(n), rank(n, w).abs]
		
                n = (n % 2 == 0) ? n / 2 : (n * 3 + 1) / 2
        end
  
        return l
  
end

def maxrun(l)
	return 0 if (l.empty?)
        m = l[0]

	l2 = []
	l1 = []
	l.each \
	{
		|x|
                if (x < m) then
			l2 = []
			m = x
		else
			l2 << x
                end
		l1 = l2 if (l2.size > l1.size)
        }
	return l1.size
end

def fmt(h, c)
	h = h.dup
	h['n#'] = h.delete('n').to_s(2).length
	h['p#'] = h.delete('p').to_s(2).length
	h['l#'] = c
	return h.to_s.gsub('=>', ' => ')
end

def sample(t, w)
 
        l = [{'x' => 0, 'n' => 1, 'p' => 1, 'x2' => 0}]
 
        loop \
        {
		t = (0...l.size).max_by { |j| l[j]['x'] }
		
		z = l.delete_at(t)
		
		x, n, p, x2 = [z['x'], z['n'], z['p'], z['x2']]
	 
		puts(fmt(z, l.size))
		$stdout.flush
		
		p <<= 1
		l.push({'x' => x, 'n' => n, 'p' => p, 'x2' => x2})
 
		n2 = n + p
		l2 = f(n2, w)
		
		x1 = maxrun(l2.map { |z| z[0] })
		x2 = maxrun(l2.map { |z| z[1] })
		
		l.push({'x' => x1, 'n' => n2, 'p' => p, 'x2' => x2})
        }
end




w = File.open("w#{ARGV[0]}.txt").readlines.map { |x| x.split[1].to_f }

m, n, l = sample(nil, w)

trend3

(later) on immediate inspection of the binary seed values, the “pattern” is almost trivial! the greedy code is generating a long bitstring of mostly 0’s and some scattered 1’s as seeds. the 0 runs follow some irregular pattern. this simplified code (removing the formula weighting logic) outputs a list of the lengths of the 0 runs separated by 1s for bitwise greedy search of long nonmonotone runs. this suggests quite simply that (deja vu) a better random seed generator will range over all the different 0/1 densities! the current scheme is centered around 50% density with low deviation.

def f(n)
        l = []
        while (n != 1)
                l << Math.log(n)
  
                n = (n % 2 == 0) ? n / 2 : (n * 3 + 1) / 2
        end
  
        return l
  
end

def maxrun(l)
	return 0 if (l.empty?)
        m = l[0]

	l2 = []
	l1 = []
	l.each \
	{
		|x|
                if (x < m) then
			l2 = []
			m = x
		else
			l2 << x
                end
		l1 = l2 if (l2.size > l1.size)
        }
	return l1.size
end

def fmt(h, c)
	h = h.dup
	h['n#'] = h.delete('n').to_s(2).length
	h['p#'] = h.delete('p').to_s(2).length
	h['l#'] = c
	return h.to_s.gsub('=>', ' => ')
end

def sample()
 
        l = [{'x' => 0, 'n' => 1, 'p' => 1}]
	
	h = {}
        loop \
        {
		t = (0...l.size).max_by { |j| l[j]['x'] }
		
		z = l.delete_at(t)
		
		x, n, p = [z['x'], z['n'], z['p']]
	 
		if (!h.member?(n)) then
			puts("\t" + fmt(z, l.size))
			h[n] = nil	
			ns = n.to_s(2)
			p(ns.split('1').map { |s| s.length })
			$stdout.flush
			break if (ns.length > 1000)
		end
			
		p <<= 1
		l.push({'x' => x, 'n' => n, 'p' => p})
 
		n2 = n + p
		
		x1 = maxrun(f(n2))
		
		l.push({'x' => x1, 'n' => n2, 'p' => p})
        }
end

sample()

	{"x" => 0, "n#" => 1, "p#" => 1, "l#" => 0}
[]
	{"x" => 4, "n#" => 2, "p#" => 2, "l#" => 1}
[]
	{"x" => 7, "n#" => 3, "p#" => 3, "l#" => 2}
[]
	{"x" => 7, "n#" => 4, "p#" => 4, "l#" => 4}
[]
	{"x" => 56, "n#" => 5, "p#" => 5, "l#" => 5}
[]
	{"x" => 80, "n#" => 11, "p#" => 11, "l#" => 11}
[0, 5]
	{"x" => 80, "n#" => 25, "p#" => 25, "l#" => 26}
[0, 13, 5]
	{"x" => 126, "n#" => 27, "p#" => 27, "l#" => 28}
[0, 15, 5]
	{"x" => 130, "n#" => 80, "p#" => 80, "l#" => 81}
[0, 52, 15, 5]
	{"x" => 181, "n#" => 82, "p#" => 82, "l#" => 83}
[0, 1, 52, 15, 5]
	{"x" => 208, "n#" => 122, "p#" => 122, "l#" => 123}
[0, 39, 1, 52, 15, 5]
	{"x" => 214, "n#" => 131, "p#" => 131, "l#" => 132}
[0, 8, 39, 1, 52, 15, 5]
	{"x" => 286, "n#" => 162, "p#" => 162, "l#" => 163}
[0, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 298, "n#" => 213, "p#" => 213, "l#" => 214}
[0, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 308, "n#" => 238, "p#" => 238, "l#" => 239}
[0, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 329, "n#" => 243, "p#" => 243, "l#" => 244}
[0, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 384, "n#" => 251, "p#" => 251, "l#" => 252}
[0, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 401, "n#" => 260, "p#" => 260, "l#" => 261}
[0, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 406, "n#" => 277, "p#" => 277, "l#" => 278}
[0, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 433, "n#" => 329, "p#" => 329, "l#" => 330}
[0, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 444, "n#" => 333, "p#" => 333, "l#" => 334}
[0, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 481, "n#" => 374, "p#" => 374, "l#" => 375}
[0, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 497, "n#" => 376, "p#" => 376, "l#" => 377}
[0, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 512, "n#" => 393, "p#" => 393, "l#" => 394}
[0, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 524, "n#" => 409, "p#" => 409, "l#" => 410}
[0, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 533, "n#" => 449, "p#" => 449, "l#" => 450}
[0, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 657, "n#" => 465, "p#" => 465, "l#" => 466}
[0, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 706, "n#" => 495, "p#" => 495, "l#" => 496}
[0, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 722, "n#" => 578, "p#" => 578, "l#" => 579}
[0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 742, "n#" => 579, "p#" => 579, "l#" => 580}
[0, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 749, "n#" => 597, "p#" => 597, "l#" => 598}
[0, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 752, "n#" => 618, "p#" => 618, "l#" => 619}
[0, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 764, "n#" => 671, "p#" => 671, "l#" => 672}
[0, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 793, "n#" => 680, "p#" => 680, "l#" => 681}
[0, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 809, "n#" => 695, "p#" => 695, "l#" => 696}
[0, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 972, "n#" => 706, "p#" => 706, "l#" => 707}
[0, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 996, "n#" => 785, "p#" => 785, "l#" => 786}
[0, 78, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 1024, "n#" => 797, "p#" => 797, "l#" => 798}
[0, 11, 78, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 1050, "n#" => 967, "p#" => 967, "l#" => 968}
[0, 169, 11, 78, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 1051, "n#" => 981, "p#" => 981, "l#" => 982}
[0, 13, 169, 11, 78, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]
	{"x" => 1062, "n#" => 1014, "p#" => 1014, "l#" => 1015}
[0, 32, 13, 169, 11, 78, 10, 14, 8, 52, 20, 17, 0, 82, 29, 15, 39, 15, 16, 1, 40, 3, 51, 16, 8, 7, 4, 24, 50, 30, 8, 39, 1, 52, 15, 5]

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s