grammar compression on collatz sequences

graph500this is an idea that I tried out a few months ago but failed with a negative result. a slight variation leads to positive results that look quite promising to me. am immed posting it in very preliminary/rough form because its quite novel and am excited about the success. suspect it could lead to a lot of different analysis ideas and maybe be a key milestone/change in direction.

for years have been very interested in grammar compression related to TCS type research.[1] it seems to me it could lead to very deep results. kolmogorov complexity analysis has some fairly rough natural relation to grammar compression. there are natural ways to express ideas about various important complexity results in terms of grammar compression. I tend to see a lot of complexity theory in terms of compression ideas & think the field would very much benefit from a review or conceptual reorientation/reformulation from that pov/angle. eg in the way that much of complexity theory can be approach from the circuit pov. ie a comprehensive “worldview”.

the hypothesis being tested is that there may be some complex grammar (de)compositions of collatz-related sequences. the idea that failed was to use the exact sequences. there is some repeat of sequences in this data (eg some trajectories have the same “tails”) but there is experimentally no “hierarchy” in these decomposition.

in contrast, the successful result is to look for compressions of the sequences of 0/1s that denote the two different collatz steps, either \frac{x}{2} or 3x+1 denoted 0/1 respectively. there is significant hierarchy in these grammar compressions. the approach here is a greedy grammar compression based on finding longest identical substrings. the graph was obtained via a cool online/web-based graphviz viewer.[3] this worked for starting seeds up to about 750 before causing the viewer/graphviz generator to fail. above is the graph for max 500 seed. as in the graph, there is significant hierarchical grammar decomposition structure.

however unfortunately this layout algorithm is clearly not so smart, eg the “ugdec” vertices form a block on the left that very obviously would work better (with fewer edge crossings) on the right side of the layout. it doesnt exacty seem to be a force-based layout. back to the drawing board! also, the patterns will likely not at all become apparent without much larger graphs, and graphviz seems like it might not scale too well to large graphs [at least that was the case last time I played with it…]

suspect at moment this could be the start of a “long relationship” (of investigation using this & similar techniques) wink .. “just the beginning”? 😀

ok, you want the punchline right? the idea is to find grammar decompositions that have more regularity or structure than others. then the holy grail of “induction” cant be far behind.[2] notice that decompositions are not at all unique and the same sequence may have significantly different decompositions. this “flexibility” is a sure sign of where to look for different induction/theorem-proving directions.

oh! I just tried the different layout methods neato and twopi on the same graph and got much different layouts. wonder if they work for more vertices? the 3rd graph is for seeds up to 750 in neato.

graph_neato

graph_twopi

graph_neato_750

def substr(l)

    c = l.size
    
    l2 = []

    c.times { |i| l2 << l[i..-1] }
    l2.sort!


    p = []

    (c - 1).times \
    {
        |i|

        n = 0
        while (l2[i][n] == l2[i + 1][n] && n < l2[i].size && n < l2[i + 1].size)
            n += 1
        end
        p[i] = n
    }

    a = (0...p.size).to_a.map { |i| [p[i], i] }

    a = a.sort.reverse

    sz = 0
    m2 = []
    sz2 = 0

    (1...l.size).each \
    {
        |i|

        x = a[i-1][1]
        next if (l2[x].size < sz)
        mx = nil
        m = []
        sz2 = sz
        (x - 1).downto(0).each \
        {
            |j|
            mx = mx.nil? ? p[j] : [p[j], mx].min
            
            break if (mx == 0 || sz > mx)
        
            d = (l2[x].size - l2[j].size).abs
            next if (d < sz)
            
            w = [mx, d].min
            next if (sz > w)
            
            if (m.empty? || w > sz) then
                m = [c - l2[x].size]
                f = true
            else
                next if (m.find { |z| (c - l2[j].size - z).abs < sz })
            end
            m << c - l2[j].size
#            p(['***', w, m])
            sz = w
        }
        sz2, m2 = [[sz, m], [sz2, m2]].max_by { |x| [x[0], x[1].size] }
    }

    return [sz2, m2.sort]
    
end

def compress(l)

    sz, m = substr(l)
    
#    p([sz, m])
#    exit
    
    return l if (sz <= 1)

    l2 = compress(l[m.first, sz])

    s = $s.dup

    $x[s] = l2
    
    if ($x2.member?(l2)) then
	s = $x2[l2]
    else
        $x2[l2] = s
	$y[s] = $n
        $n += 1
        $s.succ!
    end
    
    l2 = l.dup
    
    m.size.times \
    {
        |i|
        l2[m[i], sz] = s
        m = m.map { |j| j - sz + 1 }
    }
    
    return compress(l2)
end

def expand(l, l2, x)
    l1 = l2.dup
    
    i = 0
    while (i < l1.size)
        
        if (!/^[a-z]+$/.match(l1[i]))
            i += 1
            next
        end
        
        l1[i, 1] = x[l1[i]]
    end
    raise if (l1 != l)
end


def compress2(l)
    $s = 'a'
    $x = {}
    $x2 = {}
    $y = {}
    $n = l.max + 1
    l = l.map { |x| x.to_s }
    l2 = compress(l)
    
    expand(l, l2, $x)
    return l2
end

def f(n)
	l = [n]
	c = 0
	n2 = n
	while (n2 >= n)
		n2 = n2 * 3 + 1
		n2 >>= 1 while (n2.even?) 
		l << n2
		c += 1
	end
	return l
end		
		
def f2(n)
	l = []
	c = 0
	n2 = n
	while (n2 > 1)
		n2 = (n2 * 3 + 1) / 2
		l << n2 % 2
		n2 >>= 1 while (n2.even?) 
		c += 1
	end
	return l
end		

def out()
	$x.each \
	{
		|x, y|
		s = !y.select { |z| /^[a-z]+$/.match(z) }.empty?
		p([x, s, y])
	}

end

n = 3
c = 0
x2 = 0
l2 = []
loop \
{
	l = f2(n)
	l2 += l
	n += 2
	break if (l2.size >= 500)
}

l = compress2(l2)
#out()

puts("graph {")
$x.each \
{
	|x, y|
	y.select { |z| /^[a-z]+$/.match(z)  }.each \
	{
		|z|
		puts([x, '--', z, ';'].join("\t"))
	}
}
puts("}")
  • [1] grammar compression, wikipedia
  • [2] grammar induction, wikipedia
  • [2] erdos simple online graphviz viewer
    Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s