# grammar compression on collatz sequences

this is an idea that I tried out a few months ago but failed with a negative result. a slight variation leads to positive results that look quite promising to me. am immed posting it in very preliminary/rough form because its quite novel and am excited about the success. suspect it could lead to a lot of different analysis ideas and maybe be a key milestone/change in direction.

for years have been very interested in grammar compression related to TCS type research.[1] it seems to me it could lead to very deep results. kolmogorov complexity analysis has some fairly rough natural relation to grammar compression. there are natural ways to express ideas about various important complexity results in terms of grammar compression. I tend to see a lot of complexity theory in terms of compression ideas & think the field would very much benefit from a review or conceptual reorientation/reformulation from that pov/angle. eg in the way that much of complexity theory can be approach from the circuit pov. ie a comprehensive “worldview”.

the hypothesis being tested is that there may be some complex grammar (de)compositions of collatz-related sequences. the idea that failed was to use the exact sequences. there is some repeat of sequences in this data (eg some trajectories have the same “tails”) but there is experimentally no “hierarchy” in these decomposition.

in contrast, the successful result is to look for compressions of the sequences of 0/1s that denote the two different collatz steps, either $\frac{x}{2}$ or $3x+1$ denoted 0/1 respectively. there is significant hierarchy in these grammar compressions. the approach here is a greedy grammar compression based on finding longest identical substrings. the graph was obtained via a cool online/web-based graphviz viewer.[3] this worked for starting seeds up to about 750 before causing the viewer/graphviz generator to fail. above is the graph for max 500 seed. as in the graph, there is significant hierarchical grammar decomposition structure.

however unfortunately this layout algorithm is clearly not so smart, eg the “ugdec” vertices form a block on the left that very obviously would work better (with fewer edge crossings) on the right side of the layout. it doesnt exacty seem to be a force-based layout. back to the drawing board! also, the patterns will likely not at all become apparent without much larger graphs, and graphviz seems like it might not scale too well to large graphs [at least that was the case last time I played with it…]

suspect at moment this could be the start of a “long relationship” (of investigation using this & similar techniques) wink .. “just the beginning”? 😀

ok, you want the punchline right? the idea is to find grammar decompositions that have more regularity or structure than others. then the holy grail of “induction” cant be far behind.[2] notice that decompositions are not at all unique and the same sequence may have significantly different decompositions. this “flexibility” is a sure sign of where to look for different induction/theorem-proving directions.

oh! I just tried the different layout methods neato and twopi on the same graph and got much different layouts. wonder if they work for more vertices? the 3rd graph is for seeds up to 750 in neato.

def substr(l)

c = l.size

l2 = []

c.times { |i| l2 << l[i..-1] }
l2.sort!

p = []

(c - 1).times \
{
|i|

n = 0
while (l2[i][n] == l2[i + 1][n] && n < l2[i].size && n < l2[i + 1].size)
n += 1
end
p[i] = n
}

a = (0...p.size).to_a.map { |i| [p[i], i] }

a = a.sort.reverse

sz = 0
m2 = []
sz2 = 0

(1...l.size).each \
{
|i|

x = a[i-1][1]
next if (l2[x].size < sz)
mx = nil
m = []
sz2 = sz
(x - 1).downto(0).each \
{
|j|
mx = mx.nil? ? p[j] : [p[j], mx].min

break if (mx == 0 || sz > mx)

d = (l2[x].size - l2[j].size).abs
next if (d < sz)

w = [mx, d].min
next if (sz > w)

if (m.empty? || w > sz) then
m = [c - l2[x].size]
f = true
else
next if (m.find { |z| (c - l2[j].size - z).abs < sz })
end
m << c - l2[j].size
#            p(['***', w, m])
sz = w
}
sz2, m2 = [[sz, m], [sz2, m2]].max_by { |x| [x[0], x[1].size] }
}

return [sz2, m2.sort]

end

def compress(l)

sz, m = substr(l)

#    p([sz, m])
#    exit

return l if (sz <= 1)

l2 = compress(l[m.first, sz])

s = $s.dup$x[s] = l2

if ($x2.member?(l2)) then s =$x2[l2]
else
$x2[l2] = s$y[s] = $n$n += 1
$s.succ! end l2 = l.dup m.size.times \ { |i| l2[m[i], sz] = s m = m.map { |j| j - sz + 1 } } return compress(l2) end def expand(l, l2, x) l1 = l2.dup i = 0 while (i < l1.size) if (!/^[a-z]+$/.match(l1[i]))
i += 1
next
end

l1[i, 1] = x[l1[i]]
end
raise if (l1 != l)
end

def compress2(l)
$s = 'a'$x = {}
$x2 = {}$y = {}
$n = l.max + 1 l = l.map { |x| x.to_s } l2 = compress(l) expand(l, l2,$x)
return l2
end

def f(n)
l = [n]
c = 0
n2 = n
while (n2 >= n)
n2 = n2 * 3 + 1
n2 >>= 1 while (n2.even?)
l << n2
c += 1
end
return l
end

def f2(n)
l = []
c = 0
n2 = n
while (n2 > 1)
n2 = (n2 * 3 + 1) / 2
l << n2 % 2
n2 >>= 1 while (n2.even?)
c += 1
end
return l
end

def out()
$x.each \ { |x, y| s = !y.select { |z| /^[a-z]+$/.match(z) }.empty?
p([x, s, y])
}

end

n = 3
c = 0
x2 = 0
l2 = []
loop \
{
l = f2(n)
l2 += l
n += 2
break if (l2.size >= 500)
}

l = compress2(l2)
#out()

puts("graph {")
$x.each \ { |x, y| y.select { |z| /^[a-z]+$/.match(z)  }.each \
{
|z|
puts([x, '--', z, ';'].join("\t"))
}
}
puts("}")

• [1] grammar compression, wikipedia
• [2] grammar induction, wikipedia
• [2] erdos simple online graphviz viewer