Fibonacci Regex Perversity

Repeated Substitutions

Consider these two regex substitutions:

s/fi?b/i/
s/fii(i*)b/f$1bfi$1b/

(For those unfamiliar with Perlish regexes: that first one says “replace the string fb or fib with the string i”. The second one says “replace a string fiiXb with fXbfiXb, where X is zero or more i s.”)

We can repeatedly apply these rules to a string until the string stops changing. So for example, our string might mutate as follows:

fiiiiib
fiiibfiiiib
fibfiibfiibfiiib
ifiibfiibfiiib
ifbfibfbfibfibfiib
iiiiiifbfib
iiiiiiii

Expanding the path of fiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiib is left as an exercise to the reader, although this perl script might help:

#!/usr/bin/perl -w

$_ = "fiiiiib";
print "$_\n";

print "$_\n" while s/fi?b/i/g || s/fii(i*)b/f$1bfi$1b/g;

What on earth is this all this substituion doing? Well, it is calculating Fibonacci numbers of course!

Thinking In Unary

Regexes don’t handle arithmetic well, so we represent numbers in unary ... a string of n i s represents the number n. When dealing with unary, you can add numbers by simply appending them. f and b are like parens around the number we’re calculating the Fibonacci number of. So iiiii represents the number 5, and fiiiiib represents the fifth Fibonacci number, aka fib(5).

So the sequence of strings above could also be written:

fib(5)
fib(3)+fib(4)
fib(1)+fib(2)+fib(2)+fib(3)
1+fib(0)+fib(1)+fib(0)+fib(1)+fib(1)+fib(2)
6+fib(0)+fib(1)
8

So really, any language that allows a sufficiently powerful regex mechanism is able to calculate Fibonacci numbers, albeit using the worst algorithm in the world.

Turing Machines

And it is pretty easy to see how to implement a Turing machine by representing each state transition as a regex substitution. For example, we can represent the "tape" as a string of 0s and 1s, and the "head" as a symbol inserted just before the "current" symbol. The tape can be extended indefinitely by matching the end of the string as if it were a blank symbol.

Taking the Copy Subroutine as an example, using 0 and 1 for the tape symbols 0 and 1, and using A through E to represent the head in states s0 through s5, and H for the head when halted. Translating to that notation, the state transitions are:

Current State Current Symbol New Symbol Move Head New State
A 0     H
A 1 0 R B
B 0 0 R C
B 1 1 R B
C 0 1 L D
C 1 1 R C
D 0 0 L E
D 1 1 L D
E 0 1 R A
E 1 1 L E

We can cope with the unbound tape by either matching $ (end-of-string) as if it were a 0, or by simply extending the tape any time the head is at the very end of it, eg:

s/([A-Z])$/${1}0/

And the rest of the state transitions become simple string substitutions. To move the head right, simply swap it with the symbol you've just written. To move it left, put it behind the preceding symbol:

s/A0/H0/
s/A1/0B/
s/B0/0C/
s/B1/1B/
s/0C0/D01/
s/1C0/D11/
s/C1/1C/
s/0D0/E00/
s/1D1/D11/
s/E0/1A/
s/0E1/E01
s/1E1/E11

We can write this into a Perl script too:

#!/usr/bin/perl -w

# Initial state of the tape.
$_ = "A1111";
print "$_\n";

# Look out for the halting state
while (! /H/) {

    # If we're at the end of the tape, extend it.
    s/([A-Z])$/${1}0/;

    # Do exactly one substitution.
    s/A0/H0/   ||
    s/A1/0B/   ||
    s/B0/0C/   ||
    s/B1/1B/   ||
    s/0C0/D01/ ||
    s/1C0/D11/ ||
    s/C1/1C/   ||
    s/0D0/E00/ ||
    s/1D0/E10/ ||
    s/0D1/D01/ ||
    s/1D1/D11/ ||
    s/E0/1A/   ||
    s/0E1/E01/ ||
    s/1E1/E11/ ;

    print "$_\n";
}

and when we run it we can watch the turing machine at work:

A111
0B11
01B1
011B
0110C
011D01
01E101
0E1101
E01101
1A1101
10B101
101B01
1010C1
10101C
1010D11
101D011
10E1011
1E01011
11A1011
110B011
1100C11
11001C1
110011C
11001D11
1100D111
110D0111
11E00111
111A0111
111H0111

so these languages are bound to be Turing Complete as well, even if they do turn out to be Turing Tarpits.

A Farewall To Unary

Unary arithmetic is tedious and inefficient. Fortunately, Perl Regexs also offer an "expression" mode, which lets us do away with unary and instead write some "advanced" rules which handle decimal arithmetic:

s/(\d+)\+(\d+)/$1+$2/eg;
s/(\d+)-(\d+)/$1-$2/eg;

Now, this is kind of cheating: those aren't really regular expression substitutions at all! If we're going to allow the s///e form, we're really allowing anything at all, since you could just do s/(.*)/artibrary_function($1)/. In fact, we can even go a bit silly and say that if it looks like an arithmetic expression it is Perl's problem:

s/(\d+([-+*\/]\d+)+)/eval($1)/eg

But the point is, once we've defined some basic arithmetic in these terms, we can define the rest of our fib function in such a way as to use these:

s/fib\([01]\)/1/g;
s/fib\((\d+)\)/fib($1-1)+fib($1-2)/g;

We can run this set of expressions just like before:

fib(5)
fib(5-1)+fib(5-2)
fib(4)+fib(3)
fib(4-1)+fib(4-2)+fib(3-1)+fib(3-2)
fib(3)+fib(2)+fib(2)+fib(1)
fib(3)+fib(2)+fib(2)+1
fib(3-1)+fib(3-2)+fib(2-1)+fib(2-2)+fib(2-1)+fib(2-2)+1
fib(2)+fib(1)+fib(1)+fib(0)+fib(1)+fib(0)+1
fib(2)+1+1+1+1+1+1
fib(2)+6
fib(2-1)+fib(2-2)+6
fib(1)+fib(0)+6
1+1+6
8

What's The Point Of All This?

I’m quite interested in substitution as a kind of pure functional programming. This article is slowly expanding in that direction ... More later.