Implementation Details for Hopcroft & Ullman's Universal Turing Machine

Tape encoding

It's probably easiest to demonstrate the tape encoding by example.

Say we have a 2-state Turing machine that accepts strings of the form (01)* (that is, 0 or more repetitions of the pattern "01"). Its tape alphabet is {0, 1, b} where "b" denotes a blank, and its state transition table looks like this:

Tape symbol
b 0 1
State 1 (3,R,1) (2,R,0) reject
2 reject reject (1,R,1)
3 accept

Encoding is done with these rules:

  1. The transitions of the accepting state are encoded by the symbol "0".

  2. The transition reject is encoded by the symbol "0".

  3. For other transitions, the next state (1, 2, ...) should be replaced by the equivalent number of "1"s. An "R" or "L" is appended, followed by the tape symbol to be printed ("0" or "1").

  4. Each transition is preceded by the symbol "c". Transitions are read from the state transition table in top-to-bottom and right-to-left order (i.e., first the transition for state 1, tape symbol 0, then 1, then b, then the same for rest of the states).

  5. States are preceded by an additional "c".

  6. The first state is preceded by an additional "c".

  7. The data immediately follows the state transition information, and is preceded by "ccc". Data consists of zero or more "0"s and "1"s, terminated with a "b".

Applying these to our sample Turing machine, and with a sample data tape of "0101", we get (with some added spaces)

    ccc 111R1 c 11R0 c 0 cc 0 c 0 c 1R1 cc 0 c 0 c 0 ccc 0101
        ^--------------^    ^---------^    ^-------^     ^--^
            state 1           state 2       state 3      data

Removing the spaces yields "ccc111R1c11R0c0cc0c0c1R1cc0c0c0ccc0101", which is the encoded state transition table and data.

Initialization

The UTM begins in state 1.

The tape encoding the Turing machine to be run must be modified in the following ways:

  1. Reading from the left, the third "c" is replaced by the special "marked" symbol (mc). Note that parenthesized symbols denote one symbol on the tape, and are not treated as literal strings.

  2. The first symbol in the data area is replaced by its corresponding marked version (m0), (m1), or (mb).

The UTM's tape head should begin by pointing to the leftmost symbol.

For the example tape encoding above, this yields

       m                               m
     ccc111R1c11R0c0cc0c0c1R1cc0c0c0ccc0101
     ^

Where ^ denotes the position of the UTM's tape head.

Now let 'er rip.


Last updated 4 April 2003
http://www.rdrop.com/~half/General/UTM/ImplementationDetails.html
All contents ©2000-2002 Mark L. Irons except encoding scheme ©1969 Hopcroft & Ullman.