Note: I've just migrated to a different physical server to run Spivey's Corner,
with a new architecture, a new operating system, a new version of PHP, and an updated version of MediaWiki.
Please let me know if anything needs adjustment! – Mike

Deriving regexps from finite automata

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

The standard argument for one half of Kleene's theorem is often acknowledged to be complicated. It becomes more so when presented in a form that involves quantities written [math]\displaystyle{ E_{q_1q_2}^{S\cup\{q\}} }[/math]. Much better is to view it as a sequence of transformations on hybrid automata, which are like NFAs except that each transition can be labelled with a regular expression. The definition is easy:

A string \(s\) is in the language accepted by a hybrid automaton if there is a path of length \(n\) in the automaton and a way of splitting \(s\) into \(n\) pieces such that each piece is in the language described by the regexp labelling the corresponding step in the path.

For simplicity, it is often best to begin the sequence of transformations by forming the 'suspension' of the original automaton, adding new start and finish states \(q_i\) and \(q_f\) with \(\epsilon\)-transitions so that the machine never re-enters the new start state or carries on from the new finish state.

Then, as in the algorithm given, we begin to eliminate intermediate states until there are none left, replacing each path that goes via an intermediate state with direct connections labelled with an appropriate regexp. To make the construction clearer, we can first split a state that has more than one outgoing transition, exploiting nondeterminism to replace it with multiple states that have the same incoming transitions but have only one outgoing transition each. This move is not necessary to the construction, but it reduces the mental bandwidth needed in concrete examples to see that each stage in the transformation is equivalent to the preceding stage. In the example attached below, splitting \(q_2\) reveals that it has two 'purposes': one is to return to \(q_1\) via a \(b\), the other to go on to state \(q_0\) via an \(a\), and these paths yield different elements of the final regular expression.

Here (click for a larger size) are the steps needed to deal with the automaton mentioned on the problem sheet.

Clean-Kleene.jpg