[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]
How to use yacc to test for ambiguity: Difference between revisions
No edit summary |
No edit summary |
||
Line 50: | Line 50: | ||
Now, with the longer string, and with @ELSE BASIC@ as the remaining input, the conflict gives two possibilities: one option is to shift the @ELSE@, in which case we can complete a parse by shifting the token @BASIC@ on top of it, then reducing in several stages as follows. | Now, with the longer string, and with @ELSE BASIC@ as the remaining input, the conflict gives two possibilities: one option is to shift the @ELSE@, in which case we can complete a parse by shifting the token @BASIC@ on top of it, then reducing in several stages as follows. | ||
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE <u>BASIC</u> | |||
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE | IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE <u>inner</u> | ||
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE | IF expr THEN IF expr THEN inner ELSE <u>IF expr THEN inner ELSE stmt</u> | ||
IF expr THEN IF expr THEN inner ELSE | IF expr THEN IF expr THEN inner ELSE <u>inner</u> | ||
IF expr THEN IF expr THEN inner ELSE | IF expr THEN <u>IF expr THEN inner ELSE stmt</u> | ||
IF expr THEN | IF expr THEN <u>inner</u> | ||
IF expr THEN | <u>IF expr THEN stmt</u> | ||
stmt | |||
</ | |||
The other option is immediately to reduce by @stmt --> inner@, beginning a sequence that goes like this: | The other option is immediately to reduce by @stmt --> inner@, beginning a sequence that goes like this: | ||
IF expr THEN IF expr THEN inner ELSE IF expr THEN <u>inner</u> ELSE BASIC | |||
IF expr THEN IF expr THEN inner ELSE IF expr THEN | IF expr THEN IF expr THEN inner ELSE <u>IF expr THEN stmt</u> ELSE BASIC | ||
IF expr THEN IF expr THEN inner ELSE | IF expr THEN <u>IF expr THEN inner ELSE stmt</u> ELSE BASIC | ||
IF expr THEN | IF expr THEN inner ELSE <u>BASIC</u> | ||
IF expr THEN inner ELSE | IF expr THEN inner ELSE <u>inner</u> | ||
IF expr THEN inner ELSE | <u>IF expr THEN inner ELSE stmt</u> | ||
<u>inner</u> | |||
stmt | |||
</ | |||
The string | The string | ||
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE BASIC | IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE BASIC |
Revision as of 10:53, 21 September 2023
There is no algorithm that decides definitively whether a given context free grammar is ambiguous, but that does not prevent us from using yacc pragmatically. If we submit a grammar (unannotated with precedence declarations) to yacc, and yacc accepts it, then we know the grammar is in the class LALR(1), and all such grammars are unambiguous. If yacc says No and reports parsing conflicts, then either the grammar is ambiguous, or it is unambiguous, but "too difficult" for yacc to handle. In the latter case, we can look at yacc's diagnostic output and use it to find a situation where a conflict is encountered: that can help us to find a string that can be derived in more than one way.
Consider, for example, the following interesting grammar provided by Matteo Cati. It purports to be an unambiguous grammar for the classic Algol--60 dangling else problem.
stmt : inner | IF expr THEN stmt inner : BASIC | IF expr THEN inner ELSE stmt
In order to submit this to yacc, we must surround it with some apparatus declaring various tokens, and supply semantic actions, all of which can be written as the trivial unit expression {()}
. A complete input is shown towards the end of this note. Invoking ocamlyacc
with
$ ocamlyacc -v cati.mly
produces the error message
1 shift/reduce conflict
and also (because of the -v
flag) a report file cati.output
that gives details of parsing automaton ocamlyacc
compiled, and where it discovered the conflict. In what follows, I've used (because why wouldn't I?) the output of my own yacc variant, but the output of ocamlyacc
is equivalent, even if the states are numbered differently. The whole of the report file is shown at the end of this note.
Examining the report, we see the conflict occurs in state 11:
State 11: shift/reduce conflict (shift 12, reduce 1) on ELSE State 11 inner --> IF expr THEN inner . ELSE stmt (4) stmt --> inner . (1) EOF reduce 1 ELSE shift 12
If the machine reaches state 11 and the lookahead is ELSE
, then the machine cannot decide between shifting the ELSE
(in the hope of reducing with the production inner --> IF expr THEN inner ELSE stmt
), or reducing immediately with the production stmt --> inner
(presumably so that it can then reduce with stmt -> IF expr THEN stmt
).
We can look for an ambiguous string by asking what sequence of moves in the machine can lead to state 11. Sadly, the simplifications from the full LR(1) process incorporated into yacc mean that the shortest such sequence does not yield an ambiguous string, but a longer sequence does.[1] Consider the following string of tokens and variables:
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner
Starting in state 2 (states 0 and 1 are present for housekeeping purposes), we can trace out a trajectory in the machine as follows. In the report from yacc, state 2 contains the instruction
State 2: IF shift 4
And state 4 contains
State 4: expr goto 8
Continuing in this way, we can trace out the following path that ends at state 11, incidentally visiting state 11 along the way.
2 IF 4 expr 8 THEN 9 IF 4 expr 8 THEN 9 inner 11 ELSE 12 IF 4 expr 8 THEN 9 inner 11
The shorter sequence that stops after reaching state 11 for the first time also leaves the yacc-based parser unable to decide what to do next, but it does not allow both decisions to lead to a succesful parsing outcome.
Now, with the longer string, and with ELSE BASIC
as the remaining input, the conflict gives two possibilities: one option is to shift the ELSE
, in which case we can complete a parse by shifting the token BASIC
on top of it, then reducing in several stages as follows.
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE BASIC IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE inner IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE stmt IF expr THEN IF expr THEN inner ELSE inner IF expr THEN IF expr THEN inner ELSE stmt IF expr THEN inner IF expr THEN stmt stmt
The other option is immediately to reduce by stmt --> inner
, beginning a sequence that goes like this:
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE BASIC IF expr THEN IF expr THEN inner ELSE IF expr THEN stmt ELSE BASIC IF expr THEN IF expr THEN inner ELSE stmt ELSE BASIC IF expr THEN inner ELSE BASIC IF expr THEN inner ELSE inner IF expr THEN inner ELSE stmt inner stmt
The string
IF expr THEN IF expr THEN inner ELSE IF expr THEN inner ELSE BASIC
is not a string of tokens, but the sequence
IF X THEN IF X THEN BASIC ELSE IF X THEN BASIC ELSE BASIC
leads, with some reduction along the way, to the same configuration, and is therefore a witness that the grammar is ambiguous.
Ocamlyacc input
%token BASIC X IF THEN ELSE %type<unit> stmt %start stmt %% stmt : inner {()} | IF expr THEN stmt {()} ; inner : BASIC {()} | IF expr THEN inner ELSE stmt {()} ; expr : X {()} ;
Report file
Rules: 0: *start* --> *entry* EOF 1: stmt --> inner 2: stmt --> IF expr THEN stmt 3: inner --> BASIC 4: inner --> IF expr THEN inner ELSE stmt 5: expr --> X 6: *entry* --> *1* stmt State 0 *start* --> . *entry* EOF (0) *1* shift 2 *entry* goto 1 State 1 *start* --> *entry* . EOF (0) State 2 *entry* --> *1* . stmt (6) BASIC shift 3 IF shift 4 stmt goto 5 inner goto 6 State 3 inner --> BASIC . (3) . reduce 3 State 4 stmt --> IF . expr THEN stmt (2) inner --> IF . expr THEN inner ELSE stmt (4) X shift 7 expr goto 8 State 5 *entry* --> *1* stmt . (6) . reduce 6 State 6 stmt --> inner . (1) . reduce 1 State 7 expr --> X . (5) . reduce 5 State 8 stmt --> IF expr . THEN stmt (2) inner --> IF expr . THEN inner ELSE stmt (4) THEN shift 9 State 9 stmt --> IF expr THEN . stmt (2) inner --> IF expr THEN . inner ELSE stmt (4) BASIC shift 3 IF shift 4 stmt goto 10 inner goto 11 State 10 stmt --> IF expr THEN stmt . (2) . reduce 2 State 11: shift/reduce conflict (shift 12, reduce 1) on ELSE State 11 inner --> IF expr THEN inner . ELSE stmt (4) stmt --> inner . (1) EOF reduce 1 ELSE shift 12 State 12 inner --> IF expr THEN inner ELSE . stmt (4) BASIC shift 3 IF shift 4 stmt goto 13 inner goto 6 State 13 inner --> IF expr THEN inner ELSE stmt . (4) . reduce 4
- ↑ The longer sequence can be found immediately by using an LR(1) parser generator, such as the one at http://jsmachines.sourceforge.net/machines/lr1.html . Here is a suitable grammar:
S' -> S S -> I S -> if E then S I -> b I -> if E then I else S E -> x
The analysis reveals a parsing conflict, and the shortest string that leads to the state with a conflict is the one shown as an ambiguous input above.