Reconstructing Ruby, Part 5: The dreaded shift/reduce

Read Part 4 in case you missed it.

Sorry for the months of silence. While this post is going to be short, I'm getting back to working on new content for the series.

So one problem that you'll often see when writing a parser is a shift/reduce conflict. In fact, at this point, if you run make you'll probably see the following:

bison -d parse.y  
parse.y: conflicts: 1 shift/reduce  
flex ruby.l  
cc -o ruby main.c parse.tab.c lex.yy.c  

A shift/reduce conflict happens when we introduce an ambiguity in our grammer. An ambiguity means the parser doesn't know if it should shift the next terminal onto the stack or if it should reduce according to a specified rule. You might also encounter a reduce/reduce conflict where the parser doesn't know which rule it should use to reduce, but those are typically easy to resolve.

Given we have a shift/reduce conflict, it would be useful for us to be able to see what is causing the current shift/reduce conflict. We can do this by changing our Makefile rule for parse.tab.c to the following:

parse.tab.c: parse.y  
    bison -v -d parse.y

The -v flag is used to generate an output file which breaks down the rules and states of our parser. A state refers to how the parser has interpreted the different terminals. As it reduces these terminals to non-terminals or shifts new tokens on to its stack it will change which state it's in.

Now, when we run make it will generate the file parse.output. At the top of that file you should see something like:

State 8 conflicts: 1 shift/reduce  

If we look at state 8, we see the following:

state 8                                         

    5 expression: expression . tPLUS expression
    5           | expression tPLUS expression .

    tPLUS  shift, and go to state 7

    tPLUS     [reduce using rule 5 (expression)]
    $default  reduce using rule 5 (expression)

Here we can see that our parser is unsure whether it should shift a tPLUS token or reduce using rule 5 which reduces expression tPLUS expression to expression. A situation where this ambiguity would show up is in the program:

1 + 2 + 3  

Essentially our parser doesn't know if it should perform the operations as (1 + 2) + 3 or 1 + (2 + 3). While this isn't a problem mathematically, we should do our best to eliminate ambiquities.

Luckily for us we can easily solve this by telling the "+" token how it should reduce. In this case we'll tell it to always reduce everything to the left of it (essentially choosing the (1 + 2) + 3 path). To solve this add:

%left tPLUS

right above the %token declarations. Now when we run make we'll no longer see the shift/reduce conflict. We'll continue to utilize the parser.output file whenever a shift/reduce conflict arises.

In the next post we'll introduce a simple Abstract Syntax Tree, or AST, and the grammar a bit to cover more of the ruby language.

If you're having any problems you can check the reference implementation on GitHub or look specifically at the diff to see what I've changed. Additionally, if you have any comments or feedback I'd greatly appreciate if you left a comment!

Update: read Part 6 of this series.