Reconstructing Ruby, Part 8: Make our parser reentrant

Read Part 7 in case you missed it.

Reentrantcy basically means our parser doesn't use any global state. This will eventually allow us to write threaded programs. Since Ruby supports threading (kind of) we'll want to make our parser reentrant. We'll talk more about threading once we start working on the actual implementation, but for now, we'll just make the necessary changes to our parser.

Luckily bison provides a convenient directive for generating a reentrant parser. All we need to do is add the following to our parser:

%pure-parser

This makes it so yylval, yylloc, and other internal variables are no longer global variables but local. Since they are all now local variables we're we could potentially run yyparse in multiple threads without consequence.

At this point, running make will generate a bunch of errors. We'll tackle this set first:

parse.tab.c:1368:16: error: too many arguments to function  
  call, expected 0, have 1
      yychar = YYLEX;
               ^~~<del>
parse.tab.c:713:23: note: expanded from macro 'YYLEX'  
# define YYLEX yylex (&yylval)
               </del>~<del>  ^</del>~~~~
parse.y:29:3: note: 'yylex' declared here  
  extern int yylex(void);
  ^
1 error generated.  

The first error is telling us that yylex now takes an argument. In this case, yylval needs to be passed in. We'll need to change our extern to reflect this. Move this line:

  extern int yylex(void);

below our %union directive and change void to YYSTYPE *yylval like so:

%union {
  int ival;
  float fval;
  char *sval;
}

%{
  extern int yylex(YYSTYPE *yylval);
%}

We need to have this declaration below %union because the union directive will define the type YYSTYPE.

To fix the next error:

ruby.l:21:3: error: use of undeclared identifier 'yylval'  
{ yylval.sval = strdup(yytext); TOKEN(STRING); }

We'll need to change the definition of yylex in our lexer. It should now accept yylval as an argument. We'll do this by definining a YY_DECL macro in our ruby.l file like so:

%{
  #include <stdio.h>
  #include <string.h>
  #include "node.h"
  #include "parse.tab.h"
  #define YY_DECL int yylex(YYSTYPE *yylval)
  #define TOKEN(id) return t##id
  #define KEYWORD(id) return k##id
%}

If you run make now, you'll see it complains about yylval being a pointer and that we should use -> instead of .. Let's make those changes to our lexer rules:

\"([^"]|\\.)*\" { yylval->sval = strdup(yytext); TOKEN(STRING); }
\'([^']|\\.)*\' { yylval->sval = strdup(yytext); TOKEN(STRING); }
{NUMBER}(\.{NUMBER}|(\.{NUMBER})?[eE][+-]?{NUMBER}) { yylval->fval = atof(yytext); TOKEN(FLOAT); }
{NUMBER} { yylval->ival = atoi(yytext); TOKEN(NUMBER); }
[a-z_][a-zA-Z0-9_]* { yylval->sval = strdup(yytext); TOKEN(ID); }
[A-Z][a-zA-Z0-9_]* { yylval->sval = strdup(yytext); TOKEN(CONSTANT); }

If we run make one last time everything compiles correctly. We now have a reentrant parser!

One other thing that will be useful to setup now is to change our parser to take a parameter. This parameter will be a struct that holds information about the current parser status. We'll start simple by only storing the filename and line number for now.

First let's create the file node.h where we'll store our struct definition. In the future, we'll use this file to define some nodes for our Abstract Syntax Tree or AST. Inside of the node.h file, add the following:

#ifndef _NODE_H_
#define _NODE_H_

typedef struct parser_state {  
    char *source_file;
    int source_line;
} parser_state;

#endif

In the actual ruby implementation, this struct is known as parser_params. source_file and source_line are both parser_ruby_sourcefile and parser_ruby_sourceline respectively.

Now that we've defined our struct we'll need to include node.h before we include parse.tab.h anywhere. You'll need to make this change in both main.c and ruby.l. You'll also want to include node.h at the top of parse.y.

Now right below %pure-parser add the following lines:

%parse-param { parser_state *state }
%lex-param { state }

The first line changes yyparseto take in an argument to manage it's state. The second line also passes this state to the lexer. We'll need to make some changes to yylex as well. First we'll need to change our forward declaration of yylex in parse.y to the following:

%{                                                       
  extern int yylex(YYSTYPE *yylval, parser_state *state);
%}                   

Then you'll change this line in ruby.l:

#define YY_DECL int yylex(YYSTYPE *yylval)

to:

#define YY_DECL int yylex(YYSTYPE *yylval, parser_state *state)

Then in order to pull this all together we'll need to change main so that it passes in a parser_state, and additionally, we'll set the filename:

int main(int argc, char *argv[]) {  
  parser_state state = { NULL, 0 };

  if(argc > 1) {
    state.source_file = strdup(argv[1]);
    yyin = fopen(argv[1], "r");
  }

  yyparse(&state);
  return EXIT_SUCCESS;
}

You'll also need to add #include <string.h> to the top of your main.c in order to support strdup. Lastly, in order to see our parser state working correctly change the program rule in parse.y to the following:

program: expressions { printf("%s\n", state->source_file); }  

Now let's try it out:

$ make
$ ./ruby program.rb
program.rb  

Nice it works!

One problem we currently have in our implementation is that we're leaking memory. In the next post I'll discuss setting up Valgrind, a memory profiler, using Docker (since OSX doesn't support Valgrind very well) in order to catch and fix our memory leaks.

If you're having any problems you can check the reference implementation on GitHub or look specifically at the diff to see what I've changed. Additionally, if you have any comments or feedback I'd greatly appreciate if you left a comment!