Using Flex and Bison to parse the regular expression

kkken.eth
4 min readJul 25, 2021
Photo by Sincerely Media on Unsplash

What are Flex and Bison?

Flex and Bison are tools for parsing the input string. They were originally tools for building compilers. But they are also useful in many other areas. Flex can transfer a string into the tokens. Bison will check these tokens if it is correct. In this article, We’ll look at an example of using Flex and Bison to parse the regular expression.

Flex

Flex is a scanner that can divide a string into tokens. Let’s just check out the code directly.

// file name is "parser.l"%{
// part1: declarations
#include "y.tab.h"
%}
%%
// part2: translation rules
[a-z] {
yylval.charVal = yytext[0]
return LOWERCASE
}
[A-Z] {
yylval.charVal = yytext[0]
return UPPERCASE
}
[0-9] {
yylval.intVal = atoi(yytext);
return INTEGER;
}
%%
// part3: auxiliary functions

As you can see, the file divides into three parts.

1. Declarations

You can declare some variable or function in this area. Flex will copy the code to the .c file.

2. Translation rules

The second part is the most important also the kernel of Flex. [a-z] is the regular expression that represents the letter a, b, c, …, z. letter ‘a’ matches this expression. If you don’t know what string matches this regular expression. You can go to this website to check.

Next, the code in the scope is C code. That means what Flex will do after the input string matches the regular expression you set. yytext is a *char data type that represents the string. charVal is defined in Bison file which we will talk about it later. LOWERCASE is a token. when input ‘a’ is scanned by Flex, It will transfer it into the token LOWERCASE and tell Bison.

So, if the input is ‘a’, what these codes do is to know what the input is. And tell Bison it is a LOWERCASE token. this token’s value is ‘a’.

3. auxiliary functions

The third part is like the first part. Usually, in this area, you will implement the code you define in the first part. Flex will also copy the code to the .c file.

Bison

Bison is a parser that can parse the token that comes from Flex. The file also divided into three parts.

// the filename is "parser.y"%token INTEGER LOWERCASE UPPERCASE LEFT_B_P RIGHT_B_P COMMA
%union {
char charVal;
int intVal;
}
%{
// part1: declarations
#include "stdio.h"
#include "stdlib.h"
%}
%%
// part2: grammar rules
program:
program statement '\n' {printf("success!");}
|
;
statement:
expr size
;
size:
INTEGER {$$ = $1;}
|LEFT_B_P size RIGHT_B_P
|LEFT_B_P size COMMA size RIGHT_B_P
|LEFT_B_P size COMMA RIGHT_B_P
|'+'
|'*'
;
%%
// part3: supporting C routines
int main(void)
{
yyparse();
return 0;
}

At the top of the file is some declarations of the tokens. Commonly used declarations are: %token, %union %start %type %left %right %nonassoc, etc.

1. Declarations

You can write some definitions here. Bison will copy the code to the .c file.

2. Grammar rules

This part is composed of rules and actions (including C code). Bison’s basic rule is BNF grammar. But it makes a little simplification to input easily.

size:
INTEGER {$$ = $1;}

Let’s just look at this part. the ‘size’ token can be an ‘INTEGER’ token. When the input matches the grammar, it will execute the code in {}. The code in {} is C code. ‘$$=$1’ means ‘size=INTEGER’. For example, if the input is ‘3’, the ‘INTEGER’ token will be 3, and you can get this number from ‘$1’ variable and assign the number to the ‘size’ token by using '$$=$1'. (Remember to put the semicolon at the end of the code because this is the C code.)

3. auxiliary functions

This part is the same as the third part in Flex. Bison will copy the code into the .c file.

Makefile

At this point, the build process is complex enough to be worth putting into a Makefile.

CC = gcc
OUT = parser.out
all :
flex parser.l
bison -vdty parser.y
$(CC) -o $(OUT) lex.yy.c y.tab.c
clean :
rm -rf *.tab.c *.tab.h *.output lex.yy.c $(OUT)

First it runs flex to create lex.yy.c, then it runs bison with the -d(for “definitions” file) flag, which creates parser.tab.c and parser.tab.h. Then it compiles them together, along with the flex library. If you are developing some complicated program, you can add the -d(debug) flag when you run the Flex. The code will be like this:

flex -d parser.l

Conclusion

This is my note for learning Flex and Bison. Of course, there are more powerful things in Flex and Bison. But this is just for a beginner who doesn’t know where to start. If you want to see the whole project, below is my code on GitHub. You can download it and run it by yourself. It will make you feel how powerful Flex and Bison are. Hope you guys are doing well. Thanks for your reading.

--

--