If you've ever written a compiler using compiler construction tools, you have probably heard about or even used LEX for generating a lexical analyser. This article tries to give an insight into the working of LEX.
Example :
lex -T prog_file.l
The finite state machine used by LEX is deterministic in nature i.e. it is a DFA. The simulation of the constructed DFA is done in the lex.yy.c file. Hence, a LEX compiler constructs a DFA according to the specifications of the regular expression in the LEX program, and generates a simulation algorithm (to simulate the DFA) and a matching switch-case algorithm (to match and execute the appropriate action if the DFA enters an accept state).
DFA_simulator()
current_state = start_state
c = get_next_char()
while(c != EOF)
current_state = transition(current_state , c)
c = get_next_char()
if(current_state ∈ Final_states)
/*ACCEPT*/
else
/*REJECT*/
The information about all the transitions made by the DFA can be obtained from the decision table (generally a two dimensional matrix) through the transition() function.
A brief introduction
The LEX tool is used to find a certain pattern in the input stream and execute a corresponding action associated with it, as specified in the LEX program. The patterns are specified in the form of regular expressions and the actions as C code. Technically, LEX is a compiler which compiles a LEX program (prog_file.l) and generates a C file (lex.yy.c) which is in turn compiled using a C compiler to generate an executable file, which is the generated lexical analyser.
But, why use LEX ?
The reason LEX is used instead of cascading a series of if-else statements with stcmp() conditions in an attempt to manually write a lexical analyzer in C , is because LEX offers :- Ability to handle character sequences as abstract entities
- Ease of handling errors
- Speed and efficiency
So what does this generated lex.yy.c file contain ?
Conceptually, LEX converts all the regular expressions into a finite state machine which it uses to accept or reject a string in the input stream. The corresponding action is executed when the machine is in accept state. The LEX compiler stores information about the constructed finite state machine in the form of a decision table (transition table) in the lex.yy.c file. Also the corresponding actions and the information regarding when they are to be executed is stored in the lex.yy.c file. A transition() function is used to access the decision table. LEX makes it's decision table visible if we compile the program with the -T flag.Example :
lex -T prog_file.l
The finite state machine used by LEX is deterministic in nature i.e. it is a DFA. The simulation of the constructed DFA is done in the lex.yy.c file. Hence, a LEX compiler constructs a DFA according to the specifications of the regular expression in the LEX program, and generates a simulation algorithm (to simulate the DFA) and a matching switch-case algorithm (to match and execute the appropriate action if the DFA enters an accept state).
How is the constructed DFA simulated ?
The working of the constructed DFA is simulated using the following algorithm.DFA_simulator()
current_state = start_state
c = get_next_char()
while(c != EOF)
current_state = transition(current_state , c)
c = get_next_char()
if(current_state ∈ Final_states)
/*ACCEPT*/
else
/*REJECT*/
The information about all the transitions made by the DFA can be obtained from the decision table (generally a two dimensional matrix) through the transition() function.