A Lexical Analyzer for a small language Lotus


Token types:

A lexical analyzer for a small language, called "Lotus ". Lotus contains the following six types of tokens:

1. An identifier is a sequence of letters and digits; the first character must be a letter. All identifiers are returned as the same token type. The string of each identifier is returned as the attribute value of the token.
2. The following identifiers are reserved for use as keywords, and may not be used otherwise:
else exit int if read return while write
Each keyword is returned as a different token. No attribute value is returned with a keyword.
3. An integer constant consists of a sequence of digits. All integer constants are returned as the same token type. The integer value of each integer constant is returned as the attribute value of the token.
4. Operators include
+ - * / % == != > >= < <= && || ! = ; , ( ) { }
Each operator is returned as a different token. No attribute value is returned with an operator.
5. White spaces. Blanks, tabs, and newlines are ignored except as they serve to separate tokens.
6. Comments. A comment starts with the characters -- and ends with a newline. Comments are ignored except as they serve to separate tokens.


Output format:

Identifiers: print "Identifier:", a space, then the string of the identifier.
Keywords: print "Keyword:", a space, then the string of the keyword.
Integer constants: print "Integer:", a space, then the value of the integer constant.
Operators: print "Operator:", a space, then the string of the operator.

Also, The lexical analyzer handles lexical errors and prints error messages.


A sample program:

-- A program to sum 1 to n
int main( ) {
    int n;
    int s;

    read n;
    if (n < 0) {
        write -1;
        exit 0;
    }
    else {
        s = 0;
    }
    while (n > 0) {
        s = s + n;
        n = n ¡V 1;
    }
    write s;
    return 0;

}