Sunday 27 May 2007

NOTES ON A PARSER / PROGRAM CONVERTER

NOTES ON A PARSER / PROGRAM CONVERTER
for NATURAL to JAVAJ2EE program conversion
by Ali Riza SARAL
Statement

MOVE A TO B

MOVE is the characteristic word which identifies the beginning of a statement.

Each statement begins with a characteristic word.

All the characteristic words that specify the beginning of a statement must be kept in an array or similar structure.

There must be a fuction which scans the characteristic word array and returns whether a word isCharacteristic(word).

This function will be used to find out the end of a statement and the beginning of a new one where the last parameter may be a calculation or complex expression. In this case knowing that a new statement has begun may make things very simple.

Parsing

MOVE A TO B

MOVE,MOVE -> A,MOVE-PARM1 -> TO,MOVE-TO -> B,MOVE-PARM2
There must be a token stack where statements are pushed and popped. Using a stack helps in two aspects:
1- Decides the meanings of tokens
2- It overcomes the effect of broken lines
For ex.
MOVE A
TO
B
will produce the same token stack status:
MOVE,MOVE -> A,MOVE-PARM1 -> TO,MOVE-TO -> B,MOVE-PARM2

How is parsing done

The relative positions of parameters and key words decide the meaning of parameters.
For ex. MOVE-PARM1 comes after the key word MOVE,MOVE… MOVE-PARM2 comes after the key word MOVE-TO…

Unfortunately some statements of NATURAL can get really complex with all of its options.
In this case, parsing can become cumbersome but still reasonably manageable. On general terms, there are initiator key words which are continued by a specific sequence of key words and parameters. This requires to sense different options of the tokenValues after initiators. At the end of each sequence lies a parameter that you assign a meaning
such as MOVE-PARM2.

Ex for MOVE

Get Next Token Loop:
IF token is MOVE push MOVE,MOVE
IF last token is MOVE push A,MOVE-PARM1
IF token is TO push TO TO,MOVE-TO
IF last token is TO push B,MOVE-PARM2

If next token is characteristic BREAK
End Get Next Token Loop.
Of course this demonstration does not depict the input line management and statement management according to the characteristic word.

Be aware that there may be more than one Tos in different options of a single statement and this may require you to use…, things may get real complex such as can be seen below:
if ((object.last(head)->meaning == "EXAMINE") (object.last(head)->meaning == "EXAMINE-FULL"))
{
if (tokenVal == "FULL")
{
head=object.push(head, "FULL", "EXAMINE-FULL");
}
else
{
head=object.push(head, tokenVal, "EXAMINE-PARM1");
}
}
else
if (object.last(head)->meaning == "EXAMINE-PARM1")
{
head=object.push(head, tokenVal, "EXAMINE-PARM2");
}
else
if (object.last(head)->meaning == "EXAMINE-PARM2")
{
if (tokenVal == "REPLACE")
{
head=object.push(head, "REPLACE", "EXAMINE-REPLACE");
}
if (tokenVal == "DELETEPRM")
EXAMINE P1 P2 REPLACE
EXAMINE P1 P2 DELETE
EXAMINE FULL P1
The key in parsing is, you process the line sequentially but you do not know which type of options you are going to meet. Use of IF – ELSE statements checking the past tokenval assures that you are progressing sequentially on a certain type of statement. Checking the current tokenVal helps to decide the meaning of the current tokenVal. At the bottom of each sequence there is a return statement which assures that for each token it processes it pushes a single meaning to the stack and returns back. After it takes the next token and check that it is not charateristic, it understands that the statement is continuing and comes back to the same sequence of if statements but this time it has progressed one token forward and the if statements sequence processes the meaning of the next token.
To be followed by;
input token management
statement management
calculation/condition expression handling
statement conversion
variable conversion
data structures conversion
screen conversion to jsp