Compiler Project Additional Notes

Simplified Symbol Table

In these notes we change the miniJava language specification by limiting variable scopes to two levels: global and local. This simplifies the coding. Note that, as before, each procedure has its own local scope. This means that in
    foo() {int x; {int y; ...} x=y; }
the y is at the same scope level as x; the y does not go away when the compound statement in which it is defined is terminated. Thus, the final x=y is legal.

The full multi-scope implementation is still encouraged, but get this simpler "dual-scope" version working first.

To implement this dual-scope change, we simply require that newScope() must not be called except at the top level of a procedure. We can do that in one of two ways. The first is by arranging so that newScope() is not called when we enter non-top-level compound statements in compileCompound (modify compileCompound() to contain something like:
    if (isTopLevel) symTable.newScope();    // ditto for endScope()
Althernatively, we can eliminate newScope()/endScope() from compileCompound and move them to compileFunction(). To move to the multi-scope case, once SymbolTable supports that, simply go back to having newScope()/endScope() called unconditionally in compileCompound().

class SymbolTable

This class needs fields for the following:
Note that the local-declaration map will be reinitialized every time newScope() is called; endScope() will simply assign this map the value null. In fact, instead of isGlobal you can simply base the local/global decision on whether the local-var map is null or not.

The allocate() method has now been replaced by allocVar(), allocProc(), and allocConst(). Each of these takes parameters for the information it needs; note that in allocVar() the variable's offset is determined within the SymbolTable class (using the next-global/local-offset markers), and so is not a parameter (this is a correction from version 1.1).

Each of allocVar/Proc/Const calls a now-private allocate(String ident, IdentInfo ii) method to do the actual insertion, after allocVar(), etc have filed in the fields of the IdentInfo record. It is allocate()'s job to detect duplicate declarations, as well. Note that allocate() returns null if there was a problem, and returns ii if there was no problem. Note also that allocate() has to determine which "scope map", the global one or the local one, is to be used for the insertion. And note that allocate() (unlike lookup()) always looks at exactly one scope, the "current" one.

Here is what needs to be done in class SymbolTable:
  1. Finish allocate(), which needs to check for duplicate declaration (and, if so, call error() and return null), and otherwise insert the (ident, ii) pair into the current scope map (local if applicable, otherwise global).
  2. implement lookup(), which searches the local scope, if applicable, and then if not found searches the global scope. Again it should call error() if there is no match found; this means the var/const/proc was undeclared.
  3. implement newScope(), which simply creates a new local-vars map object, and resets the local-vars marker to 0. Also, isGlobal would be set to false, if you're using this. In the dual-scope version, it is an error to call newScope() if isGlobal is already false.
  4. implement endScope(), which sets the local-vars map to null and sets isGlobal to true again. It is an error to call endScope() if isGlobal is already true.
  5. implement stackFrameSize() so that it returns the number of local variables allocated in this local scope. It should only be called if isGlobal is false.

class Compiler

In the Compiler class you will have to make changes to the following:
The places are marked in comments with the tag "pld12".

You should NOT have to change the "parsing structure"; ie you should NOT have to add any calls to t.token()!!! If you do that, the parser will get "out of sync" and fail to parse.

I have eliminated the old calls to the non-SymbolTable versions of allocate(), lookup(), etc, and declared the SymbolTable object. I've also commented out most of my LHack(), GHack(), FHack() code.

The changes to invocation of newScope()/endScope() were outlined way at the beginning: make sure newScope/endScope are only called in compileCompound() if isTopLevel is true, or else move them to compileFunction().

The changes that involve adding things to the symbol table (compileFunction() and compileDeclaration()) are relatively straightforward. You call symTable.allocVar() (or allocConst() or allocProc()), with the identifier string and the supplemental parameter (isGlobal for variables, the constant value for constants, and the entry point for procedures). The existing code in compileDeclaration() to generate the Machine.ALLOC statement should still work, provided SymbolTable.stackFrameSize() is working.

In compileIdentStmt(), there are two cases: assignment and procedure call. For assignment, you use
    varInfo = symTable.lookup(ident)
to get the information, and then call varInfo.getAddr() and varInfo.getIsGlobal() to get the isGlobal and location values. You no longer use GHack or LHack. You should verify that varInfo.isVar() returns true; that is, that the identifier in question is in fact a variable (not a const or proc). The existing code to emit the appropriate Machine.STOR or Machine.STORF instruction should now work.

For procedures, you follow the same strategy but retrieve the procedure's entryPoint using IdentInfo.getEntryPoint(). Again, you should use IdentInfo.isProc() to verify that the identifier in question is in fact a procedure name.

Appropriate use of isVar(), isConst(), isProc() (defined in class IdentInfo) should mean that you never have to use the enumerated IdentInfo.IDType at all.

In compileFactor(), you need to change only the "if (isIdent(theToken))" case. However, the identifier can now be either a variable or a constant. Following the line
    IdentInfo theInfo = symTable.lookup(ident);
you should test theInfo.isVar() and theInfo.isConst() (if it's neither, that's an error!). The existing code deals only with the isVar() case; I've started to set things up for checking two cases but you'll need to complete that. Get rid of my "its_a_variable" test; use theInfo.isVar() instead. In the variable case, you will now get values for theAddr and isGlobal from theInfo. The code that emits LOAD/LOADF should still work.

You will need to complete the case for constants, following the style of the cs.emitLOADINT  in the following "if (isNumber(theToken))" case. You should have a third case to handle the error if theInfo.isProc() is true.

Multi-scope implementation

Once you have the dual-scope version working, to implement the multiple-scopes version the changes will be almost exclusively to SymbolTable. However, you will have to change Compiler.compileCompound() to make sure that newScope() and endScope() are called at the beginning and end of each "compound" statement (delimited with "{" and "}").

The symbol-table changes will be to newScope(), endScope(), allocate(String ident, IdentInfo ii), and lookup(String ident). Lookup will look in each scope in turn, starting with the most local. newScope() will create a new scope and endScope will zap it; note that the underlying implementation will likely involve some sort of list of Map objects. allocVar/Const/Proc will not change, only allocate(); note that allocate() will involve looking at only the top-level ("last", or most-recent) scope map.