Compiler Project Additional Notes
Simplified Symbol Table
In these notes we change the miniJava language specification by
limiting variable scopes to two levels: global and local. This
simplifies the coding. Note that, as before, each procedure has its own
local scope. This means that in
foo() {int x; {int y; ...} x=y; }
the y is at the same scope level as x; the y does not go away when the
compound statement in which it is defined is terminated. Thus, the
final x=y is legal.
The full multi-scope implementation is still encouraged, but get this simpler "dual-scope" version working first.
To implement this dual-scope change, we simply require that newScope()
must not be called except at the top level of a procedure. We can do
that in one of two ways. The first is by arranging so that newScope()
is not called when we enter non-top-level compound statements in
compileCompound (modify compileCompound() to contain something like:
if (isTopLevel) symTable.newScope(); // ditto for endScope()
Althernatively, we can eliminate newScope()/endScope() from
compileCompound and move them to compileFunction(). To move to the
multi-scope case, once SymbolTable supports that, simply go back to
having newScope()/endScope() called unconditionally in
compileCompound().
class SymbolTable
This class needs fields for the following:
- an int marker for the next global variable offset
- an int marker for the next local variable offset
- a boolean isGlobal to identify whether we're currently working at the global scope level or we're in a procedure (local scope)
- a Map<String, IdentInfo> for global declarations.
- a Map<String, IdentInfo> for local declarations.
Note that the local-declaration map will be reinitialized every time
newScope() is called; endScope() will simply assign this map the value
null. In fact, instead of isGlobal you can simply base the local/global
decision on whether the local-var map is null or not.
The allocate() method has now been replaced by allocVar(), allocProc(), and allocConst(). Each of these takes parameters for the information it needs; note that in allocVar() the variable's offset is determined within the SymbolTable class (using the next-global/local-offset markers), and so is not a parameter (this is a correction from version 1.1).
Each of allocVar/Proc/Const calls a now-private allocate(String ident, IdentInfo ii)
method to do the actual insertion, after allocVar(), etc have filed in
the fields of the IdentInfo record. It is allocate()'s job to detect
duplicate declarations, as well. Note that allocate() returns null if
there was a problem, and returns ii if there was no problem. Note also
that allocate() has to determine which "scope map", the global one or
the local one, is to be used for the insertion. And note that
allocate() (unlike lookup()) always looks at exactly one scope, the "current" one.
Here is what needs to be done in class SymbolTable:
- Finish allocate(), which needs to check for duplicate declaration
(and, if so, call error() and return null), and otherwise insert the
(ident, ii) pair into the current scope map (local if applicable,
otherwise global).
- implement lookup(), which searches the local scope, if applicable,
and then if not found searches the global scope. Again it should call
error() if there is no match found; this means the var/const/proc was
undeclared.
- implement newScope(), which simply creates a new local-vars map
object, and resets the local-vars marker to 0. Also, isGlobal would be
set to false, if you're using this. In the dual-scope version, it is an
error to call newScope() if isGlobal is already false.
- implement endScope(), which sets the local-vars map to null and sets isGlobal to true again. It is an error to call endScope() if isGlobal is already true.
- implement stackFrameSize() so that it returns the number of local
variables allocated in this local scope. It should only be called if
isGlobal is false.
class Compiler
In the Compiler class you will have to make changes to the following:
- compileCompound (newScope(), etc)
- compileFunction (to handle proc declaration)
- compileDeclaration (to handle var & const declarations)
- compileIdentStmt (to handle procedure calls and assignments to variables)
- compileFactor (to handle uses of variables and constants)
The places are marked in comments with the tag "pld12".
You should NOT have to change the "parsing structure"; ie you should NOT have to add any calls to t.token()!!! If you do that, the parser will get "out of sync" and fail to parse.
I have eliminated the old calls to the non-SymbolTable
versions of allocate(), lookup(), etc, and declared the SymbolTable
object. I've also commented out most of my LHack(), GHack(), FHack()
code.
The changes to invocation of newScope()/endScope() were outlined way at
the beginning: make sure newScope/endScope are only called in
compileCompound() if isTopLevel is true, or else move them to
compileFunction().
The changes that involve adding things to the symbol table (compileFunction() and compileDeclaration())
are relatively straightforward. You call symTable.allocVar() (or
allocConst() or allocProc()), with the identifier string and the
supplemental parameter (isGlobal for variables, the constant value for
constants, and the entry point for procedures). The existing code in
compileDeclaration() to generate the Machine.ALLOC statement should
still work, provided SymbolTable.stackFrameSize() is working.
In compileIdentStmt(), there are two cases: assignment and procedure call. For assignment, you use
varInfo = symTable.lookup(ident)
to get the information, and then call varInfo.getAddr() and
varInfo.getIsGlobal() to get the isGlobal and location values. You no
longer use GHack or LHack. You should
verify that varInfo.isVar() returns true; that is, that the identifier
in question is in fact a variable (not a const or proc). The existing
code to emit the appropriate Machine.STOR or Machine.STORF instruction
should now work.
For procedures, you follow the same strategy but retrieve the
procedure's entryPoint using IdentInfo.getEntryPoint(). Again, you
should use IdentInfo.isProc() to verify that the identifier in question
is in fact a procedure name.
Appropriate use of isVar(), isConst(), isProc() (defined in class
IdentInfo) should mean that you never have to use the enumerated
IdentInfo.IDType at all.
In compileFactor(), you need to
change only the "if (isIdent(theToken))" case. However, the identifier
can now be either a variable or a constant. Following the line
IdentInfo theInfo = symTable.lookup(ident);
you should test theInfo.isVar() and theInfo.isConst() (if it's neither,
that's an error!). The existing code deals only with the isVar() case;
I've started to set things up for checking two cases but you'll need to
complete that. Get rid of my "its_a_variable" test; use theInfo.isVar()
instead. In the variable case, you will now get values for theAddr and isGlobal from theInfo. The code that emits LOAD/LOADF should still work.
You will need to complete the case for constants, following the style
of the cs.emitLOADINT in the following "if (isNumber(theToken))"
case. You should have a third case to handle the error if
theInfo.isProc() is true.
Multi-scope implementation
Once you have the dual-scope version working, to implement the
multiple-scopes version the changes will be almost exclusively to
SymbolTable. However, you will
have to change Compiler.compileCompound() to make sure that newScope()
and endScope() are called at the beginning and end of each "compound"
statement (delimited with "{" and "}").
The symbol-table changes will be to newScope(), endScope(),
allocate(String ident, IdentInfo ii), and lookup(String ident). Lookup
will look in each scope in turn, starting with the most local.
newScope() will create a new scope and endScope will zap it; note that
the underlying implementation will likely involve some sort of list of Map objects. allocVar/Const/Proc will not change, only allocate(); note that allocate() will involve looking at only the top-level ("last", or most-recent) scope map.