TC-1/2-Parser FAQ
- What is the
chunks
rule? In the given code, we specify that a program is either
exps
orchunks
.In the Tiger language, a chunk is a set of declarations which work as a group, and must be processed together during static analysis. These can be handled in various ways: for instance, we could have a visitor which would go through the AST and generate chunks from appropriate declarations. Another method is to directly parse the declaration chunks, which is what we chose to do in our implementation.
- A NameTy, or a symbol?
At some places, you may use one or the other. Just ask yourself which is the most appropriate given the context. Appel is not always right.
- Bison
Be sure to read its dedicated section: RE/flex & Bison.
- Bison reports type clashes
Bison may report type clashes for some actions. For instance, if you have given a type to
"string"
, but none toexp
, then it will choke on:exp: "string";
because, unless you used %define variant
, it actually means
exp: "string" { $$ = $1; };
which is not type consistent. So write this instead:
exp: "string" {};
You may also want to specify the type of exp
:
%type <type> exp exp: "string";
- Memory leaks in the parser during error recovery
To reclaim the memory during error recovery, use the
%destructor
directive:%type <ast::Exp*> exp %type <ast::Var*> lvalue %destructor { delete $$; } <ast::Exp*> <ast::Var*> /* ... */;
- Memory leaks in the standard containers
See Valgrind, The Ultimate Memory Debugger, for a pointer to the explanation and solution.
- How do I use misc::error?
See The lib/misc Directory, for a description of this component. In the case of the parse module,
TigerDriver
aggregates the local error handler. From scan_open, for instance, your code should look like:if (!yyin) error_ << misc::error::failure << program_name << ": cannot open `" << name << "': " << strerror(errno) << std::endl << &misc::error::exit;
- Finding
prelude.tih
When run, the compiler needs the file
prelude.tih
which includes the signature of all the primitives.The option
--prelude
allows to specify the Tiger prelude.Note
The default value
builtin
denotes the builtin prelude, this is the one that you should rely on.Any other prelude value will look for the corresponding file in the library path. However, the executable
tc
is typically run in two very different contexts:Warning
The following are legacy behaviors, they are only kept for documentation purposes.
- installed
An installed binary (e.g. via a Linux distribution package) will usually look for an installed
prelude.tih
, typically in/usr/local/share/tc/
. Thecpp
macroPKGDATADIR
is set to this directory.For your information, its value can be modified via use of
configure
’s option--prefix
, defaulting to/usr/local
.- compiled, not installed
When compiled, the binary will look for the installed
prelude.tih
, and of course will fail if it has never been installed. There are two means to address this issue:- The environment variable
TC_PKGDATADIR
If set, it overrides the value of
PKGDATADIR
.- The option
--library-prepend
/-p
Using this option you may set the library file search path to visit the given directory before the built-in default value. For instance
tc -p /tmp foo.tig
will first look forprelude.tih
in/tmp
.
- The environment variable
- Must import be functional?
As mentioned before, in order to handle any other prelude than
builtin
, the compiler must be able parse a prelude file such asprelude.tih
. You may also need to import your own functions from your own files. Therefore, you need to handleimport
.- ast::fields_type vs. ast::VarChunk, Record definition vs. Function declaration
The grammar of the Tiger language (see Syntactic Specifications in Tiger Compiler Reference Manual) includes:
(* Function, primitive and method declarations. *) fundec = "function" id "(" tyfields ")" [ ":" type-id ] "=" exp | "primitive" id "(" tyfields ")" [ ":" type-id ] ; classfield = { "method" id "(" tyfields ")" [ ":" type-id ] "=" exp } ; (* Record type declarations. *) ty = "{" tyfields "}" ; (* List of "id : type". *) tyfields = [ id ":" type-id { "," id ":" type-id } ] ;
This grammar snippet shows that we used
tyfields
several times, in two very different contexts: a list of formal arguments of a function, primitive or method; and a list of record fields. The fact that the syntax is similar in both cases is an “accident”: it is by no means required by the language. A. Appel could have chosen to make them different, but what would have been the point then? It does make sense, sometimes, to make two different things look alike, that’s a form of economy - a sane engineering principle.If the concrete syntaxes were chosen to be identical, should it be the case for abstract too? We would say it depends: the inert data is definitely the same, but the behaviors (i.e., the handling in the various visitors) are very different. So if your language features “inert data”, say C or ML, then keeping the same abstract syntax makes sense if your language features “active data” - let’s call this… objects - then it is a mistake. Sadly enough, the first edition of Red Tiger book made this mistake, and we also did it for years.
The second edition of the Tiger in Java introduces a dedicated abstract syntax for formal arguments; we made a different choice: there is little difference between formal arguments and local variables, so we use a
VarChunk
, which fits nicely with the semantics of chunks.Regarding the abstract syntax of a record type declaration, we use a list of
Fields
(akafields_type
).Of course this means that you will have to duplicate your parsing of the
tyfields
non-terminal in your parser.- ast::DefaultVisitor and ast::NonObjectVisitor
The existence of
ast::NonObjectVisitor
is the result of a reasonable compromise between (relative) safety and complexity.The problem is: as object-aware programs are to be desugared into object-free ones, (a part of) our front-end infrastructure must support two kinds of traversals:
Traversals dealing with AST with objects:
ast::PrettyPrinter
,object::Binder
,object::TypeChecker
,object::DesugarVisitor
.Traversals dealing with AST without objects
bind::Binder
,type::TypeChecker
, and all other AST visitors.
The first category has visit methods for all type of nodes of our (object-oriented) AST, so they raise no issue. On the other hand, the second category of visitors knows nothing about objects, and should either be unable to visit AST w/ objects (static solution) or raise an error if they encounter objects (dynamic solution).
Which led us to several solutions:
Consider that we have two kinds of visitors, and thus two hierarchies of visitors. Two hierarchies might confuse the students, and make the maintenance harder. Hooks in the AST nodes (
accept
methods) must be duplicated, too.Have a single hierarchy of visitors, but equip all concrete visitors traversing ASTs w/o objects with methods visiting object-related node aborting at run time.
Likewise, but factor the aborting methods in a single place, namely
ast::NonObjectVisitor
. That is is the solution we chose.
Solutions 2 and 3 let us provide a default visitor for ASTs without objects, but it’s harder to have a meaningful default visitor for ASTs with objects: indeed, concrete visitors on ASTs w/ objects inherit from their non-object counterparts, where methods visiting object nodes are already defined! (Though they abort at run time.)
We have found that having two visitors (
ast::DefaultVisitor
andast::NonObjectVisitor
) to solve this problem was more elegant, rather than merging both of them inast::DefaultVisitor
. The pros are that ast::DefaultVisitor remains a default visitor; the cons are that this visitor is now abstract, since object-related nodes have no visit implementation. Therefore, we also introduced anast::ObjectVisitor
performing default visits of the remaining node types; the combined inheritance of bothast::DefaultVisitor
andast::ObjectVisitor
provides a complete default visitor.