TC-1/2-Parser FAQ

What is the chunks rule?

In the given code, we specify that a program is either exps or chunks.

In the Tiger language, a chunk is a set of declarations which work as a group, and must be processed together during static analysis. These can be handled in various ways: for instance, we could have a visitor which would go through the AST and generate chunks from appropriate declarations. Another method is to directly parse the declaration chunks, which is what we chose to do in our implementation.

A NameTy, or a symbol?

At some places, you may use one or the other. Just ask yourself which is the most appropriate given the context. Appel is not always right.

Bison

Be sure to read its dedicated section: RE/flex & Bison.

Bison reports type clashes

Bison may report type clashes for some actions. For instance, if you have given a type to "string", but none to exp, then it will choke on:

exp: "string";

because, unless you used %define variant, it actually means

exp: "string" { $$ = $1; };

which is not type consistent. So write this instead:

exp: "string" {};

You may also want to specify the type of exp:

%type <type> exp
exp: "string";

Memory leaks in the parser during error recovery

To reclaim the memory during error recovery, use the %destructor directive:

%type <ast::Exp*> exp
%type <ast::Var*> lvalue
%destructor { delete $$; } <ast::Exp*> <ast::Var*> /* ... */;

Memory leaks in the standard containers

See Valgrind, The Ultimate Memory Debugger, for a pointer to the explanation and solution.

How do I use misc::error?

See The lib/misc Directory, for a description of this component. In the case of the parse module, TigerDriver aggregates the local error handler. From scan_open, for instance, your code should look like:

if (!yyin)
  error_ << misc::error::failure
         << program_name << ": cannot open `" << name << "': "
         << strerror(errno) << std::endl
         << &misc::error::exit;

Finding prelude.tih

When run, the compiler needs the file prelude.tih which includes the signature of all the primitives.

The option --prelude allows to specify the Tiger prelude.

Note

The default value builtin denotes the builtin prelude, this is the one that you should rely on.

Any other prelude value will look for the corresponding file in the library path. However, the executable tc is typically run in two very different contexts:

Warning

The following are legacy behaviors, they are only kept for documentation purposes.

installed

An installed binary (e.g. via a Linux distribution package) will usually look for an installed prelude.tih, typically in /usr/local/share/tc/. The cpp macro PKGDATADIR is set to this directory.

For your information, its value can be modified via use of configure’s option --prefix, defaulting to /usr/local.

compiled, not installed

When compiled, the binary will look for the installed prelude.tih, and of course will fail if it has never been installed. There are two means to address this issue:

The environment variable TC_PKGDATADIR: If set, it overrides the value of PKGDATADIR.
The option --library-prepend/-p: Using this option you may set the library file search path to visit the given directory before the built-in default value. For instance tc -p /tmp foo.tig will first look for prelude.tih in /tmp.

Must import be functional?

As mentioned before, in order to handle any other prelude than builtin, the compiler must be able parse a prelude file such as prelude.tih. You may also need to import your own functions from your own files. Therefore, you need to handle import.

ast::fields_type vs. ast::VarChunk, Record definition vs. Function declaration

The grammar of the Tiger language (see Syntactic Specifications in Tiger Compiler Reference Manual) includes:

(* Function, primitive and method declarations. *)
fundec =
    "function" id "(" tyfields ")" [ ":" type-id ] "=" exp
  | "primitive" id "(" tyfields ")" [ ":" type-id ]
  ;
classfield =
    { "method" id "(" tyfields ")" [ ":" type-id ] "=" exp }
  ;

(* Record type declarations. *)
ty = "{" tyfields  "}" ;

(* List of "id : type". *)
tyfields = [ id ":" type-id { "," id ":" type-id } ] ;

This grammar snippet shows that we used tyfields several times, in two very different contexts: a list of formal arguments of a function, primitive or method; and a list of record fields. The fact that the syntax is similar in both cases is an “accident”: it is by no means required by the language. A. Appel could have chosen to make them different, but what would have been the point then? It does make sense, sometimes, to make two different things look alike, that’s a form of economy - a sane engineering principle.

If the concrete syntaxes were chosen to be identical, should it be the case for abstract too? We would say it depends: the inert data is definitely the same, but the behaviors (i.e., the handling in the various visitors) are very different. So if your language features “inert data”, say C or ML, then keeping the same abstract syntax makes sense if your language features “active data” - let’s call this… objects - then it is a mistake. Sadly enough, the first edition of Red Tiger book made this mistake, and we also did it for years.

The second edition of the Tiger in Java introduces a dedicated abstract syntax for formal arguments; we made a different choice: there is little difference between formal arguments and local variables, so we use a VarChunk, which fits nicely with the semantics of chunks.

Regarding the abstract syntax of a record type declaration, we use a list of Fields (aka fields_type).

Of course this means that you will have to duplicate your parsing of the tyfields non-terminal in your parser.

ast::DefaultVisitor and ast::NonObjectVisitor

The existence of ast::NonObjectVisitor is the result of a reasonable compromise between (relative) safety and complexity.

The problem is: as object-aware programs are to be desugared into object-free ones, (a part of) our front-end infrastructure must support two kinds of traversals:

Traversals dealing with AST with objects: ast::PrettyPrinter, object::Binder, object::TypeChecker, object::DesugarVisitor.

Traversals dealing with AST without objects bind::Binder, type::TypeChecker, and all other AST visitors.

The first category has visit methods for all type of nodes of our (object-oriented) AST, so they raise no issue. On the other hand, the second category of visitors knows nothing about objects, and should either be unable to visit AST w/ objects (static solution) or raise an error if they encounter objects (dynamic solution).

Which led us to several solutions:

Consider that we have two kinds of visitors, and thus two hierarchies of visitors. Two hierarchies might confuse the students, and make the maintenance harder. Hooks in the AST nodes (accept methods) must be duplicated, too.

Have a single hierarchy of visitors, but equip all concrete visitors traversing ASTs w/o objects with methods visiting object-related node aborting at run time.

Likewise, but factor the aborting methods in a single place, namely ast::NonObjectVisitor. That is is the solution we chose.

Solutions 2 and 3 let us provide a default visitor for ASTs without objects, but it’s harder to have a meaningful default visitor for ASTs with objects: indeed, concrete visitors on ASTs w/ objects inherit from their non-object counterparts, where methods visiting object nodes are already defined! (Though they abort at run time.)

We have found that having two visitors (ast::DefaultVisitor and ast::NonObjectVisitor) to solve this problem was more elegant, rather than merging both of them in ast::DefaultVisitor. The pros are that ast::DefaultVisitor remains a default visitor; the cons are that this visitor is now abstract, since object-related nodes have no visit implementation. Therefore, we also introduced an ast::ObjectVisitor performing default visits of the remaining node types; the combined inheritance of both ast::DefaultVisitor and ast::ObjectVisitor provides a complete default visitor.