OVM
Warning
OVM is a brand-new tool which started being developed in 2024. It is currently in a period of transition. Bear in mind that bugs may appear, and that some features are intended to evolve.
We recommend you to read HAVM Documentation (possibly use it), the legacy tool, which can complement OVM’s functionality & technical understanding. Take note of known bugs.
OVM is a Tree
(HIR or LIR, see Intermediate representation TREE)
programs interpreter. OVM is written and maintained by Damien Assire,
Mael Cravero and Ghiles Ziat. Its development has been generously supported
by IRILL.
- Its features are:
evaluate (with tracing) both HIR and LIR
transform the Tree representation (linearize)
perform static checks on the Tree representation
OVM aims to modernize and replace legacy HAVM so that EPITA students could exercise their compiler projects before the final jump to assembly code. It is implemented in OCaml, an industrial-strength functional programming language with an emphasis on expressiveness and safety. OVM was coined on both OCaml, and VM standing for Virtual Machine.
- Resources:
Required version is OVM 0.1
Feedback can be sent to LRE’s Projects Address.
Documentation
Here is a brief summary of the documentation for OVM.
Table of Contents
Invoking OVM
- To invoke
ovm
run:ovm OPTIONS FILE
. Where
FILE
is a simple text file, andOPTIONS
is any combination of the following options:-show-ast
: Enable the printing of the produced AST before actually evaluating the program.-heap-size
: Set the heap size (default :65536
).-stack-size
: Set the stack size (default :16384
).-trace
: Print the trace execution.-lin
: Linearize the program before evaluation.-lir
: Check if the program is low-level (reject high level constructs).-unsafe
: Do not perform the static checks.-ssa
: Check if the program is written using Static Single Assignment.-version
: Display the program version.-help
,--help
: Display this list of options and exit successfully.
Steps of OVM
- OVM works in three steps:
it scans and parses the file (lexical and syntactical analysis), producing an abstract representation of a program;
it performs requested static checks and transformations;
it loads the resolved program in the virtual machine, search for an entry point labeled
main
and start evaluation.
OVM vs HAVM
OVM & HAVM serve the same purpose, however, there are some notable design/technical differences.
- Nested
jump
in recursive structure handling: One of the problems HAVM had, was the
jump
s and how they were handled. Two main problems occurred on compiled programs.- Ineffective break:
The first problem with
jump
in HAVM is with the break in an expression, like this example:ineffective-break.tigwhile 1 do print_int((break; 1))
ineffective-break.png
In this example, HAVM evaluates the
eseq
by evaluating thejump
and the statement after it. However, it will then return to theeseq
and will continue its evaluation by evaluating theconst 1
and returning1
, doing it indefinitely because of thewhile
.OVM correctly handles this case:
tc -H ineffective-break.tig > ineffective-break.hir$ tc -H ineffective-break.tig > ineffective-break.hir $ echo $? 0
ovm ineffective-break.hir$ ovm ineffective-break.hir /bin/sh: 1: ovm: not found $ echo $? 127
- Ineffective boolean operator:
Another means to generate a
jump
that breaks the recursive evaluation is using optimized if, like the following example.ineffective-if.tigprint_int (if 0 | 0 then 0 else 1)
ineffective-if.png
OVM correctly handles this case, compare the traces below:
tc -H ineffective-if.tig > ineffective-if.hir$ tc -H ineffective-if.tig > ineffective-if.hir $ echo $? 0
ovm --trace ineffective-if.hir | ansi2txt$ ovm --trace ineffective-if.hir | ansi2txt /bin/sh: 1: ovm: not found $ echo $? 0
havm --trace ineffective-if.hir$ havm --trace ineffective-if.hir checkingLow plaining unparsing checking evaling call ( name main ) [] 10.6-43.15: eseq 11.6-42.13: seq 12.8-30.15: seq 14.12-14.19: const 0 15.12-15.19: const 0 13.10-17.19: cjump ne 0 0 ( name l3 ) ( name l4 ) 26.12-26.19: const 0 27.12-27.19: const 0 25.10-29.19: cjump ne 0 0 ( name l0 ) ( name l1 ) 40.10-40.17: const 1 38.8-40.17: move ( temp t0 ) 1 41.8-41.16: label l2 12.8-30.15: seq end 31.8-31.16: label l0 34.10-34.17: const 0 32.8-34.17: move ( temp t0 ) 0 35.8-36.17: jump ( name l2 ) 11.6-42.13: seq end 43.8-43.15: temp t0 = 0 8.4-44.12: call ( name print_int ) [0] 8.4-44.12: end call ( name print_int ) [0] = () 7.2-44.12: sxp () 46.4-46.11: const 0 45.2-46.11: sxp 0 end call ( name main ) [] = 0 0 $ echo $? 0
- Proposed solutions:
To address these problems, we chose to use exceptions, a mechanism that is not idiomatic in the Haskell programming language for managing control flow. The approach involves raising an exception when encountering a
jump
. This exception is caught at a point where the code for thejump
’s destination can be evaluated, effectively bypassing the control flow of the previously executing code.One of the encountered problems was getting to the destination quickly. A possible way was scanning all the sequences until finding the destination label. But this method is particularly slow.
First approach:
For the first approach, we use a map which maps to a label a destination.
Unlike HAVM, we don’t “plain” the sequences of the program. Instead, we use a stack of sequences. Each cell of the stack contains the end of a sequence, after the destination label or the sub-sequence containing it, which has to be evaluated. The advantage of this is that we can restore the sub-sequences without loosing information for the traces.
In this exemple, the stack contain two sequences, in red:
[C, (seq [D, E])]
[F, G]
first-approach.png
Note that
D
andE
are not in their own cell in the stack, because the sequence they belong to is not currently being evaluated.The exception is caught at the beginning of the program’s evaluation, and then the corresponding stack is fetched and then evaluated.
The problem of this approach is the
eseq
. The stack doesn’t take account of aneseq
, and will not restore it nor evaluate it, even if thejump
doesn’t go out of its statements.Second approach:
In the second approach, instead of catching the
jump
at the beginning of the evaluation, it is also caught by theeseq
.To do so, instead of having a unique map, we have one associated to each
eseq
, with a physical equality. Each map has the labels of the statements in its statement child, excluding those in childeseq
’s statements. When ajump
is caught, if the destination label is not in theeseq
’s map, it is propagated.The exception is also caught at the beginning of the program, but not propagated in this case.