OVM

Warning

OVM is a brand-new tool which started being developed in 2024. It is currently in a period of transition. Bear in mind that bugs may appear, and that some features are intended to evolve.

We recommend you to read HAVM Documentation (possibly use it), the legacy tool, which can complement OVM’s functionality & technical understanding. Take note of known bugs.

OVM is a Tree (HIR or LIR, see Intermediate representation TREE) programs interpreter. OVM is written and maintained by Damien Assire, Mael Cravero and Ghiles Ziat. Its development has been generously supported by IRILL.

Its features are:

evaluate (with tracing) both HIR and LIR
transform the Tree representation (linearize)
perform static checks on the Tree representation

OVM aims to modernize and replace legacy HAVM so that EPITA students could exercise their compiler projects before the final jump to assembly code. It is implemented in OCaml, an industrial-strength functional programming language with an emphasis on expressiveness and safety. OVM was coined on both OCaml, and VM standing for Virtual Machine.

Resources:

Required version is OVM 0.1
OVM Documentation
OVM Repository
Feedback can be sent to LRE’s Projects Address.

Documentation

Here is a brief summary of the documentation for OVM.

Invoking OVM

To invoke ovm run: ovm OPTIONS FILE.

Where FILE is a simple text file, and OPTIONS is any combination of the following options:

--show-ast: Enable the printing of the produced AST before actually evaluating the program.
--quiet: Disable detailed error output.
--memory-size: Set the memory size (default : 81920).
--trace: Print the trace execution.
--lin: Linearize the program before evaluation.
--lir: Check if the program is low-level (reject high level constructs).
--unsafe: Do not perform the static checks.
--ssa: Check if the program is written using Static Single Assignment.
--version: Display the program version.
-help, --help: Display this list of options and exit successfully.

Steps of OVM

OVM works in three steps:

it scans and parses the file (lexical and syntactical analysis), producing an abstract representation of a program;
it performs requested static checks and transformations;
it loads the resolved program in the virtual machine, search for an entry point labeled main and start evaluation.

OVM vs HAVM

OVM & HAVM serve the same purpose, however, there are some notable design/technical differences.

Nested jump in recursive structure handling:

One of the problems HAVM had, was the jump s and how they were handled. Two main problems occurred on compiled programs.

Ineffective break:

The first problem with jump in HAVM is with the break in an expression, like this example:

ineffective-break.tig

while 1 do
  print_int((break; 1))

graph "Ineffective break"
{
fontname="Source Sans Pro"
node [fontname="consolas"]
node [fontsize=10, shape=none, height=0.25, margin=0]

label="Ineffective break AST"
nodesep=0

n002 ;
n002 [label="seq", width=0.30] ;
n002 -- n003 ;
n003 [label="label l1"] ;
n002 -- n004 ;
n004 [label="cjump ne"] ;
n004 -- n005 ;
n005 [label="const 1"] ;
n004 -- n006 ;
n006 [label="const 0"] ;
n004 -- n007 ;
n007 [label="name l2"] ;
n004 -- n008 ;
n008 [label="name l0"] ;
n002 -- n009 ;
n009 [label="label l2"] ;
n002 -- n010 ;
n010 [label="sxp"] ;
n010 -- n011 ;
n011 [label="call"] ;
n011 -- n012 ;
n012 [label="name print_int", width=1] ;
n011 -- n013 ;
n013 [label="eseq"] ;
n013 -- n014 ;
n014 [label=<<b>jump</b>>] ;
n014 -- n015 ;
n015 [label=<<b>name l0</b>>] ;
n013 -- n016 ;
n016 [label="const 1"] ;
n002 -- n017 ;
n017 [label="jump"] ;
n017 -- n018 ;
n018 [label="name l1"] ;
n002 -- n019 ;
n019 [label="label l0"] ;
} — ineffective-break.png

In this example, HAVM evaluates the eseq by evaluating the jump and the statement after it. However, it will then return to the eseq and will continue its evaluation by evaluating the const 1 and returning 1, doing it indefinitely because of the while.

OVM correctly handles this case:

tc -H ineffective-break.tig > ineffective-break.hir

$ tc -H ineffective-break.tig > ineffective-break.hir

$ echo $?
0

ovm ineffective-break.hir

$ ovm ineffective-break.hir

$ echo $?
0

Ineffective boolean operator:

Another means to generate a jump that breaks the recursive evaluation is using optimized if, like the following example.

ineffective-if.tig

print_int (if 0 | 0 then 0 else 1)

graph "Ineffective break"
{
fontname="Source Sans Pro"
node [fontname="consolas"]
node [fontsize=10, shape=none, height=0.25, margin=0]

label="Ineffective boolean operator AST"
nodesep=0

n001 ;
n001 [label="call"]
n001 -- n002
n002 [label="name print_int", width=1]
n001 -- n003
n003 [label="eseq"]
n003 -- n004
n004 [label="seq", width=0.30]
n004 -- n005
n005 [label="seq", width=0.30]
n005 -- n006
n006 [label="cjump ne"]
n006 -- n007
n007 [label="const 0"]
n006 -- n008
n008 [label="const 0"]
n006 -- n009
n009 [label="name l3"]
n006 -- n010
n010 [label="name l4"]
n005 -- n032
n032 [label="label l3"]
n005 -- n011
n011 [label="cjump ne"]
n011 -- n012
n012 [label="const 1"]
n011 -- n013
n013 [label="const 0"]
n011 -- n014
n014 [label="name l0"]
n011 -- n015
n015 [label="name l1"]
n005 -- n033
n033 [label="label l4"]
n005 -- n016
n016 [label="cjump ne"]
n016 -- n017
n017 [label="const 0"]
n016 -- n018
n018 [label="const 0"]
n016 -- n019
n019 [label="name l0"]
n016 -- n020
n020 [label="name l1"]
n004 -- n021
n021 [label="label l0"]
n004 -- n022
n022 [label="move", width=0.45]
n022 -- n023
n023 [label="temp t0"]
n022 -- n024
n024 [label="const 0"]
n004 -- n025
n025 [label="jump", width=0.45]
n025 -- n026
n026 [label="name l2"]
n004 -- n027
n027 [label="label l1"]
n004 -- n028
n028 [label="move", width=0.45]
n028 -- n029
n029 [label="temp t0"]
n028 -- n030
n030 [label="const 1"]
n004 -- n031
n031 [label="label l2"]
} — ineffective-if.png

OVM correctly handles this case, compare the traces below:

tc -H ineffective-if.tig > ineffective-if.hir

$ tc -H ineffective-if.tig > ineffective-if.hir

$ echo $?
0

ovm --trace ineffective-if.hir | ansi2txt

$ ovm --trace ineffective-if.hir | ansi2txt
Parsing ineffective-if.hir
eval seq
  eval sxp
    evalCall print_int
      evalEseq
        eval seq
          eval seq
            evalCjump
              const 0
              const 0
            ne 0 0
            Jump to label l4            
            label l4
            evalCjump
              const 0
              const 0
            ne 0 0
            Jump to label l1          
          label l1
          eval move
            const 1
          move : temp t0 <- 1
          label l2
        end seq
        temp t0 = 1
      Eseq -> 1
    call print_int 1
    print_int 1 -> void
  sxp : void
  eval sxp
    const 0
  sxp : 0
end seq
label end
Exit with code : 0

Temporaries state :
fp : 81916
sp : 81916
rv : 0
t0 : 1
1
$ echo $?
0

havm --trace ineffective-if.hir

$ havm --trace ineffective-if.hir
checkingLow
plaining
unparsing
checking
evaling
 call ( name main ) []
10.6-43.15:  eseq
11.6-42.13:  seq
12.8-30.15:  seq
14.12-14.19:  const 0
15.12-15.19:  const 0
13.10-17.19:  cjump ne 0 0 ( name l3 ) ( name l4 )
26.12-26.19:  const 0
27.12-27.19:  const 0
25.10-29.19:  cjump ne 0 0 ( name l0 ) ( name l1 )
40.10-40.17:  const 1
38.8-40.17:  move ( temp t0 ) 1
41.8-41.16:  label l2
12.8-30.15:  seq end
31.8-31.16:  label l0
34.10-34.17:  const 0
32.8-34.17:  move ( temp t0 ) 0
35.8-36.17:  jump ( name l2 )
11.6-42.13:  seq end
43.8-43.15:  temp t0 = 0
8.4-44.12:  call ( name print_int ) [0]
8.4-44.12:  end call ( name print_int ) [0] = ()
7.2-44.12:  sxp ()
46.4-46.11:  const 0
45.2-46.11:  sxp 0
 end call ( name main ) [] = 0
0
$ echo $?
0

Proposed solutions:

To address these problems, we chose to use exceptions, a mechanism that is not idiomatic in the Haskell programming language for managing control flow. The approach involves raising an exception when encountering a jump. This exception is caught at a point where the code for the jump’s destination can be evaluated, effectively bypassing the control flow of the previously executing code.

One of the encountered problems was getting to the destination quickly. A possible way was scanning all the sequences until finding the destination label. But this method is particularly slow.

First approach:

For the first approach, we use a map which maps to a label a destination.

Unlike HAVM, we don’t “plain” the sequences of the program. Instead, we use a stack of sequences. Each cell of the stack contains the end of a sequence, after the destination label or the sub-sequence containing it, which has to be evaluated. The advantage of this is that we can restore the sub-sequences without loosing information for the traces.

In this example, the stack contain two sequences, in red:
- [C, (seq [D, E])]
- [F, G]
first-approach.png

Note that D and E are not in their own cell in the stack, because the sequence they belong to is not currently being evaluated.

The exception is caught at the beginning of the program’s evaluation, and then the corresponding stack is fetched and then evaluated.

The problem of this approach is the eseq. The stack doesn’t take account of an eseq, and will not restore it nor evaluate it, even if the jump doesn’t go out of its statements.
Second approach:

In the second approach, instead of catching the jump at the beginning of the evaluation, it is also caught by the eseq.

To do so, instead of having a unique map, we have one associated to each eseq, with a physical equality. Each map has the labels of the statements in its statement child, excluding those in child eseq’s statements. When a jump is caught, if the destination label is not in the eseq’s map, it is propagated.

The exception is also caught at the beginning of the program, but not propagated in this case.