OVM

Warning

OVM is a brand-new tool which started being developed in 2024. It is currently in a period of transition. Bear in mind that bugs may appear, and that some features are intended to evolve.

We recommend you to read HAVM Documentation (possibly use it), the legacy tool, which can complement OVM’s functionality & technical understanding. Take note of known bugs.

OVM is a Tree (HIR or LIR, see Intermediate representation TREE) programs interpreter. OVM is written and maintained by Damien Assire, Mael Cravero and Ghiles Ziat. Its development has been generously supported by IRILL.

Its features are:
  • evaluate (with tracing) both HIR and LIR

  • transform the Tree representation (linearize)

  • perform static checks on the Tree representation

OVM aims to modernize and replace legacy HAVM so that EPITA students could exercise their compiler projects before the final jump to assembly code. It is implemented in OCaml, an industrial-strength functional programming language with an emphasis on expressiveness and safety. OVM was coined on both OCaml, and VM standing for Virtual Machine.

Resources:

Documentation

Here is a brief summary of the documentation for OVM.

Table of Contents

Invoking OVM

To invoke ovm run: ovm OPTIONS FILE.

Where FILE is a simple text file, and OPTIONS is any combination of the following options:

  • -show-ast: Enable the printing of the produced AST before actually evaluating the program.

  • -heap-size: Set the heap size (default : 65536).

  • -stack-size: Set the stack size (default : 16384).

  • -trace: Print the trace execution.

  • -lin: Linearize the program before evaluation.

  • -lir: Check if the program is low-level (reject high level constructs).

  • -unsafe: Do not perform the static checks.

  • -ssa: Check if the program is written using Static Single Assignment.

  • -version: Display the program version.

  • -help, --help: Display this list of options and exit successfully.

Steps of OVM

OVM works in three steps:
  • it scans and parses the file (lexical and syntactical analysis), producing an abstract representation of a program;

  • it performs requested static checks and transformations;

  • it loads the resolved program in the virtual machine, search for an entry point labeled main and start evaluation.

OVM vs HAVM

OVM & HAVM serve the same purpose, however, there are some notable design/technical differences.

Nested jump in recursive structure handling:

One of the problems HAVM had, was the jump s and how they were handled. Two main problems occurred on compiled programs.

Ineffective break:

The first problem with jump in HAVM is with the break in an expression, like this example:

ineffective-break.tig
while 1 do
  print_int((break; 1))
graph "Ineffective break"
   {
    fontname="Source Sans Pro"
    node [fontname="consolas"]
    node [fontsize=10, shape=none, height=0.25, margin=0]

    label="Ineffective break AST"
    nodesep=0

    n002 ;
    n002 [label="seq", width=0.30] ;
    n002 -- n003 ;
    n003 [label="label l1"] ;
    n002 -- n004 ;
    n004 [label="cjump ne"] ;
    n004 -- n005 ;
    n005 [label="const 1"] ;
    n004 -- n006 ;
    n006 [label="const 0"] ;
    n004 -- n007 ;
    n007 [label="name l2"] ;
    n004 -- n008 ;
    n008 [label="name l0"] ;
    n002 -- n009 ;
    n009 [label="label l2"] ;
    n002 -- n010 ;
    n010 [label="sxp"] ;
    n010 -- n011 ;
    n011 [label="call"] ;
    n011 -- n012 ;
    n012 [label="name print_int", width=1] ;
    n011 -- n013 ;
    n013 [label="eseq"] ;
    n013 -- n014 ;
    n014 [label=<<b>jump</b>>] ;
    n014 -- n015 ;
    n015 [label=<<b>name l0</b>>] ;
    n013 -- n016 ;
    n016 [label="const 1"] ;
    n002 -- n017 ;
    n017 [label="jump"] ;
    n017 -- n018 ;
    n018 [label="name l1"] ;
    n002 -- n019 ;
    n019 [label="label l0"] ;
   }

ineffective-break.png

In this example, HAVM evaluates the eseq by evaluating the jump and the statement after it. However, it will then return to the eseq and will continue its evaluation by evaluating the const 1 and returning 1, doing it indefinitely because of the while.

OVM correctly handles this case:

tc -H ineffective-break.tig > ineffective-break.hir
$ tc -H ineffective-break.tig > ineffective-break.hir

$ echo $?
0
ovm ineffective-break.hir
$ ovm ineffective-break.hir
/bin/sh: 1: ovm: not found
$ echo $?
127
Ineffective boolean operator:

Another means to generate a jump that breaks the recursive evaluation is using optimized if, like the following example.

ineffective-if.tig
print_int (if 0 | 0 then 0 else 1)
graph "Ineffective break"
   {
    fontname="Source Sans Pro"
    node [fontname="consolas"]
    node [fontsize=10, shape=none, height=0.25, margin=0]

    label="Ineffective boolean operator AST"
    nodesep=0

    n001 ;
    n001 [label="call"]
    n001 -- n002
    n002 [label="name print_int", width=1]
    n001 -- n003
    n003 [label="eseq"]
    n003 -- n004
    n004 [label="seq", width=0.30]
    n004 -- n005
    n005 [label="seq", width=0.30]
    n005 -- n006
    n006 [label="cjump ne"]
    n006 -- n007
    n007 [label="const 0"]
    n006 -- n008
    n008 [label="const 0"]
    n006 -- n009
    n009 [label="name l3"]
    n006 -- n010
    n010 [label="name l4"]
    n005 -- n032
    n032 [label="label l3"]
    n005 -- n011
    n011 [label="cjump ne"]
    n011 -- n012
    n012 [label="const 1"]
    n011 -- n013
    n013 [label="const 0"]
    n011 -- n014
    n014 [label="name l0"]
    n011 -- n015
    n015 [label="name l1"]
    n005 -- n033
    n033 [label="label l4"]
    n005 -- n016
    n016 [label="cjump ne"]
    n016 -- n017
    n017 [label="const 0"]
    n016 -- n018
    n018 [label="const 0"]
    n016 -- n019
    n019 [label="name l0"]
    n016 -- n020
    n020 [label="name l1"]
    n004 -- n021
    n021 [label="label l0"]
    n004 -- n022
    n022 [label="move", width=0.45]
    n022 -- n023
    n023 [label="temp t0"]
    n022 -- n024
    n024 [label="const 0"]
    n004 -- n025
    n025 [label="jump", width=0.45]
    n025 -- n026
    n026 [label="name l2"]
    n004 -- n027
    n027 [label="label l1"]
    n004 -- n028
    n028 [label="move", width=0.45]
    n028 -- n029
    n029 [label="temp t0"]
    n028 -- n030
    n030 [label="const 1"]
    n004 -- n031
    n031 [label="label l2"]
   }

ineffective-if.png

OVM correctly handles this case, compare the traces below:

tc -H ineffective-if.tig > ineffective-if.hir
$ tc -H ineffective-if.tig > ineffective-if.hir

$ echo $?
0
ovm --trace ineffective-if.hir | ansi2txt
$ ovm --trace ineffective-if.hir | ansi2txt
/bin/sh: 1: ovm: not found
$ echo $?
0
havm --trace ineffective-if.hir
$ havm --trace ineffective-if.hir
checkingLow
plaining
unparsing
checking
evaling
 call ( name main ) []
10.6-43.15:  eseq
11.6-42.13:  seq
12.8-30.15:  seq
14.12-14.19:  const 0
15.12-15.19:  const 0
13.10-17.19:  cjump ne 0 0 ( name l3 ) ( name l4 )
26.12-26.19:  const 0
27.12-27.19:  const 0
25.10-29.19:  cjump ne 0 0 ( name l0 ) ( name l1 )
40.10-40.17:  const 1
38.8-40.17:  move ( temp t0 ) 1
41.8-41.16:  label l2
12.8-30.15:  seq end
31.8-31.16:  label l0
34.10-34.17:  const 0
32.8-34.17:  move ( temp t0 ) 0
35.8-36.17:  jump ( name l2 )
11.6-42.13:  seq end
43.8-43.15:  temp t0 = 0
8.4-44.12:  call ( name print_int ) [0]
8.4-44.12:  end call ( name print_int ) [0] = ()
7.2-44.12:  sxp ()
46.4-46.11:  const 0
45.2-46.11:  sxp 0
 end call ( name main ) [] = 0
0
$ echo $?
0
Proposed solutions:

To address these problems, we chose to use exceptions, a mechanism that is not idiomatic in the Haskell programming language for managing control flow. The approach involves raising an exception when encountering a jump. This exception is caught at a point where the code for the jump’s destination can be evaluated, effectively bypassing the control flow of the previously executing code.

One of the encountered problems was getting to the destination quickly. A possible way was scanning all the sequences until finding the destination label. But this method is particularly slow.

  • First approach:

    For the first approach, we use a map which maps to a label a destination.

    Unlike HAVM, we don’t “plain” the sequences of the program. Instead, we use a stack of sequences. Each cell of the stack contains the end of a sequence, after the destination label or the sub-sequence containing it, which has to be evaluated. The advantage of this is that we can restore the sub-sequences without loosing information for the traces.

    In this exemple, the stack contain two sequences, in red:

    • [C, (seq [D, E])]

    • [F, G]

    graph {
    fontname="Source Sans Pro"
    node [fontname="consolas"]
    node [fontsize=10, shape=none, height=0.25, margin=0]

    n001 [label="seq", width=0.3] ;
    n001 -- n002 ;
    n002 [label="A", width=0.2]
    n001 -- n003
    n003 [label="seq", width=0.3]
    n003 -- n004
    n004 [label="seq", width=0.3]
    n004 -- n005
    n005 [label="jump", width=0.45]
    n005 -- n006
    n006 [label="name l0"]
    n004 -- n007
    n007 [label="B", width=0.2]
    n003 -- n008
    n008 [label="label l0"]

    subgraph {
        cluster=true ;
        shape=box
        node [fontcolor="red2"]
        n003 -- n009
        n009 [label="C", width=0.2]
        n003 -- n010
        n010 [label="seq", width=0.3]
        n010 -- n011
        n011 [label="D", width=0.2]
        n010 -- n012
        n012 [label="E", width=0.2]
        n001 -- n013
        n013 [label="F", width=0.2]
        n001 -- n014
        n014 [label="G", width=0.2]
    }
}

    first-approach.png

    Note that D and E are not in their own cell in the stack, because the sequence they belong to is not currently being evaluated.

    The exception is caught at the beginning of the program’s evaluation, and then the corresponding stack is fetched and then evaluated.

    The problem of this approach is the eseq. The stack doesn’t take account of an eseq, and will not restore it nor evaluate it, even if the jump doesn’t go out of its statements.

  • Second approach:

    In the second approach, instead of catching the jump at the beginning of the evaluation, it is also caught by the eseq.

    To do so, instead of having a unique map, we have one associated to each eseq, with a physical equality. Each map has the labels of the statements in its statement child, excluding those in child eseq’s statements. When a jump is caught, if the destination label is not in the eseq’s map, it is propagated.

    The exception is also caught at the beginning of the program, but not propagated in this case.