Typed high-level intermediate representation (THIR)

The THIR is used as a transition phase between the HIR and the MIR. It progressively rewrites the IR into lower-level constructs until it can be readily lowered into MIR. Compared to HIR, the THIR is:

  • Combined (there is only one Model which contains all of the items from all model files)
  • Fully typed (expressions always have full, correct types - we abort compilation if there were errors)
  • Destructured - declaration items can only declare a single identifier
  • Fully resolved (identifiers do not have their own names, they simply point to the item they are for)
  • Desugared (since type information is now available, we can perform more desugarings)
  • Expressions are boxed rather than arena allocated

The language constructs made available in THIR are an attempt to provide rewriting stages with a convenient representation which is not too high-level and complex to process, but not so low-level as to be cumbersome to use.

Structure

The root THIR node is the Model which contains arenas for each kind of item. Since there is only one Model, the item indices now globally identify the items in the program. Each item contains storage for the expressions it owns in its ExpressionAllocator. Expressions contain their types, which are computed during construction. A builder API is provided (ExpressionBuilder) for creating expressions.

Lowering from HIR

Lowering involves adding the items from the HIR into the THIR model. This is done in topologically sorted order, so we only have to visit each item once. One exception is that functions with bodies are lowered in two stages: first, the function signature is added, then the body is added to it after all items have been processed. Otherwise identifiers in the function bodies could refer to items not yet added to the THIR.

Desugarings

The THIR involves several semantic desugarings from the HIR.

Destructuring variable declarations are rewritten using multiple declarations:

HIR syntaxDesugaring
any: (x, (y, z)) =
  (1, (2, 3));
tuple(int, tuple(int, int)): A =
  (1, (2, 3));
int: x = A.1;
tuple(int, int): B = A.2;
int: y = B.1;
int: z = B.2;

Type aliases are removed as they are resolved:

HIR syntaxDesugaring
type Foo = tuple('..'(1, 3), '..'(2, 4));
var Foo: x;
set of int: A = '..'(2, 4);
set of int: B = '..'(1, 3);
tuple(var A, var B): x;

2D array literals are rewritten using array2d:

MiniZinc syntaxDesugaring
[| 1, 2
 | 3, 4 |]
array2d('..'(1, 2), '..'(1, 2), [1, 2, 3, 4])
[|    c: d:
 | a: 1, 2
 | b: 3, 4 |]
array2d([a, b], [c, d], [1, 2, 3, 4])

Indexed array literals are rewritten using arrayNd:

MiniZinc syntaxDesugaring
[3: a, b, c]
arrayNd(3, [a, b, c])
[3: a, 4: b, 5: c]
arrayNd([3, 4, 5], [a, b, c])

Array access is rewritten into a call to '[]':

HIR syntaxDesugaring
any: x = [1, 2, 3];
any: y = x[1];
array [int] of int: x = [1, 2, 3];
array [int] of int: y = '[]'(1);

Slicing is rewritten using mzn_slice function calls:

HIR syntaxDesugaring
any: x = [|1, 2 | 3, 4 |];
any: y = x['..'(1, 2), '..'(1, 1)];
array [int] of int: x = array2d(
  '..'(1, 2),
  '..'(1, 2),
  [1, 2, 3, 4]
);
array [int] of int: y = let {
  set of int: A = '..'(1, 2);
  set of int: B = '..'(1, 1);
} in array2d(A, B, mzn_slice(x, [A, B]));
any: x = [|1, 2 | 3, 4|];
any: y = x[..<, 2];
array [int] of int: x = array2d(
  '..'(1, 2),
  '..'(1, 2),
  [1, 2, 3, 4]
);
array [int] of int: y = let {
  set of int: A = '..<'(index_set(x));
  set of int: B = {2};
} in array1d(A, mzn_slice(x, [A, B]));

Performing a tuple/record access on an array is rewritten to perform the field access on each element of the array using a comprehension:

HIR syntaxDesugaring
any: x = [(1, "a"), (2, "b")];
any: y = x.1;
array [int] of tuple(int, string): x =
  [(1, "a"), (2, "b")];
array [int] of int: y = [v.1 | v in x];
any: x = [
  (a: 1, b: "hello"),
  (a: 2, b: "world")
];
any: y = x.a;
array [int] of record(int: a, string: b): x = [
  (a: 1, b: "hello"),
  (a: 2, b: "world")
];
array [int] of int: y = [v.a | v in x];

Case expressions are rewritten such that the destructuring is moved into the branch RHS, and pattern identifiers which create new variables are replaced with the wildcard _ pattern:

HIR syntaxDesugaring
enum Foo = A(Bar);
enum Bar = {B};
var Foo: x;
any: y = case x of
  A(v) => v
endcase;
enum Bar = { B };
enum Foo = A(Bar);
var Foo: x;
var Bar: y = case x of
  A(_) => let {
    var Bar: v = A⁻¹(x);
  } in v
endcase;

Pattern matching in comprehension generators is rewritten using a case expression in the generator's where clause. Destructuring is rewritten as assignment generators:

HIR syntaxDesugaring
enum Foo = A(Bar);
enum Bar = {B};
any: x = [v | A(v) in Foo];
enum Bar = { B };
enum Foo = A(Bar);
array [int] of Bar: x = [
  v | i in Foo
      where case i of
              A(_) => true,
              _ => false
            endcase,
        v = A⁻¹(i)
];

Model transformations

At the THIR level, several model-to-model transformations occur which progressively remove language constructs which do not exist in the mid-level intermediate representation.

The order of these transforms is often important as certain information or language constructs may be removed in one transform, and so cannot be used in subsequent transforms.

The transforms which occur are:

See Model transformations for more details.