The Shackle Almanac

This booklet contains the (early) design and documentation of the Shackle constraint modelling framework. The Shackle project was started as a redesign of the MiniZinc compiler, and the handling the MiniZinc language remains its primary goal. Early research into a design of such a framework was conducted in Jip J. Dekker's thesis: "A Modern Architecture for Constraint Modelling Languages" The intention of this document is to be updated throughout the design process as to both keep a history of the design of this framework and make this a start of the internal documentation of the framework.

Acknowledgements

This research was partially funded by the Australian Government through the Australian Research Council Industrial Transformation Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA), Project ID IC200100009.

The compilation process

The Shackle compiler takes a MiniZinc model and compiles it into a bytecode program which is interpreted to produce FlatZinc for solvers. Compilation is divided into modules dealing with each of the major source code representations. Frontend analysis predominantly uses the high-level intermediate representation (HIR). The typed high-level (THIR) and mid-level (MIR) intermediate representations deal with transforming the MiniZinc program into MicroZinc, then finally the bytecode generation step takes place to produce the final compiled program. This program is data-independent. The interpreter parses the data runs the bytecode program to generates NanoZinc and ultimately FlatZinc for the solver.

flowchart TD;
  mzn>MiniZinc model files]--Parsing-->cst["Parse tree (CST)"]

  cst-->ast["Abstract syntax tree (AST)"]
  ast--"Lowering (syntactic desugaring)"-->hir["High-level intermediate representation"]

  subgraph parser [Parsing]
    cst
    ast
  end

  hir-->validation[/"Validation (e.g. type checking)"/]
  validation-->thir["Typed high-level intermediate representation"]
  validation-->ls["Code tools (e.g. language server)"]
  thir-->transform[/"Basic transformations"/]
  transform-->mir["Middle-level intermediate representation"]
  mir-->analysis[/"Code analysis (e.g. mode analysis)"/]
  analysis-->rewrite[/"Rewriting"/]
  rewrite-->uzn[MicroZinc]
  uzn-->codegen[/"Code generation"/]
  codegen-->bc[Bytecode]

  subgraph compiler [Data independent compilation]
    hir
    thir
    mir
    validation
    transform
    analysis
    rewrite
    uzn
    codegen
    bc
  end

  dzn>Data files]
  bc--->interpret
  dzn-->interpret
  interpret[/Interpretation/]
  interpret-->nzn[NanoZinc]

  subgraph interpreter [Interpreter]
    interpret
    nzn
  end

  nzn-->solver[FlatZinc solver]
  solver-->result[Result]
  result-->heuristic[Search heuristic]
  heuristic-->solution(Solution)
  heuristic--->interpret

  subgraph solving [Solving]
    solver
    result
    heuristic
    solution
  end

Compilation stages

Parsing
tree-sitter is used to generate a CST from a MiniZinc model.
Abstract syntax tree
Type-safe accessors for the CST, used for include resolution.
High-level intermediate representation
A syntactic desugaring phase for the first main intermediate representation.
Includes scope collection, name resolution, function resolution, type checking, and various validation checks.
Typed high-level intermediate representation
A semantic desugaring phase which combines the HIR nodes with their computed types.
Includes type specialisation and removal of optional and enum types, as well as other model-level transformations.
Mid-level intermediate representation
The MiniZinc THIR is transformed into a MicroZinc AST.
Bytecode generation
Code generation of a program which will be interpreted to generate the final FlatZinc.
Bytecode interpretation
The bytecode along with the data is interpreted to produce NanoZinc and later FlatZinc or any other format for solver backends.

Query-based architecture

The compiler utilises Salsa to provide a demand-driven, incremental architecture. Salsa tracks the dependencies for queries, and memoises query results to avoid recomputation where possible. This is especially useful for the language server. The compiler frontend up to the HIR stage is designed to be incremental with respect to changing input files - if a model is changed, then only the changed portion, and anything that depended on it will be recomputed.

This means we use a 'pull' architecture - tasks are done lazily when demanded. The main query produces the compiled program, requires the MIR, which in turn requires the THIR, which in turn requires the HIR, which in turn requires the parsed model files. The CompilerDatabase is the main query database for the compiler. The actual queries are defined in the database trait for the module they belong to.

Development

THIR can be pretty-printed to produce a model file which can be fed to the old compiler
MIR should be able to do the same
This allows us to test the compiler without having to also write the interpreter

Other compilers

Much of the design of the compiler has been influenced by other projects, particularly

Parsing

Tree-sitter is used to generate a concrete syntax tree from a MiniZinc model. The grammar is located in the parsers/tree-sitter-minizinc/grammar.js file. There is also a corpus of tests which can be run to test the parser's output.

Since this concrete syntax tree is too low level to perform most useful compilation steps, an abstract syntax tree will be constructed (with the AST nodes linked to the related CST nodes) during AST generation.

Example

The model

test foo() = true;
function bool: bar() = true;
var 1..3: x;

Gives the CST

kind="source_file", named=true, has_error=false, error=false, missing=false, extra=false, field=None
  kind="predicate", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("item")
    kind="test", named=false, has_error=false, error=false, missing=false, extra=false, field=Some("type")
    kind="identifier", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("name")
    kind="(", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind=")", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="=", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="boolean_literal", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("body")
      kind="true", named=false, has_error=false, error=false, missing=false, extra=false, field=None
  kind=";", named=false, has_error=false, error=false, missing=false, extra=false, field=None
  kind="function_item", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("item")
    kind="function", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="type_base", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("type")
      kind="primitive_type", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("domain")
        kind="bool", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind=":", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="identifier", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("name")
    kind="(", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind=")", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="=", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="boolean_literal", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("body")
      kind="true", named=false, has_error=false, error=false, missing=false, extra=false, field=None
  kind=";", named=false, has_error=false, error=false, missing=false, extra=false, field=None
  kind="declaration", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("item")
    kind="type_base", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("type")
      kind="var", named=false, has_error=false, error=false, missing=false, extra=false, field=Some("var_par")
      kind="infix_operator", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("domain")
        kind="integer_literal", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("left")
        kind="..", named=false, has_error=false, error=false, missing=false, extra=false, field=Some("operator")
        kind="integer_literal", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("right")
    kind=":", named=false, has_error=false, error=false, missing=false, extra=false, field=None
    kind="identifier", named=true, has_error=false, error=false, missing=false, extra=false, field=Some("name")
  kind=";", named=false, has_error=false, error=false, missing=false, extra=false, field=None

AST generation

After parsing, a MiniZinc AST is generated from the CST. This provides type-safe accessors to the nodes in the syntax tree. No desugaring takes place at this stage, and all semantic nodes are are made available other than parentheses (which made explicit in the tree structure). So nodes like whitespace, comments and semicolons from the CST are removed in the AST.

This is still too low level for most analysis, and there several constructs which are semantically the same, but with different syntactic representations. Therefore, the next step is to resolve the include items, and then lower each model into HIR.

Example

The model

test foo() = true;
function bool: bar() = true;
var 1..3: x;

Gives the AST

MznModel(
    Model {
        items: [
            Predicate(
                Predicate {
                    cst_kind: "predicate",
                    declared_type: Test,
                    id: UnquotedIdentifier(
                        UnquotedIdentifier {
                            cst_kind: "identifier",
                            name: "foo",
                        },
                    ),
                    parameters: [],
                    body: Some(
                        BooleanLiteral(
                            BooleanLiteral {
                                cst_kind: "boolean_literal",
                                value: true,
                            },
                        ),
                    ),
                    annotations: [],
                },
            ),
            Function(
                Function {
                    cst_kind: "function_item",
                    return_type: TypeBase(
                        TypeBase {
                            cst_kind: "type_base",
                            var_type: None,
                            opt_type: None,
                            any_type: false,
                            domain: Unbounded(
                                UnboundedDomain {
                                    cst_kind: "primitive_type",
                                    primitive_type: Bool,
                                },
                            ),
                        },
                    ),
                    id: UnquotedIdentifier(
                        UnquotedIdentifier {
                            cst_kind: "identifier",
                            name: "bar",
                        },
                    ),
                    parameters: [],
                    body: Some(
                        BooleanLiteral(
                            BooleanLiteral {
                                cst_kind: "boolean_literal",
                                value: true,
                            },
                        ),
                    ),
                    annotations: [],
                },
            ),
            Declaration(
                Declaration {
                    cst_kind: "declaration",
                    pattern: Identifier(
                        UnquotedIdentifier(
                            UnquotedIdentifier {
                                cst_kind: "identifier",
                                name: "x",
                            },
                        ),
                    ),
                    declared_type: TypeBase(
                        TypeBase {
                            cst_kind: "type_base",
                            var_type: Some(
                                Var,
                            ),
                            opt_type: None,
                            any_type: false,
                            domain: Bounded(
                                InfixOperator(
                                    InfixOperator {
                                        cst_kind: "infix_operator",
                                        left: IntegerLiteral(
                                            IntegerLiteral {
                                                cst_kind: "integer_literal",
                                                value: Ok(
                                                    1,
                                                ),
                                            },
                                        ),
                                        operator: Operator {
                                            cst_kind: "..",
                                            name: "..",
                                        },
                                        right: IntegerLiteral(
                                            IntegerLiteral {
                                                cst_kind: "integer_literal",
                                                value: Ok(
                                                    3,
                                                ),
                                            },
                                        ),
                                    },
                                ),
                            ),
                        },
                    ),
                    definition: None,
                    annotations: [],
                },
            ),
        ],
    },
)

Include resolution

Include resolution involves recursively going through the include items in a MiniZinc model, and finding the linked model files. There are also two 'implied' includes which make the standard library available. These are stdlib.mzn and solver_redefinitions.mzn. While this process is described here as part of the AST module, as the include items are extracted from the AST, it actually produces ModelRef IDs, which are part of the HIR module.

Include resolution gives a list of ModelRefs which comprises all of the model files to be lowered into HIR. Logically these model files get concatenated together, however we don't actually do this until the THIR stage. The list of models produced is in the order that the models appear.

Resolution rules

The resolution method used depends on the kind of path used in the include item. Note that relative includes are never resolved relative to the process's current directory, only to the current model file (so string models without files cannot use ./path/to/model.mzn includes).

Include	Resolution method
`include "foo.mzn";`	Resolved relative to include search directories If not found, resolved relative to current file
`include "./foo.mzn";`	Resolved relative to current file
`include "/path/to/foo.mzn";`	Absolute path used

Cyclic includes

As all models are locally concatenated together, it actually does not matter if there are includes which are cyclic. We only ever add any particular model file to the list of models once, so cycles are simply ignored.

Error handling

Errors during include resolution prevent lowering to HIR from occurring (as without having all of the included files available, it is highly likely that uninformative name resolution and type errors will occur). Note that when we fail to resolve an include, we actually don't abort include resolution. This way, we can produce errors for as many missing files as possible.

High-level intermediate representation (HIR)

The HIR is the representation used for validating the input MiniZinc program. Name resolution and type checking results are used by the language server to provide type information and diagnostics. This phase of the compiler is incremental with respect to changing model code. When an item hasn't changed (e.g. white-space changed, or the change happened in a different file), then results of relevant analyses can be reused.

Structure and IDs

HIR nodes are assigned 'IDs' which are generally based on numeric indices which can be used to access the node itself. The HIR is structured such that each .mzn file gets its own Model, each of which which gets a ModelRef for its ID. Models contain arenas to store each kind of top-level item. Local to each model, items can be referred to using their arena index, and the LocalItemRef type can be used to store the index of any item. The ItemRef type includes both the ModelRef and the LocalItemRef for an item, meaning it can be used to globally identify any item.

Expressions, user-ascribed types, and patterns (called HIR 'entities') are stored in arenas inside each item, and can be locally referred to using their arena indices. To refer to one of these entities globally, the ExpressionRef, TypeRef and PatternRef IDs can be used. EntityRef can be used to store a reference to any of these entities. It should be noted that local items inside let expressions use their parent's expression storage and do not themselves store their expressions internally like top-level items.

There is also a NodeRef type for referencing an arbitrary HIR node (i.e. models, items, entities). This is mainly useful for tracking origins of nodes.

Types in the HIR

The Types used in the HIR represent user-written types, and may be incomplete. This is in contrast to the Ty type, which is type computed by the type-checker, and is always complete.

Consider:

any: x = 1;

In this case, the Type for the declaration of x will be any (set during lowering), but the Ty will be par int (computed during type checking).

Lowering from the AST

Lowering from AST is done on a per-model basis, accessed using the db.lookup_model(model_ref) query. The actual HIR trees are kept separate from the source-mapping back to AST, which allows analysis queries to ignore the source-mapping (unless they are emitting diagnostics) and avoid unnecessary recomputation when source locations change.

Lowering AST expressions is done by walking the AST and allocating the HIR expressions bottom up. We also apply some syntactic desugarings during this process. Patterns and types are also lowered in a similar way.

As the HIR nodes are built up, a SourceMap containing the original AST nodes and the type of desugaring which occurred (if any) is populated. This source mapping can be accessed with db.lookup_source_map(model_ref). This should only be done if a diagnostic needs to be produced.

One complication in lowering is that we use different representations for enum assignment items and parameter assignment items, even though we cannot distinguish them syntactically. To deal with this, we keep track of the names of the enum items in the AST, so that we can detect what kind of assignment item is being used.

A further complication is that type-inst identifiers in MiniZinc do not get 'pre-declared', and may simply appear in a function signature. When lowering function items, we have to keep track of the type-inst identifiers and what positions they are used in to obtain the implied type-inst identifier 'declarations'.

Desugarings

The HIR involves several syntactic desugarings from the AST.

Note that only desugarings which cannot cause future errors referring to non-user written constructs to occur are permitted, in order to avoid emitting confusing error messages, or needing to track desugarings and specialise error messages for them.

For example, we could desugar 2D array literals into calls to array2d at this point, however, if the user were to use an invalid index set type, as in [|"a": "b": | 2: x, 3: y |], we would immediately rewrite that into array2d([2, 3], ["a", "b"], [x, y]), which would give a type error indicating that no overload of array2d exists for the given argument types, even though the user has not called array2d at all.

Predicate/test items are rewritten into function items:

MiniZinc syntax	Desugaring
`predicate foo(); test bar();`	`function var bool: foo(); function bool: bar();`

Unary/binary operators are rewritten as calls:

MiniZinc syntax	Desugaring
`a + b`	`'+'(a, b)`
`-a`	`'-'(a)`
`a..`	`'..o'(a)`

Generator calls are rewritten as calls with comprehension arguments:

MiniZinc syntax	Desugaring
`forall (i in 1..3) (foo(i))`	`forall([foo(i) \| i in 1..3])`

String interpolation is rewritten using concat and show:

MiniZinc syntax	Desugaring
`"foo\(value)bar"`	`concat(["foo", show(value), "bar"])`

Analysis

Analysis of the HIR is generally done on a per-item basis. This allows us to avoid recomputation if an item hasn't changed (since even after lowering a model again, many items may still be the same). Additionally, analysis at this stage is designed to be as robust to errors as possible. For example, scope collection errors do not prevent type checking, and the type-checker tries to continue as much as possible in the presence of type errors.

Scope collection

Scope collection determines the identifiers in scope for each expression in an item (and also the identifiers in the global scope). This can be accessed using the db.lookup_item_scope(item_ref) query. Note that this does not perform name resolution - that is done in the type-checker as identifiers in call expressions cannot be resolved until the argument types are known (due to overloading).

We also check to ensure that variable declarations and function parameters only use irrefutable patterns (i.e. those which always match every value of the type).

See Scope collection for more detail about the process.

Type checking

Type checking is done for each HIR item, accessed through the db.lookup_item_types(item_ref) query. It produces:

A mapping from expression indices to their computed types
A mapping from pattern indices to their computed types
- Note that these types are augmented to include more information about the pattern (e.g. so you can tell the difference between a pattern forming a variable declaration and a pattern that forms a type alias).
A mapping from identifier expression indices to their declaring PatternRef
A mapping from identifier pattern indices to their declaring PatternRef (e.g. for enum atoms/constructor functions)

See Type checking for more detail about the process.

Case expression exhaustiveness checking

Case expressions inside items are checked to ensure that they are exhaustive.

For example

enum Foo = {A, B, C} ++ D(Bar);
enum Bar = {E, F};
var Foo: x;
any: y = case x of
    A => 1,
    B => 2,
    D(E) => 3,
endcase;

Is missing the case for the pattern C and the pattern D(F).

The algorithm is based on testing whether or not a pattern is 'useful' given a list of other patterns. If the wildcard pattern _ is still useful given all the patterns that appear in the case arms, then the case expression is non-exhaustive. See the documentation in pattern_matching.rs for details.

Topological sorting

The topological sorter sorts the items so the declarations are not used in definitions/domains (other than function bodies) before they appear. This is where cyclic definitions are detected.

For example:

int: y = x + 1;
int: z = x - 1;
int: x = 3;

Will be reordered into:

int: x = 3;
int: y = x + 1;
int: z = x - 1;

Whereas

int: x = y;
int: y = x;

Has no ordering that allows definitions to appear before they are used. Therefore, this is an error.

However, this is allowed:

constraint y = x;
var int: x = y;
var int: y;

As it can be reordered into:

var int: y;
var int: x = y;
constraint y = x;

Which does allow every variable declaration to appear before being used.

Validation

Some final validation of the HIR is done at the whole-program level, since some problems may be spread across multiple files.

Check for duplicate function definitions
- For each pair of function items f1 and f2 with the same name, we consider it to be an error if the parameters of f1 can be used to call f2 and vice-versa, and either:
  - Both have bodies, or
  - They have different return types
Check for duplicate constructors
- Defining two constructor functions with the same name is not allowed
- E.g. enum Foo = A(B) ++ A(C) is illegal since there would be two constructors named A.
- Note that it is legal to overload a constructor function using a normal function. In fact, the function may have exactly the same signature as the constructor, and will be used in preference to it.
Check for multiple assignments to the same variable (if allowing multiple assignments to the same variable is not enabled)
Check for multiple solve items

If no errors were emitted at all for the HIR module (or the syntax model), then the program is valid and we proceed to lower the program into the typed high-level intermediate representation (THIR).

Scope collection

Scope collection gives the identifiers in scope for each expression in an item. Note that at this point, we don't perform name resolution, as resolving call identifiers requires the types of the arguments to perform function overloading resolution, and resolving record field access requires knowledge of the accessed record's type.

It would be possible to only keep track of this for expressions which are identifiers, rather than all expressions (and we could even perform name resolution for variables and non-overloaded function calls). However, having the scopes computed for every expression is useful for providing code completion information in the language server.

Scope collection results

Scope collection results in a data structure containing an arena of Scopes which contain the identifiers defined in each scope in an item. Each Scope keeps track of its parent Scope index, so that the chain of scopes can be followed to resolve an identifier. Each expression index in an item is mapped to a scope index and generation (see below for details on generations).

The db.lookup_item_scope(item_ref) query produces a ScopeResult which allows for easy lookup of functions and variables in scope for an expression. It automatically traverses the item scopes and will bubble up to global scope to find an identifier. Note that the result of looking up identifiers is a PatternRef (or list of them for overloaded functions). These point to the declaring identifier pattern of the identifier (as there may be several identifiers declared in a single item, via destructuring or in a let expression, so it is not sufficient to give only the item).

Scopes and generations

A Scope keeps track of the variables and functions declared in a particular scope, along with the parent scope (if any). For example:

function foo(int: a) =
    let {
        int: x = 3 * a;
        int: y = x + 2;
    } in a + y;

Has two scopes: one created by the function item for its body, and one created by the let item.

The first scope contains a declaration for the identifier a and its parent scope (assuming this is a top-level item) is the global scope. The second scope has declarations for the identifiers x and y, and its parent scope is the first scope.

Then, the scope for the outer let would be computed to be the first scope, while the scope for the RHS of x, the RHS of y, and the in expression a + y would be the second scope (and all of their sub-expressions would also be assigned to be the second scope).

Note that items in let expressions are such that they only enter into scope once they appear, so an item cannot refer to an item later than itself. For example:

let {
    int: y = 10;
    int: z = let {
        int: x = y;
        int: y = 1;
    } in x;
} in z

Will give z = 10 as the let item int: y = 1 is not in scope for the declaration of x.

This is achieved by keeping track of the 'generation' where a let-item identifier has been declared. The generation is incremented every time we visit an identifier pattern which creates a new variable. It should also be noted that the patterns for a declaration in a let are always visited after processing their RHS. Expressions are then assigned both a scope, along with a generation defining the minimum generation an identifier must be to be considered available in the expression. Exiting a scope restores the generation the value it had upon entering the scope.

Following the example:

The declaration int: y = 10; is visited
- The RHS expression (10) is assigned scope 0, gen 0
- The LHS identifier y is assigned scope 0, gen 1
The declaration int: z = let { int: x = y; int: y = 1; } in x is visited
- The RHS let expression is assigned scope 0, gen 1
  - The declaration int: x = y; is visited
    - The RHS expression y is assigned scope 1, gen 1
    - The LHS identifier x is assigned scope 1, gen 2
  - The declaration int: y = 1 is visited
    - The RHS expression 1 is assigned scope 1, gen 2
    - The LHS identifier y is assigned scope 1, gen 3
  - The in expression x is assigned scope 1, gen 3
- The LHS identifier z is assigned scope 0, gen 1 (restored because exited scope 1)
The in expression z is assigned scope 0, gen 1

From this, we can see that while the RHS of int: x = y is scope 1, since its generation is 1, the y does not refer to the y in scope 1 (which is generation 3), and so instead refers to the y in scope 0 (which is generation 1).

It should be noted that the 'generation' values are an implementation detail - what actually matters in the end is to be able to determine whether or not an identifier is in scope or not for a given expression.

Patterns

Patterns used for variable or parameter declarations are only allowed to contain tuples, records and identifiers. Otherwise, the pattern is refutable and may not match all values. Identifiers in variable declarations always create new variables, and do not refer to enumeration/annotation atoms (instead they get shadowed).

enum Foo = {A, B};

% OK: creates variables `x`, `y` and `z`.
any: (x, (foo: y, bar: z)) = (1, (foo: 2, bar: 3));

% OK: shadows enum values `A` and `B`
any: z = let {
    int: A = 1;
    int: B = 2;
} in A + B;

% ERROR: refutable pattern may not match all cases.
any: (m, 1) = (3, 2);

In comprehension generators and case expressions, all patterns are allowed, and identifiers can either refer to enumeration/annotation atoms, or if none match, they create new variables. A problem with this approach is that since we don't have namespacing, a user could create an atom with the same name as a variable binding in a case expression or generator in another part of the code, and then that variable binding would instead match that atom, changing the behaviour (likely causing the model to fail type checking).

Type checking

The type-checker performs bottom-up typing of expressions, and checks that the types are correct. This is performed on a per-item basis. The type-checker deals with 'signatures' and 'bodies' of items separately. Signatures give the type information needed to compute the type of an expression referring to the item. Typing a body involves typing the rest of the item (generally this is the annotations and the RHS).

Consider the items:

function float: foo(int: x :: my_ann) = x + 0.5;
1..3: a = 23;
any: b = foo(a);

foo has a signature of function float: foo(int). Its RHS of x + 0.5 is typed separately and verified against the signature return type, along with the annotation my_ann.
a has a signature of int: a, which is computed using only the LHS of the declaration as the type is complete. The RHS of 23 is typed separately and verified against the signature type.
b uses an any type and so its signature of float: b is computed using the RHS.

Type checker results

Type checking an item produces several maps containing the computed types. The TypeResult provides access to all types for an item, by using the signature types and the body types as needed. This can be obtained using the db.lookup_item_types(item_ref) query.

Indexing using an ArenaIndex<Expression> gives the Ty of the expression.
- E.g. the expression {1, 1.5} will have type set of float.
Indexing using an ArenaIndex<Pattern> gives the PatternTy of the pattern.
- The PatternTy includes information about what the pattern is being used for, rather than just the type of the pattern itself (e.g. so we can tell if a pattern is simply for destructuring, or if it actually declares a new variable).
- E.g. the declaration any: (x, y) = (1, 1.5) will be such that
  - (x, y) is a PatternTy::Destructuring(tuple(int, float))
  - x is a PatternTy::Variable(int)
  - y is a PatternTy::Variable(float)
The name_resolution(ident) method finds the PatternRef for an expression which is an identifier (e.g. identifier pointing to a variable).
The pattern_resolution(ident) method finds the PatternRef for a pattern which is an identifier (e.g. enum atom used in a case expression pattern)

Expressions

Typing of expressions is done recursively, using the types of child expressions to determine the type of the outer expression.

For example, the expression ([1, 2.5], 3) will be typed by:

The first expression is a tuple, so visit its child fields
- The first tuple field is an array literal so visit the child members
  - The first child member is an integer literal, so its type is int
  - The second child member is a float literal, so its type is float
- Therefore the array literal is determined to have type array [int] of float as float is the most specific supertype of int and float.
- The second tuple field is an integer literal, so its type is int
Therefore the tuple is of type tuple(array [int] of float, int)

Calls

One exception to bottom-up typing occurs when dealing with calls, as in order to perform overloading resolution, we must specially handle the case of a call with an identifier callee. See the following section on function resolution for more detail.

For example, the call foo(1, "a") would be typed by:

Getting the types of each argument (in this case int and string)
Finding all the function items named foo
Performing overloading resolution given the argument types
foo is given the type of the operation
The call is given the return type of the operation

If the call does not have an identifier as its callee, the callee type is determined in the usual bottom-up fashion, and no overloading resolution is required.

Identifiers

Identifier expressions cause the type checker to lookup the identifier in the expression's scope, and fetch the type signature of the retrieve pattern's item to determine the type of the identifier. For bodies, we can fetch the signature of the item directly (and reuse the computation of that signature if it's already done), but for signatures which depend on other signatures, we do not do this because of the possibility of cycles.

Instead, the signatures of the dependencies are always computed, and if we reach the initiating signature during this process, we can break out of the cycle and return the error type (at that stage, not at the initiating site). Any cyclic definition errors will be emitted later during topological sorting. This is important for dealing with function signatures that include calls to other overloads of the function, since these should work, but resolving the overloaded call would require the type of 'this' signature, which is in the middle of being computed.

Ascribed types

Typing user-written types (the HIR Type) involves computing the concrete type (Ty) they represent. For example, the type var 1..3 is typed by computing the type of the domain 1..3. Since it is a set of int, then the variable is an int type, and since the type has the var modifier, the complete type is computed to be var int.

Typing these becomes more complex when dealing with incomplete types. In these cases, we always have a RHS expression which can be typed first, and then we can use this computed type to fill in the 'holes' in the ascribed type.

For example, consider the (contrived) declaration:

array [_] of var set of any: x = [{1, 2}];

In this case, we start by computing the type of the RHS, and determine it to be of type array [int] of set of int. This type is unified with the ascribed type to obtain the type array [int] of var set of int for x.

Patterns

Typing of patterns is done top-down, based on an already computed type that the pattern must take. For example, to type the declaration

tuple(var int, any): (a, b) = (1, 2.5);

The concrete type of the declaration is tuple(var int, float) (found by computing the type of the RHS and using it to fill in hole in the LHS type)
Therefore, the root pattern (a, b) has type tuple(var int, float).
We then visit the child patterns and set their type to the corresponding tuple field type.
So a has type var int
And b has type float

One complication is that when dealing with constructor calls in patterns, we only know the 'return type' of the call, as we know the type of the call pattern itself, but not its arguments. We use this return type to determine the what the actual constructor call must be. For example:

enum Foo = A(Bar) ++ {B, C};
enum Bar = {D, E};
var Foo: x;
any: y = case x of
    A(v) => v,
    _ => E
endcase;

The type of x is var Foo
Therefore the type of the first case pattern A(v) is also var Foo
- So based on that, we must be using the constructor var Foo: A(var Bar), and so this is the type of the A pattern
- The type of v is therefore var Bar
- So the result for the first case arm is var Bar
The second case pattern _ should also match the type of x (var Bar)
- Its result type is E, which is the type Bar
- So the case expression is of type var Bar

Function resolution

Function resolution involves determining a call identifier and a list of argument Tys, the best matching overload to use (if there is one). The algorithm used is:

If there is a variable which is an op function with the correct name, use it.
If not, then find the function items in scope with the correct name.
Remove any candidates which have an incorrect number of arguments.
Remove any candidates which cannot be instantiated with the argument Tys
For every pair of candidates a, and b remaining, eliminate the 'less specific' function
- Instantiate a and b with our call's argument types
- If the instantiated argument types of a are all subtypes of the instantiated argument types of b, but not the other way around, then eliminate b (and vice-versa).
- If both instantiated argument types are equivalent, then
  - If a is polymorphic but b is a concrete function, then eliminate a (and vice-versa)
  - If both candidates are polymorphic, then
    - If a can be instantiated with b's (original) parameter types, but not the other way around, eliminate a (and vice-versa)
  - If a has a body but b doesn't, eliminate b (and vice-versa)
- Otherwise arbitrarily eliminate a

Essentially, we choose such that:

More specific parameter types are preferred: bool over int, int over var int. We take the instantiated argument types into account, so a function taking $$E will be preferred over a function taking float if called with an int argument.
Where two candidates would instantiate to the same types, prefer $$E functions over $T functions, and prefer concrete types over either of those.

It is an error if we are left with more than one candidate, or no candidates at the end of the process. Additionally, it should be noted that in the case of two identical functions which have bodies, we will simply arbitrarily choose one at this stage to allow type checking to continue with a reasonable return type. The duplicate function error will be emitted during final validation of the HIR.

Type-inst variables

There are two kinds of type-inst variables: $T and $$E. $T matches any type other than arrays and functions (this is because the old compiler doesn't accept arrays for these, but actually it seems like it would make more sense if it did). $$E matches any enumerable type (booleans, integers, enums) - unlike the old compiler, this is a a properly generic type parameter, and is not considered to a special int.

It's also possible to restrict $T to accept only types valid for array indices if it is used as array [$T] of int. In this case only integers/enums or tuples of them would be accepted.

One complication is that the way type-inst variables are used looks different to most other languages. Since omitting var/opt implies par, non-opt, a parameter declared to be $T is actually par (non-opt) $T. That is, the parameter will accept the par, non-optional version of whatever type the type parameter $T is given as. any $T is used to make the parameter accept the actual type of the type parameter $T.

Note that the language is currently missing anyvar and anyopt, which would be needed for expressing parameters whose inst/optionality depend on the input type parameters (for type-inst variables used in parameters only, this is not really a problem since using var $T or opt $T will still let you match the needed combinations - the problem is that if you use the type-inst variable in the return type, specifying opt $T will force the return type to be opt, even if $T is non-optional).

Determining the types substituted for type-inst variables

Since calls do not explicitly set the types to give for each type-inst variable, we have to use the argument types to determine them. Consider the function:

function var opt $T: foo(var $T: x, any $T: y);
var 1..3: p;
opt int: q;
any: v = foo(p, q);

Then for the call foo(p, q) we start by considering the first argument p, which is of type var int. The function parameter is declared as var $T, so that means it applies its own inst (var) and optionality (non-opt) to the type of $T - therefore we strip off the inst and optionality from the given argument type and use that for $T. So $T for this argument is set to be (par, non-opt) int.

Next we look at the second argument q, which is of type opt int. Since the function parameter is declared as any $T, it does not apply any modifiers to the type of $T, so we set the type of $T for this argument to be opt int.

Then the final type for $T is the most specific supertype of int and opt int, meaning the final type is opt int. From this, we can compute the actual instantiated function signature by substituting $T = opt int for the parameters and return type.

The first parameter var $T becomes var int (again, since var $T is really var, non-opt $T).
The second parameter of any $T gives opt int (since any means don't apply any modifiers to $T).
The return type of var opt $T gives var opt int (since it applies the var and opt modifiers to $T).

So the instantiated result is:

function var opt int: foo(var int: x, opt int: y);
var 1..3: p;
opt int: q;
var opt int: v = foo(p, q);

Output typing

The expressions of output items, and definitions of :: output_only declarations are type-checked in a special mode where the types of top-level identifiers are assumed to be par. This alleviates the need to manually insert calls to fix().

In this example, if we don't consider p in the output statement to be par, then the if-then-else will cause will trigger a type error as it will return the illegal type var string.

var bool: p;
output [if p then "Yes" else "No" endif];

Typed high-level intermediate representation (THIR)

The THIR is used as a transition phase between the HIR and the MIR. It progressively rewrites the IR into lower-level constructs until it can be readily lowered into MIR. Compared to HIR, the THIR is:

Combined (there is only one Model which contains all of the items from all model files)
Fully typed (expressions always have full, correct types - we abort compilation if there were errors)
Destructured - declaration items can only declare a single identifier
Fully resolved (identifiers do not have their own names, they simply point to the item they are for)
Desugared (since type information is now available, we can perform more desugarings)
Expressions are boxed rather than arena allocated

The language constructs made available in THIR are an attempt to provide rewriting stages with a convenient representation which is not too high-level and complex to process, but not so low-level as to be cumbersome to use.

Structure

The root THIR node is the Model which contains arenas for each kind of item. Since there is only one Model, the item indices now globally identify the items in the program. Each item contains storage for the expressions it owns in its ExpressionAllocator. Expressions contain their types, which are computed during construction. A builder API is provided (ExpressionBuilder) for creating expressions.

Lowering from HIR

Lowering involves adding the items from the HIR into the THIR model. This is done in topologically sorted order, so we only have to visit each item once. One exception is that functions with bodies are lowered in two stages: first, the function signature is added, then the body is added to it after all items have been processed. Otherwise identifiers in the function bodies could refer to items not yet added to the THIR.

Desugarings

The THIR involves several semantic desugarings from the HIR.

Destructuring variable declarations are rewritten using multiple declarations:

HIR syntax	Desugaring
`any: (x, (y, z)) = (1, (2, 3));`	`tuple(int, tuple(int, int)): A = (1, (2, 3)); int: x = A.1; tuple(int, int): B = A.2; int: y = B.1; int: z = B.2;`

Type aliases are removed as they are resolved:

HIR syntax	Desugaring
`type Foo = tuple('..'(1, 3), '..'(2, 4)); var Foo: x;`	`set of int: A = '..'(2, 4); set of int: B = '..'(1, 3); tuple(var A, var B): x;`

2D array literals are rewritten using array2d:

MiniZinc syntax	Desugaring
`[\| 1, 2 \| 3, 4 \|]`	`array2d('..'(1, 2), '..'(1, 2), [1, 2, 3, 4])`
`[\| c: d: \| a: 1, 2 \| b: 3, 4 \|]`	`array2d([a, b], [c, d], [1, 2, 3, 4])`

Indexed array literals are rewritten using arrayNd:

MiniZinc syntax	Desugaring
`[3: a, b, c]`	`arrayNd(3, [a, b, c])`
`[3: a, 4: b, 5: c]`	`arrayNd([3, 4, 5], [a, b, c])`

Array access is rewritten into a call to '[]':

HIR syntax	Desugaring
`any: x = [1, 2, 3]; any: y = x[1];`	`array [int] of int: x = [1, 2, 3]; array [int] of int: y = '[]'(1);`

Slicing is rewritten using mzn_slice function calls:

HIR syntax Desugaring

HIR syntax	Desugaring
`any: x = [\|1, 2 \| 3, 4 \|]; any: y = x['..'(1, 2), '..'(1, 1)];`	`array [int] of int: x = array2d( '..'(1, 2), '..'(1, 2), [1, 2, 3, 4] ); array [int] of int: y = let { set of int: A = '..'(1, 2); set of int: B = '..'(1, 1); } in array2d(A, B, mzn_slice(x, [A, B]));`
`any: x = [\|1, 2 \| 3, 4\|]; any: y = x[..<, 2];`	`array [int] of int: x = array2d( '..'(1, 2), '..'(1, 2), [1, 2, 3, 4] ); array [int] of int: y = let { set of int: A = '..<'(index_set(x)); set of int: B = {2}; } in array1d(A, mzn_slice(x, [A, B]));`

any: x = [|1, 2 | 3, 4 |];
any: y = x['..'(1, 2), '..'(1, 1)];

array [int] of int: x = array2d(
  '..'(1, 2),
  '..'(1, 2),
  [1, 2, 3, 4]
);
array [int] of int: y = let {
  set of int: A = '..'(1, 2);
  set of int: B = '..'(1, 1);
} in array2d(A, B, mzn_slice(x, [A, B]));

any: x = [|1, 2 | 3, 4|];
any: y = x[..<, 2];

array [int] of int: x = array2d(
  '..'(1, 2),
  '..'(1, 2),
  [1, 2, 3, 4]
);
array [int] of int: y = let {
  set of int: A = '..<'(index_set(x));
  set of int: B = {2};
} in array1d(A, mzn_slice(x, [A, B]));

Performing a tuple/record access on an array is rewritten to perform the field access on each element of the array using a comprehension:

HIR syntax Desugaring

HIR syntax	Desugaring
`any: x = [(1, "a"), (2, "b")]; any: y = x.1;`	`array [int] of tuple(int, string): x = [(1, "a"), (2, "b")]; array [int] of int: y = [v.1 \| v in x];`
`any: x = [ (a: 1, b: "hello"), (a: 2, b: "world") ]; any: y = x.a;`	`array [int] of record(int: a, string: b): x = [ (a: 1, b: "hello"), (a: 2, b: "world") ]; array [int] of int: y = [v.a \| v in x];`

any: x = [(1, "a"), (2, "b")];
any: y = x.1;

array [int] of tuple(int, string): x =
  [(1, "a"), (2, "b")];
array [int] of int: y = [v.1 | v in x];

any: x = [
  (a: 1, b: "hello"),
  (a: 2, b: "world")
];
any: y = x.a;

array [int] of record(int: a, string: b): x = [
  (a: 1, b: "hello"),
  (a: 2, b: "world")
];
array [int] of int: y = [v.a | v in x];

Case expressions are rewritten such that the destructuring is moved into the branch RHS, and pattern identifiers which create new variables are replaced with the wildcard _ pattern:

HIR syntax Desugaring

HIR syntax	Desugaring
`enum Foo = A(Bar); enum Bar = {B}; var Foo: x; any: y = case x of A(v) => v endcase;`	`enum Bar = { B }; enum Foo = A(Bar); var Foo: x; var Bar: y = case x of A(_) => let { var Bar: v = A⁻¹(x); } in v endcase;`

enum Foo = A(Bar);
enum Bar = {B};
var Foo: x;
any: y = case x of
  A(v) => v
endcase;

enum Bar = { B };
enum Foo = A(Bar);
var Foo: x;
var Bar: y = case x of
  A(_) => let {
    var Bar: v = A⁻¹(x);
  } in v
endcase;

Pattern matching in comprehension generators is rewritten using a case expression in the generator's where clause. Destructuring is rewritten as assignment generators:

HIR syntax Desugaring

HIR syntax	Desugaring
`enum Foo = A(Bar); enum Bar = {B}; any: x = [v \| A(v) in Foo];`	`enum Bar = { B }; enum Foo = A(Bar); array [int] of Bar: x = [ v \| i in Foo where case i of A(_) => true, _ => false endcase, v = A⁻¹(i) ];`

enum Foo = A(Bar);
enum Bar = {B};
any: x = [v | A(v) in Foo];

enum Bar = { B };
enum Foo = A(Bar);
array [int] of Bar: x = [
  v | i in Foo
      where case i of
              A(_) => true,
              _ => false
            endcase,
        v = A⁻¹(i)
];

Model transformations

At the THIR level, several model-to-model transformations occur which progressively remove language constructs which do not exist in the mid-level intermediate representation.

The order of these transforms is often important as certain information or language constructs may be removed in one transform, and so cannot be used in subsequent transforms.

The transforms which occur are:

See Model transformations for more details.

Model transformations

THIR seems to be a good fit for model level transformations.
The idea would be to perform some sort of rewriting of the THIR tree to produce a new tree

Output item removal

The MIR does not have output items. Instead, we create :: output_only string declarations for each output section.

output ["foo\n"];
output :: "section" ["bar\n"];
output :: "section" ["qux\n"];

Becomes:

string: mzn_output :: output_only = concat(["foo\n"]);
string: mzn_output_section :: output_only = concat(["bar\n"] ++ ["qux\n"]);

We also make the :: output variables explicit, so var declarations with no RHS definitions get :: output added unless they are already marked as :: output or :: no_output.

Rewriting of domains to constraints

This transform rewrites domains into constraints where necessary.

The MIR only allows domains in the type-insts of fresh variables without RHS definitions which do not contain structured types (or nested arrays).

For example, var 1..3: x, var set of 1..3, array [1..3] of var 1..3, array [1..3] of var set of 1..3.

Other type-inst domains, such as function parameter domains, function return type-inst domains, domains of par declarations, and domains of declarations with RHS definitions, are transformed into constraints which enforce the domains (or assertions if the declaration is par).

1..3: x;

Becomes:

int: x;
constraint assert(x in 1..3, "Value out of range");

Structured types and arrays need to check domains for each element:

tuple(1..2, 1..3): x;
array [int] of 1..3: y;

Becomes:

tuple(int, int): x;
constraint assert(x.1 in 1..2 /\ x.2 in 1..3, "Value out of range");
array [int] of int: y;
constraint forall (y_i in 1..3) (y_i in y);

The actual implementation also has to ensure that the semantics of when the domain gets evaluated remains consistent with what's expected, so some extra let expressions are often needed to prevent re-evaluation.

Since we also would like to present the user with a useful error/warning message, we also keep track of how to display the variable accessor (e.g. x.1.foo for tuple(record(1..3: foo)): x) and use the mzn_domain_constraint functions to perform the checks.

Unpacking of structured type declarations

Consider the declaration:

tuple(var 1..2, var 1..3): x;

Since the type-inst is for a tuple type, we must rewrite this to not have a domain.

However, since this has no RHS, we do not have to introduce constraints - instead we can unpack x into its constituent variables:

tuple(var int, var int): x = let {
  var 1..2: a;
  var 1..3: b;
} in (a, b);

In the case of an array of structured types, a comprehension can be used. For example:

array [1..3] of tuple(var 1..2, var 1..3): x;

Can be transformed into:

array [1..3] of tuple(var int, var int): x = [
  let {
    var 1..2: a;
    var 1..2: b;
  } in (a, b)
| _ in 1..3];

Index set checking

For arrays, each declared index set needs to be checked against the true index set of the array.

array [1..3, int] of int: x;

Becomes:

array [int, int] of int: x;
constraint assert(index_set_1of2(x) = 1..3, "Index set doesn't match");

Since we again need to produce a useful error message, we perform the index set checks using the mzn_check_index_set function, which can produce an error message mentioning which dimension has the incorrect index set.

Top-down typing

Top-down typing of the model involves pushing types from enclosing expressions into their subexpressions. This allows us to determine a real type for every type found to be bottom during bottom-up type-checking.

For example, in the expression [1, <>], bottom-up typing would give:

1 is an int
<> is an opt bottom
[1, <>]: is an array [int] of opt int

In top-down typing, we start from the outer-most expression and work inwards:

[1, <>]: is an array [int] of opt int (found from bottom-up typing)
1: is an opt int (because the element type of the enclosing array is opt int)
<>: is an opt int (because the element type of the enclosing array is opt int)

This allows us to remove the bottom from the MIR entirely.

For a declaration item, the top-down type of the outer-most RHS expression is taken to be the declared LHS type from the declaration.

For example, in tuple(opt int, float): x = (3, 5):

(3, 5) is a tuple(opt int, float) (taken from LHS)
3 is an opt int (first field of tuple)
5 is a float (second field of tuple)

The top-down type of a call argument is taken to be the declared type of the parameter.

For example:

function int: foo(opt int: x);
int: y = foo(3);

foo(3) is an int (taken from the LHS of y)
3 is an opt int (taken from the parameter x of the function foo)

Polymorphic types

For a polymorphic function which with a type-inst variable in the return type, we may need to use the return type to determine the type of the parameters:

For example;

function array [$X] of any $T: foo(array [$X] of any $T: x);
array [int] of int: y = foo([]);

foo([]) is an array [int] of int (taken from the LHS of y)
For this call, the return type array [$X] of any $T is therefore array [int] of int
- So $X = int and $T = int
- Therefore the parameter x is array [int] of int
[] is an array [int] of int (taken from the parameter x of the function foo)

Making coercion to option types explicit

Since option types are going to be transformed into tuples, we must keep track of values which are known to occur, but are used as optional values. These coercions must be made explicit since one the option types become tuples, the non-tuple occurring type will no longer be a subtype of the optional one.

Type specialisation

Type specialisation involves generating concrete versions of polymorphic functions for each call to such a function. This is needed because the MIR does not have polymorphic functions.

We start by finding the top-level polymorphic calls, and then create the concrete versions if we haven't already. We then traverse the bodies of the functions and recursively specialise polymorphic calls in them.

The end result is that all reachable calls are now non-polymorphic.

We can then remove all polymorphic which have bodies.

Calls to polymorphic functions without bodies are left as-is and not specialised since they would be interpreter builtins.

Example

function any $T: foo(any $T: x) = bar(x);
function any $T: bar(any $T: x) = x;
any: a = foo(1);
any: b = bar(2);

First we process the call foo(1) and generate

function int: foo(int: x) = bar(x);

Then we process the call in the body of the newly generated foo, which is bar(x), and generate

function int: bar(int: x) = x;

Then we process the call bar(2), but don't need to generate anything since we already generated an integer version.

The final result is

function int: foo(int: x) = bar(x);
function int: bar(int: x) = x;
any: a = foo(1);
any: b = bar(2);

Dynamic dispatch

In order to facilitate dynamic dispatch, we need to ensure that when there are multiple compatible polymorphic overloads of a function, we instantiate all versions which are more specific that the one that has been called. Otherwise, the more specific versions won't be in the model which we create the function dispatch preambles.

For example, consider:

function int: foo(any $T: x) = 1;
function int: foo(var $$E: x) = 2;
function int: foo($U: x) = 3;
var opt 1..1: x;
constraint foo(x);

If we only instantiated the var opt int version of foo(any $T), then we would not be able to dispatch to the other versions at runtime.

To do this, we find the type-inst var instantiations for each polymorphic overload (other than the original) given the argument types, and then instantiate calls using those types.

In this case, we have two other overloads and an argument of var opt int:

foo(var $$E) gives $$E = int, and so the signature becomes foo(var int).
foo($U) gives $U = int, and so the signature becomes foo(int).

We then instantiate foo(var int) and foo(int) and will be able to generate function dispatch preambles as required.

Note that it's possible for multiple overloads to give the same signature - so we actually have to re-lookup the best matches to prevent instantiating the wrong versions. For example, if the second overload was foo($$E), then both foo($$E) and foo($U) would give foo(int) to instantiate, and in that case, we need to use the more specific $$E version and not the $U one.

Finally, we also have keep track of which polymorphic function each specialised function comes from, because dynamic dispatch should not happen between different instantiations of the same polymorphic function.

Generation of `show` for erased types

Enums, option types and records (and types containing them) need to have specialised versions of show generated. The definition of show for plain enums is actually generated during enum erasure, but the others are generated at this stage.

Optimisation for enum functions

In the future, it would be possible to add an optimisation for functions which are polymorphic over enum types. Functions which take a $$E can actually just specialise into the type-erased int version if the call does not lead to a show($$E) call somewhere down the line.

However, it should be noted that since the fzn_... functions are never polymorphic, there probably aren't that many layers of depth to most calls.

Looking up function calls after specialisation

Once specialisation has occurred, we it's no longer valid to perform function lookups on non-builtin polymorphic functions (as we would have had to specialise them in order to use them correctly).

So all calls introduced after this stage must either be to builtins, or to concrete functions (or exactly matching an existing specialised version).

Generation of function dispatch preambles

The MicroZinc Interpreter only supports static dispatch. Therefore, to emulate dynamic dispatch to more specific versions of functions, we can add preambles to the bodies of functions which dispatch to the other versions.

Dispatch follows this hierarchy:

flowchart TD
    vot["var opt T"]-- occurs is fixed to true -->vt["var T"]
    vt-- variable is fixed -->t["T"]
    vot-- variable is fixed and occurs -->t
    vot-- variable is fixed -->ot["opt T"]
    ot-- variable occurs -->t

We consider a directed graph where the nodes are each function. Then, for each pair of overloads of a function, if the parameters of one can all dispatch to other (or are the same type), then we add this as an edge.

Then by getting the transitive reduction, we can determine a list of functions each function needs to dispatch to.

Erasure of records

Erasing records into tuples involves transforming record literals into tuples, and record access into tuple access, ensuring that the order of the fields remains consistent.

record(int: foo, float: bar): x = (bar: 1.5, foo: 2);
int: y = x.foo;

Transforms into

tuple(float, int): x = (1.5, 2);
int: y = x.2;

Erasure of enums

Desugaring of comprehensions

Erasure of option types

Removal of overloading

Since MIR does not support overloading, we perform name-mangling to give each overloaded variant of a function a distinct name.

Desugaring of capturing

Compilation to MicroZinc

Compilation of the MiniZinc THIR to MicroZinc will involve transformation to a new IR for MicroZinc.

Decomposition of variable conditionals

if-then-else expressions with a variable condition need to be rewritten into function calls.

Lifting partiality

Partial functions need to transformed into total functions.

Subtype based overloading

As functions arguments also accept their subtypes, these should be transformed to dispatch to their specialised counterparts.

Pass values from outer scopes of functions through parameters

Functions which refer to variables outside the scope of the function need to be transformed such that these variables are passed as arguments to the function.

Generation of the main entrypoint

MicroZinc uses a top-level function main as its entrypoint for the interpreter.

The top-level decision variable and constraints are added as a let expression in this function, taking the model parameters as arguments.

Totalisation

Overview

A function is total iff it is defined for any argument in its domain. In MiniZinc, the domain of a function is determined by the type-insts of its arguments, but not by any constraints on its arguments.

A function is partial iff it is undefined for certain arguments in its domain. In MiniZinc, a function is partial if its return type is not a subtype of var opt bool, and at least one of these conditions holds:

If at least one of its arguments has a constrained type. E.g., function int: foo(1..3: x) = x + 1; is partial, because its argument x is constrained.
If its body is a partially defined expression
If it is a built-in function that is known to be partial (e.g. integer division, modulo etc.)

A function that is partial according to the rules above can be annotated as ::promise_total if it can be guaranteed that it will return a value for any input in its domain.

An expression is partially defined if its type is not a subtype of var opt bool and it does not evaluate to a value for all possible evaluations of its free variables. At least one of the following conditions needs to hold for an expression to be partially defined:

It is a call to a partial function or built-in operator
It is a let expression with a type that is not a subtype of var opt bool, and one of the cases in the let is a constraint, or one of the variables in the let has a constrained type-inst
One of its immediate subexpressions is partially defined

Array types

Two options:

An undefined element in an array makes the entire array undefined (MiniZinc 2 semantics)
An undefined element in an array only causes undefinedness if the element is accessed (Stuckey/Frisch paper)

Issues with option 2: Passing an array to a function now means that all arrays have to be arrays of tuples (even if all elements are statically known to be defined); or, we have to generate different versions (tuple and non-tuple) for each function, for all array argument types.

Transformation

The basic idea of the transformation is to turn any partially defined expression into a pair of expressions (b,e) where b is a Boolean that is true iff the expression is defined, and e is the value of the expression if it is defined, and ⊥ otherwise.

The return type of a partial function returning type T is changed to tuple(bool, T).
Any variable declaration of type T whose right hand side is a partially defined expression is changed into a variable declaration of type tuple(bool, T). NOTE: What about array types? Do they capture definedness for each element, or for the array?
Set literals (containing partially defined expressions) The type is transformed from set of T into tuple(bool, set of T) as follows. { e1, e2, pde1, ..., pdek, ... } into let { any: (b1,tmp1) = pde1; any: (bk,tmpk) = pdek; } in (forall([b1,...,bk]), {e1, e2, tmp1, ..., tmpk ...})
Array literals (containing partially defined expressions) ~~Change type array[...] of T into array[...] of tuple(bool, T) and change [ e1, ..., pde1, ... pdek, ... en] into [ (true, e1), ..., pde1, ... pdek, ... (true,en) ]~~ Change type array[...] of T into tuple(bool, array [...] of T and change [ e1, ..., (b1, pde1), ... (bk, pdek), ... en] into (b1 /\ ... /\ bk, [e1, pde1, ... pdek, ... en])
Array comprehensions
- Partially defined generators are not permitted (static type error).
- Partially defined generated expressions are fine. The resulting type changes from array[...] of T into tuple(bool, array[...] of T) (like array literals). May need to create array [...] of tuple(bool, T) first, then extract the definedness from that.****
- Partially defined where clauses are fine (they are their own Boolean context)
- These examples show why it would be a bad idea to allow partial generators. - par comprehensions [ pde | i in 0..10, j in x[i] ] [ pde | i in 0..10, (b,e)=x[i] where b, j in e ] - var generator comprehensions: var 0..10: x [ pde | i in 0..10, j in i div x..100] [ let { any: (b,t) = i div x..100; any: (eb, et) = pde } in if eb /\ b /\ j in t..100 then (true,et) else (b,<>) endif) | i in 0..10, j in ub(i div x..100) ]
Set comprehensions These are transformed into array comprehensions and array2set in a previous compiler phase.
Array access
var array access turns into element() which has the right semantics. function tuple(bool, any $T): element(array [int, ...] of any $T: x, int: i, ...) where the boolean is
if then else endif
Can't use a normal function since the arguments would escape into the boolean context outside, so instead define the function which returns the pair:
function tuple(var bool, var $T): if_then_else(array [int] of var bool: c, array [int] of tuple(var bool, var $T): r)
Tuple field access
Same as for arrays, so undefined inside a tuple makes the whole tuple undefined
Calls Transformed to return a tuple if not known to be total
let { ... } in If not in ::promise_total/root context and non-boolean then the conjunction of the constraints and the variable definedness becomes the let's definedness
```
let {
    var int: x = e1;
    constraint c;
} in e2
```
becomes
```
let {
    tuple(var bool, var int): x' = (de1, e1');
    var bool: c' = c;
} in (de1 /\ c' /\ de2, e2')
```
Annotations Annotations are compiled in the root context.
Output The expression in an output statement must be total. It is a static type error if it isn't. Users can use default to make all expressions total.

Context analysis

Bytecode generation

Output generation

Some kind of code which maps from FlatZinc output variables to user model output should be generated similarly to how we generate .ozn output models currently.

This functionality could be extended to allow for mapping of other solver information such as statistics, duals, etc to report these in the context of the user model.

Bytecode interpretation

MicroZinc specification

MicroZinc is a simple language used to define transformations to be performed by the interpreter. It is a simplified subset of MiniZinc. The transformation are represented in the language through the use of function definitions. A function of type var bool, describing a relation, can be defined as a native constraint. Otherwise, a function body must be provided for rewriting. The function body can use a restriction of the MiniZinc expression language. An important difference between MiniZinc and MicroZinc is that a well-formed MicroZinc does not contain partial expressions. Any partiality in the MiniZinc model must be explicitly expressed using total functions in MicroZinc. As such, constraints introduced in MicroZinc let expressions can be globally enforced. They are guaranteed to only constrain the decision variables introduced in the same let expression.

TODO: Remaining questions

What annotation operations should be allowed in MicroZinc?
The operational semantics for array access definition only works for 1-dimensional arrays. How can we generalise them for multiple dimensions?
Add list of built-in functions

Syntax

MicroZinc is defined using the following syntax,

\[ \begin{array}{lcl} \mathit{program} &::=& \mathit{func}* \\ \mathit{func} &::=& \mathsf{function}~\mathit{typeinst}~\mathsf{:}~\mathit{ident}~\mathsf{(}\mathit{typing} [\mathsf{,}~ \mathit{typing}]*\mathsf{)}~[\mathsf{=}~\mathit{letExpr}]\mathsf{;}\\&|& \mathsf{predicate}~\mathit{ident}~\mathsf{(}\mathit{typing} [\mathsf{,}~ \mathit{typing}]*\mathsf{)}~[\mathsf{=}~\mathit{letRoot}]\mathsf{;} \\ \mathit{typing} &::=& \mathit{typeinst}~\mathsf{:}~\mathit{ident} \\ \\ \mathit{expr} &::=& \mathit{letExpr} \\&|& \mathit{ident}~\mathsf{(}\mathit{val} [ \mathsf{,}~ \mathit{val}]*\mathsf{)} \\&|& \mathsf{if}~\mathit{val}~\mathsf{then}~\mathit{letExpr}~[\mathsf{elseif}~\mathit{val}~\mathsf{then}~\mathit{letExpr}]*~\mathsf{else}~\mathit{letExpr}~\mathsf{endif} \\&|& \mathsf{[}[\mathit{tuple}\mathsf{:}]~\mathit{letExpr}~\mathsf{|}~\mathit{genExpr} [ \mathsf{,}~ \mathit{genExpr}]*\mathsf{]} \\ \mathit{rootExpr} &::=& \mathit{letRoot} \\&|& \mathit{ident}~\mathsf{(}\mathit{val} [ \mathsf{,}~ \mathit{val}]*\mathsf{)} \\&|& \mathsf{if}~\mathit{val}~\mathsf{then}~\mathit{letRoot}~[\mathsf{elseif}~\mathit{val}~\mathsf{then}~\mathit{letRoot}]*~\mathsf{else}~\mathit{letRoot}~\mathsf{endif} \\&|& \mathsf{forall*root}\mathsf{(}\mathsf{[}\mathit{letRoot}~\mathsf{|}~\mathit{genExpr} [ \mathsf{,}~ \mathit{genExpr}]*\mathsf{]}\mathsf{)} \\ \mathit{letExpr} &::=& \mathit{val} \\&|& \mathsf{let}~\mathsf{\\{}\mathit{item}*\mathsf{\\}}~\mathsf{in}~\mathit{val} \\ \mathit{letRoot} &::=& \mathsf{let}~\mathsf{\\{}\mathit{item}*\mathsf{\\}}~\mathsf{in}~\mathsf{root} \\ \mathit{item} &::=& \mathit{typing}~\mathsf{;} \\&|& \mathit{typing}~\mathsf{=}~\mathit{expr}~\mathsf{;} \\&|& \mathsf{constraint}~\mathit{rootExpr}~\mathsf{;} \\ \mathit{genExpr} &::=& \mathit{ident}~\mathsf{in}~\mathit{letExpr}~[\mathsf{where}~\mathit{letExpr}] \\&|& \mathit{ident}~\mathsf{=}~\mathit{letExpr} \\ \mathit{val} &::=& \mathit{lit}~|~\mathit{range}~|~\mathit{tuple} \\&|& \mathsf{\{}\mathit{lit}~\mathsf{,}~ [\mathit{lit}~\mathsf{,}]*\mathsf{\}} \\&|& \mathsf{array}\mathit{int}\mathsf{d(}\mathit{range}~\mathsf{,}~[\mathit{range}~\mathsf{,}]*~\mathsf{[}\mathit{lit}~\mathsf{,}~ [\mathit{lit}~\mathsf{,}]*\mathsf{])} \\&|& \mathit{ident}~\mathsf{.}~\mathit{int} \\&|& \mathit{ident}~\mathsf{[}\mathit{lit}\mathsf{]} \\ \mathit{tuple} &::=& \mathsf{(}\mathit{lit}~\mathsf{,}~ [\mathit{lit}~\mathsf{,}]*\mathsf{)} \\ \mathit{range} &::=& \mathit{lit}\mathsf{..}\mathit{lit} \\&|& \mathit{ident} \\ \mathit{lit} &::=& \mathit{bool} \\&|& \mathit{int} \\&|& \mathit{float} \\&|& \mathit{str} \\&|& \mathit{ident} \\ \\ \mathit{bool} &::=& \mathsf{true}~|~\mathsf{false} \\ \mathit{int} &::=& /\texttt{[0-9]+}/ \\ \mathit{float} &::=& /\texttt{0[xX]$[0-9a-fA-F]*\\.[0-9a-fA-F]+$|$[0-9a-fA-F]+\\.?$$[pP][+-]?[0-9]+$?}/ \\ \mathit{str} &::=& /\texttt{"[\^\"]*"}/ \\ \mathit{ident} &::=& /\texttt{[A-Za-z]A-Za-z0-9*]*}/ \\ \end{array} \]

The Typing Rules of MicroZinc

The following syntax describes the types available in MicroZinc. The type syntax are used directly in let-expressions and parameter declarations.

\[ \begin{array}{lcl} \mathit{typeinst} &::=& \mathsf{pred} \\&|& \mathsf{array}~\mathsf{[}\mathit{int}\mathsf{..}\mathit{int} [ \mathsf{,}~ \mathit{int}\mathsf{..}\mathit{int}]*\mathsf{]}~\mathsf{of}~\mathit{baseType} \\&|& \mathit{baseType} \\ \mathit{baseType} &::=& \mathsf{tuple}~\mathsf{(}\mathit{typeinst} [ \mathsf{,}~ \mathit{typeinst}]*\mathsf{)} \\&|& \mathit{domType} \\&|& \mathit{primType} \\ \mathit{domType} &::=& \mathsf{var}~\mathit{val}\\&|& \mathsf{var}~\mathsf{set}~\mathsf{of}~\mathit{val}\\ \mathit{primType} &::=& \mathsf{par}~\mathsf{bool}~|~\mathsf{var}~\mathsf{bool}\\&|& \mathsf{par}~\mathsf{int}~|~\mathsf{var}~\mathsf{int}\\&|& \mathsf{par}~\mathsf{float}~|~\mathsf{var}~\mathsf{float}\\&|& \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{int}~|~\mathsf{var}~\mathsf{set}~\mathsf{of}~\mathsf{int}\\&|& \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{float}~|~\mathsf{string} \end{array} \]

Importantly, MicroZinc includes two types of sub-typing. When type $ T_1 $ is a sub-type of type $ T_2 $ then $ T_1 $ can be used anywhere where the type $ T_2 $ is required.

In MicroZinc, $ \mathsf{par}~T $ is a sub-type of $ \mathsf{var}~T $.
MicroZinc also has numeric subtyping (i.e., $ \mathsf{par~bool} $ is a subtype of $ \mathsf{par~int} $, which is a sub-type of $ \mathsf{par~float} $, and similarly $ \mathsf{var~bool} $ is a sub-type of $ \mathsf{var~int} $, which is a sub-type of $ \mathsf{var~float} $)

The $ \mathsf{pred} $ type is a special value given to expressions that enforce constraints, but do not return a value. This It should be noted that the

The following rules describe the conditions under which a MicroZinc program is correctly typed. In these rules the variable $ \Gamma $ will denote the typing context. This context contains known types for identifiers.

Functions and calls

At the top level of the MicroZinc program we find different functions. The program is well-typed if the type of the body of each function matches the declared type of the function given the declared types for the arguments. The types of all functions are available when typing the body expression of a function, allowing for (mutual) recursive functions.

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} T : T^r$} \AxiomC{$ \Gamma, f^{id} : (T'_1 \equiv )\langle T^p_1, \dots, T^p_n \rangle \rightarrow T^{r}\vdash{} funcs : \langle T'_2, \dots, T'_m \rangle $} \RightLabel{(T-Builtin)} \BinaryInfC{$ \vdash{} \mathsf{function}~T~\mathsf{:}~f^{id}~\mathsf{(} T^p_1 : x_1\mathsf{,}\dots\mathsf{,}T^p_n : x_n\mathsf{)}~\mathsf{;} funcs : \langle T'_1, \dots, T'_m \rangle $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} T : T^r$} \AxiomC{$ \Gamma, f^{id}_1 : (T'_1 \equiv) \langle T^p_1, \dots, T^p_n \rangle \rightarrow T^r \vdash{} f_2 \mathsf{;} \dots \mathsf{;} f_m : \langle T'_2, \dots, T'_m \rangle $} \noLine{} \BinaryInfC{$\Gamma, f^{id}_1 : T'_1, \dots, f^{id}_m : T'_m, x_1 : T^p_1, \dots, x_n : T^p_n \vdash{} E : T^r$} \RightLabel{(T-Func)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{function}~T~\mathsf{:}~f^{id}_1~\mathsf{(}~x_1: T^p_1, \dots, x_n: T^p_n \mathsf{)}~\mathsf{=}~E~\mathsf{;} f_2 \mathsf{;} \dots \mathsf{;} f_m : \langle T'_1, \dots, T'_m \rangle $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma, f^{id} : (T'_1 \equiv )\langle T^p_1, \dots, T^p_n \rangle \rightarrow \mathsf{pred}\vdash{} funcs : \langle T'_2, \dots, T'_m \rangle $} \RightLabel{(T-Slv-Native)} \UnaryInfC{$ \vdash{} \mathsf{predicate}~f^{id}~\mathsf{(} T^p_1 : x_1\mathsf{,}\dots\mathsf{,}T^p_n : x_n\mathsf{)}~\mathsf{;} funcs : \langle T'_1, \dots, T'_m \rangle $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma, f^{id}_1 : (T'_1 \equiv) \langle T^p_1, \dots, T^p_n \rangle \rightarrow \mathsf{pred} \vdash{} f_2 \mathsf{;} \dots \mathsf{;} f_m : \langle T'_2, \dots, T'_m \rangle $} \noLine{} \UnaryInfC{$\Gamma, f^{id}_1 : T'_1, \dots, f^{id}_m : T'_m, x_1 : T^p_1, \dots, x_n : T^p_n \vdash{} E : \mathsf{par~bool}$} \RightLabel{(T-Pred)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{function}~T~\mathsf{:}~f^{id}_1~\mathsf{(}~x_1: T^p_1, \dots, x_n: T^p_n \mathsf{)}~\mathsf{=}~E~\mathsf{;} f_2 \mathsf{;} \dots \mathsf{;} f_m : \langle T'_1, \dots, T'_m \rangle $} \end{prooftree} \]

Calls are defined in both the context of a constraint and on the right hand side of an assignment of a let-expression. In both cases the typing of call is described by the following rule.

\[ \begin{prooftree} \AxiomC{$ f^{id} : \langle T^p_1, \dots, T^p_n \rangle \rightarrow T^r \in \Gamma$} \AxiomC{$ \Gamma \vdash{} x_1 : T^p_1 ~~ \dots ~~ \Gamma \vdash{} x_n : T^p_n$} \RightLabel{(T-Call)} \BinaryInfC{$ \Gamma \vdash{} f^{id} ~\mathsf{(}~x_1 \mathsf{,} \dots \mathsf{,} x_n~\mathsf{)} : T^r $} \end{prooftree} \]

Let expressions and identifiers

Identifiers are typed simply using a lookup in the typing context. The typing of the let expression iteratively adds the types of each declaration item to the typing context. The program is well-typed when all expressions on the right-hand side of a declaration match their declared types, and any constraint items are of $ \mathsf{pred} $ type.

\[ \begin{prooftree} \AxiomC{$ x : T \in \Gamma $} \RightLabel{(T-Ident)} \UnaryInfC{$ \Gamma \vdash{} x : T$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x : T$} \RightLabel{(T-Let-Base)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{let}~\mathsf{\\{}~\mathsf{\\}}~\mathsf{in}~\mathit{x} : T$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} T : T'$} \AxiomC{$ \Gamma, x : T' \vdash{} \mathsf{let}~\mathsf{\\{}~items~\mathsf{\\}}~\mathsf{in}~y : T''$} \RightLabel{(T-Let-Decl1)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{let}~\mathsf{\\{}~T~\mathsf{:}~x~\mathsf{;}~items~\mathsf{\\}}~\mathsf{in}~y : T''$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} T : T'$} \AxiomC{$ \Gamma \vdash{} E : T'$} \AxiomC{$ \Gamma, x : T' \vdash{} \mathsf{let}~\mathsf{\\{}~items~\mathsf{\\}}~\mathsf{in}~y : T''$} \RightLabel{(T-Let-Decl2)} \TrinaryInfC{$ \Gamma \vdash{} \mathsf{let}~\mathsf{\\{}~T~\mathsf{:}~x = E~\mathsf{;}~items~\mathsf{\\}}~\mathsf{in}~y : T''$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} E : \mathsf{pred}$} \AxiomC{$ \Gamma, x : T \vdash{} \mathsf{let}~\mathsf{\\{}~items~\mathsf{\\}}~\mathsf{in}~y : T'$} \RightLabel{(T-Let-Con)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{let}~\mathsf{\\{}~\mathsf{constraint}~E~\mathsf{;}~items~\mathsf{\\}}~\mathsf{in}~y : T'$} \end{prooftree} \]

Comprehensions and Generators

As shown in the following rules, the $ \mathit{genExpr} $ rules will must have either type $ \mathsf{set~of~int} $ or $ \mathsf{array1d~of}~T $. The $ \text{T_Comp} $ rule, to type array comprehensions, will use the $ elem $ function which maps the former type to $ \mathsf{int} $ and the latter to $ T $.

\[ \begin{prooftree} \AxiomC{$ \Gamma, \vdash{} E : T$} \RightLabel{(T-Comp-Expr)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{[} E~\mathsf{|}~\mathsf{]} : \mathsf{array1d~of~} T $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} I : T$} \AxiomC{$ T \in \{ \mathsf{set~of~int}, \mathsf{array1d~of}~V \}$} \AxiomC{$ \Gamma, x : T \vdash{} \mathsf{[} E~\mathsf{|}~gens~\mathsf{]} : T' $} \RightLabel{(T-Comp-In)} \TrinaryInfC{$ \Gamma \vdash{} \mathsf{[} E~\mathsf{|}~x~\mathsf{in}~I, gens~\mathsf{]} : T' $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} B : \mathsf{par~bool}$} \AxiomC{$ \Gamma \vdash{} \mathsf{[} E~\mathsf{|}~x~\mathsf{in}~I, gens~\mathsf{]} : T $} \RightLabel{(T-Comp-Where)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{[} E~\mathsf{|}~x~\mathsf{in}~I~\mathsf{where}~B, gens~\mathsf{]} : T $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} E_2 : T$} \AxiomC{$ \Gamma, x : T \vdash{} \mathsf{[} E_1~\mathsf{|}~gens~\mathsf{]} : T' $} \RightLabel{(T-Comp-Asg)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{[} E_1~\mathsf{|}~x~\mathsf{=}~E_2, gens~\mathsf{]} : T' $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma\vdash{} \mathsf{[} I_i~\mathsf{|}~gens~\mathsf{]} : \mathsf{array1d~of~int}, \forall{} 1 \leq{} i \leq{} X$} \AxiomC{$ \Gamma\vdash{} \mathsf{[} E~\mathsf{|}~gens~\mathsf{]} : \mathsf{array1d~of}~T $} \RightLabel{(T-Comp-Ind)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{[(}I_1, \dots, I_X \mathsf{):} E~\mathsf{|}gens~\mathsf{]} : \mathsf{array}X\mathsf{d~of}~T $} \end{prooftree} \]

Arrays, Sets, and Tuples

MicroZinc has three different container types. Arrays can contain multiple, possibly duplicate, elements of the same type, each associated with a unique index with which the element can be retrieved.

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} r_i : \mathsf{par~set~of~int}, \forall{} 1 \leq{} i \leq{} X$} \AxiomC{$ \Gamma \vdash{} x_i : T, \forall{} 1 \leq{} i \leq{} n$} \RightLabel{(T-Arr)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{array}X\mathsf{d(}~r_1 \mathsf{,} \dots \mathsf{,} r_X \mathsf{,} \mathsf{[} x_1 \mathsf{,} \dots \mathsf{,} x_n \mathsf{])} : \mathsf{array}X\mathsf{d~of}~T $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} v_i : \mathsf{par}~\mathsf{int}, \forall{} 1 \leq{} i \leq{} X$} \AxiomC{$ \Gamma \vdash{} x : \mathsf{array}X\mathsf{d~of}~T$} \RightLabel{(T-Ind)} \BinaryInfC{$ \Gamma \vdash{} x \mathsf{[} v_1 \mathsf{,} \dots \mathsf{,} v_X \mathsf{]} : T $} \end{prooftree} \]

Sets contain a certain number of unique elements of the same type. Ranges of elements are also typed as sets.

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x_1 : T, \forall{} 1 \leq{} i \leq{} n$} \AxiomC{$ T \in {\mathsf{par~int}, \mathsf{par~float}}$} \RightLabel{(T-Set)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{\{} x_1 \mathsf{,} \dots \mathsf{,} x_n \mathsf{\}} : \mathsf{par~set~of}~T $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x_1 : T $} \AxiomC{$ \Gamma \vdash{} x_2 : T $} \AxiomC{$ T \in {\mathsf{par~int}, \mathsf{par~float}}$} \RightLabel{(T-Range)} \TrinaryInfC{$ \Gamma \vdash{} x_1 \mathsf{..} x_2 : \mathsf{par~set~of}~T $} \end{prooftree} \]

Finally, tuples are collections of elements with possibly different types. The number of elements in a tuple is known during type checking

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x_i : T_i, \forall{} 1 \leq{} i \leq{} n$} \RightLabel{(T-Tup)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{(} x_1\mathsf{,} \dots\mathsf{,} x_n \mathsf{)} : \mathsf{tuple(} T_1 \mathsf{,} \dots \mathsf{,} T_n \mathsf{)}$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x : \mathsf{tuple(} T_1 \mathsf{,} \dots \mathsf{,} T_i \mathsf{,} \dots \mathsf{,} T_n \mathsf{)}$} \AxiomC{$ i \in 1 \mathsf{..} n $} \RightLabel{(T-Acc)} \BinaryInfC{$ \Gamma \vdash{} x \mathsf{.} i : T_i $} \end{prooftree} \]

If-then-else Expressions

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x : \mathsf{par}~\mathsf{bool} $} \AxiomC{$ \Gamma \vdash{} E_1 : T$} \AxiomC{$ \Gamma \vdash{} E_2 : T$} \RightLabel{(T-If)} \TrinaryInfC{$ \Gamma \vdash{} \mathsf{if}~x~\mathsf{then}~E_1~\mathsf{else}~E_2~\mathsf{endif} : T $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} x_1 : \mathsf{par}~\mathsf{bool} $} \AxiomC{$ \Gamma \vdash{} E_1 : T$} \AxiomC{$ \Gamma \vdash{} \mathsf{if}~x_2~\mathsf{then}~E_2 \dots \mathsf{else}~E_n~\mathsf{endif} : T$} \RightLabel{(T-ElseIf)} \TrinaryInfC{$ \Gamma \vdash{} \mathsf{if}~x_1~\mathsf{then}~E_1~\mathsf{elseif}~x_2~\mathsf{then}~E_2 \dots \mathsf{else}~E_n~\mathsf{endif} : T $} \end{prooftree} \]

Literals

The remaining parts of the MicroZinc laguage are simple literals that have an intrinsic type.

\[ \begin{prooftree} \AxiomC{} \RightLabel{(T-Root)} \UnaryInfC{$\vdash{} \mathsf{root} : \mathsf{pred}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(T-True)} \UnaryInfC{$\vdash{} \mathsf{true} : \mathsf{par}~\mathsf{bool}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(T-False)} \UnaryInfC{$\vdash{} \mathsf{false} : \mathsf{par}~\mathsf{bool}$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{} \RightLabel{(T-int)} \UnaryInfC{$\vdash{} /\texttt{[0-9]+}/ : \mathsf{par}~\mathsf{int}$} \end{prooftree} \begin{prooftree} \AxiomC{} \RightLabel{(T-Str)} \UnaryInfC{$\vdash{} /\texttt{"[\^\"]*"}/ : \mathsf{string}$} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{} \RightLabel{(T-float)} \UnaryInfC{$\vdash{} /\texttt{0[xX]\$[0-9a-fA-F]*\\.[0-9a-fA-F]+\$|\$[0-9a-fA-F]+\\.?\$\$[pP][+-]?[0-9]+\$?}/ : \mathsf{par}~\mathsf{float}$} \end{prooftree} \]

Type Instances

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} I_i : \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{int}, \forall{} 1 \leq{} i \leq{} X$} \AxiomC{$ \Gamma \vdash{} T : T'$} \RightLabel{(T-Type-Arr)} \BinaryInfC{$ \Gamma \vdash{} \mathsf{array}~\mathsf{[}I_1 \mathsf{,} \dots, I_X\mathsf{]}~\mathsf{of}~\mathit{T} : \mathsf{array}X\mathsf{d~of}~T' $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} T_1 : T'_1 ~~ \dots ~~ \Gamma \vdash{} T_n : T'_n$} \RightLabel{(T-Type-Tup)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{tuple}\mathsf{(}\mathit{T_1}\mathsf{,} \dots, T_n\mathsf{)} : \mathsf{tuple(}T'_1, \dots, T'_n\mathsf{)} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} S : \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{int}$} \RightLabel{(T-Int-Dom)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{var}~S : \mathsf{var}~\mathsf{int} $} \end{prooftree} \begin{prooftree} \AxiomC{$ \Gamma \vdash{} S : \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{float}$} \RightLabel{(T-Flt-Dom)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{var}~S : \mathsf{var}~\mathsf{float} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Gamma \vdash{} S : \mathsf{par}~\mathsf{set}~\mathsf{of}~\mathsf{int}$} \RightLabel{(T-Set-Dom)} \UnaryInfC{$ \Gamma \vdash{} \mathsf{var}~\mathsf{set}~\mathsf{of}~S : \mathsf{var}~\mathsf{set}~\mathsf{of}~\mathsf{int} $} \end{prooftree} \]

The remaining variants ($ \mathit{primType} $) are trivial, and their expression directly describes their type.

The Operational Semantics of MicroZinc

Note that only function calls and the items in let expressions change the environment.

\[ \def\tuple#1{\left\langle #1 \right\rangle} \def\Sem#1#2{[\![#1]\!]\tuple{#2}} \def\SemLet#1#2{[\![#1]\!]_L\tuple{#2}} \def\Prog\ensuremath{\mathcal{P}} \def\Env\ensuremath{\sigma} \]

Function calls

\[ \begin{prooftree} \AxiomC{$ \mathsf{function}~T\mathsf{:}~F\mathsf{(} p*1, \dots, p_k \mathsf{)} = E; \in{} \Prog{},~\text{where the}~p_i~\text{are fresh} $} \AxiomC{$ \Sem{E*{[p_i \mapsto a_i, \forall{} 1 \leq{} i \leq{} k]}}{\Prog, \Env} \Rightarrow{} \tuple{v, \Env'} $} \RightLabel{(E-Call)} \BinaryInfC{$ \Sem{F\mathsf{(}a_1, \ldots, a_k\mathsf{)}}{\Prog, \Env, C} \Rightarrow{} \tuple{v, \Env'} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ F \in \text{Builtins} $} \RightLabel{(E-Call-Builtin)} \UnaryInfC{$ \Sem{F\mathsf{(}a_1, \ldots, a_k\mathsf{)}}{\Prog, \Env} \Rightarrow{} \tuple{\mathit{eval}(F, \langle a_1, \dots, a_k \rangle), \Env'} $} \end{prooftree} \]

TODO: The following rule comes from my thesis to call a native constraint, but I'm not sure if it works correctly when you have a functionally defined Boolean variable (call on RHS).

\[ \begin{prooftree} \AxiomC{$ \mathsf{function~var~bool:}~F\mathsf{(}p_1, \ldots, p_k\mathsf{)}; \in \Prog $} \RightLabel{(E-Call-Native)} \UnaryInfC{$ \Sem{F\mathsf{(}a_1, \ldots, a_k\mathsf{)}}{\Prog, \Env} \Rightarrow{} \tuple{\mathsf{constraint}~ F\mathsf{(}a_1, \ldots, a_k\mathsf{)}, \Env} $} \end{prooftree} \]

Let expressions

Constraint in let-expression are aggregated and bound to the returned variable. As such, let expressions are evaluated with an addition context collection $ C $, which contains the constraints enforced in the let-expression. Evaluation rules that use this special context are indicated using $ L $.

\[ \begin{prooftree} \AxiomC{$ \SemLet{\mathsf{let}~\mathsf{\{}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env, \emptyset} \Rightarrow \tuple{x, \Env'} $} \RightLabel{(E-Let)} \UnaryInfC{$ \Sem{\mathsf{let}~\mathsf{\{}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env} \Rightarrow{} \tuple{x, \Env'} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{y}{\Prog, \Env} \Rightarrow \tuple{x, \Env} $} \RightLabel{(E-Let-In)} \UnaryInfC{$ \SemLet{\mathsf{let}~\mathsf{\{}\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env, C} \Rightarrow{} \tuple{x, \Env \cup x~\texttt{↳}~C} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{E}{\Prog, \Env} \Rightarrow \tuple{c, \Env'} $} \AxiomC{$ \SemLet{\mathsf{let}~\mathsf{\{}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env', C \cup c} \Rightarrow{} \tuple{x, \Env''} $} \RightLabel{(E-Let-Con)} \BinaryInfC{$ \SemLet{\mathsf{let}~\mathsf{\{}~\mathsf{constraint}~E~\mathsf{;}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env, C} \Rightarrow{} \tuple{x, \Env''} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \SemLet{\mathsf{let}~\mathsf{\{}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env \cup \{T: x\}, C} \Rightarrow{} \tuple{x, \Env'} $} \RightLabel{(E-Let-Decl1)} \UnaryInfC{$ \SemLet{\mathsf{let}~\mathsf{\{}~T~\mathsf{:}~x~\mathsf{;}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env, C} \Rightarrow{} \tuple{x, \Env'} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{E}{\Prog, \Env} \Rightarrow \tuple{v, \Env'} $} \AxiomC{$ \SemLet{\mathsf{let}~\mathsf{\{}~items*{[x \mapsto v]}~\mathsf{\}}~\mathsf{in}~y*{[x \mapsto v]}}{\Prog, \Env', C} \Rightarrow{} \tuple{x, \Env''} $} \RightLabel{(E-Let-Decl2)} \BinaryInfC{$ \SemLet{\mathsf{let}~\mathsf{\{}~T~\mathsf{:}~x = E~\mathsf{;}~items~\mathsf{\}}~\mathsf{in}~y}{\Prog, \Env, C} \Rightarrow{} \tuple{x, \Env''} $} \end{prooftree} \]

Comprehensions and Generators

\[ \begin{prooftree} \AxiomC{} \RightLabel{(T-Comp-NoGen)} \UnaryInfC{$ \Sem{\mathsf{[} E \mathsf{|}~\mathsf{]}}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{[ ]}, \Env} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{I}{\Prog, \Env} \Rightarrow \tuple{\mathsf{[ ]}, \Env'} $} \RightLabel{(T-Comp-Empty)} \UnaryInfC{$ \Sem{\mathsf{[} E \mathsf{|}~x~\mathsf{in}~I~\mathsf{where}~B, gens~\mathsf{]}}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{[ ]}, \Env} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{I}{\Prog, \Env} \Rightarrow \tuple{\mathsf{[}v_1~\mathsf{]}, \Env'} $} \RightLabel{(T-Comp-It)} \UnaryInfC{$ \Sem{\mathsf{[} E \mathsf{|}~x~\mathsf{in}~I~\mathsf{where}~B, gens~\mathsf{]}}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{[ ]}, \Env} $} \end{prooftree} \]

Tuples and Arrays

\[ \begin{prooftree} \AxiomC{$ \Sem{x}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{(} v_1\mathsf{,} \dots\mathsf{,} v_n \mathsf{)}, \Env} $} \AxiomC{$ i \in 1 \mathsf{..} n$} \RightLabel{(E-Tup-Acc)} \BinaryInfC{$ \Sem{x \mathsf{.} i}{\Prog,\Env} \Rightarrow{} \tuple{v_i, \Env} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{x}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{[} v_n\mathsf{,} \dots\mathsf{,} v_m \mathsf{]}, \Env} $} \AxiomC{$ \Sem{y}{\Prog,\Env} \Rightarrow{} \tuple{i, \Env} $} \AxiomC{$ i \in n \mathsf{..} m$} \RightLabel{(E-Arr-Ind)} \TrinaryInfC{$ \Sem{x \mathsf{[} y \mathsf{]}}{\Prog,\Env} \Rightarrow{} \tuple{v_i, \Env} $} \end{prooftree} \]

If-then-else expressions

\[ \begin{prooftree} \AxiomC{$ \Sem{x_1}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{true}, \Env} $} \AxiomC{$ \Sem{E_1}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \RightLabel{(E-If)} \BinaryInfC{$ \Sem{\mathsf{if}~x_1~\mathsf{then}~E_1~\mathsf{elseif}~x_2~\mathsf{then}~E_2 \dots \mathsf{else}~E_n~\mathsf{endif}}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{x_1}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{false}, \Env} $} \AxiomC{$ \Sem{\mathsf{if}~x_2~\mathsf{then}~E_2 \dots \mathsf{else}~E_n~\mathsf{endif}}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \RightLabel{(E-ElseIf)} \BinaryInfC{$ \Sem{\mathsf{if}~x_1~\mathsf{then}~E_1~\mathsf{elseif}~x_2~\mathsf{then}~E_2 \dots \mathsf{else}~E_n~\mathsf{endif}}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \end{prooftree} \]

\[ \begin{prooftree} \AxiomC{$ \Sem{x}{\Prog,\Env} \Rightarrow{} \tuple{\mathsf{false}, \Env} $} \AxiomC{$ \Sem{E_2}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \RightLabel{(E-Else)} \BinaryInfC{$ \Sem{\mathsf{if}~x~\mathsf{then}~E_1~\mathsf{else}~E_2~\mathsf{endif}}{\Prog,\Env} \Rightarrow{} \tuple{v, \Env'} $} \end{prooftree} \]

Identifiers and Literals

\[ \begin{prooftree} \AxiomC{$ x \in \mathit{ident} $} \AxiomC{$ \{T: x~\texttt{↳}~C \} \in \Env $} \RightLabel{(E-Ident)} \BinaryInfC{$ \Sem{x}{\Prog, \Env} \Rightarrow{} \tuple{x, \Env} $} \end{prooftree} \begin{prooftree} \AxiomC{$ v \in \mathit{literal} $} \RightLabel{(E-Lit)} \UnaryInfC{$ \Sem{v}{\Prog, \Env} \Rightarrow{} \tuple{v, \Env} $} \end{prooftree} \]

The MicroZinc Interpreter

Builtin functions

The following functions have been included as builtin functions for

MiniZinc Syntax Changes

New syntax includes:

Tuples
Records (which are transformed into tuples)
case expression with complex pattern matching (possibly compiled into case with simple matching)
Namespacing/module system

Syntax still to be discussed:

TypeInst variables syntax
Function type syntax
Module syntax

Error handling

Good error reporting should be integral to the compiler.

We need to think about what errors could be encountered and what information is needed to generate a useful message.

MiniZinc <-> MicroZinc

Could keep a stack of the transformations, so we always know the origin of any new constraints
Transformations generate explanations of why they were done, so the logic can be followed for debugging

Bytecode

Could generate bytecode with debugging symbols giving locations for instructions
Interpreter could have a debugging runtime mode enabling full tracing