Type checking
The type-checker performs bottom-up typing of expressions, and checks that the types are correct. This is performed on a per-item basis. The type-checker deals with 'signatures' and 'bodies' of items separately. Signatures give the type information needed to compute the type of an expression referring to the item. Typing a body involves typing the rest of the item (generally this is the annotations and the RHS).
Consider the items:
function float: foo(int: x :: my_ann) = x + 0.5;
1..3: a = 23;
any: b = foo(a);
foo
has a signature offunction float: foo(int)
. Its RHS ofx + 0.5
is typed separately and verified against the signature return type, along with the annotationmy_ann
.a
has a signature ofint: a
, which is computed using only the LHS of the declaration as the type is complete. The RHS of23
is typed separately and verified against the signature type.b
uses anany
type and so its signature offloat: b
is computed using the RHS.
Type checker results
Type checking an item produces several maps containing the computed types. The TypeResult
provides access to all types
for an item, by using the signature types and the body types as needed. This can be obtained using the
db.lookup_item_types(item_ref)
query.
- Indexing using an
ArenaIndex<Expression>
gives theTy
of the expression.- E.g. the expression
{1, 1.5}
will have typeset of float
.
- E.g. the expression
- Indexing using an
ArenaIndex<Pattern>
gives thePatternTy
of the pattern.- The
PatternTy
includes information about what the pattern is being used for, rather than just the type of the pattern itself (e.g. so we can tell if a pattern is simply for destructuring, or if it actually declares a new variable). - E.g. the declaration
any: (x, y) = (1, 1.5)
will be such that(x, y)
is aPatternTy::Destructuring(tuple(int, float))
x
is aPatternTy::Variable(int)
y
is aPatternTy::Variable(float)
- The
- The
name_resolution(ident)
method finds thePatternRef
for an expression which is an identifier (e.g. identifier pointing to a variable). - The
pattern_resolution(ident)
method finds thePatternRef
for a pattern which is an identifier (e.g. enum atom used in a case expression pattern)
Expressions
Typing of expressions is done recursively, using the types of child expressions to determine the type of the outer expression.
For example, the expression ([1, 2.5], 3)
will be typed by:
- The first expression is a tuple, so visit its child fields
- The first tuple field is an array literal so visit the child members
- The first child member is an integer literal, so its type is
int
- The second child member is a float literal, so its type is
float
- The first child member is an integer literal, so its type is
- Therefore the array literal is determined to have type
array [int] of float
as float is the most specific supertype ofint
andfloat
. - The second tuple field is an integer literal, so its type is
int
- The first tuple field is an array literal so visit the child members
- Therefore the tuple is of type
tuple(array [int] of float, int)
Calls
One exception to bottom-up typing occurs when dealing with calls, as in order to perform overloading resolution, we must specially handle the case of a call with an identifier callee. See the following section on function resolution for more detail.
For example, the call foo(1, "a")
would be typed by:
- Getting the types of each argument (in this case
int
andstring
) - Finding all the function items named
foo
- Performing overloading resolution given the argument types
foo
is given the type of the operation- The call is given the return type of the operation
If the call does not have an identifier as its callee, the callee type is determined in the usual bottom-up fashion, and no overloading resolution is required.
Identifiers
Identifier expressions cause the type checker to lookup the identifier in the expression's scope, and fetch the type signature of the retrieve pattern's item to determine the type of the identifier. For bodies, we can fetch the signature of the item directly (and reuse the computation of that signature if it's already done), but for signatures which depend on other signatures, we do not do this because of the possibility of cycles.
Instead, the signatures of the dependencies are always computed, and if we reach the initiating signature during this process, we can break out of the cycle and return the error type (at that stage, not at the initiating site). Any cyclic definition errors will be emitted later during topological sorting. This is important for dealing with function signatures that include calls to other overloads of the function, since these should work, but resolving the overloaded call would require the type of 'this' signature, which is in the middle of being computed.
Ascribed types
Typing user-written types (the HIR Type
) involves computing the concrete type (Ty
) they represent. For example, the
type var 1..3
is typed by computing the type of the domain 1..3
. Since it is a set of int
, then the variable is an
int
type, and since the type has the var
modifier, the complete type is computed to be var int
.
Typing these becomes more complex when dealing with incomplete types. In these cases, we always have a RHS expression which can be typed first, and then we can use this computed type to fill in the 'holes' in the ascribed type.
For example, consider the (contrived) declaration:
array [_] of var set of any: x = [{1, 2}];
In this case, we start by computing the type of the RHS, and determine it to be of type array [int] of set of int
.
This type is unified with the ascribed type to obtain the type array [int] of var set of int
for x
.
Patterns
Typing of patterns is done top-down, based on an already computed type that the pattern must take. For example, to type the declaration
tuple(var int, any): (a, b) = (1, 2.5);
- The concrete type of the declaration is
tuple(var int, float)
(found by computing the type of the RHS and using it to fill in hole in the LHS type) - Therefore, the root pattern
(a, b)
has typetuple(var int, float)
. - We then visit the child patterns and set their type to the corresponding tuple field type.
- So
a
has typevar int
- And
b
has typefloat
One complication is that when dealing with constructor calls in patterns, we only know the 'return type' of the call, as we know the type of the call pattern itself, but not its arguments. We use this return type to determine the what the actual constructor call must be. For example:
enum Foo = A(Bar) ++ {B, C};
enum Bar = {D, E};
var Foo: x;
any: y = case x of
A(v) => v,
_ => E
endcase;
- The type of
x
isvar Foo
- Therefore the type of the first case pattern
A(v)
is alsovar Foo
- So based on that, we must be using the constructor
var Foo: A(var Bar)
, and so this is the type of theA
pattern - The type of
v
is thereforevar Bar
- So the result for the first case arm is
var Bar
- So based on that, we must be using the constructor
- The second case pattern
_
should also match the type ofx
(var Bar
)- Its result type is
E
, which is the typeBar
- So the case expression is of type
var Bar
- Its result type is
Function resolution
Function resolution involves determining a call identifier and a list of argument Ty
s, the best matching overload to
use (if there is one). The algorithm used is:
- If there is a variable which is an
op
function with the correct name, use it. - If not, then find the function items in scope with the correct name.
- Remove any candidates which have an incorrect number of arguments.
- Remove any candidates which cannot be instantiated with the argument
Ty
s - For every pair of candidates
a
, andb
remaining, eliminate the 'less specific' function- Instantiate
a
andb
with our call's argument types - If the instantiated argument types of
a
are all subtypes of the instantiated argument types ofb
, but not the other way around, then eliminateb
(and vice-versa). - If both instantiated argument types are equivalent, then
- If
a
is polymorphic butb
is a concrete function, then eliminatea
(and vice-versa) - If both candidates are polymorphic, then
- If
a
can be instantiated withb
's (original) parameter types, but not the other way around, eliminatea
(and vice-versa)
- If
- If
a
has a body butb
doesn't, eliminateb
(and vice-versa)
- If
- Otherwise arbitrarily eliminate
a
- Instantiate
Essentially, we choose such that:
- More specific parameter types are preferred:
bool
overint
,int
overvar int
. We take the instantiated argument types into account, so a function taking$$E
will be preferred over a function takingfloat
if called with anint
argument. - Where two candidates would instantiate to the same types, prefer
$$E
functions over$T
functions, and prefer concrete types over either of those.
It is an error if we are left with more than one candidate, or no candidates at the end of the process. Additionally, it should be noted that in the case of two identical functions which have bodies, we will simply arbitrarily choose one at this stage to allow type checking to continue with a reasonable return type. The duplicate function error will be emitted during final validation of the HIR.
Type-inst variables
There are two kinds of type-inst variables: $T
and $$E
. $T
matches any type other than arrays and functions (this
is because the old compiler doesn't accept arrays for these, but actually it seems like it would make more sense if it
did). $$E
matches any enumerable type (booleans, integers, enums) - unlike the old compiler, this is a a properly
generic type parameter, and is not considered to a special int
.
It's also possible to restrict $T
to accept only types valid for array indices if it is used as array [$T] of int
.
In this case only integers/enums or tuples of them would be accepted.
One complication is that the way type-inst variables are used looks different to most other languages. Since omitting
var
/opt
implies par
, non-opt
, a parameter declared to be $T
is actually par (non-opt) $T
. That is, the
parameter will accept the par, non-optional version of whatever type the type parameter $T
is given as. any $T
is
used to make the parameter accept the actual type of the type parameter $T
.
Note that the language is currently missing anyvar
and anyopt
, which would be needed for expressing parameters
whose inst/optionality depend on the input type parameters (for type-inst variables used in parameters only, this is
not really a problem since using var $T
or opt $T
will still let you match the needed combinations - the problem
is that if you use the type-inst variable in the return type, specifying opt $T
will force the return type to be
opt
, even if $T
is non-optional).
Determining the types substituted for type-inst variables
Since calls do not explicitly set the types to give for each type-inst variable, we have to use the argument types to determine them. Consider the function:
function var opt $T: foo(var $T: x, any $T: y);
var 1..3: p;
opt int: q;
any: v = foo(p, q);
Then for the call foo(p, q)
we start by considering the first argument p
, which is of type var int
. The function
parameter is declared as var $T
, so that means it applies its own inst (var
) and optionality (non-opt
) to the type
of $T
- therefore we strip off the inst and optionality from the given argument type and use that for $T
. So
$T
for this argument is set to be (par
, non-opt
) int
.
Next we look at the second argument q
, which is of type opt int
. Since the function parameter is declared as
any $T
, it does not apply any modifiers to the type of $T
, so we set the type of $T
for this argument to be
opt int
.
Then the final type for $T
is the most specific supertype of int
and opt int
, meaning the final type is opt int
.
From this, we can compute the actual instantiated function signature by substituting $T
= opt int
for the parameters
and return type.
- The first parameter
var $T
becomesvar int
(again, sincevar $T
is reallyvar
, non-opt
$T
). - The second parameter of
any $T
givesopt int
(sinceany
means don't apply any modifiers to$T
). - The return type of
var opt $T
givesvar opt int
(since it applies thevar
andopt
modifiers to$T
).
So the instantiated result is:
function var opt int: foo(var int: x, opt int: y);
var 1..3: p;
opt int: q;
var opt int: v = foo(p, q);
Output typing
The expressions of output items, and definitions of :: output_only
declarations are type-checked in a special mode
where the types of top-level identifiers are assumed to be par. This alleviates the need to manually insert calls to
fix()
.
In this example, if we don't consider p
in the output statement to be par
, then the if-then-else will cause will
trigger a type error as it will return the illegal type var string
.
var bool: p;
output [if p then "Yes" else "No" endif];