Summary of the ML language

Summary of the ML language

To load the file "foo" into an interactive session, type use "foo", or, if you prefer extensions, use "foo.ml".

Like Lisp, ML supports interactive use. Type an expression and you get back its value. In ML, however, you also get its type. Basic types are int, real, bool, string, and char. Examples are 4, 5.0, 6<7, "eight", #"9" (a character).
To preserve type purity, ~ is used for negation (unary minus), div and mod are used for integer division, and ints and reals cannot be mixed. real(x:int) converts an int x to a real; to go the other way use floor, ceil (ceiling), round, or trunc.

Boolean expressions occur in if...then...else expressions, among other places. Because if...then...else is an expression, and because ML is strongly typed, the else clause must always be present, and must be of the same type as the then clause. Otherwise the type of the entire expression cannot be determined at compile time! Note that in Lisp the form (IF ...) is still an expression, but if the else part is omitted then the result returned by a false condition is nil and nil is compatible with any type. This doesn’t happen in ML. Boolean-valued comparison operators =, <>, <, <=, >, >= behave as in most languages, except that reals cannot be compared for = or <>. The rationale for this is officially that reals are imprecise, and nearly-equal reals may fail to be exactly equal; another element to this, however, is that functions can’t be compared for equality either (for deep reasons), and so why not have basic types be equality-free too?

Variable constants can be set with val statements:
    val pi = 3.141592753;
    val name = "peter" + "dordal";
It is possible to redefine "constants", making them look like variables:
    val pi = "cherry";
but this is misleading: in some sense the old binding is still around (*is* still around in noninteractive settings), and rebinding a name allows its type to change.

ML allows "multiple values" to be a single type; parameter lists are the classic example of multiple values. Lisp allows a function to return multiple values, but this feature is seldom used. Lisp does *not* treat multple values as first-class objects the way ML does. In the statement
val t = (4, 5.0, "six");
the variable t becomes a 3-element "tuple", of type int*real*string. In general if T1 and T2 are two types, then the type of all ordered pairs from T1 and then T2 is denoted T1*T2. (Exercise: how is T1 * T2 different from a record/struct type?) The use of * here is as an operator on types. The components of t, above, can be obtained as #1(t), #2(t), and #3(t); there is no way to obtain the ith component, where i is set at run time, because the type of the ith component at compile time (without knowing i) is ambiguous.

ML supports lists almost the way Lisp does. Notationally, [ ] are used to enclose lists in ML instead of ( ). More deeply, ML lists must be homogeneous: all list elements must be the same type. If T is a type, the type of a list of T’s is T list. For example, [1, 2, 3] is of type int list. The empty list is nil (compatible with all list types) or []. The first element (car) of a list L is hd L; the cdr is tl L. The Lisp cons operator is denoted in ML with ::; if L is [2, 3, 4] then 1::L (in Lisp, (cons 1 L)) is [1, 2, 3, 4]. The @ operator is for appending lists; [1,2]@[3,4] is the list [1,2,3,4]. Note that it is a type error to cons (or append) incompatible types; for example, 1.0 :: [ 2,3 ,4] is illegal because the lefthand side is a real while the righthand side is an int list. Similarly it is illegal to write [1.0, 2,3,4].

When entering lists, quoting them is not an issue. Lisp needs quoting because evaluating a list means treating it as a function call; ML lists are never treated as executable. And in Lisp, symbols may either represent themselves or (as variables) may hold other values; in ML, variables always hold other values. Thus, if we execute val x = 4; then the list [1, 2, 3+4, x] is simply [1,2,7,4].

Defining functions
Functions are defined with
    fun identifier ( parameters) = expression ;
This is much like a Lisp defun statement. However, because ML is strictly typed, all functions used in the function-body expression must be defined before the function itself; there are special rules for mutually recursive functions. The ability to define functions days or years after they are called is a major part of Lisp’s "feel". Here is the squaring function:
    fun square (x) = x*x;
Because of ML97’s rules, the x here is assumed to be int, although real would be equally plausible. If we want to write a squaring function for reals, any of the three x’s above would have to be tagged with :real; eg
    fun square (x) = (x:real) * x;
(the parentheses around x:real can not be omitted!)

ML has an odd notion about function application; the parentheses can be omitted for functions of one variable. So, if we define
fun double(x) = 2*x;
we can call this as either double(5) or double 5. For functions of multiple parameters, the use of ( ) is needed to establish precedence. However, the ( ) can also be viewed as grouping the parameters into a single tuple; if we define
fun f (x, y, z) = y;
then f t, where t is the tuple (4, 5.0, "six") above, returns 5.0.

ML functions are lexically (statically) scoped; when we define a function that refers to a free variable, all calls to that function invoke references to the same variable. For example, suppose we type
    val x = 3;
    fun foo () = x;
    val x = 13;
foo() always returns 3, because that’s how x was bound when foo was defined. Note that this can also be interpreted as an illustration of why val statements are not assignment statements.

Interesting things happen when we define a len function on lists, as
    fun len (L) = if (L = nil) then 0 else 1 + len (tl(L));
This is exactly the same semantically as the recursive definition of len in Lisp:
(defun len (L)
    (cond ((null L) 0)
          (t (+ 1 (len(cdr L))))))
In ML, though, everything is strongly typed, and so we have to state what type of lists len is good for. But len is clearly good for any type, and so ML introduces the notion of a "type variable" to express this. The type of len, above, is `a list, where `a (read "alpha") is a type variable standing for any type. The len function can be applied to any list, but the parameter must be a list.

In this way, ML manages to be strongly typed while at the same time supporting polymorphism; that is, the len function above is strongly typed but works for lists of any type. Pass it a non-list, though, and you will get an error. This combination of strong typing + polymorphism is something of a contradiction, resolved only by ML’s powerful type-variable system.

It is also possible to specify function definitions using patterns. Lisp has such patterns, in destructuring-bind, but that is seldom used. Here is len using patterns:
fun len (nil) = 0
| len(h::t) = 1+ len(t);
The single parameter is replaced by a pattern; the argument type is ‘a list and any list x is either nil or else can be written as h::t; in this case, then, the patterns are exhaustive.

In the example above, h is never used and so it can be replaced with an underscore:
    fun len(nil) = 0 | len(_ :: t) = 1+len(t);
We can also use both patterns and a single name for the whole parameter:
    fun len(nil) = 0 | len(x as h::t) = <expr involving x, h, and t>

Here is a version of reverse in ML; it is just as inefficient as the Lisp equivalent:
    fun rev(nil) = 0 | rev (h::t) = rev (t) @ h;
Here is a two-parameter version, in which x is reversed onto the front of y:
    fun rev2(nil, y) = y | rev2 (h::t, y) = rev2(t, h::y);
Note the use of patterns here.

When defining functions with patterns, it is not always clear if patterns are exhaustive or if one argument could match multiple cases. If the patterns are not exhaustive, a runtime error results when the function is called with arguments that fail to match any pattern; a compile-time warning may also be issued. Patterns may also overlap, as the following inductive example shows.
fun fact (1) = 1 | fact(n) = n * fact (n-1);
Note that the fact(n) case still technically does match, when n=1.

The Lisp notion of let exists in ML; the value of the following expression is 11.
    let
        val x = 5;
        val y = 6;
    in
        x+y
    end;
If `a and `b are types (read alpha and beta), then the following compound types can be defined:
    `a * `b tuples (a, b), where a has type `a and b has type `b
    `a list lists of objects of type `a
    `a->`b functions that take an `a parameter and return a `b.

When ML determines the type of an expression (usually, though not necessarily, a function expression), it leaves as many subtypes as possible represented by type variables, which can then match any specific type. In this way we can define the len function above, of type `a list -> int; one definition of len can then suffice to calculate lengths of all types of lists. An early problem with strong typing was that one needed a separate definition for each type, even when the underlying form was identical. For example, in C++ you would need separate len functions for lists of floats, lists of ints, lists of chars, etc.

C++ supports polymorphism through inheritance, and also "ad-hoc" polymorphism through operator overloading. The latter is not true polymorphism, though; in ML we define len once and it works for all types of lists.

Exercise: what is the type of fun identity (x) = x?

Operators that commit us to specific types are +,-,*, etc, inequalities < <= > >=, boolean connectives andalso, orelse, and not, string operators, and explicit type conversion functions like real().

Exercise: In the definition of fact above, how does ML infer that the argument must be an int? How does ML infer that the result type is int?

The equality operators =, <> are a special case: they can be applied to most but not all types. To distinguish type variables that must support equality, the double quote is used: ``a. Reals cannot be compared for equality; neither can any function types.

Higher-order functions are functions which take functions as parameters, or which return new functions (like mapcar in lisp). Mapcar in ML is known as map (well, sort of; read further before using map); map is of type (`a->`b) * `a list -> `b list; that is, it takes a function F and a list [a1, a2, ..., aN] and returns [F(a1), F(a2), ..., F(aN)]. (Actually, the real ML map is Curried; more below). We could define map as
fun map(F, nil) = nil | map (f, x::cdr) = F(x)::map(F, cdr)

In Lisp, anonymous functions are defined with lambda. In ML, here is an anonymous square function:
fn x => x*x
Note the use of fn instead of fun, and of => instead of =.

Another important ML function is reduce; reduce (F, [a1, a2, a3, a4]) returns F(a1, F(a2, F(a3, a4))); this works perfectly for, say, the plus function. The reduce function takes a binary operator, F, and produces an n-ary operator.

The filter, or remove-if-not, function also can be defined in ML. Its type, at least assuming type definitions from foreign countries, would be (int ->bool) * (int)

So far, the functions we have considered can be viewed as taking a single parameter, namely a tuple. Parentheses are necessary to define what is part of the tuple. ML has a more general way of dealing with multple parameters, called Currying (after Haskell Curry; the language haskell is another language worth looking into). To Curry a function of two parameters, f(a,b), of type `a*`b -> `c, we treat it as a function of one parameter of type `a, but the value returned is a function of type `b. Thus, if F is the curried form of f, then Fa is a function that takes a `b as parameter, and Fa b (or F(a)(b)) is the same thing as f(a,b).

If f(a,b) is given, here is the Curried form:
fun F a = (fn b => f(a,b));
In other words, there is a natural equivalence between `a*`b -> `c and `a->(`b->`c). If f(a,b) takes pairs of type `a*`b, then Curry(f) takes an `a and returns a function `b->`c. We still need both an `a and a `b to get a `c; it is just that Curry(f) allows us to present the parameters one at a time.

The real map function, built into ML, is in fact Curried: map is a 1-parameter function taking a function F as parameter and returning a function that applies to lists:
    val listf = map (fn x => x*x);
    listf [1,2,3,4];
    it = [1,4,9,16]: int list

ML has an operator to form the composition of two functions; if f is the (real) squaring function and g is the function fn x => sqrt(1-x), then g o f is the function the graph of which is a circle.

ML also has a function foldr that takes a list of functions, all of type `a->`a, and composes them: (foldr [F1,...,Fn]) (x) is F1(...(Fn(x)). Many basic operations can be formed by starting with a list, mapping it to a list of functions, and foldring the functions. For example, if L is any list, and inc is the increment function fn(x)=>x+1, then the length of L is foldr(map(fn x=>inc)(L))(0). To multiply the elements of an int list L, we can write foldr(map (fn x => (fn y => x*y)) L) 1.

One last example concerns how one might write iterative loops in ML, without any notion of assignment. The idea is similar to Lisp’s do loop, in the case when all the work is done by the variable updates. We start with a tuple of variables, and successively update them until some condition is met. Let `a be the type of the tuples. The update function has type `a->`a, and the exit condition has type `a->bool. We also need to specify an initial value of type `a; the value returned will be the tuple for which the exit value was true. Thus, one could define while to be of type `a * (`a->bool) * (`a->`a), that is, of type initial*exit_condition*update.

For example, here is how we might calculate factorial: the loop state `a is the tuple (i,prod), where prod=i!. The inital state is (i,prod) = (1,1). The exit_condition is (i=n). And the update is fn(i,p)=>(i+1,p*(i+1)). The while loop would then exit at (n,n!), and we could immediately obtain n! as the second component.

Here is a definition of while:
    fun while(init, exit, update) =
        if exit(init) then init
        else while(update(init), exit, update);