Skip to the content.

Specification of the Amy Language

Modern Compiler Construction HKUST COMP 4121 Spring 2026

1. Introduction

Welcome to the Amy project! This semester, you will learn how to compile a simple functional Scala-like language from source files down to executable code. When your compiler is complete, it will be able to take Amy source (text) files as input and produce WebAssembly bytecode files. WebAssembly is a new format for portable bytecode which is meant to be run in browsers.

This document is the specification of Amy. Its purpose is to help you clearly and unambiguously understand what an Amy program means, and to be the Amy language reference, along with the reference compiler. It does not deal with how you will actually implement the compiler; this will be described to you as assignments are released.

Note: The language might change along the way, so check this specification before starting each lab, to make sure you have the latest version in mind.

1.1 Features of Amy

Let us demonstrate the basic features of Amy through some examples:

1.1.1 The factorial function

object Factorial

def fact(i: Int(32)): Int(32) :=
  if (i < 2) then 1
  else i * fact(i-1)
  end if
end fact

end Factorial

Every program in Amy is contained in a module, also called object. A function is introduced with the keyword def, and all its parameters and result type must be explicitly typed. Amy supports conditional (or if-) expressions. Notice that conditionals are not statements, but return a value, in this case an Int(32).

In fact, there is no distinction between expressions and statements in Amy. Even expressions that are called only for their side-effects return a value of type Unit.

The condition of an if-expression must be of type Boolean and its branches must have the same type, which is also the type of the whole expression.

1.1.2 Saying hello

object Hello
  Std.printString("Hello " ++ "world!")
end Hello

Amy supports compiling multiple modules together. To refer to functions (or other definitions) in another module, one must explicitly use a qualified name. There is no import statement like in Scala.

In this example, we refer to the printString function in the Std module, which contains some builtin functions to interact with the user. The string we print is constructed by concatenating two smaller strings with the ++ operator.

1.1.3 Input, local variables and sequencing expressions

object ReadName
  Std.printString("What is your name?");
  val name: String = Std.readString();
  Std.printString("Hello " ++ name)
end ReadName

We can read input from the console with the readX functions provided in Std.

We can define local variables with val, which must always be typed explicitly. The value of the variable is given after =, followed by a semicolon.

We can sequence expressions with ;. The value of the first expression is discarded, and the value of the second one is returned. Note that ; is an operator and not a terminator: you are not allowed to put it at the end of a sequence of expressions.

1.1.4 Type definitions

Except for the basic types, a user can define their own types in Amy. The user-definable types in Amy come from functional programming and are called algebraic data types. In this case, we define a type, List, and two constructors Nil and Cons, which we can call to construct values of type List.

object L
  abstract class List
  case class Nil() extends List
  case class Cons(h: Int(32), t: List) extends List
end L

1.1.5 Constructing ADT values

def range(from: Int(32), to: Int(32)): List :=
  if (to < from) then Nil()
  else
    Cons(from, range(from + 1, to))
  end if
end range

We can create a List by calling one of its two constructors like a function, as demonstrated in the range function.

1.1.6 Pattern matching

def length(l: List): Int(32) :=
  l match {
    case Nil() => 0
    case Cons(h, t) => 1 + length(t)
  }
end length

To use a list value in any meaningful way, we have to break it down, according to the constructor used to construct it. This is called pattern matching and is a powerful feature of functional programming.

In length we pattern match against the input value l. Pattern matching will check if its argument matches the pattern of the first case, and if so will evaluate the corresponding expression. Otherwise it will continue with the second case etc. If no pattern matches, the program will exit with an error. If the constructor has arguments, as does Cons in this case, we can bind their values to fresh variables in the pattern, so we can use them in the case expression.

1.1.7 Wildcard patterns and errors

The error keyword takes a string as argument, prints Error: and its argument on the screen, then exits the program immediately with an error code. In this function, we are trying to compute the head of a list, which should fail if the list is empty.

Notice that in the second case, we don’t really care what the tail of the list is. Therefore, we use a wildcard pattern (_), which matches any value without binding it to a name.

def head(l: List): Int(32) :=
  l match {
    case Nil() => error("head(Nil)")
    case Cons(h, _) => h
  }
end head

1.2 Relation to Scala

Amy, with mild syntactic variations, is designed to be as close to a simple subset of Scala as possible. However, it is not a perfect subset. You can easily come up with Amy programs that are not legal in Scala. However, many “reasonable” programs will be compilable with scalac, provided you provide an implementation of the Amy standard library along with your code. This should not be required however, as we are providing a reference implementation of Amy.


2. Syntax

The syntax of Amy is given formally by the context-free grammar of Figure 1. Everything spelled in italic is a nonterminal symbol of the grammar, whereas the terminal symbols are spelled in monospace font. * is the Kleene star, s+ stands for one or more repetitions of s, and ? stands for optional presence of a symbol (zero or one repetitions). The square brackets [] are not symbols of the grammar; they merely group symbols together. Please note that the square brackets [] are still tokenized as they are reserved for future use.

Before parsing an Amy program, the Amy lexer generates a sequence of terminal symbols (tokens) from the source files. Some non-terminal symbols mentioned, but not specified, in Figure 1 are also represented as a single token by the lexer. They are lexed according to the rules in Figure 2. In Figure 2, we denote the range between characters α and β (included) with [α-β].

The syntax in Figure 1 is an overapproximation of the real syntax of Amy. This means that it allows some programs that should not be allowed in Amy. To get the real syntax of Amy, there are some additional restrictions presented (among other things) in the following notes:


Figure 1: Syntax of Amy

Program ::= Module∗

Module ::= object Id Definition∗ Expr? end Id
Definition ::= AbstractClassDef | CaseClassDef | FunDef

AbstractClassDef ::= abstract class Id
CaseClassDef ::= case class Id ( Params ) extends Id

FunDef ::= def Id ( Params ) : Type :=  Expr end Id
Params ::= ϵ | ParamDef [ , ParamDef ]∗

ParamDef ::= Id : Type
Type ::= Int ( 32 ) | String | Boolean | Unit | [ Id . ]? Id
Expr ::= Id
       | Literal
       | Expr BinOp Expr
       | UnaryOp Expr
       | [ Id . ]? Id ( Args )
       | Expr ; Expr
       | val ParamDef = Expr ; Expr
       | if ( Expr ) then Expr else Expr end if
       | Expr match { MatchCase+ }
       | error ( Expr )
       | ( Expr )

Literal ::= true | false | ( )
          | IntLiteral | StringLiteral

BinOp ::= + | - | * | / | % | < | <=
        | && | || | == | ++

UnaryOp ::= - | !

MatchCase ::= case Pattern => Expr
Pattern ::= [ Id . ]? Id ( Patterns ) | Id | Literal | _
Patterns ::= ϵ | Pattern [ , Pattern ]∗

Args ::= ϵ | Expr [ , Expr ]∗
Figure 2: Lexical rules for Amy

IntLiteral ::= Digit+

Id ::= Alpha AlphaNum∗ (and not a reserved word)
AlphaNum ::= Alpha | Digit | _
Alpha ::= [a-z] | [A-Z]
Digit ::= [0-9]

StringLiteral ::= " StringChar∗ "
StringChar ::= Any character except newline and "

3. Semantics

In this section we will give the semantics of Amy, i.e. we will systematically explain what an Amy program represents, as well as give the restrictions that a legal Amy program must obey. The discussion will be informal, except for the typing rules of Amy.

3.1 Program Structure

An Amy program consists of one or more source files. Each file contains a single module (object), which in turn consists of a series of type and function definitions, optionally followed by an expression. We will use the terms object and module interchangeably.

3.2 Execution

When an Amy program is executed, the expression at the end of each module, if present, is evaluated. The order of execution among modules is the same that the user gave when compiling or interpreting the program. Each module’s definitions are visible within the module automatically, and in all other modules provided a qualified name is used.

3.3 Naming rules

In this section, we will give the restrictions that a legal Amy program must obey with regard to naming or referring to entities defined in the program. Any program not following these restrictions should be rejected by the Amy name analyzer.

3.4 Types and Classes

Every expression, function parameter, and class parameter in Amy has a type. Types catch some common programming errors by introducing typing restrictions. Programs that do not obey these restrictions are illegal and will be rejected by the Amy type checker.

The built-in types of Amy are Int(32), String, Boolean and Unit. Int(32) represents 32-bit signed integers. String is a sequence of characters. Strings have poor support in Amy: the only operations defined on them are concatenation and conversion to integer. In fact, not even equality is “properly” supported (see Section 3.5). Boolean values can take the values true and false. Unit represents a type with a single value, (). It is usually used as the result of a computation which is invoked for its side-effects only, for example, printing some output to the user. It corresponds to Java’s void.

In addition to the built-in types, the programmer can define their own types. The sort of types that are definable in Amy are called Algebraic Data Types (ADTs) and come from the functional programming world, but they have also been successfully adopted in Scala.

An ADT is a type along with several constructors that can create values of that type. For example, an ADT defining a list of integers in pseudo syntax may look like this:

type List = Nil() | Cons(Int(32), List)

which states that a List is either Nil (the empty list), or a Cons of an integer and another list. We will say that Cons has two fields of types Int(32) and List, whereas Nil has no fields. Inside the program, the only way to construct values of the List type is to call one of these constructors, e.g. Nil() or Cons(1, Cons(2, Nil())). You can think of them as functions from their field types to the List type.

Notice that in the above syntax, Nil and Cons are not types. More specifically, they are not subtypes of List: in fact, there is no subtyping in Amy. Only List is a type, and values such as Nil() or Cons(1, Cons(2, Nil())) have the type List.

In Amy, we use Scala syntax to define ADTs. A type is defined with an abstract class and the constructors with case classes. The above definition in Amy would be:

abstract class List
case class Nil() extends List
case class Cons(h: Int(32), t: List) extends List

Notice that the names of the fields have no practical meaning, and we only use them to stay close to Scala.

We will sometimes use the term abstract class for a type and case class for a type constructor.

The main programming structure to manipulate class types is pattern matching. In Section 3.5 we define how pattern matching works.

3.5 Typing Rules and Semantics of Expressions

Each expression in Amy is associated with a typing rule, which constrains and connects its type and the types of its subexpressions. An Amy program is said to typecheck if: 1) All its expressions obey their respective typing rules, 2) The body of each function corresponds to its declared return type.

A program that does not typecheck will be rejected by the compiler.

In the following, we will informally give the typing rules and explain the semantics (meaning) of each type of expression in Amy. We will use function type notation for typing of the various operators. For example, (A,B) => C denotes that an operator takes arguments of types A and B and returns a value of type C.

When talking about the semantics of an expression we will refer to a context. A context is a mapping from variables to the values that have been assigned to them.

3.6 Formal discussion of types

In this section, we give a formal (i.e. mathematically robust) description of the Amy typing rules. A typing rule will be given as:

   Rule Name
   P1  ...  Pn
   -----------
        C

where Pi are the rule premises and C is the rule conclusion. A typing rule means that the conclusion is true under the premises.

Conclusions and most premises will be type judgements in an environment. A type judgement Γ ⊢ e : T means that an expression (or pattern) e has type T in environment Γ. Environments Γ are mappings from variables to types and will be written as Γ = v1 : T1, . . . , vn : Tn. We can add a new pair to an environment Γ by writing Γ, vn+1 : Tn+1. We will also sometimes write a type judgement of the form Γ ⊢ p. This means that p typechecks, but we don’t assign a type to it. Type checking will try to typecheck a program under the initial environment, and reject the program if it fails to do so.

The initial environment Γ0(p) of a program p is one that contains the types of all functions and constructors in p, where a constructor is treated as a function from its fields to its parent type (see Section 3.4). The initial environment is used to kickstart typechecking at the function definition level.

Below are the typing rules for expressions, patterns, functions, and programs:

Figure 3: Typing rules for expressions

Variable
v : T ∈ Γ
-----------
Γ ⊢ v : T

Int Literal
i is an integer literal
------------------------
Γ ⊢ i : Int(32)

String Literal
s is a string literal
----------------------
Γ ⊢ s : String

Unit
--------------
Γ ⊢ () : Unit

Boolean Literal
b ∈ {true, false}
------------------
Γ ⊢ b : Boolean

Arith. Bin. Operators
Γ ⊢ e1 : Int(32)   Γ ⊢ e2 : Int(32)
op ∈ {+, -, *, /, %}
--------------------------------------
Γ ⊢ e1 op e2 : Int(32)

Arith. Comp. Operators
Γ ⊢ e1 : Int(32)   Γ ⊢ e2 : Int(32)
op ∈ {<, <=}
-------------------------------------
Γ ⊢ e1 op e2 : Boolean

Arith. Negation
Γ ⊢ e : Int(32)
-----------------
Γ ⊢ -e : Int(32)

Boolean Bin. Operators
Γ ⊢ e1 : Boolean   Γ ⊢ e2 : Boolean
op ∈ {&&, ||}
-------------------------------------
Γ ⊢ e1 op e2 : Boolean

Boolean Negation
Γ ⊢ e : Boolean
------------------
Γ ⊢ !e : Boolean

String Concatenation
Γ ⊢ e1 : String   Γ ⊢ e2 : String
-----------------------------------
Γ ⊢ e1 ++ e2 : String

Equality
Γ ⊢ e1 : T   Γ ⊢ e2 : T
-------------------------
Γ ⊢ e1 == e2 : Boolean

Sequence
Γ ⊢ e1 : T1   Γ ⊢ e2 : T2
---------------------------
Γ ⊢ e1 ; e2 : T2

Local Variable Definition
Γ ⊢ e1 : T1   Γ, n : T1 ⊢ e2 : T2
-----------------------------------
Γ ⊢ val n : T1 = e1 ; e2 : T2

Function/Class Constructor Invocation
Γ ⊢ e1 : T1 ... Γ ⊢ en : Tn
Γ ⊢ f : (T1, ... , Tn) ⇒ T
--------------------------------
Γ ⊢ f(e1, ... , en) : T

If-Then-Else
Γ ⊢ e1 : Boolean   Γ ⊢ e2 : T   Γ ⊢ e3 : T
--------------------------------------------
Γ ⊢ if (e1) then e2 else e3 end if : T

Error
Γ ⊢ e : String
---------------
Γ ⊢ error(e) : T

Pattern Matching
Γ ⊢ e : Ts
∀i ∈ [1, n]. Γ ⊢ pi : Ts
∀i ∈ [1, n]. Γ, bindings(pi) ⊢ ei : Tc
-------------------------------------------------------------
Γ ⊢ e match { case p1 => e1 ... case pn => en } : Tc
Figure 4: Typing rules for patterns, functions and programs

Wildcard Pattern
-------------
Γ ⊢ _ : T

Identifier Pattern
--------------
Γ ⊢ v : T

Case Class Pattern
Γ ⊢ p1 : T1 ... Γ ⊢ pn : Tn
Γ ⊢ C : (T1, ... , Tn) ⇒ T
----------------------------------
Γ ⊢ C(p1, ... , pn) : T

Function Definition
Γ, v1 : T1, ... , vn : Tn ⊢ e : T
---------------------------------------------
Γ ⊢ def f(v1 : T1, ... , vn : Tn): T := e end Id

Program
∀f ∈ p. Γ0(p) ⊢ f
--------------------
⊢ p

4. The standard library of Amy

Amy comes with a library of predefined functions, which are accessible in the Std object. Some of these functions implement functionalities that are not expressible in Amy, e.g. printing to the standard output. These built-in functions are implemented in JavaScript and WebAssembly in case of compilation, and in Scala in the interpreter. Built-in functions have stub implementations in the Amy Std module for purposes of name analysis and type checking.

The Amy compiler will not automatically include Std to the input files. If you want them included, you have to provide them manually.

The signature of the Std module is the following:

object Std

// Output
def printString(s: String): Unit = ...
def printInt(i: Int(32)): Unit = ...
def printBoolean(b: Boolean): Unit = ...

// Input
def readString(): String = ...
def readInt(): Int(32) = ...

// Conversions
def intToString(i: Int(32)): String = ...
def digitToString(i: Int(32)): String = ...
def booleanToString(b: Boolean): String = ...

end Std