In this chapter, we describe the syntax and semantics of WhyML.
Blank characters are space, horizontal tab, carriage return, and line feed. Blanks separate lexemes but are otherwise ignored. Comments are enclosed by (* and *) and can be nested. Note that (*) does not start a comment.
Strings are enclosed in double quotes ("
). Double quotes can be
escaped inside strings using the backslash character (\
).
The other special sequences are \n
for line feed and \t
for horizontal tab.
In the following, strings are referred to with the non-terminal
string.
The syntax for numerical constants is given by the following rules:
Integer and real constants have arbitrary precision. Integer constants can be given in base 10, 16, 8 or 2. Real constants can be given in base 10 or 16. Notice that the exponent in hexadecimal real constants is written in base 10.
Identifiers are composed of letters, digits, underscores, and primes. The syntax distinguishes identifiers that start with a lowercase letter or an underscore (lident), identifiers that start with an uppercase letter (uident), and identifiers that start with a prime (qident, used exclusively for type variables):
alpha | ::= | a – z ∣ A – Z |
suffix | ::= | alpha ∣ digit ∣ ' ∣ _ |
lident | ::= | (a – z) suffix* ∣ _ suffix+ |
uident | ::= | (A – Z) suffix* |
qident | ::= | ' (a – z) suffix* |
Identifiers that contain a prime followed by a letter, such as int32’max, are reserved for symbols introduced by Why3 and cannot be used for user-defined symbols.
In order to refer to symbols introduced in different namespaces (scopes), we can put a dot-separated “qualifier prefix” in front of an identifier (e.g. Map.S.get). This allows us to use the symbol get from the scope Map.S without importing it in the current namespace:
All parenthesised expressions in WhyML (types, patterns, logical terms, program expressions) admit a qualifier before the opening parenthesis, e.g. Map.S.(get m i). This imports the indicated scope into the current namespace during the parsing of the expression under the qualifier. For the sake of convenience, the parentheses can be omitted when the expression itself is enclosed in parentheses, square brackets or curly braces.
Prefix and infix operators are built from characters organized in four precedence groups (op-char-1 to op-char-4), with optional primes at the end:
Infix operators from a high-numbered group bind stronger than the infix operators from a low-numbered group. For example, infix operator .*. from group 3 would have a higher precedence than infix operator ->- from group 1. Prefix operators always bind stronger than infix operators. The so-called “tight operators” are prefix operators that have even higher precedence than the juxtaposition (application) operator, allowing us to write expressions like inv !x without parentheses.
Finally, any identifier, term, formula, or expression in a WhyML source can be tagged either with a string attribute or a location:
An attribute cannot contain newlines or closing square brackets; leading and trailing spaces are ignored. A location consists of a file name in double quotes, a line number, and starting and ending character positions.
WhyML features an ML-style type system with polymorphic types, variants (sum types), and records that can have mutable fields. The syntax for type expressions is the following:
Built-in types are int (arbitrary precision integers), real (real numbers), bool, the arrow type (also called the mapping type), and the tuple types. The empty tuple type is also called the unit type and can be written as unit.
Note that the syntax for type expressions notably differs from the usual ML syntax. In particular, the type of polymorphic lists is written list ’a, and not ’a list.
Snapshot types are specific to WhyML, they denote the types of ghost values produced by pure logical functions in WhyML programs. A snapshot of an immutable type is the type itself: thus, {int} is the same as int and {list ’a} is the same as list ’a. A snapshot of a mutable type, however, represents a snapshot value which cannot be modified anymore. Thus, a snapshot array a of type {array int} can be read from (a[42] is accepted) but not written into (a[42] <- 0 is rejected). Generally speaking, a program function that expects an argument of a mutable type will accept an argument of the corresponding snapshot type as long as it is not modified by the function.
term ::= integer integer constant ∣ real real constant ∣ true ∣ false Boolean constant ∣ () empty tuple ∣ qualid qualified identifier ∣ qualifier? ( term ) term in a scope ∣ qualifier? begin term end idem ∣ tight-op term tight operator ∣ { term-field+ } record ∣ { term with term-field+ } record update ∣ term . lqualid record field access ∣ term [ term ] '* collection access ∣ term [ term <- term ] '* collection update ∣ term [ term .. term ] '* collection slice ∣ term [ term .. ] '* right-open slice ∣ term [ .. term ] '* left-open slice ∣ term term+ application ∣ prefix-op term prefix operator ∣ term infix-op-4 term infix operator 4 ∣ term infix-op-3 term infix operator 3 ∣ term infix-op-2 term infix operator 2 ∣ term at uident past value ∣ old term initial value ∣ term infix-op-1 term infix operator 1 ∣ ... continued in Fig. 6.2 term-field ::= lqualid = term ; field = value qualid ::= qualifier? (lident-ext ∣ uident) qualified identifier lident-ext ::= lident lowercase identifier ∣ ( ident-op ) operator identifier ∣ ( ident-op ) (_ ∣ ') alpha suffix* associated identifier ident-op ::= infix-op-1 infix operator 1 ∣ infix-op-2 infix operator 2 ∣ infix-op-3 infix operator 3 ∣ infix-op-4 infix operator 4 ∣ prefix-op _ prefix operator ∣ tight-op _? tight operator ∣ [ ] '* collection access ∣ [ <- ] '* collection update ∣ [ ] '* <- in-place update ∣ [ .. ] '* collection slice ∣ [ _ .. ] '* right-open slice ∣ [ .. _ ] '* left-open slice
Figure 6.1: WhyML terms (part I).
term ::= ... see Fig. 6.1 ∣ not term negation ∣ term /\ term conjunction ∣ term && term asymmetric conjunction ∣ term \/ term disjunction ∣ term || term asymmetric disjunction ∣ term by term proof indication ∣ term so term consequence indication ∣ term -> term implication ∣ term <-> term equivalence ∣ term : type type cast ∣ attribute+ term attributes ∣ term (, term)+ tuple ∣ quantifier quant-vars triggers? . term quantifier ∣ ... continued in Fig. 6.3 quantifier ::= forall ∣ exists quant-vars ::= quant-cast (, quant-cast)* quant-cast ::= binder+ (: type)? binder ::= _ ∣ bound-var bound-var ::= lident attribute* triggers ::= [ trigger (| trigger)* ] trigger ::= term (, term)*
Figure 6.2: WhyML terms (part II).
A significant part of a typical WhyML source file is occupied by non-executable logical content intended for specification and proof: function contracts, assertions, definitions of logical functions and predicates, axioms, lemmas, etc.
Logical expressions are called terms. Boolean terms are called formulas. Internally, Why3 distinguishes the proper formulas (produced by predicate symbols, propositional connectives and quantifiers) and the terms of type bool (produced by Boolean variables and logical functions that return bool). However, this distinction is not enforced on the syntactical level, and Why3 will perform the necessary conversions behind the scenes.
The syntax of WhyML terms is given in Figures 6.1-6.3. The constructions are listed in the order of decreasing precedence. For example, as was mentioned above, tight operators have the highest precedence of all operators, so that -p.x denotes the negation of the record field p.x, whereas !p.x denotes the field x of a record stored in the reference p.
An operator in parentheses acts as an identifier referring to that operator, for example, in a definition. To distinguish between prefix and infix operators, an underscore symbol is appended at the end: for example, (-) refers to the binary subtraction and (-_) to the unary negation. Tight operators cannot be used as infix operators, and thus do not require disambiguation.
In addition to prefix and infix operators, WhyML supports several mixfix bracket operators to manipulate various collection types: dictionaries, arrays, sequences, etc. Bracket operators do not have any predefined meaning and may be used to denote access and update operations for various user-defined collection types. We can introduce multiple bracket operations in the same scope by disambiguating them with primes after the closing bracket: for example, a[i] may denote array access and s[i]’ sequence access. Notice that the in-place update operator a[i] <- v cannot be used inside logical terms: all effectful operations are restricted to program expressions. To represent the result of a collection update, we should use a pure logical update operator a[i <- v] instead.
WhyML supports “associated” names for operators, obtained by adding a suffix after the parenthesised operator name. For example, an axiom that represents the specification of the infix operator (+) may be called (+)’spec or (+)_spec. As with normal identifiers, names with a letter after a prime, such as (+)’spec, can only be introduced by Why3, and not by the user in a WhyML source.
The at and old operators are used inside postconditions and assertions to refer to the value of a mutable program variable at some past moment of execution (see the next section for details). These operators have higher precedence than the infix operators from group 1 (infix-op-1): old i > j is parsed as (old i) > j and not as old (i > j).
Infix operators from groups 2-4 are left-associative. Infix operators from group 1 are non-associative and can be chained. For example, the term 0 <= i < j < length a is parsed as the conjunction of three inequalities 0 <= i, i < j, and j < length a.
As with normal identifiers, we can put a qualifier over a parenthesised operator, e.g. Map.S.([]) m i. Also, as noted above, a qualifier can be put over a parenthesised term, and the parentheses can be omitted if the term is a record or a record update.
The propositional connectives in WhyML formulas are listed in Figure 6.2. The non-standard connectives — asymmetric conjunction (&&), asymmetric disjunction (||), proof indication (by), and consequence indication (so) — are used to control the goal-splitting transformations of Why3 and provide integrated proofs for WhyML assertions, postconditions, lemmas, etc. The semantics of these connectives follows the rules below:
For example, full splitting of the goal (A by (exists x. B so C)) && D produces four subgoals: exists x. B (the premise is verified), forall x. B -> C (the premise justifies the conclusion), (exists x. B /\ C) -> A (the proof justifies the affirmation), and finally, A -> D (the proof of A is discarded and A is used to prove D).
The behaviour of the splitting transformations is further controlled by attributes [@stop_split] and [@case_split]. Consult Section 9.5.5 for details.
Among the propositional connectives, not has the highest precedence, && has the same precedence as /\ (weaker than negation), || has the same precedence as \/ (weaker than conjunction), by, so, ->, and <-> all have the same precedence (weaker than disjunction). All binary connectives except equivalence are right-associative. Equivalence is non-associative and is chained instead: A <-> B <-> C is transformed into a conjunction of A <-> B and B <-> C. To reduce ambiguity, WhyML forbids to place a non-parenthesised implication at the right-hand side of an equivalence: A <-> B -> C is rejected.
term ::= ... see Fig. 6.1 and 6.2 ∣ if term then term else term conditional ∣ match term with term-case+ end pattern matching ∣ let pattern = term in term let-binding ∣ let symbol param+ = term in term mapping definition ∣ fun param+ -> term unnamed mapping term-case ::= | pattern -> term pattern ::= binder variable or ‘_’ ∣ () empty tuple ∣ { (lqualid = pattern ;)+ } record pattern ∣ uqualid pattern* constructor ∣ ghost pattern ghost sub-pattern ∣ pattern as ghost? bound-var named sub-pattern ∣ pattern , pattern tuple pattern ∣ pattern | pattern “or” pattern ∣ qualifier? ( pattern ) pattern in a scope symbol ::= lident-ext attribute* user-defined symbol param ::= type-arg unnamed typed ∣ binder (un)named untyped ∣ ( ghost? type ) unnamed typed ∣ ( ghost? binder ) (un)named untyped ∣ ( ghost? binder+ : type ) multi-variable typed
Figure 6.3: WhyML terms (part III).
In Figure 6.3, we find the more advanced term constructions: conditionals, let-bindings, pattern matching, and local function definitions, either via the let-in construction or the fun keyword. The pure logical functions defined in this way are called mappings; they are first-class values of “arrow” type τ1 -> τ2.
The patterns are similar to those of OCaml, though the when clauses and numerical constants are not supported. Unlike in OCaml, as binds stronger than the comma: in the pattern (p1,p2 as x), variable x is bound to the value matched by pattern p2. Also notice the closing end after the match-with term. A let-in construction with a non-trivial pattern is translated as a match-with term with a single branch.
Inside logical terms, pattern matching must be exhaustive: WhyML rejects a term like let Some x = o in …, where o is a variable of an option type. In program expressions, non-exhaustive pattern matching is accepted and a proof obligation is generated to show that the values not covered cannot occur in execution.
The syntax of parameters in user-defined operations—first-class mappings, top-level logical functions and predicates, and program functions—is rather flexible in WhyML. Like in OCaml, the user can specify the name of a parameter without its type and let the type be inferred from the definition. Unlike in OCaml, the user can also specify the type of the parameter without giving its name. This is convenient when the symbol declaration does not provide the actual definition or specification of the symbol, and thus only the type signature is of relevance. For example, one can declare an abstract binary function that adds an element to a set simply by writing function add ’a (set ’a) : set ’a. A standalone non-qualified lowercase identifier without attributes is treated as a type name when the definition is not provided, and as a parameter name otherwise.
Ghost patterns, ghost variables after as, and ghost parameters in function definitions are only used in program code, and not allowed in logical terms.
The syntax of program expressions is given in Figures 6.4-6.5. As before, the constructions are listed in the order of decreasing precedence. The rules for tight, prefix, infix, and bracket operators are the same as for logical terms. In particular, the infix operators from group 1 can be chained. Notice that binary operators && and || denote here the usual lazy conjunction and disjunction, respectively.
expr ::= integer integer constant ∣ real real constant ∣ true ∣ false Boolean constant ∣ () empty tuple ∣ qualid identifier in a scope ∣ qualifier? ( expr ) expression in a scope ∣ qualifier? begin expr end idem ∣ tight-op expr tight operator ∣ { (lqualid = expr ;)+ } record ∣ { expr with (lqualid = expr ;)+ } record update ∣ expr . lqualid record field access ∣ expr [ expr ] '* collection access ∣ expr [ expr <- expr ] '* collection update ∣ expr [ expr .. expr ] '* collection slice ∣ expr [ expr .. ] '* right-open slice ∣ expr [ .. expr ] '* left-open slice ∣ expr expr+ application ∣ prefix-op expr prefix operator ∣ expr infix-op-4 expr infix operator 4 ∣ expr infix-op-3 expr infix operator 3 ∣ expr infix-op-2 expr infix operator 2 ∣ expr infix-op-1 expr infix operator 1 ∣ not expr negation ∣ expr && expr lazy conjunction ∣ expr || expr lazy disjunction ∣ expr : type type cast ∣ attribute+ expr attributes ∣ ghost expr ghost expression ∣ expr (, expr)+ tuple ∣ expr <- expr assignment ∣ ... continued in Fig. 6.5
Figure 6.4: WhyML program expressions (part I).
Keyword ghost marks the expression as ghost code added for verification purposes. Ghost code is removed from the final code intended for execution, and thus cannot affect the computation of the program results nor the content of the observable memory.
Assignment updates in place a mutable record field or an element of a collection. The former can be done simultaneously on a tuple of values: x.f, y.g <- a, b. The latter form, a[i] <- v, amounts to a call of the ternary bracket operator ([]<-) and cannot be used in a multiple assignment.
expr ::= ... see Fig. 6.4 ∣ expr spec+ added specification ∣ if expr then expr (else expr)? conditional ∣ match expr with (| pattern -> expr)+ end pattern matching ∣ qualifier? begin spec+ expr end abstract block ∣ expr ; expr sequence ∣ let pattern = expr in expr let-binding ∣ let fun-defn in expr local function ∣ let rec fun-defn (with fun-defn)* in expr recursive function ∣ fun param+ spec* -> spec* expr unnamed function ∣ any result spec* arbitrary value fun-defn ::= fun-head spec* = spec* expr function definition fun-head ::= ghost? kind? symbol param+ (: result)? function header kind ::= function ∣ predicate ∣ lemma function kind result ::= ret-type ∣ ( ret-type (, ret-type)* ) ∣ ( ret-name (, ret-name)* ) ret-type ::= ghost? type unnamed result ret-name ::= ghost? binder : type named result spec ::= requires { term } pre-condition ∣ ensures { term } post-condition ∣ returns { (| pattern -> term)+ } post-condition ∣ raises { (| pattern -> term)+ } exceptional post-c. ∣ raises { uqualid (, uqualid)* } raised exceptions ∣ reads { lqualid (, lqualid)* } external reads ∣ writes { path (, path)* } memory writes ∣ alias { alias (, alias)* } memory aliases ∣ variant { variant (, variant)* } termination variant ∣ diverges may not terminate ∣ (reads ∣ writes ∣ alias) { } empty effect path ::= lqualid (. lqualid)* v.field1.field2 alias ::= path with path arg1 with result variant ::= term (with lqualid)? variant + WF-order
Figure 6.5: WhyML program expressions (part II).
The syntax for terms is given in Figure 6.1. The various constructs have the following priorities and associativities, from lowest to greatest priority:
construct | associativity |
if then else / let in | – |
label | – |
cast | – |
infix-op level 1 | left |
infix-op level 2 | left |
infix-op level 3 | left |
infix-op level 4 | left |
prefix-op | – |
function application | left |
brackets / ternary brackets | – |
bang-op | – |
Note the curryfied syntax for function application, though partial application is not allowed (rejected at typing).
The syntax for formulas is given Figure 6.6. The various constructs have the following priorities and associativities, from lowest to greatest priority:
construct | associativity |
if then else / let in | – |
label | – |
-> / <-> | right |
by / so | right |
\/ / || | right |
/\ / && | right |
not | – |
infix level 1 | left |
infix level 2 | left |
infix level 3 | left |
infix level 4 | left |
prefix | – |
Note that infix symbols of level 1 include equality (=) and disequality (<>).
Figure 6.6: Syntax for formulas.
Notice that there are two symbols for the conjunction: /\
and &&
, and similarly for disjunction. They are logically
equivalent, but may be treated slightly differently by some
transformations. For instance, split transforms the goal
A /\ B
into subgoals A
and B
, whereas it transforms
A && B
into subgoals A
and A -> B
. Similarly, it
transforms not (A || B)
into subgoals not A
and
not ((not A) /\ B)
.
The by/so connectives are proof indications. They are
logically equivalent to their first argument, but may affect the result
of some transformations. For instance, the split_goal
transformations interpret those connectives as introduction of logical cuts
(see 9.5.5 for details).
The syntax for theories is given on Figure 6.7 and 6.8.
Figure 6.7: Syntax for theories (part 1).
Figure 6.8: Syntax for theories (part 2).
TO BE COMPLETED
TO BE COMPLETED
A declaration of the form type r = < range a b > defines a type that projects into the integer range [a,b]. Note that in order to make such a declaration the theory int.Int must be imported.
Why3 let you cast an integer literal in a range type (e.g. (42:r)) and will check at typing that the literal is in range. Defining such a range type r automatically introduces the following:
function r'int r : int constant r'maxInt : int constant r'minInt : int |
The function r’int projects a term of type r to its integer value. The two constants represent the high bound and low bound of the range respectively.
Unless specified otherwise with the meta "keep:literal" on r, the transformation eliminate_literal introduces an axiom
axiom r'axiom : forall i:r. r'minInt <= r'int i <= r'maxInt |
and replaces all casts of the form (42:r) with a constant and an axiom as in:
constant rliteral7 : r axiom rliteral7_axiom : r'int rliteral7 = 42 |
This type is used in the standard library in the theories bv.BV8, bv.BV16, bv.BV32, bv.BV64.
A declaration of the form type f = < float eb sb > defines a type of floating-point numbers as specified by the IEEE-754 standard [8]. Here the literal eb represents the number of bits in the exponent and the literal sb the number of bits in the significand (including the hidden bit). Note that in order to make such a declaration the theory real.Real must be imported.
Why3 let you cast a real literal in a float type (e.g. (0.5:f)) and will check at typing that the literal is representable in the format. Note that Why3 do not implicitly round a real literal when casting to a float type, it refuses the cast if the literal is not representable.
Defining such a type f automatically introduces the following:
predicate f'isFinite f function f'real f : real constant f'eb : int constant f'sb : int |
As specified by the IEEE standard, float formats includes infinite values and also a special NaN value (Not-a-Number) to represent results of undefined operations such as 0/0. The predicate f’isFinite indicates whether its argument is neither infinite nor NaN. The function f’real projects a finite term of type f to its real value, its result is not specified for non finite terms.
Unless specified otherwise with the meta "keep:literal" on f, the transformation eliminate_literal will introduce an axiom
axiom f'axiom : forall x:f. f'isFinite x -> -. max_real <=. f'real x <=. max_real |
where max_real is the value of the biggest finite float in the specified format. The transformation also replaces all casts of the form (0.5:f) with a constant and an axiom as in:
constant fliteral42 : f axiom fliteral42_axiom : f'real fliteral42 = 0.5 /\ f'isFinite fliteral42 |
This type is used in the standard library in the theories ieee_float.Float32 and ieee_float.Float64.
A Why3 input file is a (possibly empty) list of theories.
The syntax for specification clauses in programs is given in Figure 6.9.
Figure 6.9: Specification clauses in programs.
Within specifications, terms are extended with new constructs
old
and at
:
Within a postcondition, old
t refers to the value of term t
in the prestate. Within the scope of a code mark L,
the term at
t '
L refers to the value of term t at the program
point corresponding to L.
The syntax for program expressions is given in Figure 6.10 and Figure 6.11.
Figure 6.10: Syntax for program expressions (part 1).
expr ::= ... see Fig. 6.4 ∣ expr spec+ added specification ∣ if expr then expr (else expr)? conditional ∣ match expr with (| pattern -> expr)+ end pattern matching ∣ qualifier? begin spec+ expr end abstract block ∣ expr ; expr sequence ∣ let pattern = expr in expr let-binding ∣ let fun-defn in expr local function ∣ let rec fun-defn (with fun-defn)* in expr recursive function ∣ fun param+ spec* -> spec* expr unnamed function ∣ any result spec* arbitrary value fun-defn ::= fun-head spec* = spec* expr function definition fun-head ::= ghost? kind? symbol param+ (: result)? function header kind ::= function ∣ predicate ∣ lemma function kind result ::= ret-type ∣ ( ret-type (, ret-type)* ) ∣ ( ret-name (, ret-name)* ) ret-type ::= ghost? type unnamed result ret-name ::= ghost? binder : type named result spec ::= requires { term } pre-condition ∣ ensures { term } post-condition ∣ returns { (| pattern -> term)+ } post-condition ∣ raises { (| pattern -> term)+ } exceptional post-c. ∣ raises { uqualid (, uqualid)* } raised exceptions ∣ reads { lqualid (, lqualid)* } external reads ∣ writes { path (, path)* } memory writes ∣ alias { alias (, alias)* } memory aliases ∣ variant { variant (, variant)* } termination variant ∣ diverges may not terminate ∣ (reads ∣ writes ∣ alias) { } empty effect path ::= lqualid (. lqualid)* v.field1.field2 alias ::= path with path arg1 with result variant ::= term (with lqualid)? variant + WF-order
Figure 6.11: Syntax for program expressions (part 2).
In applications, arguments are evaluated from right to left.
This includes applications of infix operators, with the only exception of
lazy operators &&
and ||
that evaluate from left to
right, lazily.
The syntax for modules is given in Figure 6.12.
Figure 6.12: Syntax for modules.
Any declaration which is accepted in a theory is also accepted in a module. Additionally, modules can introduce record types with mutable fields and declarations which are specific to programs (global variables, functions, exceptions).
A WhyML input file is a (possibly empty) list of theories and modules.
A theory defined in a WhyML file can only be used within that file. If a theory is supposed to be reused from other files, be they Why3 or WhyML files, it should be defined in a Why3 file.
The Why3 standard library provides general-purpose modules, to be used in logic and/or programs. It can be browsed on-line at http://why3.lri.fr/stdlib/. Each file contains one or several modules. To use or clone a module M from file file, use the syntax file.M, since file is available in Why3’s default load path. For instance, the module of integers and the module of references are imported as follows:
use import int.Int use import ref.Ref |
A sub-directory mach/ provides various modules to model machine arithmetic. For instance, the module of 63-bit integers and the module of arrays indexed by 63-bit integers are imported as follows:
use import mach.int.Int63 use import mach.array.Array63 |
In particular, the types and operations from these modules are mapped to native OCaml’s types and operations when Why3 code is extracted to OCaml (see Section 7.2).