This is not intended to be a language reference, but rather an informal
introduction to the whitespace language. The closest thing to a formal
specification is the implementation itself! Look in
The only lexical tokens in the whitespace language are Space (ASCII 32), Tab (ASCII 9) and Line Feed (ASCII 10). By only allowing line feed as a token, CR/LF problems are avoided across DOS/Unix file conversions. (Um, not sure. Maybe we'll sort this in a later version.).
The language itself is an imperative, stack based language. Each command consists of a series of tokens, beginning with the Instruction Modification Parameter (IMP). These are listed in the table below.
The virtual machine on which programs run has a stack and a heap. The programmer is free to push arbitrary width integers onto the stack (only integers, currently there is no implementation of floating point or real numbers). The heap can also be accessed by the user as a permanent store of variables and data structures.
Many commands require numbers or labels as parameters. Numbers can be any number of bits wide, and are simply represented as a series of [Space] and [Tab], terminated by a [LF]. [Space] represents the binary digit 0, [Tab] represents 1. The sign of a number is given by its first character, [Space] for positive and [Tab] for negative. Note that this is not twos complement, it just indicates a sign.
Labels are simply [LF] terminated lists of spaces and tabs. There is only one global namespace so all labels must be unique.
Stack Manipulation (IMP: [Space])
Stack manipulation is one of the more common operations, hence the shortness of the IMP [Space]. There are four stack instructions.
The copy and slide instructions are an extension implemented in Whitespace 0.3 and are designed to facilitate the implementation of recursive functions. The idea is that local variables are referred to using [Space][Tab][Space], then on return, you can push the return value onto the top of the stack and use [Space][Tab][LF] to discard the local variables.
Arithmetic (IMP: [Tab][Space])
Arithmetic commands operate on the top two items on the stack, and replace them with the result of the operation. The first item pushed is considered to be left of the operator.
Heap Access (IMP: [Tab][Tab])
Heap access commands look at the stack to find the address of items to be stored or retrieved. To store an item, push the address then the value and run the store command. To retrieve an item, push the address and run the retrieve command, which will place the value stored in the location at the top of the stack.
Flow Control (IMP: [LF])
Flow control operations are also common. Subroutines are marked by labels, as well as the targets of conditional and unconditional jumps, by which loops can be implemented. Programs must be ended by means of [LF][LF][LF] so that the interpreter can exit cleanly.
I/O (IMP: [Tab][LF])
Finally, we need to be able to interact with the user. There are IO instructions for reading and writing numbers and individual characters. With these, string manipulation routines can be written (see examples to see how this may be done).
The read instructions take the heap address in which to store the result from the top of the stack.
Here is an annotated example of a program which counts from 1 to 10, outputting the current value as it goes.
What could be simpler? The source code for this program is available here. Have fun!
Released April 1st, 2003
Hosted by Durham University Computing Society