Introduction and base language reference
This Lisp has the following characteristics and goals:
- Scheme-like, but only as far as avoiding spurious differences. No compliance with standards.
- Non-opaque closures. All functions (and environments) are s-expressions. Performance is good despite this.
- Super-sleek API
- Common Lisp-style macros
- Builtin functions are constants
- All values are printable and readable
- Case-sensitive symbol names
- Only the minimal core built-in (i.e. written in C), but enough to provide a practical level of performance
- Concise (but not necessarily simple...) implementation
- Nothing excessively weird or fancy
Any character string can be a symbol name, including the empty string. In general, text between whitespace is read as a symbol except in the following cases:
- The text begins with
- The text consists of a single period
- The text contains one of the special characters
- The text is a valid number
- The text is empty
In these cases the symbol can be written by surrounding it with
| characters, or by escaping individual characters within the
symbol using backslash
\. Note that
\ must always be
preceded with a backslash when writing a symbol name.
A number consists of an optional + or - sign followed by one of the following sequences:
NNN...where N is a decimal digit
N.NN[eE][+-]NN[f]floating point (
fmeans single precision)
0xNNN...where N is a hexadecimal digit
0NNN...where N is an octal digit
- -0.0 and -0.0f, IEEE distinguished -0 values
- Special IEEE values:
+Inf, -Inf, +NaN, -NaN, +nan.0, +inf.0, -nan.0, -inf.0
- C99-style hex float literals
#xprefixes for binary, octal, decimal, and hex literals
Negative numbers may be written by prefixing any of the above with a
1.3. Special values
#f are boolean literals (not case sensitive).
() is the empty list.
1.4. Conses and vectors
(a b c) parses to the structure
(cons a (cons b (cons c nil))) where a, b, and c are arbitrary
(a . b) parses to the structure
(cons a b) where a and b are arbitrary expressions.
[a b c] parses to a vector of expressions a, b, and c.
#(a b c) has the same meaning.
Text between a semicolon
; and the next end-of-line is skipped.
|# is also skipped.
#; skips the next single s-expression.
1.6. Prefix tokens
There are five special prefix tokens which parse as follows:
'a is equivalent to
a is equivalent to
,a is equivalent to
,@a is equivalent to
,.a is equivalent to
1.7. Other read macros
femtoLisp provides a few "read macros" that let you accomplish interesting tricks for textually representing data structures.
||read as the identically-same value previously labeled
||read a gensym. NNN is a decimal constant. future occurrences of the same
||reads to the result of evaluating
||triggers an error|
||ignored; provided for compatibility|
||single-line comment, for script execution support|
||UTF-8 character string; may contain newlines.
When a read macro involves persistent state (e.g. label assignments), that
state is valid only within the closest enclosing call to
Builtin functions are represented as opaque constants. Every builtin
function is the value of some constant symbol, so the builtin
for example, can be written as
#.eq? ("the value of symbol eq?").
eq? itself is still an ordinary symbol, except that its
value cannot be changed.
2. Data and execution models
All programs and data are built from a small set of basic data types: number, boolean, symbol, cons (pair), vector, and "other". The "other" category includes machine data types (such as arrays of 32-bit integers) and implementation-defined types (currently hash tables and IO streams).
femtoLisp provides essentially 3 operations:
read parses text into values (instances of the basic data types).
eval treats a value as code and executes it.
As you might guess,
eval is where all the action is.
eval looks at a value and applies several rules to interpret its meaning:
- If the value is not a symbol or a cons, it evaluates to itself.
- If the value is a symbol, return its value in the current environment if any, otherwise return its global value.
- Otherwise we have an expression such as
(A B C D)
Ais a symbol, look in the syntax environment. If it is found there, pass the expression to the associated function and
evalthe result of that. Stop.
evaleverything in the list.
Amust have evaluated to a function; otherwise stop with an error. Pass the values of
Dto that function.
A function is either a builtin (an additional basic data type I neglected
to mention earlier!) or an expression of the form
(lambda arglist body).
Symbols are given values in the syntax environment using
set is used to set the global value of a symbol.
Most of the language revolves around a fixed set of builtin functions and syntax bindings. Builtin functions are bound to symbols whose global values cannot be changed, and that cannot be used as variables. The builtin functions are:
eq?, eqv?, equal?, atom?, not, null?, boolean?, bound?, symbol?, number?, pair?, builtin?, vector?, fixnum?
cons, list, car, cdr, set-car!, set-cdr!, vector, aref, aset!, length, assq
eval, eval*, apply, prog1, raise, for
+, -, *, /, <, lognot, logand, logior, logxor, compare
Builtin syntax bindings cannot be changed and cannot be read directly.
The symbols involved are otherwise normal, i.e. you can use them as
variables (though this is strongly discouraged!). The builtin syntax
quote, cond, if, and, or, while, lambda, trycatch, set!, begin.
At startup, femtoLisp loads the file
reads and evaluates every expression in the file.
If command-line arguments are present, the first argument must be a file name. That file is loaded, then femtoLisp exits.
Otherwise, femtoLisp enters an interactive prompt. At the prompt, the value
of the last evaluated expression gets bound to
> 2 2 > (1+ that) 3
raise signals an error by throwing a value to the
closest dynamically-enclosing exception handler.
Exceptions are handled by the form
(trycatch expr function). This
form evaluates its first argument (the expression) as usual, but if an
exception is raised during evaluation, the second argument is evaluated.
The second argument must evaluate to a function, and that function is
subsequently called with the raised value as its argument.
4. Delimited continuations
It's delightful, it's delicious, it's... delimited!
Delimited continuations provide the subset of Scheme's
to implement generators (or coroutines), and a few more tricks besides.
You get real, first-class, restartable continuations (as functions), callable
from anywhere. The only catch is that their state is limited to an
explicitly delimited region of the program. This could be considered an
advantage however, since it means you need only look at the program to see
the range of places a continuation might return to.
To use the feature, surround a piece of code with
(with-delimited-continuations ...). Continuations can be realized
anywhere inside this form. Such continuations will capture the execution
state of code inside the form; stackframes higher up (e.g. the caller
of the form) are lost.
Continuations are realized by the operators
(reset ...) to mark a control flow point. Inside the reset form,
(shift var expr) will evaluate
var bound to the continuation
shift expression. Calling this continuation with a value behaves
as if the
shift expression had returned that value. Execution continues
until we hit the
reset, whose value is then returned to the caller of
the continuation. If the continuation is not called, and the
expression returns a value when it originally runs
(shift k 2)), then this value is immediately returned from
reset expression (nothing between the
shift and the
As an example, we can use this feature to implement loops by repeatedly
calling the continuation of the top of the loop. We will translate the
(let ((x 0)) (while (< x 10) (begin (print x) (set! x (+ 1 x)))))
Here is the delimited-continuation version:
(with-delimited-continuations (let ((x 0)) (reset (let ((k (shift k (k k)))) (if (< x 10) (begin (print x) (set! x (+ 1 x)) (k k)) x)))))
(k k) acts like a jump that rebinds k to itself, then performs the
(< x 10) test again.
For a much more satisfying example, we'll turn our modest while loop
into a generator. The following expression yields a no-argument function
that returns values from 0 to 9 on successive calls, and the symbol
(let ((ko #f)) (lambda () (with-delimited-continuations (if ko (ko 0) (reset (let ((yield (lambda (v) (shift yk (begin (set! ko yk) v))))) ; -- begin body (let ((x 0)) (while (< x 10) (begin (yield x) (set! x (+ 1 x)))) 'done)))))))
The interesting thing is the way the "body" code is written: there's nothing
special about it. It is the same as our basic
while loop above.
yield call can return from any point, and when the generator function
is invoked again the code will pick up right where it left off. Note that
while loop does not even need to terminate; you can use a
generator to express an infinite sequence.
This pattern is abstracted by the
(define-generator range-iterator (lo hi) (let ((x lo)) (while (<= x hi) (begin (yield x) (set! x (+ 1 x)))) 'done))
(range-iterator a b) will return a no-argument generator function
as above. It is up to you what to do when the generator runs out of values;
you might throw an exception, invoke a caller-supplied escape procedure, or
return a special value like
with-delimited-continuations supports almost the whole language, but
there are some limitations to be aware of. The feature works by converting
the enclosed code to continuation-passing style, which means
- Functions are modified to accept an extra continuation argument.
- Function calls are modified to (conditionally) pass the extra continuation argument.
So while you can call any function from inside
you will not be able to capture continuations during execution of functions
with-delimited-continuations. In particular, this means closures
passed to higher-order functions defined outside cannot use
A workaround is to define a CPS version of the needed higher-order function by
enclosing its definition in
with-delimited-continuations. A better remedy
for this problem will be supplied in the future.
If an expression in a generator function fails due to this limitation, but does
not need to capture continuations, it can be enclosed in
femtoLisp allows you to use the full range of C data types on dynamically-typed Lisp values. The motivation for this feature is that useful interpreters must provide a large library of routines in C for dealing with "real world" data like text and packed numeric arrays, and I would rather not write yet another such library. Instead, all the required data representations and primitives are provided so that such features could be implemented in, or at least described in, Lisp.
The cvalues capability makes it easier to call C from Lisp by providing ways to construct whatever arguments your C routines might require, and ways to decipher whatever values your C routines might return. Here are some things you can do with cvalues:
- Call native C functions from Lisp without wrappers
- Wrap C functions in pure Lisp, automatically inheriting some degree of type safety
- Use the Lisp garbage collector to reclaim malloc'd storage
- Annotate C pointers with size information for bounds checking or serialization
- Attach symbolic type information to a C data structure, allowing it to be printed readably
- Add new types to Lisp
- Use more efficient represenations for your Lisp programs' data
femtoLisp's "cvalues" is inspired in part by Python's "ctypes" package. Lisp doesn't really have first-class types the way Python does, but it does have values, hence my version is called "cvalues".
5.2. Type representations
The core of cvalues is a language for describing C data types as symbolic expressions:
- Primitive types are symbols
int8, uint8, int16, uint16, int32, uint32, int64, uint64, byte, wchar, long, ulong, float, double, void
(array TYPE SIZE), where TYPE is another C type and SIZE is either a Lisp number or a C ulong. SIZE can be omitted to represent incomplete C array types like
int a. As in C, the size may only be omitted for the top level of a nested array; all array element types must have explicit sizes. Examples:
(array (array (array int32 3) 2))
int awould be
(array (array int32) 4), but this is invalid.
(struct ((NAME TYPE) (NAME TYPE) ...))
(union ((NAME TYPE) (NAME TYPE) ...))
(enum (NAME NAME ...))
(c-function RET-TYPE (ARG-TYPE ARG-TYPE ...))
A cvalue can be constructed using
(c-value TYPE arg), where
arg is some Lisp value. The system will try to convert the Lisp
value to the specified type. In many cases this will work better if some
components of the provided Lisp value are themselves cvalues.
Note the function type is called
c-function to avoid confusion, since
functions are such a prevalent concept in Lisp.
sizeof returns the size (in bytes) of a cvalue or a
c type. Every cvalue has a size, but incomplete types will cause
sizeof to raise an error. The function
the type of a cvalue.
For convenience, a specialized constructor is provided for each class of C type (primitives, pointer, array, struct, union, enum, and c-function). For example:
(int32 -4) (wchar 65) (array 'int8 1 1 2 3 5 8)
Cvalue constructors are generally permissive; they do the best they can with whatever you pass in. For example:
(c-value '(array int8 1)) ; ok, full type provided (c-value '(array int8)) ; error, no size information (c-value '(array int8) [0 1]) ; ok, size implied by initializer
Notice that the initializer used with
c-value is always a single value;
to make an array we had to pass a vector. You can think of
a kind of lisp-to-C cast operator.
5.4. Pointers, arrays, and strings
Pointer types are provided for completeness and C interoperability, but they should not generally be used from Lisp. femtoLisp doesn't know anything about a pointer except the raw address and the (alleged) type of the value it points to. Arrays are much more useful. They behave like references as in C, but femtoLisp tracks their sizes and performs bounds checking.
Arrays are used to allocate strings. All strings share
the incomplete array type
> (c-value '(array byte) [#\h #\e #\l #\l #\o]) "hello" > (sizeof that) 5
sizeof reveals that the size is known even though it is not
reflected in the type (as is always the case with incomplete array types).
Since femtoLisp tracks the sizes of all values, there is no need for NUL terminators. Strings are just arrays of bytes, and may contain zero bytes throughout. However, C functions require zero-terminated strings. To solve this problem, femtoLisp allocates magic strings that actually have space for one more byte than they appear to. The hidden extra byte is always zero. This guarantees that a C function operating on the string will never overrun its allocated space.
Strings are UTF-8 encoded by default, but are not required to contain valid UTF-8 data and can be used as arbitrary byte arrays as well. As the type implies, elements of the string are accessed as bytes, not characters. The functions beginning with
string. can be used to manipulate strings at the character level. Wide character strings, which allow characters to be indexed directly, are available with the type
5.6. Memory management concerns
5.7. Guest functions
Functions written in C but designed to operate on Lisp values are
known here as "guest functions". Although they are foreign, they live in
Lisp's house and so live by its rules. Guest functions are what you
use to write interpreter extensions, for example to implement a function
assq in C for performance.
Guest functions must have a particular signature:
value_t func(value_t *args, uint32_t nargs);
Guest functions must also be aware of the femtoLisp API and garbage collector.