Structure Layout 5 (SL5; STS/K2 V1, STS/K2 V2, FSE/L, FSE/DX)

SL5 once was a read-only filing system intended to help bootstrap STS into a usable, self-hosting operating system. Four previous iterations of the SL-family of filing systems once existed; their details, however, are lost to history.

Today, SL5 is a read-write capable filing system, supported by a new operating system in development: FSE/L and FSE/DX.

SL5 has the following properties:

Single, flat namespace for files.
Files may have labels up to 47 bytes.
Files may have between 1 and 5 extents allocated to them.
The primary (1st) extent can be any size, independent of any secondary extents.
Secondary extents are contributed as the file grows beyond its primary allocation.
Each file has its own preferred secondary extent size.
It properly tracks free space using a linked list.
The Volume Label VTOC entry now maintains the head of the free-space list, and includes the volume's total size.
The VTOC itself is described as a file named $DIR. The format is unique to SL5, however.

Mission

SL5's primary purpose is to get the Kestrel-2 platform to be minimally self-hosting with as few resources required as possible.

It's expected to be used mainly on removable SD/MMC media.
It's expected that reads from storage will far out-number writes once a filesystem has been created.
Its on-device structure is simple enough for a single person to fully comprehend, and easy enough to document in a single chapter of a programmer's reference guide.
It's expected to be used with operating systems and/or language runtime environments designed to execute applications in 32KB of memory or less.

It is explicitly a not a goal that SL5 be fast, efficient, or robust against failures. SL5 is furthermore not required to support a networked or multi-user environment.

It must only be functionally sufficient, even if not exactly correct, to let the user perform the following operations:

SL5 must allow the user to boot into the host operating system of his/her choice.
SL5 must support an operating system's ability to let the user (directly or indirectly) call a program by name.
SL5 must allow a user to create his/her own source listings directly on a Kestrel-2 or compatible device.
SL5 must allow a user to create his/her own programs, given a source listing of potentially their own origin and a set of tools residing on the same SL5 volume.

Given these minimal expectations, SL5 should be sufficiently powerful and easy enough to use to enable the user to build higher-level, more functionally complete filesystems at a later time.

Influences and Inspirations

Commodore DOS

Originally, I had wanted to model the Kestrel file system after the Commodore DOS filesystem. This filesystem is very simple, but not the simplest possible. It relies on a bitmap to track free space. File sectors are chained together in a singly linked list, which means a single sector can store 254 bytes of data, not 256. This complicates file seek operations in at least two ways:

Because the data file sectors are intrusively linked, it's impossible to access random offsets of files without first "rewinding" a file then sequentially reading through and ignoring all the uninteresting sectors first, and,
Because a sector stores 2^n-2 bytes of data, you need a full-blown division routine to calculate how many sectors to skip in the file.

Now, Commodore DOS works around the first problem by using what's called "REL" (relative) files, named after the concept of "relative record" databases in COBOL. These allow random access to any arbitrary record (assuming you already know the record number), but they work by augmenting sequential file linkage with a pre-computed, and itself sequentially linked, set of "side sectors" which point to individual records. The benefits of REL files, as you can see, comes at a fairly significant cost.

IBM System/360

If we take a trip back to 1964, we come across the IBM System/360 mainframe family. This family of computers supported a filesystem which was at once quite bold for its time, and quite conservative.

Unlike its contemporaries, it allowed for very long filenames. A "data set" (their name for a file) had a 44-character label at a time when most other computers had between 5 and 10. Why 44? It allowed for a hierarchical naming conventions that supported up to four categorizations before you identified a particular user's file.

 (HLQ)                               (LLQ)

AAAAAAAA.BBBBBBBB.CCCCCCCC.DDDDDDDD.EEEEEEEE
\______/ \______/ \______/ \______/ \______/
    |        |        |        |        |
    |        |        |        |        User's data set name
    |        |        |        |
    |        |        |        User's category
    |        |        |
    |        |        Organization-specific category
    |        |
    |        Organization-specific category
    |
    User ID

So, a developer working on a bunch of assembly language programs might store assembly language source files in a (partitioned) data set named A131072.SALES.ENG.CMS.ASMLIB.

ASIDE. This is where file extensions come from. If we continued to use filenames today in the same way as on IBM mainframes, then MS-DOS's or CP/M's 8.3 filenames are technically misnomers: you'd have an 8-character "qualifier" and a 3-character name!

OS/360, in its myriad of different flavors, plus all the anciliary operating systems, agreed with each other on the disk layout. Each storage device consisted of some space for bootstrapping, followed by a sequential number of blocks holding an array of directory entries, and then a ton of free space. The directory entries, collectively, were known as a "volume table of contents," of VTOC.

As you can imagine, with a plurality of different, yet cooperating, operating systems, it follows that nobody wants to write, rewrite, and rewrite again a set of filesystem drivers for accessing data on volumes. That means that the on-disk layout for OS/360-compatible volumes had to be very simple.

Data sets were identified by a flat, 44-byte label that conformed to a set of conventions that were enforced by the host OS:

Each qualifier must start with an alphabetic character, but subsequent characters can be alphanumeric, or $, @, or -.
No qualifier can exceed 8 characters.
No qualifier can be zero in length (so, no adjacent dots in a data set name).

Data sets were further identified by type, logical record length, record organization, and a slew of other parameters.

I'm sure there was a rationale for these design decisions; however, for my needs, this is a little bit too much complexity. So, I decided to take the S/360 filesystem, strip it to its bare minimum concepts to make a working system, and use only that for the Kestrel-2. The first four iterations of the filesystem are lost to the mists of time; however, the fifth iteration is relatively stable. This became SL5.

SL5 Layout

As a high-level overview, an SL5 volume looks like this:

Sectors	Purpose
0-1	Boot sectors
2-31	15KB of uFSD and/or 2nd-stage bootstrap
32-52	VTOC (this example allows up to 160 VTOC entries, but can be any size needed)
53...	Free space and file extents

Sectors 0 and 1 hold a 1024-byte bootstrap program. Together, these two sectors serve the same purpose as the MBR in a disk formatted for use with IBM PC-compatible computers. The primary exception being, of course, there is no partition table. It's assumed that SL5 will reside inside a GUID Partition Table (GPT) if desired; otherwise, it assumes flat access to the storage medium.

Sectors 2 through 31 hold a 15KiB program which serves either as a 2nd-stage bootstrap routine, or as a kernel image if small enough. STS/K2 fits entirely in 12KiB, and so doesn't require a 2nd-stage bootstrap. It remains to be seen how big FSE/DX will be; however, I strongly suspect it will also fit within the 15KB allotment.

Starting at sector 32, we find the Volume Table of Contents (VTOC). This structure serves the role of a root directory, mapping filenames to regions of storage. It's composed of a vector of sectors, each in turn, holding a vector of eight mappings. The example shown allows for 160 VTOC entries; however, the VTOC may be any size needed, as long as its file allocation is correctly configured.

The first sector of the VTOC holds some significance. In it, we find four special entries:

A volume label, naming this volume explicitly, and providing information on volume size and free space.
$IPL provides a mapping for sectors 0 and 1, thus reserving those sectors against accidental overwrite.
$SYS provides a mapping for sectors 2 through at most 31, thus reserving the 2nd-stage bootstrap and/or kernel space against accidental overwrite.
$DIR provides a mapping for sectors 32 through however many is reserved for the VTOC. This prevents the VTOC itself from being accidentally overwritten.

These four files need not appear in any special order; SL5 merely requires that the volume label and $DIR appears in the first VTOC sector somewhere. This allows a boot loader a cheap way to both discover how big the volume is, and within that, how big the VTOC is. $IPL and $SYS can appear anywhere in the VTOC.

Note. This differs from the STS/K2 implementations of SL5, which erroneously required $IPL and $SYS to reside in the first sector as well. The original intent was to support OS/2-style micro-filesystem drivers (or uFSDs), and loading a single sector would have made it easy for a uFSD to locate and bootstrap the OS kernel. As it turns out, this assumption proved false (just read the first 16KB of memory into RAM and run it), and so this revision of the documentation relaxes the rules somewhat. However, the volume label and $DIR are required to be in the first sector, so that a uFSD has enough information to scan the rest of the VTOC for other files it might need while booting, without needing the full weight of a complete SL5 filesystem implementation.

Note. The STS/K2 implementation of SL5 did not require the volume label to be present at all. STS/K2 always referred to the SD/MMC card in the only supported slot simply as volume "SYS:", since it was definitely the boot volume. FSE/DX may support multiple devices in the future, as well as non-SD/MMC mass storage. Additionally, STS/K2 kind of assumed a volume was always at least 32MB in size. Again, this may not always be the case when you have multiple kinds of supported devices. For this reason, all new SL5 volumes must include a volume label entry in the first VTOC sector.

After the VTOC, space may be used as required to hold files. Free space is maintained in linked lists, with the volume label containing the head of the list.

VTOC Directory Entries

A single directory entry consumes 64 bytes of space. Since 512 bytes comprises a sector, it follows that a single VTOC sector holds 8 entries.

SL5 supports filenames up to 47 characters long. The first byte of the directory entry holds the filename length. Unused filename bytes remain undefined; for future compatibility, ignore them when reading, but write NULs ($00) when creating new entries.

The final 16 bytes of a directory entry holds administrative information for the entry.

Empty Slots

Offset	Size	Subfields	Description
0	1		Unused.
1	47		Unused.
48	2	.... .... .... 0000	VTOC entry type.
		0000 0000 0000 ....	Zero.
50	2		Zero.
52	2		Zero.
54	2		Zero.
56	2		Zero.
58	2		Zero.
60	2		Zero.
62	2		Zero.

The size of a VTOC is typically fixed. However, it's not likely that all available VTOC entries will be used up. Empty slots are used to indicate when a slot may be taken by a new entry.

Volume Labels

Offset	Size	Subfields	Description
0	1		Name length. Must fall between 0 and 47 inclusive.
1	47		Name, stored in UTF-8.
48	2	.... .... .... 0010	VTOC entry type.
		0000 0000 0000 ....	Zero.
50	2		Sector of first node in free-list.
52	2		Last sector of the volume as a whole.
54	2		Zero.
56	2		Zero.
58	2		Zero.
60	2		Zero.
62	2		Zero.

STS/K2 currently doesn't do anything with volume labels; I originally wanted to use them for volume names on the command-line interface (e.g., WorkDisk:), a la Tripos. Volume labels in FSE/DX now take on additional importance. They not only name the volume for the human user; but, they also help the filing system implementation keep track of free space.

Files

Offset	Size	Subfields	Description
0	1		Name length. Must fall between 0 and 47, inclusive.
1	47		Name, in UTF-8.
48	2	.... .... .... 0001	VTOC entry type.
		0000 0000 0000 ....	Zero.
50	2		Starting sector of the primary extent.
52	2		Last sector of the primary extent. The length (in sectors) of the extent is start-last+1.
54	2		Secondary extent length (in sectors). 0 means no secondary extents are permitted.
56	2		If non-zero, the starting sector of a secondary extent.
58	2		If non-zero, the starting sector of a secondary extent.
60	2		If non-zero, the starting sector of a secondary extent.
62	2		If non-zero, the starting sector of a secondary extent.

Files are allocated with a primary extent (it's not possible to allocate a file with a zero-length allocation). This allocation is consists of a contiguous number of sectors. From SL5's point of view, a file always consists of an integral number of sectors.

New to FSE, the notion of a secondary extent length. When a file is written to and it first exceeds its primary extent, a new extent (of the length specified by this field) is allocated from the free list, and is referred to in the first secondary extent field. If, upon further writes, even this secondary extent is exceeded, another secondary extent is allocated and accounted for.

A file may have any sized primary extent, and up to four secondary extents. Although all secondary extents share the same size, they may be up to 65535 sectors in length. Thus, were it not for its use of 16-bit sector addresses, SL5 would allow a single file to grow up to 159MB.

STS/K2 V1 and V2 would set this field to 0, indicating no secondary extents are used or supported.

SL5 Filesystem