Book:Storage

Not logged in

Storage overview

Operating systems need persistent storage (user data that survives reboot cycles) and Genode is no exception.

On a desktop (by opposed to server) OS, that ought to include metadata/attributes (so called "extended" attributes). Without it, one has to resort to scary kludges to encode file types (3 letter extensions, sigh), thumbnails/icons (littering the FS with .info files, yuck), spatial mode window positioning of desktop folders and documents, app resources, simple file comments, email/image/exif/audio/video/PDF attributes ..etc ..etc. See the last section for hints regarding metadata storage.

As is the trend with the most recent modern OSes, Genode pushes the enveloppe with truly flexible storage concepts that go way beyond "everything is a file" claims. Read on.

File systems

The Genode framework provides file systems with so called VFS plugins. As a system integrator you have a choice a locating each plugin...

Plugins placed in your component's vfs config will be accessed directly.

Plugins placed in the vfs server will be accessed remotely: file access requests initiated from your component/app will be routed to "jump" through a local vfs plugin (namely, the Vfs::Fs_file_system plugin); that plugin will forward calls through an RPC/session, to the remote server, which will serve them with the (local to it) intended vfs plugin.

The client/server model allows several application components to access the same file system concurrently (the server "multiplexes" file access).

The app-local model allows to do away with the vfs server, but (for block-based FSes) only one component at a time can claim access to a given partition.

The client/server scenario further splits into sub cases:

The app-local model offers choices too:

Genode used to provide file system plug-ins directly in LibC, but this is deprecated -- see this message for some background and history.

In the IPC client/server scenario, the client will route file requests to the VFS server like this:

  <vfs>
    <dir> name..
      <fs>
  ....
  <route>
    <service>File_system...

In the app-local scenario, the <vfs> node would instantiate the plugin directly, instead of forwarding to a server which owns an instance of the plugin.

The VFS plugins can be either for ext2, NTFS, and other block based file systems; or for a RAM-disk and other 'virtual' file systems.

Disk blocks

More software layers are involved in the case of VFS plugins which are block based, i.e. backed by a spinning hard disk or other mass storage, or read-only medium like optical drives.

There exists several block device drivers to handle certain types of hard disks at a raw/physical level:

The hard disk will probably be partitioned into several volumes (partitions). This calls for using part_blk, which also interacts like a block device:

<start name="part_blk"..>

Putting it all together

server layers: client layers:

VFS built-in plugins

As can be seen in os/src/lib/vfs/file_system_factory.cc, some file systems are supported out of the box, ready to be instantiated.

	_add_builtin_fs<Vfs::Block_file_system>();
	_add_builtin_fs<Vfs::Fs_file_system>();  // remote access (i.e. the VFS plugin is in another component) instead of the VFS plugin being local to this component
	_add_builtin_fs<Vfs::Inline_file_system>();
	_add_builtin_fs<Vfs::Log_file_system>();  // libc: /dev/log
	_add_builtin_fs<Vfs::Null_file_system>();  // libc: /dev/null
	_add_builtin_fs<Vfs::Ram_file_system>();
	_add_builtin_fs<Vfs::Rom_file_system>();
	_add_builtin_fs<Vfs::Rtc_file_system>();
	_add_builtin_fs<Vfs::Symlink_file_system>();
	_add_builtin_fs<Vfs::Tar_file_system>();
	_add_builtin_fs<Vfs::Terminal_file_system>();
	_add_builtin_fs<Vfs::Zero_file_system>();  // libc: /dev/zero

The Block_file_system and (more typically) Fs_file_system ones come into play for accessing hard disk drives and the like.

The (implementation of the) others, so called 'virtual' FSes, show the great flexibility offered by VFS and friends, where implementing e.g. /dev/zero is just a matter of subclassing SingleFileSystem and adding a few lines of code.

More at Storage:VfsBuiltIns

VFS external plugins

In addition to built-ins, plugins can be 'external's, i.e. they live in shared objects loaded at runtime (again, the scenario can leverage a vfs server, or do everything locally in the client component, using a similar syntax in both cases). Externals are loaded in Global_file_system_factory::_library_name() following the naming convention "vfs_%s.lib.so", where %s is replaced with the file system name.

E.g. if the component specifies in its XML configuration:

  <vfs>
    <dir name="foo">
      <bfs/>
    </dir>

The VFS library will look up a built-in file system plugin with the name "bfs". If the look up fails in built-ins, it will look for a shared object named "vfs_bfs.lib.so" and attempt to load it as an add-on to the component (vfs server or client component, depending).

Also see the bottom of Genodians - Genodes VFS #1: The basics for info on the <import> feature (provided by VFS plug-in : vfs_import.lib.so) which allows to populate an arborescence -- for example a Ram FS node, which would otherwise start life blank.

Other features

Additionally, to improve R/W performance Genode provides a block cache server, located at os/src/server/blk_cache .

Finally, about syncing data to disk: the 'File_system::Session' interface has a 'sync' RPC function (to make the file system write back its internal caches) and FUSE provides a Fuse::sync_fs() function.

Advanced topic : Adding support for a new block based File System driver

Case study: making Genode support BFS

There is no support for BFS (BeOS/Haiku File System) out of the box in Genode. Let's look into porting the Haiku FUSE implementation of BFS. It has existed for years and can be compiled in Linux. How do we translate that support to Genode ?

Up until Genode 19.05, doing so involved adapting the Genode fuse_fs library (which works for ext2 and NTFS), so that it works with BFS. See ChiselApp / genode-haiku / bfs-on-genode.

In a future version of Genode, the preferred way will involve adapting a (to be done) Genode fuse VFS server plug-in, so that it works with BFS.

More at Storage:Extra:BFS


Layers

That (bloc based FS) means using a "stack" of 3 components plus the app(s):

File system and concurrency

The file system layer is where the developer is faced with several different choices. One decision to make is whether to host the file system in ...
  1. the app itself (with a <vfs><rump..> .. or <vfs><fatfs/>... config)
  2. a stand-alone, non-multiplexing component (<start name="ext2fs_fuse_fs"....)
  3. a vfs 'daemon' (<vfs><rump...>) to multiplex accesses.

Option one means the app will need exclusive access to the block device -- no other app will be able to read/write on that partition.

Option two means essentially the same thing (apps could gain access sequentially if not simultaneously on the condition of having transient "sessions", which is probably not trivially easy to implement).

Option three means several apps can concurrently access the same partition/volume.

File system types

Several popular file systems are provided like FAT32, ext2, NTFS ..etc. Each of them can be provided by several different implementations: there are several ways to provide FAT32, or ext2 ..etc. With some grepping through the doc, one who was not around in the early days of Genode can gain some understanding as to what's what:
(release notes 14.02): (fuse) write support on ext2 is declared as an experimental feature. In hindsight it is clear why: FUSE is primarily being used for accessing file systems not found in the Linux kernel. So it shines with supporting NTFS but less so with file systems that are well supported by the Linux kernel. Coincidentally, when we came to this realization, we stumbled upon the wonderful work of Antti Kantee on so-called rump kernels (..)

2. Block VFS files (*local* per app)

This describes how to setup AHCI (or USB mass storage) files, read/written directly by each app (XXXX when the writes are flushed back to disk, are they safely synchronized between app/components, if several components access the same disk? who is doing the "multiplexing" ? Presumably, there is no multiplexing as a block device only has one client, and the client works synchronously, so there)

Seems there are several mechanisms for local access to file systems:

FUSE-based libc plugins:

From https://github.com/genodelabs/genode-world/tree/master/run :

Mounting:

(from release notes 13.11): As Genode does not use the normal mount mechanism employed by the original FUSE implementation, the start-up code, normally implemented in the main routine of a FUSE file-system server, needs to be provided by the FUSE file system port itself. The FUSE file system uses the new _libc_block_ plugin described in Section 'New C-runtime plugin for accessing block devices' to direct requests

Example run file:

(r.notes): For quickly trying out the new FUSE-based file systems, there are ready-to-use run scripts located at _libports/run/libc_fuse_ext2.run_ and _libc_fuse_exfat.run_ respectively. Before using those run scripts, make sure to prepare the packages "exfat" and "fuse-ext2" as found in the _libports_ repository.

3. Block VFS files (shared server)

This describes how to setup AHCI (or USB mass storage) files, served by a "vfs daemon", so that file changes made by one app will be "seen" by other apps.

Uses:

Also see:

FUSE-based file-system servers:

The partition probing currently hardcodes those partition types (as per fsprobe.h):

Run file:

		import_from_depot genodelabs/src/vfs
	<start name="my_application">
		<vfs>
			<fs/>
		</vfs>
			...

<start name="ahci_drv"> <config> <policy label_prefix="vfs" device="0" writeable="yes"/> ...
<start name="vfs" caps="200"> <resource name="RAM" quantum="10M" /> <provides><service name="File_system"/></provides> <config> <vfs> <fatfs/> <dir name="dev"> <log/> </dir> </vfs> <policy label_prefix="test-libc_vfs" writeable="yes"/> </config> </start>"

====> looks like this ONLY supports FAT, no other file systems unfortunately.. And vfs accesses ahci directly, instead of going through a partition parser (part_blk).. EDIT: one may also use rump_fs within vfs, adding this "<rump fs="ext2fs" ram="64M" writeable="yes"/>" instead of <fatfs/>...

4. Rump FS ?

A file server that's a 'protocol stack' rather than a 'resource multiplexer' ?

See dde_rump/src/server/rump_fs

Build components:

Supported file systems:

More could be supported? e.g. https://github.com/rumpkernel/wiki/wiki/Info%3A-Available-rump-kernel-components lists rumpfs_ntfs

Example run file:

===> Fails to build, with a couple hundred errors/warnings (conflicting definition of typedef id_t ..etc)

5. FUSE ?

?

https://github.com/genodelabs/genode-world/blob/master/src/lib/fuse/fuse.cc

6. FAT-fs server ?

See libports/src/server/fatfs_fs

Port of some third party monolithic code for reading FAT32 partitions. Got Genode going initially, until FUSE and Rump FS were implemented (replacing it?).

7. Pass-through to underlying linux

"lx_fs" : for Genode apps compiled as linux binaries, rather than against the Genode ABI.

Uses "server/lx_block", "server/lwext4_fs" ..etc


Storage : "extended" attributes

Some stream-of-thoughts paragraphs about replacing BFS attributes...

FUSE hooks include not only open/close/read/write but also setxattr, readxattr ..etc. That's of interest since FUSE supports NTFS, ext2, BFS, which all have some form of extended attributes:

NTFS

NTFS has "ADS" attributes. From https://www.tuxera.com/community/ntfs-3g-manual/ :
Alternate Data Streams (ADS)
NTFS stores all data in streams. Every file has exactly one unnamed
data stream and can have many named data streams. The size of a
file is the size of its unnamed data stream. By default, ntfs-3g will only
read the unnamed data stream. By using the options “streams_interface=windows”
(not possible with lowntfs-3g), you will be able to read any named data streams,
simply by specifying the stream’s name after a colon. For example:
cat some.mp3:artist

The Genode port might need to /enable/ the ADS feature(?) though. See libports/src/lib/ntfs-3g/init.cc:

(*ctx)->streams = NF_STREAMS_INTERFACE_NONE;

That variable appears to accept one of three values, affecting the behavior of ntfs_fuse_parse_path():

NF_STREAMS_INTERFACE_NONE,	/* No access to named data streams. */
NF_STREAMS_INTERFACE_XATTR,	/* Map named data streams to xattrs. */
NF_STREAMS_INTERFACE_WINDOWS,	/* "file:stream" interface. */

Also see various lines like this one:

#ifdef HAVE_SETXATTR

Genode Makefile(s):

  1. libports/lib/mk/libntfs-3g.mk
  2. libports/src/server/fuse_fs/ntfs-3g/target.mk

FUSE build: make server/fuse_fs/ntfs-3g

==> ntfs-3g_fuse_fs won't run as-is on Genode due to missing implementation of fcntl(F_SET_LK..):

(init -> fs) Error: fcntl(): command 12 not supported - vfs

==> so rem that call out in unix_io.c:

/*
	if (fcntl(DEV_FD(dev), F_SETLK, &flk)) {
		err = errno;
		ntfs_log_perror("Failed to %s lock '%s'", NDevReadOnly(dev) ? 
				"read" : "write", dev->d_name);
		if (close(DEV_FD(dev)))
			ntfs_log_perror("Failed to close '%s'", dev->d_name);
		goto err_out;
	}
*/

==> ntfs-3g_fuse_fs STILL does not run though:

(init -> fs) Error: libc suspend() called from non-user context (0x118edfa) - aborting

==> turns out one has to rem out logging.c's ntfs_log.handler() call for some reason. And then ntfs crashes, the same way as ext2:

(init -> fs) libc_fuse_ntfs-3g: try to mount /dev/blkdev...
no RM attachment (READ pf_addr=0x100 pf_ip=0x118cd07 from pager_object: pd='init -> fs' thread='ep') 
page fault, pd='init -> fs' thread='ep' cpu=0 ip=0x118cd07 address=0x100 stack pointer=0xa04fef18 qualifiers=0x4 irUwp reason=1

==> my investigation so far leads me to ::read()... Yet part_blk is (presumably) able to read() from the ahci block device, since it reports partitions correctly. So the FUBAR here must be small rather than big, hopefully...

==> turns out that component::construct() was calling the obligatory "with_libc()" on static ctor initialization, but NOT on the component construction itself ?! Added the missing big and now NTFS-fuse-srv runs

NTFS finally works today, reading directory listing and files! The secret sauce was:

That was a /lot/ of bitrot...

ext2

Ext2 has "xattr" attributes.

Haiku's "bootstrap" build process linux has to interface with ext2 xattrs; e.g. fs_attr_xattr.h says:

// the namespace all attributes live in
static const char* kAttributeNamespace = "user.haiku.";

Haiku's bootstrap does the same on Darwin (is that MacOS?), see fs_attr_bsdxattr.h

There is a "triaging" cpp which does this:

#	if defined(HAIKU_HOST_PLATFORM_LINUX)
#		include "fs_attr_xattr.h"
#	elif defined(HAIKU_HOST_PLATFORM_FREEBSD)
#		include "fs_attr_extattr.h"
#	elif defined(HAIKU_HOST_PLATFORM_DARWIN)
#		include "fs_attr_bsdxattr.h"
#	else

talking of which -- FreeBSD, not ext2:

Back to ext2:

On Genode/FUSE however, fuse-ext2.c goes...

	.setxattr       = NULL,
	.getxattr       = NULL,
	.listxattr      = NULL,
	.removexattr    = NULL,

Maybe RumpFS/ext2 is not crippled though ? Maybe RumpFS/UFS is also worth a look, for that matter? (see contrib/dde_rump..../...extattr.h and vfs_xattr.c and xattr.h)

==> ext2_fuse_fs repeatedly crashes (in ext2fs_open()) after the message "try to mount /dev/blkdev..." :-(

==> see above (NTFS) for a fix!

BFS

It looks like Haiku's FUSE implementation is not direct fuse-to-bfs, but indirects through a (stubbed) VFS layer: fuse-to-haiku-vfs-to-bfs

Steps:

As to fuse.cpp:

Fails to init though:

init -> fs_bfs fuse_bfs: init_kernel()
init -> fs_bfs Warning: rtc not configured, returning 0
init -> fs_bfs fuse_bfs: try mounting /dev/blkdev...
init -> fs_bfs  Block.stat: /blkdev
init -> fs_bfs  Block.stat: /blkdev
init -> fs_bfs libc Vfs_plugin::open </dev/blkdev> fd -1 flags 2
init -> fs_bfs  Block : open: /blkdev
init -> fs_bfs  Block.stat: /blkdev
init -> fs_bfs  Block.stat: /blkdev
init -> fs_bfs Warning: unsupported ioctl (request=0x40046480)
init -> fs_bfs Warning: unsupported ioctl (request=0x40046482)
init -> fs_bfs Warning: unsupported ioctl (request=0x40046483)
Warning: unresolvable exception 0, pd 'init -> fs_bfs', thread 'ep', cpu 0, ip=0x105bcb6 no signal handler
init -> fs_bfs Error: Uncaught exception of type 'Genode::Ipc_error'
init -> fs_bfs Warning: abort called - thread: main

Tracing:

Resulting patches in Haiku's BFS:

Resulting patches in Genode and Genode-world(?) (2x):

misc

https://www.haiku-os.org/guides/building/configure/use-xattr/ recommends NTFS-3g

Linux has FUSE-BFS support ever since https://git.haiku-os.org/haiku/commit/?id=18128d58dca3e03fb850fc52d1b5f7992d6dd02d hrev31409

direct-to-HoG NTFS ?

Crazy idea: use this customization of ntfs-3G https://git.haiku-os.org/haiku/tree/src/add-ons/kernel/file_systems/ntfs/attributes.h directly in HoG? HoG has no vfs/kernel-interface though, so there'd be need for "glue" code..


Storage : FS APIs

Files and directories can be accessed via well known functions/methods provided by the LibC, or flavors of Genode native classes:

Trouble-shooting:

The former can be fixed by setting the inode to 1:

+			e->inode = 1; /* inode 0 is a pending unlink */
			if ((fatfs_file_info.fattrib & AM_DIR) == AM_DIR)
				e->type = Directory_entry::TYPE_DIRECTORY;
			...

Under the hood : File_system

Here's a look at what happens when using the File_system class (a client component talking to an FS server). The following is based on LOG tracing obtained with:

The client executes a recursive listing (opendir, readdir..) of the "/" hierarchy with the FAT32 f.s. being mounted at /fs :

init -> fatfs_fs --- Starting Fatfs_fs ---
init -> CommandCenter_HoG libc Vfs_plugin::open </dev/null> fd 0 flags 0
init -> CommandCenter_HoG libc Vfs_plugin::open </dev/log> fd 1 flags 1
init -> CommandCenter_HoG libc Vfs_plugin::open </dev/log> fd 2 flags 1
init -> CommandCenter_HoG Error: Open dir: /
init -> CommandCenter_HoG libc Vfs_plugin::open </> fd -1 flags 4
init -> CommandCenter_HoG     vfs.getdirentries into buf 0x2046a0 bytesize 4096
init -> CommandCenter_HoG     vfs.getdirentries: returns 264 including inode: 1 and name = fs
init -> CommandCenter_HoG     vfs.getdirentries into buf 0x2047a8 bytesize 3832
init -> CommandCenter_HoG     vfs.getdirentries: returns 264 including inode: 1 and name = dev
init -> CommandCenter_HoG     vfs.getdirentries into buf 0x2048b0 bytesize 3568
init -> CommandCenter_HoG     vfs.getdirentries: returns 264 including inode: 191920 and name = background.jpeg
init -> CommandCenter_HoG     vfs.getdirentries into buf 0x2049b8 bytesize 3304
init -> CommandCenter_HoG Type:4 entry: fs/
init -> CommandCenter_HoG Error:   Open dir: //fs
init -> CommandCenter_HoG libc Vfs_plugin::open </fs> fd -1 flags 4
init -> CommandCenter_HoG   --- ClientFS opendir </>
init -> fatfs_fs Warning: fat +++ dir() RPC request received for: /
init -> fatfs_fs Warning: fat - Directory ctor
  (at this point fatfs_fs needs to access the HDD, and thus needs to wait for ahci to be started):
init -> ahci_drv --- Starting AHCI driver ---
init -> ahci_drv              #0: ATA
init -> ahci_drv              #1: off (unknown device signature)
init -> ahci_drv              #2: off (unknown device signature)
init -> ahci_drv              #3: off (unknown device signature)
init -> ahci_drv              #4: off (unknown device signature)
init -> ahci_drv              #5: off (unknown device signature)
init -> ahci_drv read-only session opened at device 0 for 'part_blk -> '
init -> part_blk Partition 1: LBA 40 (84 blocks) type: '21686148-6449-6e6f-744e-656564454649' name: 'BIOSBOOT'
init -> part_blk Partition 2: LBA 124 (3284 blocks) type: 'c12a7328-f81f-11d2-ba4b-00a0c93ec93b' name: 'GRUB2'
init -> part_blk Partition 3: LBA 3536 (44032 blocks) type: '0fc63daf-8483-4772-8e79-3d69d8477de4' name: 'GENODE'
init -> part_blk session opened at partition 2 for 'fatfs_fs -> 0'
init -> fatfs_fs Warning: fat +++ f_opendir_res: 0
init -> fatfs_fs Warning: fat  +++ returning a Dir Handle with id: 0
init -> CommandCenter_HoG   --- ClientFS opendir : OPENDIR_OK
init -> CommandCenter_HoG   --- ClientFS queue sync..
init -> fatfs_fs Warning: fatFS _process_packet_operation 4 len 0 offset 18446744073709551615
init -> fatfs_fs Warning: fatFS _process_packet_op: packet acknowledged!--------
init -> CommandCenter_HoG   --- ClientFS handle_ack()
init -> CommandCenter_HoG     vfs.getdirentries into buf 0x2056c0 bytesize 4096
init -> CommandCenter_HoG   --- ClientFS num_dirent() </>
init -> fatfs_fs Warning:   fat-main.cc -  stat(us):
init -> fatfs_fs Warning:   fat -  STATUS: all done for / numents: 2 entries found, status.size: 544 st.mode 16384 (MODE_DIRECTORY=16384
..
init -> fatfs_fs Warning:   fat-main.cc -  stat(us):
init -> fatfs_fs Warning:   fat -  STATUS: all done for / numents: 2 entries found, status.size: 544 st.mode 16384 (MODE_DIRECTORY=16384
init -> CommandCenter_HoG   --- ClientFS num_dirent: returning 2
init -> CommandCenter_HoG   Type:4 entry: boot/
..
init -> CommandCenter_HoG libc Vfs_plugin::open </fs/boot> fd -1 flags 4
init -> CommandCenter_HoG   --- ClientFS directory()
init -> fatfs_fs Warning:   fat-main.cc -  stat(us):
init -> fatfs_fs Warning:   fat -  STATUS: all done for /boot numents: 1 entries found, status.size: 272 st.mode 16384 (MODE_DIRECTORY=16384
init -> CommandCenter_HoG   --- ClientFS opendir </boot>
init -> fatfs_fs Warning: fat +++ dir() RPC request received for: /boot
init -> fatfs_fs Warning: fat - Directory ctor
init -> fatfs_fs Warning: fat +++ f_opendir_res: 0
init -> fatfs_fs Warning: fat  +++ returning a Dir Handle with id: 1

Under the hood: layers and their source files:

One outlier is libports/src../fuse.cc, which contains the oft-used fill_dir() function (called by xxx ?)

Under the hood: client calling a FUSE server via opendir():

Under the hood: client calling a FUSE server via fopen():