Arcan

a12.md at tip
Login

a12.md at tip

File doc/a12.md from the latest check-in


A12

A12 is a remote network protocol for interactive, realtime multimedia applications. It is has been designed as the network equivalent of the local display server API and IPC system, [SHMIF], used by ARCAN. To achieve this it adds extensions for supporting confidentiality, integrity, discovery and adaptive compression.

This document provides an informal introduction for implementing and using the protocol and document its security considerations, along with an overview of the existing tools and support libraries that exist to leverage it today.

Table Of Contents

  1. Introduction
  2. Dependencies
  3. Authentication and Cryptography
  4. Commands
  5. Streaming Transfers
    1. Video
    2. Audio
    3. Binary
    4. Text
  6. Event Model
    1. Input
    2. Target Commands
    3. External Hints
  7. Example Flow and Lifecycle
  8. Directory Extension
    1. File Transfer
    2. FAP Format
  9. Discovery Extension
  10. Security
  11. Tools and Reference Implementations
  12. Future Changes
  13. Acknowledgements
  14. References

Introduction

There are few protocols around for 'remote desktop' like applications, and none that covers the needs of modern desktop cooperating with mobile devices while taking the long legacy of misaligned features into account.

Instead of new protocols surfacing, there is a long list of proprietary extensions to existing protocols like [RFC6143] and RFC4254. Other options like SPICE see little development and has drifted behind into specialised niches such as Virtual Machine monitors, or are complex to implement [MSRDP] correctly.

The A12 protocol described in this document, seeks to remedy that situation.

Dependencies

Many of the primitives used are from other established algorithms and protocols. The dependencies that MUST be present are:

[x25519] for public key cryptography. [CHACHA20] as stream cipher. [BLAKE3] used for MAC construction, key derivation, hashing for cache management and integrity. [ZSTD] for generic compression.

It is also RECOMMENDED that [H264] is present for video compression, but an implementation MUST provide work-arounds for their absence.

All integral types are in network byte order.

Authentication and Cryptography

Every packet except for the first has the same outer frame format:

|--------------------------|
| 16 octet MAC             |
|--------------------------| ---
| u64 sequence number      |  |
| 1 byte type              |  |   encrypted
|--------------------------|  |
| [ type dependent block ] |  |
|--------------------------| ---

The first packet has the MAC truncated to 8 octets, and an 8 octet cryptographically secure pseudorandom number used as a nonce for key derivation.

There are 5 possible packet types, covered in their respective sections:

Control Command (1) in Section 4, Commands, Event (2) in Section 6, Event Model, Video Data (3), Audio Data (4), Binary Data (5) in Section 5.

Key Derivation

The key derivation used for the authentication packet is as follows:

kA   = H(message = 'arcan-a12 init-packet', passphrase, nonce)
kMac = H(kA)
kCl  = H(kMac)
kSrv = H(kCl)

It is a [HKDF] style scheme, but using [BLAKE3] in KDF mode.

Unless another passphrase has been agreed upon, it MUST be set to the default 'SETECASTRONOMY'. In controlled environments where there is a pre-existing secure communication channel, the passphrase can be swapped to a limited use one as needed.

When using a 3rd party rendezvous to establish a connection between a source and a sink, the 3rd party will generate one in order for the two to authenticate the public keys used. See the 'Directory extension' section for this.

kMac is used to calculate the MAC for each packet according to:

MAC = H(previous_MAC | packet_octets)

with packet_octets starting after the MAC field of the packet and continuing through the length of the packet.

The reduced 8-round variant of [CHACHA20] is used as per the recommendations in [TOOMUCHCRYPTO]. This is done in order to allow a lower tier of hardware without acceleration to still get reasonable throughput.

Each side initiates the ChaCha8 state machine using the tuple {kCl, nonce} for the client end, and the server end for {kSrv, nonce}.

The reason this setup is used before initiating actual key exchange and derivation according to x25519 is to ensure that there is no reliable fingerprint in the initial packet exchange, as well as for enabling passphrase preauthentication of unknown public keys.

Before a connection is completely authenticated, the only packet type MUST be 'Control' (=1). It is the only packet type accepted in a preauthenticated state and only the HELLO command is permitted. See the control section for details on its general structure.

The first command sent, first from client to server with a matching reply from server to client is 'HELLO' (=0). The fields used are as follows:

version-major: u8
version-minor: u8
mode: u8
kpub: u8[32]
role: u8
petname: u8[16]

The version fields SHOULD be pegged to the corresponding version of the arcan-shmif build, if present, to assist with debugging the other end. In the final release of this protocol the version will be set to 1 major 0 minor and incremented according to [SEMVER], should any critical change be necessary after the fact.

The'mode' field specifies the authentication mode desired, and MUST be one of the following:

  1. no-exchange. Keep using the current derived keys for all communication. This is NOT RECOMMENDED unless mandated by the legal environment.

  2. x25519-direct. Start x25519 exchange using the provided kpub. The other end will respond in kind.

  3. x25519-ephemeral. The public key provided is a temporary one.

The purpose of the 25519-ephemeral mode is to establish a more secure channel before transmitting the actual public keys in order to force an aggressor to actively perform a man in the middle attack to harvest the actual public keys for tracking and correlation across sessions.

In that mode, both sides will treat the other ephemeral key as known, then transition the mode to x25519 (=1) and repeat the HELLO command, this time with the real public keys.

The 'role' can be set to either:

1 = Source
2 = Sink
3 = Probe
4 = Directory

Directory is specifically used for "Directory Extension" mode covered in Section 8. Connecting a Source to a Source or a Sink to a Sink MUST be prohibited and result in connection shutdown.

Probe is used to indicate that there is no intention in performing any data exchange after the authentication handshake. Its purpose is to be used to determine the role and availability of the node at the other end as part of checking the state of the mapping between a local keystore to known addresses.

When the public keys have been authenticated on each side, the key derivation process is repated again using the established X25519 shared secret:

kA = H(message = 'arcan-a12 init-packet', shared_secret, nonce)

With encode, decode and MAC keys derived as covered previously.

Rekeying

After a successfully authenticated connection, the server end holds 'rekeying' ownership. The current owner may, at any time, issue a REKEY command. This transfers ownership of the REKEY commmand over to the other end.

To do this, the owner first generates a new ephemeral X25519 keypair and passes the new public key as payload to the REKEY command together with a nonce in the entropy part of the command header. It sends this packets then rotate keys for outbound use.

The new shared secret is calculated using the new private key together with the last known public key of the other endpoint. The outbound cipher and HMAC state is reset to this together with the nonce attached to the command packet.

The new MAC key is taken from H(message = 'arcan-a12 rekey', shared secret).

It is RECOMMENDED that the server performs the initial REKEY early, and that further passing of the REKEY back and forth is latched to some trigger, e.g. after a certain number of bytes of cipherstream has been consumed.

After a REKEY, old keymaterial MUST be discarded safely. The RECOMMENDED way to do this is to generate the new keymaterial and hasher/cipher state into the same memory that the old material consumed.

Commands

Every control packet type has a fixed size of 128 octets, with any extra octets padded with noise or zero.

The fields of a control packet are as follows:

  -----------------------
   last-seen : u64
   entropy   : u8[8]
   ch-id     : u8
   command   : u8
  -----------------------
   command-specific data
  -----------------------

Last-seen provides the sequence number of the latest seen packet from the other end, or zero if no packet has yet to be received. The drift window (last-sent - last-seen) SHOULD inform encoding heuristics and latency compensation.

Entropy is 8 octets of cryptographically secure pseudorandom numbers.

Channel-id is set to the active channel that the command applies to, which will be zero unless aditional channels have been negotiated. The command value and command specific fields whill be covered in the remainder of this section.

If the channel referenced by ID is invalid and refer to a previously closed channel, the command should be discarded and processing continue as normal.

Command 0: HELLO

This command was covered in Section 3.

Command 1: SHUTDOWN

last-words : u8[32]

This terminates a channel with an optional short message describing the reason for termination, if any.

Command 2 : DEFINE-CHANNEL

id        : u8
type      : u8
direction : u8

This creates another communication channel. Every channel can be a recipient of commands, and may contain between zero and three ongoing data streams: one for video, one for audio and one for binary. Audio and Video are unidirectional and direction established on channel allocation.

Channel allocation SHOULD be paired to- and triggered- by secondary events from user interaction and, while possible, not expected to be called arbitrarily.

Because of these secondary events (see Section 6 there is no provision for collision avoidance in channel allocation, should both sides decide to define the same channel identifier within a collision window.

It is RECOMMENDED to split the ID namespace such that source uses odd number identifiers and sink uses even number ones but it is merely as a precaution.

The type value serves as a hint about the intended use in the local windowing system. It is covered in the 'REGISTER' event part of Target Commands the event model.

Command 3 : STREAM-CANCEL

id     : u32
reason : u8
type   : u8

This cancels an ongoing stream on the channel. Id carries the identifier provided in the corresponding DEFINE-A,-B-,V-STREAM command. Reason can be:

0 - Undesired

The sink is no-longer interested in the contents of the stream and the source MUST stop sending over the channel as soon as this command is received.

1 - Unhandled Format

The sink is not capable to decoding stream contents due to an incompatibility with the encoding scheme present. This can happen at any point during decoding. The source SHOULD attempt to re-open the stream with a more compatible codec, even if this might be raw pixel streams deltas compressed with the REQUIRED Zstd compression option.

2 - Already Known

The source already has the contents of the stream available locally. This is a possible outcome for certain binary transfers of assets that can persist across connections, such as files used for text typeface.

Command 4 : DEFINE-VIDEO-STREAM

This command is described in Section 5.2, Video.

Command 5 : DEFINE-AUDIO-STREAM

This command is described in Section 5.3, Audio.

Command 6 : DEFINE-BINARY-STREAM

This command is described in Section 5.4, Binary.

Command 7 : PING

stream-id : u32

This command can be used by either source sink for a channel and it is RECOMMENDED that it is sent periodically both as connection keep-alive and to assist each side with congestion window size tracking. The stream ID field reference the last known completed stream, if any.

Command 8 : REKEY

This command is described in Section 3, Authentication.

Command 9 .. 14 : DIRECTORY EXTENSION

These command numbers are reserved for the directory extension. Their values and use are described in Section 8.

Streaming Transfers

As covered in COMMAND 2, DEFINE CHANNEL - each channel is a container for one unidirectional audio stream, one unidirection video stream and one bidirectional binary stream. To initiate a stream on a channel, the apropriate end issues a corresponding DEFINE-VIDEO-STREAM, DEFINE-AUDIO-STREAM or DEFINE-BINARY-STREAM commands, followed by interleaving data packets of the same type.

The data packets (3 VSTREAM-DATA, 4 ASTREAM-DATA, 5 BSTREAM-DATA) all have the same header fields:

channel-id : u8 stream-id : u32 length : u16

followed by 'length' variable number of continous bytes to expect. It is the full header+variable data block that is used to calculate and verify the message authentication code as per 3. Authentication and Encryption.

The implementation may implement a number of strategies for chunking and interleaving a stream, informed by current congestion window size, abstract window type and event flow.

Video

A video frame transfer is initiated with a DEFINE-VIDEO-STREAM command (4), followed by a number of vstream data packets (packet type 3). It is recommended that those data packets are interleaved with other ongoing stream and command transfers, with priority given to the channel with most recent user interaction and activity focus through the VIEWPORT event (see Section 6).

The fields of the 'DEFINE-VIDEO-STREAM' command are as follows:

id                : u32
format            : u8
surface width     : u16
surface height    : u16
x                 : u16
y                 : u16
frame width       : u16
frame height      : u16
flags             : u8
compressed size   : u32
uncompressed size : u32
commit            : u8
four-cc           : u8[4]

ID is a source defined identifier. It is local to the channel the stream is being defined on and MUST not collide with other streams defineed on the same channel. It is RECOMMENDED that this is tracked locally per channel and incremented each time a stream is defined. An implementation MUST NOT permit multiple streams of the same type in flight without being explicitly cancelled.

A single stream can be used to convey a number of image frames, and only need to be redefined if the dimensions of the backing store change.

Format defines the encoding method for the data being sent: 0 : 32-bit, R8G8B8A8 with linear alpha. 1 : 24-bit, linear full-opqaue R8G8B8 2 : 16-bit, linear R5G6B5 5 : H264 stream 7 : ZSTD compressed TPACK block 8 : ZSTD compressed full frame 9 : ZSTD compressed delta frame 10 : Passthrough stream

The 3,4,6 format values are deprecated but kept allocated to retain compatibility with dated implementations still using them. It is RECOMMENDED that any encountered unhandled format value triggers a STREAM-CANCEL command with unhandled format (1) as reason for cancellation.

The 'TPACK' format is described in Section 5.5, Text.

If format is set to passthrough (10) the four-cc field SHOULD contain the fourCC encoded identifier of the encoder type, if known. This is used to permit an opaque bitstream link with hardware encoders where the protocol implementation might lack access to specifics due to hardware, security and architectural segmentation.

All region and surface dimensions are in upper-left origo buffer order.

These are further modified by the 'flags' bitmap of possible processing hints:

1 : origo-lower-left

This bit is set if the decompressed buffer has an inverted row order and should be flipped later in the processing pipeline.

If the format is of the known raw (0,1,2) or compressed raw(8,9) types the x, y, frame width and frame height fields specify the affected region of the defined surface. Multiple updates can be sent in sequence and changes accumulate at the receiving sink end. Updates MUST NOT be passed on locally until a stream with the 'commit' field is set to a non-zero value.

An implementation MUST calculate the uncompressed size based on the format and surface dimensions and compare the calculated uncompressed size against the value presented in the received size before allocating any decompression buffer space and reject by issuing a CANCEL-STREAM command if the calculated value does not match the received one.

Audio

An audio frame transfer is initiated with a DEFINE-AUDIO-STREAM command (5), followed by a number of astream data packets (packet type 4).

The fields of the 'DEFINE-AUDIO-STREAM' command are as follows:

id                : u32
channels          : u8
encoding          : u8
nsamples          : u16
rate              : u32

The following encodings are supported:

signed 16-bit (0)

Binary

A binary 'blob' transfer is initiated with a DEFINE-BINARY-STREAM command (6), followed by a number of bstream data packets (packet type 5).

The fields of the 'DEFINE-BINARY-STREAM' command are as follows:

stream-id         : u32
size              : u64
type              : u8
token-id          : u32
checksum          : u8[16]
compressed        : u8

Stream identifier shares namespace with audio and video streams. It MUST be unique. It is SUGGESTED that they are allocated through a shared incremental counter.

The size field covers how many bytes that should be transferred in total, or 0 if the stream is continuous. For that case completion and progress notification is conveyed over the STREAMSTATUS event.

The type MUST be one of the following:

state (0)           event trigger: STATE-IN, STATE-OUT
bchunk (1)          event trigger: BCHUNKSTATE, BCHUNK-IN, BCHUNK-OUT
font (2)            event trigger: FONTHINT
font-secondary (3)  event trigger: FONTHINT
debug (4)
appl (5), appl-controller (6) (See DIRECTORY extension)

The token ID is a custom identifier used to pair the ongoing stream with queued event with the outer desktop.

The checksum, if known, should use BLAKE3 in unkeyed hash mode. Its purpose is for the other end to check for a locally cached version, and issue a CANCEL-STREAM command if a matching one exists.

Text

A channel can be used to provide formatted text as a special encoded 'TPACK' video stream. These are always compressed with ZSTD and values encoded in little- endian.

Each frame starts with a 16 octet frame header:

data-size        : u32
line-count       : u16
cell-count       : u16
scroll-direction : u8
frame-flags      : u16
background-colour: u8[4]
cursor-state     : u8

Each line contains:

start-line       : u16
cell-count       : u16
cell-offset      : u16
content-dir      : u8 ?!
scroll-dir       : u8 ?!
line-state       : u8

Followed by cell-count of cells:

Event Model

Event type packets have a fixed 128 byte size. The categories and types are a filtered subset of those present in SHMIF. Naming and numbering conventions are kept to match with existing consumers of SHMIF. This is intended to provide an easier path for integrating with local applications and windowing system.

Place where there are gaps in the command numbering is where there exist a locally reasonable use but in conflict with the networked case.

Each packet has a 1 byte category selector:

category : u8

The PERMITTED category values are:

input-device    : 2
target-command  : 16
external-hint   : 64

An implementation MUST block/warn, discard/warn or terminate if a value from a non-permitted category is found as this suggests a routing or filtering issue with other users of SHMIF.

Input

Event category 2 is used for input events. This is most commonly provided when a user is interacting with a window that has been provided over a channel.

These have a frame format of:

input-kind     : u32
device-kind    : u32
datatype       : u32
label          : u8[16]
flags          : u8
device-id      : u16
device-subid   : u16
segment-token  : u32
sample-ts      : u64

Input kind and Device- kind are hints as to device and sampling origin, with datatype specifying layout of remaining bytes in event packet.

Input kind MUST be one of the following values:

button      : 0
axis-motion : 1
touch       : 2
status      : 3
eyes        : 4

Device kind MUST be one of the following values:

keyboard        : 1
mouse           : 2
game-controller : 4
touch-display   : 8
led-controller  : 16
eyetracker      : 32
status          : 64

These are laid out as a bitmask both for internal routing uses, and the INPUTMASK events that can be used to disable forwarding of several device categories.

The label is a custom, short, ASCII encoded tag. This is used to pair with LABELHINT events sent by the source in order to convey suggested binding and to allow outer windowing system to reliably rebind or reroute.

Flags is a bitmap used to indicate if the event sample is associated with input access or routing entering (& 2), leaving (& 4) a surface active state or gesture (& 1).

Device ID is a source-local non-unique identifier to distinguish between one device or another, and subid for devices with multiple associated input sources.

The segment token is normally set to zero, but can be used to reference a segment bound on some channel when manually rerouting, forwarding or synthesizing input events.

The sample-ts timestamp is a monotonic clock in microseconds updated when the sample was generated, for comparison against previous samples from the same channel.

If the datatype is specified as ANALOG (=1):

relative : u8
count    : u8
samples  : d16[4]

Relative defines if the values provided in samples are relative to their previously defined value (starting at 0), count how many (MUST be larger than zero and less than- or equal to- 4).

If the datatype is specified as DIGITAL (=2):

active : u8

If active is set to 1, means that the button is being held and 0 if it has been released.

If the datatype is specified as TRANSLATED (=4):

codepoint : u8[5]
active    : u8
scancode  : u8
symbol    : u32
modifers  : u16

Codepoint refers to a single, 0 terminated UTF-8 encoded unicode codepoint, or zero if there is no available translation for the event.

If active set to 1, the translated input has been activated (rising) or released (falling).

The scancode is a device-local reference for the button input which triggered the event and SHOULD be considered a last resort for case by case compatibility.

The symbol is a segment type relative lookup table index. It is RECOMMENDED that the default table used is that of [SDL2] due to the range of platforms it has been verified against. It is SUGGESTED that for Segment types e.g WAYLAND and X11, the <X11/keysymdef.h> table is used as per [XLIBREF].

Modifiers is a bitmask, with the following bit allocation:

LEFT-SHIFT    : 1
RIGHT-SHIFT   : 2
LEFT-CONTROL  : 3
RIGHT-CONTROL : 4
LEFT-ALT      : 5
RIGHT-ALT     : 6
LEFT-META     : 7
RIGHT-META    : 8
NUMLOCK       : 9
CAPSLOCK      : 10
MODE          : 11
REPEAT        : 12

The REPEAT modifier indicates that the event is an oscillating input and the timestamp/congestion state SHOULD be considered before forwarding in order to avoid accidental oscillations due to network conditions.

If the datatype is specified as TOUCH (=8):

active         : u8
x, y           : d16
pressure, size : f32
tilt-x, tilt-y : d16
tool           : u8

If the datatype is specified as EYES (=16):

head position  : f32[3]
head angle     : f32[3]
gaze-region    : f32[4]
user-present   : u8

Target

Target command events authoritate instructions flowing from sink to source. Their numbering and allocations have evolved organically, with gaps in event value caused by deprecation or being masked due to poor translation from a local to network processing model. Values not present in this set MUST transition the connection to a terminal state.

Most of these require little intervention on the protocol level, but are expected to have a meaningful translation to the local windowing system.

EXIT (1)

The exit event means that the channel will be severed. No further event processing will be considered in either direction. This SHOULD result in a COMMAND-CLOSE on the channel.

FRAMESKIP (2)

framecount : s32

The frameskip event means that only every 'framecount' frames should be sent. This is useful for fast-forward stepping through contents. This can be implemented either at the protocol layer or in the local windowing system, IF it supports such a feature.

RESET (9)

level : s32

The RESET event means that the internal state of the source should change due to a request from a user or an error in the local windowing system. The level MUST be one of:

  0 : Soft
  1 : Hard
  2 : Recovery
  3 : Reconnect

Soft means that content and application state should return content to as close to initial state as possible. Hard extends Soft to also include renegotiation of additional resources such as fonts. Recorvery extends Hard with the annotation that any and all previously accumulated state has been lost. Reconnect extends Hard, but content preferences may also be different as the backing connection may have migrated to another windowing environment.

PAUSE (10)

The PAUSE event means that no events other than RESET, UNPASE or EXIT MUST be ignored or discarded.

UNPAUSE (11)

The UNPAUSE event cancels out the restrictions from a previous PAUSE event.

SEEKTIME (12)

mode      : s32
timestamp : f32

The SEEKTIME event indicates that if the data source has a seekable notion of temporal dependent content, it SHOULD seek to as close as the desired time as possible.

The mode MUST be one of:

0 : Relative
1 : Absolute

Relative timestamp value is relative to the current content position and the value is in discrete monotonic ticks on some local reference clock.

Absolute position is a floating point percentage in the 0..1 range with 0 0 meaning the start of the stream, and 1 the end of the stream.

SEEKCONTENT (13)

mode : s32
    mode = 0 (relative)
        dx : s32
        dy : s32
        dz : s32
    mode = 1 (absolute)
         x : f32
         y : f32
         z : f32

The SEEKCONTENT event MAY be used if the source has previously issued a CONTENTHINT event indicating that there is spatial content which do not fit the current window dimensions.

The absolute coordinate defines the upper left corner as a 0..1 encoded percentage of the current window dimensions.

DISPLAYHINT (14)

width       : s32
height      : s32
hint        : s32
layout      : s32
density     : f32
cell-width  : d32
cell-height : d32
token       : d64

The DISPLAYHINT event indicates to the source which dimensions it will be presented at. If these differ from the ones that the source has defined in its video stream, this means that the contents MAY be scaled to fit.

The tuple [cell-width, cell-height] are feedback to TPACK encoded channels about the nominal cell dimensions based on the currently active font and desired text size.

STREAMSET (16)

identifier : d32

The STREAMSET event MAY be sent to a source that has previously notified that there are alternate data streams for viewing the content through a STREAMINFO event. The provided identifier SHOULD be a value in the 0..n range provided in that event.

ATTENUATE (17)

gain : f32

The ATTENUATE event MAY be sent to a source to request that the input gain on any audio presented on the channel SHOULD be lowered to the gain value (within 0..1 range) before being passed as ASTREAM packets.

REQFAIL (20)

cookie : u32

The REQFAIL event MUST be passed in response to a BCHUNKSTATE or SEGREQ command that could not be fulfilled due to constraints in the local windowing system.

GRAPHMODE (23)

group: u32
color: u8[3]

The GRAPHMODE event is used to communicate preferred colors used to prepare VSTREAM transfers, depending on constraints passed in the local windowing system.

If the 8th bit is not set, it refers to the foreground colour. If the 8th bit is set for group, and the group value permits separate BACKGROUND/ FOREGROUND colours, the event refers to the BACKGROUND colour of the group.

The permitted group values are:

PRIMARY(2)    : base colour (FOREGROUND, REFERENCE)
SECONDARY(3)  : alternate colour, contrast to PRIMARY (FOREGROUND, REFERENCE)
BACKGROUND(4) : background colour, (BACKGROUND, REFERENCE)
TEXT(5)       : default content text (FOREGROUND, BACKGROUND)
CURSOR(6)     : input caret colour (FOREGROUND)
ALTCURSOR(7)  : input caret colour in locked/modal state (FOREGROUND)
HIGHLIGHT(8)  : text marked for user attention (FOREGROUND, BACKGROUND)
LABEL(9)      : text used for UI elements (FOREGROUND, BACKGROUND)
WARNING(10)   : text used to alert the user to a moderately severe problem
                (BACKGROUND, FOREGROUND)
ERROR(11)     : text used to alert the user to a severe problem
                (BACKGROUND, FOREGROUND)
ALERT(12)     : text used to alert the user towards immediate attention
                (BACKGROUND, FOREGROUND)
REFERENCE(13) : links to files or Internet URLs
                (BACKGROUND, FOREGROUND)
INACTIVE(14)  : text used for UI elements that cannot be accessed
                (BACKGROUND, FOREGROUND)
UI(15)        : text used for generic UI elements

The values from 16 to 31 are used for a reference palette matching the display attributes from VT100 descending terminals, in ascending order:

BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, LIGHT GREY, DARK GREY, LIGHT RED, LIGHT GREEN, LIGHT YELLOW, LIGHT BLUE, LIGHT MAGENTA, LIGHT CYAN.

MESSAGE (24)

message : u8[78]

The message event SHOULD be used sparringly for domain specific workarounds, as well as short-form content on CLIPBOARD channel types.

The message field MAY be padded with NUL bytes but MUST NOT exceed the fixed length. For longer binary transfers, the BINARY-STREAM command and BINARY packets SHOULD be used in response to BCHUNKSTATE commands.

FONTHINT (25)

size-mm      : f32
hint         : u32
continuation : u32

FONTHINT is used to suggest desired properties of source text rasterization. It is combined with DISPLAYHINT in order to resolve to font-local formats.

If size is set to 0, the size is unchanged from previous values or some implementation defined default.

If there is an accompanying Font file in a format supported by RFC8081 it should be transmitted through a BINARY-STREAM command and BINARY packets directly following the event.

The continuation field is set to (1) if the font transfer should append as fallback to glyphs not present in previous transfers.

GEOHINT (26)

latitude     : f32
longitude    : f32
elevation    : f32
country      : u8[4]
spoken-lang  : u8[4]
written-lang : u8[4]

The GEOHINT event is used to suggest parameters for supporting localisation and positioning, for sources which can adapt to such features.

The values for the country field follow ISO-3166-1 alpha-3 with NUL byte termination. The values for the spoken-lang and written-lang follow ISO-639-2 alpha-3 with NUL byte termination.

OUTPUTHINT (27)

max-width        : u32
max-height       : u32
vertical-refresh : u32
min-width        : u32
identifier       : u32
variable-min     : f32
variable-step    : f32

The OUTPUT hint event is used to provide details about the physical displays that the segment source is mapped to and SHOULD be provided prior to the DISPLAYHINT which covers how the segment is presented.

ACTIVATE (28)

The ACTIVATE is provided to terminate the set of events provided in the initial burst when a new channel has been mapped.

ANCHORHINT (30)

relative-x : s32
relative-y : s32
relative-z : s32
source     : s64
parent     : s64
namespace  : u32

The ANCHORHINT event is used to relay information about positioning in local sink windowing system. The relative position values are to some global anchor if parent is not referenced.

The source field is set to a segment token if the event relays information about other windows that the target channel has a pre-established relationship to.

If 'namespace' is set to 1, the source and parent fields reference source- provided identifiers instead of sink provided segment identifiers.

External

External events are descriptive events from source to sink. They MAY affect behaviour on sink processing, but any actions are implementation- defined by the local windowing system.

Values not present in this set MUST transition the connection to a terminal state.

MESSAGE (0)

data       : u8[78]
multipart  : u8

This corresponds to the MESSAGE command event, with the notable change that if multipart is set to 1 the MESSAGE is a continuation of the previous one and should me merged together at a discrete UTF-8 boundary.

IDENT (2)

data       : u8[78]
multipart  : u8

IDENT changes to dynamic identity of the segment bound window on the channel. This is used for ancilliary tags to the immutable name provided in REGISTER, such as the name of a currently open document.

STREAMINFO (6)

identifier : u8
kind       : u8
language   : u8[4]

This indicates that there are alternate media streams that can be switched to without creating a new segment or channel. the STREAMSET command is used to activate the stream specified by identifier.

The kind field can be one of:

0 - Audio
1 - Video
2 - Text
3 - Overlay

The values for the language field follow ISO-639-2 alpha-3 with NUL byte termination.

STREAMSTATUS (7)

time-string  : u8[9]
time-limit   : u8[9]
completion   : f32
streaming    : u8
frame-number : u32

The STREAMSTATUS event MAY be used to convey metadata about the ongoing VSTREAM transfer on the channel.

The time-string and time-limit fields are NUL terminated 7-bit ASCII in the HH:MM:SS format showing the current time and total runtime length if streaming is set to 0 and the time is known.

If the time is unknown but the frame count is known, the completion field is set to a value in the 0..1 range estimating the percentage (frame-number / total-frames).

If there is no time information, the streaming field MUST be set to 1.

The framenumber is the sequential monotonic counter of the frame position in the stream.

STATESIZE (8)

size : u32
type : u32

The STATESIZE event is provided to inform the Sink end that the segment supports saving and restoring state through an accompanying BSTREAM and STATE-IN or STATE-OUT target event.

The type field is a custom weak identifier used when there are multiple segments that support state management independently.

SEGMENT-REQUEST (10)

identifier : u32
width      : u16
height     : u16
x-offset   : s16
y-offset   : s16
direction  : u8
hints      : u8
kind       : u8

The SEGMENT-REQUEST event is used to request the local windowing system to create a new window that the A12 transport can map to a new channel.

The IDENTIFIER is a caller chosen value to be paired with a REQFAIL if the the request cannot be completed.

The width and height fields specify the preferred initial dimensions, with x-offset and y-offset the relative position to the parent the request event is sent through.

The kind field is a type hint about the purpose of the new window. The set of permitted values are:

Arcan             : 1
Media             : 2
Terminal          : 3
Sensor            : 4
Game              : 5
Application       : 6
Browser           : 7
Virtual Machine   : 8
Stereoscopic      : 9
Popup             : 10
Icon              : 11
Titlebar          : 12
Cursor            : 13
Accessibility     : 14
Clipboard         : 15
Widget            : 16
Text-UI           : 17
Service           : 18
X11               : 19
Wayland           : 20
Handover          : 21
Audio             : 22
Debug             : 255

The generic one with a fitting translations in most windowing systems would be Media (low interactivity, high asymmetric throughput), Application (generic option when nothing else match), Game (latency over fidelity, timing sensitive and highly interactive), Browser (complex security model), Virtual Machine (resizes act as expensive display events, input is device native), Popup (short-lived, recurring contents with grab and focus semantics) and Terminal (TPACK format buffers, cell oriented layout, sizing and input binning).

Audio is for providing multiple positioned channels to mix with those of the parents, and VIEWPORT events becomes a way for spatially positioning the audio.

Icon, Titlebar and Cursor are used to subdivide the parent for custom decoration and alternate visual identity purposes.

Accessibility and Debug are used to annotate contents of the parent.

The direction field is a window management hint to suggest how the window should align to the parent, possible affecting its size.

The permitted values are:

0 : don't care
1 : split and position to the left
2 : split and position to the right
3 : split and position above
4 : split and position below
5 : attach to the left
6 : attach to the right
7 : attach above
8 : attach below
9 : set as embedded tab
10 : set as embedded inside window canvas
11 : replace parent until closed

CURSORHINT (12)

name : u8[78]

Cursorhint changes the mouse cursor that the windowing system should apply when a mouse cursor is over the surface. The suggested 'name' SHOULD match one

wait, forbidden, grabhint, crosshair, hand, zoom-in, zoom-out, help,
context-menu, typefield, datafield, vertical-datafield, cell, alias, drag,
drag-drop, drag-reject, sizeall, west, east, north, south, west-east,
north-south, north-west, south-west, north-east, south-east,
north-west-south-east, south-west-north-east

With the extensions of:

default, hidden, hidden-rel, hidden-abs

These are used for temporarily disabling the cursor and specifying the preferred sample format (movement delta or window local coordinates).

This mechanism is also used to warp the cursor when VIEWPORT reanchoring a CURSOR segment is not preferrable or possible:

hidden-hot:x,y, input:x,y, warp:x,y

With the values encoded in the name.

VIEWPORT (13)

x                   : s32
y                   : s32
width               : u32
height              : u32
parent-identifier   : u32
border              : u8[4]
embedded            : u8
invisible           : u8
focus               : u8
anchor-edge         : u8
anchor-rectangle    : u8
external-identifier : u32

The VIEWPORT event is used to request reanchoring relative to a parent on behalf of the originating channel bound segment OR another one assuming the correct token identifier can be provided in the external-identifier field.

The border field annotates the number of pixels assigned to the top, left, right and down edges of the segment has a border that can be cropped away or used as drag trigger for when there are decorations composited into the surface.

If anchor edge is set to 1, the x, y are relative to a specific edge of the parent. The permitted values are:

0       : any
1       : Upper-left
2       : Upper-right
3       : Upper-center
4       : Center-left
5       : Center
6       : Center-right
7       : Lower-left
8       : Lower-center
9       : Lower-right

If anchor-rectangle is set to 1 the width and height fields are applied to the anchor edge to specify a region of the edge to chose from.

CONTENT-STATE (14)

x-position  : f32
y-position  : f32
x-size      : f32
y-size      : f32
cell-width  : u8
cell-height : u8
min-width   : u8
min-height  : u8
max-width   : u8
max-height  : u8

The CONTENT-STATE event is to indicate where the current VIEWPORT exist for a window where only parts of the contents is visible. This allows the SEEKCONTENT command to suggest that the active region should change and to allow sink-end to provide UI controls, such as scrollbars, to assist interaction.

The x-position, y-position, x-size and y-size are in the 0..1 floating point range showing the relative percentage in regards to the full (1) window.

The cell-width and cell-height fields provide resizing alignment hints in surface pixels for content to be provided without cropping outside tile boundaries.

The min-width, min-height, max-width, max-height provide resize constraints for how big or small the local rasterisation can handle without causing visual artifacts.

LABELHINT (15)

label       : u8[16]
initial     : u16
description : u8[53]
symbol      : u8[5]
subid       : u16
datatype    : u8
modifiers   : u16

LABELHINT provides annotation tags for Input events and default keybindings for the local window system to provide user assistance with avoiding conflicting keybindings.

The label is a NUL byte terminated ASCII string using the restricted set [a..Z0..9_] for the label field of the input event indicating the LABELHINT an input corresponds to.

Description is a short NUL byte terminated UTF-8 encoded GEOHINT language adjusted description of what the input does.

The symbol is a single NUL byte terminated UTF-8 encoded UNICODE codepoint that can be used as a visual reference for the label in a user interface.

The subid, datatype and modifiers fields correspond to the default bindings for the Input as per the corresponding Input event.

REGISTER (16)

title : u8[64]
type  : u8
guid  : u64[2]

The REGISTER event is only valid PRIOR to receiving an ACTIVATE event. The title field is a NUL byte terminated UTF-8 encoded immutable user presentable text identifier.

The type field corresponds to the set of types described in the SEGMENT-REQUEST event.

The GUID is a 128-byte value packed as 2 64-byte values as per [RFC4122] for use with remembering windowing system local properties across sesssions.

ALERT (17)

message   : u8[78]
multipart : u8

The ALERT event is used to provide a notification of content changes of immediate importance to the user. The message is multipart terminated (0) and UTF8-encoded based on preferences from the latest received GEOHINT command.

BCHUNKSTATE (19)

size           : u64
input          : u8
hint           : u8
stream         : u8
extension      : u8[64]
identifier     : u32

The BCHUNKSTATE is used to announce capability- or loss of previously announced capability- for handling BINARY-STREAM with paired BCHUNK-IN, BCHUNK-OUT commands.

The hint field suggests the context for triggering the commands, with permitted values being one out of the following:

1      : immediate
2      : all-data
3      : multipart
4      : cursor

The immediate value suggests that any windowing system local facility for picking files, such as a save/open dialog, should be triggered immedaitely.

The all-data value indicates that the source has facilities for parsing and managing any arbitrary octet stream and the extensions field MUST be ignored.

The multipart value indicates that the BCHUNKSTATE should be appended to any previously received BCHUNKSTATE events.

The cursor value indicates that the event is paired with a cursor drag-and-drop action.

The extension is a set of UTF8-encoded, NUL byte terminated and ; separated elements of accepted file extensions as a reduced type model.

The identifier field MUST be used with a REQFAIL command following an immediate hinted BCHUNKSTATE event.

INPUTMASK (22)

device    : u32
type      : u32

The INPUTMASK is used to SUGGEST that the sink exclude a subset of input events from being passed. The device field is a bitmask of the corresponding device kind, and type is the input type to be excluded. These are described in Input.

Example Flow and Lifecycle

The following example works through a full session between a source and a sink from authentication to interaction and data exchange with recovery from unhandled compression.

The direction for connection initiation can be swapped based on context and needs.

  1. Source binds TCP port 6680 and listens for inbound connections.
  2. Sink generates ephemeral keypair and connects to the IP and PORT of the Source.
  3. Sink sends the ephemeral-round HELLO command with version number and role.
  4. Source receives the HELLO command, verifies version number and that the role is a Sink.
  5. Source generates ephemeral keypair, sends an ephemeral HELLO command and derives the ephemeral session keys.
  6. Source receives the HELLO command and derives the ephemeral session keys.
  7. The HELLO handshake is repeated using the derived session keys, providing the real public keys.
  8. Source performs local window system dependent operation to access the software to share and sends a REGISTER event.
  9. Sink provides the initial set of target commands, terminating with an ACTIVATE event.
  10. Source sends DEFINE-VIDEO-STREAM with parameters according to the software that has been shared and marks the encoding as H264. It then starts compressing video frames as they are received from the local software.
  11. Sink receives the DEFINE-VSTREAM command, notes that it doesn't support the compression format used and sends a STREAM-CANCEL command with the reason that the format is unsupported. It discards any video frame packets belonging to the stream that may be in flight.
  12. Source recieves the STREAM-CANCEL and resubmits the frame in the lossless ZSTD compressed full/delta frame format.
  13. Sink receives the stream, unpacks into the local windowing system, ensuring that the window dimensions match those of the received frame.
  14. When Sink receives device input or events with a matching translation in the event model from the local windowing system, it repackages them and formats as Event packets.
  15. Source unpacks and forwards event packages as they arrive, making sure to quickly convert any frames they might produce.
  16. When the user decides to close the window, the Sink sends a SHUTDOWN command on the channel and both ends close the TCP socket.

Directory Extension

The 'role' specified during authentication can, as mentioned, be source, sink, probe or directory. The directory one is an extension to the base protocol which adds alternate context interpretation of some events, as well as adding a handful of new commands.

This extension is experimental, and some commands may be modified with revisions to this document.

The purpose of the directory is to work as rendezvous for discovery, state storage, transform and messsaging for your fleet of a12 capable clients.

A client connected to a directory with the 'sink' role MAY be permitted to list as an available data sink. Consequently, a client connected with the 'source' role MAY be permitted to list as an available data source.

A client connected as probe may use the LIST command (9):

notify : u8

If notify is set to 1, the directory MAY send updates to the index of available directory resources at any time.

The directory server SHOULD reply to a LIST with zero or more DIRECTORY-STATE (10) and zero or more DIRECTORY-DISCOVER (11) commands.

DIRECTORY-STATE (10) contains the following fields:

identifier  : u16
reserved-1  : u16
reserved-2  : u16
checksum    : u8[4]
size        : u64
name        : u8[16]
description : u8[94]

Identifier is a directory- local identifier for an appl. An Appl is a set of Lua scripts and ancilliary resources packaged according to the FAP format below. The Identifier (0) is reserved and it is RECOMMENDED that the directory allocate identifiers incrementally from 1 and onwards for available appls.

DIRECTORY-DISCOVER (11) contains the following fields:

role       : u8
state      : u8
petname    : u8[16]
public-key : u8[32]

With role indicating 0 for a source, 1 for a sink and 2 for a linked directory. The state field is set to 0 if the entity has been added, and 1 if it has been removed.

The petname match the petname provided when the other end negotiated its connection as per the HELLO command. It is possible for the directory to provide multiple entries for the same petname using different public-keys. This is useful when there is on-demand load balancing provisioning of sources.

To access an announced source or sink, the DIRECTORY-OPEN (12) command can be used:

mode      : u8
public-key: u8[32]

This requests that the directory server provides a connection to the source, sink or linked directory with a public key the one provided. Three different modes are supported and can be provided as a bitmap:

1 = inbound
2 = outbound
4 = tunnel.

The mode chosen depends on reachability of the two endpoints and the selection is up to the discretion of the directory implementation. Of particular note is the 4th mode where the pre-existing directory connection will be used to tunnel a connection between the two.

The directory server MUST respond to an OPEN command with a 'DIRECTORY-OPENED' command (13). This command carries the following fields:

status     : u8
address    : u8[46]
port       : u16
secret     : u8[12]
public-key : u8[32]

Status can be one of:

0 indicating failure
1 direct inbound connection
2 direct outbound connection
3 tunnel

The connection can fail (status = 0) if there is no negotiable solution for the two endpoints to reach eachother, or if the one or the other has disconnected while the request was made.

The address field will carry an ascii encoded IPv4 or IPv6 address in the case of an inbound or outbound connection, or a directory- local tunnel ID if the directory server will act as a tunnel.

The provided public-key is the key the other end will use to initiate the connection, and the secret will be used as passphrase for authenticating the initial HELLO command. The directory MUST use a UNIQUE cryptographically secure pseurandom number generator for generating the secret.

The negotiated tunnel identifier corresponds to a channel, and transfers to/from the other end comes as a BINARY packets across that channel. Any other activity on that channel MUST be ignored.

To terminate a tunnel relay session, either of the three parties (SOURCE, SINK, DIRECTORY) issues a DROP-TUNNEL (14) command with the matching identifier as the payload.

File Transfer

The DEFINE-BINARY-STREAM command is used to initiate a binary transfer as per Section 5.4, Binary. For regular file transfers, the pairing event used to transfer metadata and as a trigger for creating the stream is BCHUNKSTATE, which MUST carry a namespace selection identifier, a request identifier, a unique name as the 'extensions' part of the event and desired direction.

The namespace identifier corresponds to (0 = private) or a valid APPLID provided as a response or notification following the LIST command. Any name starting with a dot '.' is reserved for protocol use.

The name '.index' SHALL be used to transfer a list of files available in the namespace. The format for the .index is encoded as a number of line separated entries using UTF-8 encoded key[=value] with : as separator between keys.

If a requested name does not exist (for download), or there is insufficient permission (for upload or download), the directory server MUST respond with a TARGET_COMMAND_REQFAIL event with the corresponding request identifier.

If the request is permitted, the directory server MUST initiate the transfer through the DEFINE-BINARY-STREAM command.

The reserved '.appl' name is reserved for accessing the directory server appl store in order to retrieve or update the client side portion of an appl.

FAP format

FAP - (Format, Arcan, Package) is used to package an appl as desribed in the DIRECTORY-STATE command. These use the same key/value encoding scheme as with .index files, where each entry MUST contain a 'path' key, a 'name' key and a 'size' key.

The 'size' key value MUST be set to a string encoded value of the number of bytes belonging to path/name to consume from the bytestream unencoded. This will be followed by a new entry until there are no more bytes left to consume.

Discovery Extension

A12 has an optional broadcast domain discovery protocol. It is intended to work inside the message domain of a network of directory servers, in the broadcast domain of an IPv4 network and multicast domain of an IPv6 network.

Discovery here means establishing a network path to an entity where a previous authenticated relationship exists. The entity that should be 'discovered' issues a beacon, and entities that should affirm knowledge of this entity replies to a beacon given certain prerequisites.

Beacon

The beacon follows the format of an 8 byte NONCE which comes from a CSPRNG source combined with a set of X25519 public keys that are NONCE, H(NONCE, Kpub) stacked together.

The beacon is sent in the broadcast domain, wait for 1 second then send a beacon as (NONCE+1, H(NONCE + 1, Kpub1) .. H(NONCE, Kpub..n)). This provides a 'proof of elapsed time' that is more expensive for the source of the beacon to calculate, than for the recipient to verify.

The recipient of a beacon sweeps its keystore, looking for matches to H(NONCE, Kpub) to pair to a known petname-Kpub pair.

A number of Kpub identities can be packed in the same beacon as the authentication setup and reference tooling encourages differentiation of multiple keypairs to one identity.

Beacon Response

A device that sees a valid beacon pair with a valid timeout checks its known keystore and calculates H(NONCE, Kpub) for a match. If there is one, it calculates the H(NONCE, Self.Kpub) for the public key used to establish the identity in the past to the device in question.

It sweeps the keystore for any Kpub that match, and can use that to initiate a direct connection and/or alert outer user interfaces that the paired petname has been discovered.

This scheme ensures that it takes at more time to calculate a beacon pair over a keyset than it takes to verify it, with no amplification to bytes in flight on an attempt to spoof.

Tools and Reference Implementations

The repository at fossil.arcan-fe.com contains libraries and command-line tools which act as the reference implementation for the protocol.

The components involved are:

Future Changes

The following additions are planned, primarily to the Directory extension:

Acknowledgements

Parts of this work was funded by the NGI0 Entrust fund administered by NLnet and supported through the European Commission 'Next Generation Internet' Programme.

Work on the specification and reference tooling has been provided by Bjorn Stahl and Valts Liepins.

References

[HKDF]: https://eprint.iacr.org/2010/264 "Cryptographic Extraction and Key Derivation: The HKDF Scheme", Proceedings of CRYPTO 2010 2010, Krawczyk, H.

[BLAKE3]: https://www.ietf.org/id/draft-aumasson-blake3-00.html "The BLAKE3 Hashing Framework", Aumasson, J-P. Neves, S. O'Connor, J., Wilcox, Z.

[ZSTD]: https://datatracker.ietf.org/doc/html/rfc8878 "Zstandard Compression and the 'application/zstd' Media Type", Colette, Y.

[H264]: https://www.loc.gov/preservation/digital/formats/fdd/fdd000081.shtml " MPEG-4, Advanced Video Coding (Part 10) (H.264)"

[CHACHA20]: https://datatracker.ietf.org/doc/html/rfc7539 "ChaCha20 and Poly1305 for IETF Protocols", Nir, Y., Langley, A.

[TOOMUCHCRYPTO]: https://eprint.iacr.org/2019/1492 "Too Much Crypto", Aumasson, J-P

[SHMIF]: https://arcan-fe.com/2024/11/21/a-deeper-dive-into-the-shmif-ipc-system, " A deeper dive into the SHMIF IPC system", Stahl, B.

[XLIBREF]: "XLIB Reference Manual R5", Nye, A.

       Connection Protocol", Ylonen, T., C. Lonvick

[RFC6143]: https://datatracker.ietf.org/doc/html/rfc6143 "The Remote Framebuffer Protocol", Richardson, T., Levine, J.

[RFC4122]: https://datatracker.ietf.org/doc/html/rfc4122 "A Universally Unique IDentifier (UUID) URN Namespace", Leach, P., Mealing, M., Saltz, R.

       Top-Level Media Type", Lilley, C.

[MSRDP] : https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-rdpbcgr "Remote Desktop Protocol"