CL-RemiAudio: extended-qoa-format.md at tip

File docs/extended-qoa-format.md from the latest check-in

Extended QOA Format v1.1
========================
Last updated 23 June 2025

The Extended QOA Format (XQAF or XQA) is an audio format derived from the
original [Quite OK Audio Format](https://qoaformat.org/), with the goal of
adding additional metadata to turn QOA into a more usable everyday audio format
for music/audio listening.

The recommended file extension for Extended QOA Format audio files is ".xqa".

All values in this document are BIG ENDIAN unless otherwise noted.  This is to
match the original QOA specifications.

[XQATool](https://nanako.mooo.com/fossil/xqatool/) is the official
reference tool for XQAF.  It can also encode/decode/convert normal QOA
files.

## Overall Structure

If the file is meant to be streamed over a network, then this **SHOULD** be the
layout of the file:

```
[Header]
[Tag Data]
[Raw QOA Data]
```

However, the `[Tag Data]` can appear anywhere else *except* within the
`[Raw QOA Data]` section.  If the `[Tag Data]` appears within the
`[Raw QOA Data]` section, then the file **MUST** be considered invalid.

If the file is not meant to be streamed over a network, then you **MAY** use
either the streaming layout above, or the following layout:

```
[Header]
[Raw QOA Data]
[Tag Data]
```

Other layouts are also valid, but not recommended.  Examples of other valid
layouts:

```
[Header]
[Raw QOA Data]
[Tag Data]
```

```
[Header]
[Tag Data]
[Raw QOA Data]
```

## Header

Extended QOA Format files have a larger

```
[HEADER START]
  [4 bytes] - Magic Bytes
    **MUST** be the characters "XQAF" in ASCII (so the bytes $58 $51 $41 $46).

  [2 Bytes] - Version
    First byte is the major version, second byte is the minor version (e.g. the
    bytes $02 $01 is version 2.1).
                   
  [4 Bytes] - Flags
    32-bit unsigned integer.  Flags for the Extended QOA Format file.

  [3 Bytes] - Sample Rate
    24-bit unsigned integer.  The output sample rate for the Raw QOA Data.
    **MUST** be between 1 and 16777215, inclusive.

  [4 Bytes] - Total Samples
    32-bit unsigned integer.  Total number of samples in the Raw QOA Data.  A
    size of zero must be considered invalid.

  [1 Byte] - Channels
    8-bit unsigned integer.  The number of channels for the Raw QOA Data.
    **MUST** be between 1 and 255, inclusive.

  [4 Bytes] - Offset to QOA Data
    32-bit unsigned integer.  Offset to the Raw QOA Data relative to
    [HEADER START].  A data offset less than 34 (or an offset less than 46 when
    the 64-bit field flag is set) must be considered invalid.

  [4 Bytes] - QOA Size
    32-bit unsigned integer.  Total size of the Raw QOA Data, in bytes.  A size
    of zero must be considered invalid.

  [4 Bytes] - Tag Offset
    32-bit unsigned integer.  Offset to the tag data relative to [HEADER START].
    **MUST** be zero if no metadata tag is present.
    
    When this is not zero, then an offset less than 34 (or an offset less than
    46 when the 64-bit field flag is set) must be considered invalid.
    Additionally, when the tag offset is non-zero, then the tag data must not
    overlap the QOA data, or else the file must be considered invalid.

  [4 Bytes] - Tag Size
    32-bit unsigned integer.  Total size of the tag data, in bytes. **MUST** be
    zero if no metadata tag is present.  When the tag data is compressed, this
    is the total size of the compressed Vorbis Comment data, otherwise it is
    the total size of the uncompressed Vorbis Comment data.
[HEADER END]
```

### Flags

* Bit 0 (the LSB): When 1, then the file following this file should begin
  playing immediately after this track (i.e. "gapless playback").
* Bit 1: Tag data is compressed with ZStandard.
* Bits 2 through 30 (inclusive): Reserved.  These **MUST** be 0 in format
  version 1.0 and 1.1, otherwise the file **MUST** be considered invalid
* Bit 31: When this is 1, the following header fields are 64-bit unsigned
  integers instead of 32-bit unsigned integers: Offset to QOA Data, QOA Size,
  Tag Offset.  The notable exception is Tag Size, which always remains 32-bit.

## Tag Data

Extended QOA Format files **MUST** use Vorbis Comments for their metadata tags.
Unlike some formats, the framing bit must not be set (just like in FLAC).

The following tags definitions **SHOULD** be followed for Extended QOA Format
files:

* `title` - The primary title of the piece of work.
* `subtitle` - Secondary title of the piece of work.
* `artist` - The primary performer(s) of the work.
* `album` - The album name.
* `date` - Either the year the work was released, or the date it was released.
* `genre` - The genre of the work.
* `track` - Either an integer in base 10 indicating the track number; or a
  string in the format "x/y" where x and y are both integers, where x is the
  track number and y is the total number of tracks of the album that contains
  the work.
* `composer` - The composers of the work.
* `arranger` - The arranger of the work.
* `comment` - User comment(s).
* `copyright` - The copyright information for the work.
* `language` - The language(s) used within the work.
* `metadata_block_picture` - An associated image for the work.  See
  [here](https://wiki.xiph.org/index.php/VorbisComment#Cover_art) for more
  information.
* `replaygain_track_gain` - The gain to apply to the track.  See the section on
  [ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_track_peak` - The peak value of the track.  See the section on
  [ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_album_gain` - The gain to apply to the track.  See the section on
  [ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_album_peak` - The peak value of the album.  See the section on
  [ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
  
The Vorbis Comment data **MAY** be compressed in its entirety before writing it
to an Extended QOA Format file.  When it is, bit 1 in the Flags field of the
Header **MUST** be set to a value of 1 (see the Flags section above).  Compliant
decoders **MUST** support compressed Vorbis Comment sections.

Note that the Vorbis Comment data uses little endian integers internally.

## Raw QOA Data

The Raw QOA Data format is very closely based on, but not identical to, the
original QOA format.  The main difference is that there is no File Header, and
the Frame Header does not include the number of channels or sample rate.
Everything else works the same as the original QOA format.  The decoding and
encoding processes are the same.

The Raw QOA Data format consists of Frames, Frame Headers, LMS State Table, and
Slice List, and Slices.

The main data structure is the Frame, which uses the following format:

```
[4 Bytes] - Frame Header
[16*N Bytes] - LMS State Table
[X Bytes] - Slice List
```

The Frame Header uses this format:

```
[2 Bytes] - Frame Samples
  16-bit unsigned integer.  The number of samples in this Frame.
[2 Bytes] - Frame Size.
  16-bit unsigned integer.  The size of this Frame in bytes, including the
  Frame Header.
```

The LMS State Table uses this format:

```
[8 Bytes] - History
  Array of four 16-bit signed integers.  The LMS State histories (most recent
  is last).
[8 Bytes] - Weights
  Array of four 16-bit signed integers.  The LMS State weights (most recent is
  last).
```

The Slice List uses this format:

```
[256 * Num Channels] - Slices
  An array of Slices.  May be less than 256 if this is the last Frame.
```

Each Slice uses this format:

```
[4 Bits] - SF_Quant
  The Quantized Scalefactor.

[60 bits] - Residuals
  Array of 20 3-bit values, each one containing a quantized residual.
```

Each Frame except the last **MUST** contain exactly 256 Slices per channel.  The
last Frame may contain between 1 to 256 (inclusive) slices per channel. The last
slice (for each channel) in the last frame may contain less than 20 samples; the
slice still **MUST** be 8 bytes wide, and the unused samples **MUST** zeroed
out.

A valid Extended QOA Format file **MUST** have at least one frame. Each Frame
**MUST** contain at least one channel, and **MUST** contain at least one sample.

## Decoder Considerations

Decoders for Extended QOA Format files **MUST** support at least 2 channels, and
**SHOULD** support at least 8 channels.  Channel data is interleaved per Slice,
so for example, a two-channel stereo file would have:

```
slice[0] = L, slice[1] = R, slice[2] = L, slice[3] = R …
```

Channel layouts for channel counts 1 through 8 are:

1. Mono
2. L, R
3. L, R, C
4. FL, FR, B/SL, B/SR
5. FL, FR, C, B/SL, B/SR
6. FL, FR, C, LFE, B/SL, B/SR
7. FL, FR, C, LFE, B, SL, SR
8. FL, FR, C, LFE, BL, BR, SL, SR

## Format Revision History

* **v1.1**
  * **June 25th, 2025:** Mention XQATool since it's the official reference tool.
  * **June 23rd, 2025:** Clarified that certain header offset/size fields should
    be considered invalid.  These were implicit before.  No change to the
    format.  Fixed typo regarding the framing bit.
  * **April 23rd, 2025:** Removed the Seek Table.
* **v1.0**
  * **April 6th, 2025:** Initial format release.