File docs/extended-qoa-format.md from the latest check-in

Extended QOA Format v1.1

Last updated 23 June 2025

The Extended QOA Format (XQAF or XQA) is an audio format derived from the original Quite OK Audio Format, with the goal of adding additional metadata to turn QOA into a more usable everyday audio format for music/audio listening.

The recommended file extension for Extended QOA Format audio files is ".xqa".

All values in this document are BIG ENDIAN unless otherwise noted. This is to match the original QOA specifications.

XQATool is the official reference tool for XQAF. It can also encode/decode/convert normal QOA files.

Overall Structure

If the file is meant to be streamed over a network, then this SHOULD be the layout of the file:

[Header]
[Tag Data]
[Raw QOA Data]

However, the [Tag Data] can appear anywhere else except within the [Raw QOA Data] section. If the [Tag Data] appears within the [Raw QOA Data] section, then the file MUST be considered invalid.

If the file is not meant to be streamed over a network, then you MAY use either the streaming layout above, or the following layout:

[Header]
[Raw QOA Data]
[Tag Data]

Other layouts are also valid, but not recommended. Examples of other valid layouts:

[Header]
[Raw QOA Data]
[Tag Data]

[Header]
[Tag Data]
[Raw QOA Data]

Header

Extended QOA Format files have a larger

[HEADER START]
  [4 bytes] - Magic Bytes
    **MUST** be the characters "XQAF" in ASCII (so the bytes $58 $51 $41 $46).

  [2 Bytes] - Version
    First byte is the major version, second byte is the minor version (e.g. the
    bytes $02 $01 is version 2.1).
                   
  [4 Bytes] - Flags
    32-bit unsigned integer.  Flags for the Extended QOA Format file.

  [3 Bytes] - Sample Rate
    24-bit unsigned integer.  The output sample rate for the Raw QOA Data.
    **MUST** be between 1 and 16777215, inclusive.

  [4 Bytes] - Total Samples
    32-bit unsigned integer.  Total number of samples in the Raw QOA Data.  A
    size of zero must be considered invalid.

  [1 Byte] - Channels
    8-bit unsigned integer.  The number of channels for the Raw QOA Data.
    **MUST** be between 1 and 255, inclusive.

  [4 Bytes] - Offset to QOA Data
    32-bit unsigned integer.  Offset to the Raw QOA Data relative to
    [HEADER START].  A data offset less than 34 (or an offset less than 46 when
    the 64-bit field flag is set) must be considered invalid.

  [4 Bytes] - QOA Size
    32-bit unsigned integer.  Total size of the Raw QOA Data, in bytes.  A size
    of zero must be considered invalid.

  [4 Bytes] - Tag Offset
    32-bit unsigned integer.  Offset to the tag data relative to [HEADER START].
    **MUST** be zero if no metadata tag is present.
    
    When this is not zero, then an offset less than 34 (or an offset less than
    46 when the 64-bit field flag is set) must be considered invalid.
    Additionally, when the tag offset is non-zero, then the tag data must not
    overlap the QOA data, or else the file must be considered invalid.

  [4 Bytes] - Tag Size
    32-bit unsigned integer.  Total size of the tag data, in bytes. **MUST** be
    zero if no metadata tag is present.  When the tag data is compressed, this
    is the total size of the compressed Vorbis Comment data, otherwise it is
    the total size of the uncompressed Vorbis Comment data.
[HEADER END]

Flags

Bit 0 (the LSB): When 1, then the file following this file should begin playing immediately after this track (i.e. "gapless playback").
Bit 1: Tag data is compressed with ZStandard.
Bits 2 through 30 (inclusive): Reserved. These MUST be 0 in format version 1.0 and 1.1, otherwise the file MUST be considered invalid
Bit 31: When this is 1, the following header fields are 64-bit unsigned integers instead of 32-bit unsigned integers: Offset to QOA Data, QOA Size, Tag Offset. The notable exception is Tag Size, which always remains 32-bit.

Tag Data

Extended QOA Format files MUST use Vorbis Comments for their metadata tags. Unlike some formats, the framing bit must not be set (just like in FLAC).

The following tags definitions SHOULD be followed for Extended QOA Format files:

title - The primary title of the piece of work.
subtitle - Secondary title of the piece of work.
artist - The primary performer(s) of the work.
album - The album name.
date - Either the year the work was released, or the date it was released.
genre - The genre of the work.
track - Either an integer in base 10 indicating the track number; or a string in the format "x/y" where x and y are both integers, where x is the track number and y is the total number of tracks of the album that contains the work.
composer - The composers of the work.
arranger - The arranger of the work.
comment - User comment(s).
copyright - The copyright information for the work.
language - The language(s) used within the work.
metadata_block_picture - An associated image for the work. See here for more information.
replaygain_track_gain - The gain to apply to the track. See the section on ReplayGain here.
replaygain_track_peak - The peak value of the track. See the section on ReplayGain here.
replaygain_album_gain - The gain to apply to the track. See the section on ReplayGain here.
replaygain_album_peak - The peak value of the album. See the section on ReplayGain here.

The Vorbis Comment data MAY be compressed in its entirety before writing it to an Extended QOA Format file. When it is, bit 1 in the Flags field of the Header MUST be set to a value of 1 (see the Flags section above). Compliant decoders MUST support compressed Vorbis Comment sections.

Note that the Vorbis Comment data uses little endian integers internally.

Raw QOA Data

The Raw QOA Data format is very closely based on, but not identical to, the original QOA format. The main difference is that there is no File Header, and the Frame Header does not include the number of channels or sample rate. Everything else works the same as the original QOA format. The decoding and encoding processes are the same.

The Raw QOA Data format consists of Frames, Frame Headers, LMS State Table, and Slice List, and Slices.

The main data structure is the Frame, which uses the following format:

[4 Bytes] - Frame Header
[16*N Bytes] - LMS State Table
[X Bytes] - Slice List

The Frame Header uses this format:

[2 Bytes] - Frame Samples
  16-bit unsigned integer.  The number of samples in this Frame.
[2 Bytes] - Frame Size.
  16-bit unsigned integer.  The size of this Frame in bytes, including the
  Frame Header.

The LMS State Table uses this format:

[8 Bytes] - History
  Array of four 16-bit signed integers.  The LMS State histories (most recent
  is last).
[8 Bytes] - Weights
  Array of four 16-bit signed integers.  The LMS State weights (most recent is
  last).

The Slice List uses this format:

[256 * Num Channels] - Slices
  An array of Slices.  May be less than 256 if this is the last Frame.

Each Slice uses this format:

[4 Bits] - SF_Quant
  The Quantized Scalefactor.

[60 bits] - Residuals
  Array of 20 3-bit values, each one containing a quantized residual.

Each Frame except the last MUST contain exactly 256 Slices per channel. The last Frame may contain between 1 to 256 (inclusive) slices per channel. The last slice (for each channel) in the last frame may contain less than 20 samples; the slice still MUST be 8 bytes wide, and the unused samples MUST zeroed out.

A valid Extended QOA Format file MUST have at least one frame. Each Frame MUST contain at least one channel, and MUST contain at least one sample.

Decoder Considerations

Decoders for Extended QOA Format files MUST support at least 2 channels, and SHOULD support at least 8 channels. Channel data is interleaved per Slice, so for example, a two-channel stereo file would have:

slice[0] = L, slice[1] = R, slice[2] = L, slice[3] = R …

Channel layouts for channel counts 1 through 8 are:

Mono
L, R
L, R, C
FL, FR, B/SL, B/SR
FL, FR, C, B/SL, B/SR
FL, FR, C, LFE, B/SL, B/SR
FL, FR, C, LFE, B, SL, SR
FL, FR, C, LFE, BL, BR, SL, SR

Format Revision History

v1.1
- June 25th, 2025: Mention XQATool since it's the official reference tool.
- June 23rd, 2025: Clarified that certain header offset/size fields should be considered invalid. These were implicit before. No change to the format. Fixed typo regarding the framing bit.
- April 23rd, 2025: Removed the Seek Table.
v1.0
- April 6th, 2025: Initial format release.