Extended QOA Format v1.1
========================
Last updated 23 June 2025
The Extended QOA Format (XQAF or XQA) is an audio format derived from the
original [Quite OK Audio Format](https://qoaformat.org/), with the goal of
adding additional metadata to turn QOA into a more usable everyday audio format
for music/audio listening.
The recommended file extension for Extended QOA Format audio files is ".xqa".
All values in this document are BIG ENDIAN unless otherwise noted. This is to
match the original QOA specifications.
[XQATool](https://nanako.mooo.com/fossil/xqatool/) is the official
reference tool for XQAF. It can also encode/decode/convert normal QOA
files.
## Overall Structure
If the file is meant to be streamed over a network, then this **SHOULD** be the
layout of the file:
```
[Header]
[Tag Data]
[Raw QOA Data]
```
However, the `[Tag Data]` can appear anywhere else *except* within the
`[Raw QOA Data]` section. If the `[Tag Data]` appears within the
`[Raw QOA Data]` section, then the file **MUST** be considered invalid.
If the file is not meant to be streamed over a network, then you **MAY** use
either the streaming layout above, or the following layout:
```
[Header]
[Raw QOA Data]
[Tag Data]
```
Other layouts are also valid, but not recommended. Examples of other valid
layouts:
```
[Header]
[Raw QOA Data]
[Tag Data]
```
```
[Header]
[Tag Data]
[Raw QOA Data]
```
## Header
Extended QOA Format files have a larger
```
[HEADER START]
[4 bytes] - Magic Bytes
**MUST** be the characters "XQAF" in ASCII (so the bytes $58 $51 $41 $46).
[2 Bytes] - Version
First byte is the major version, second byte is the minor version (e.g. the
bytes $02 $01 is version 2.1).
[4 Bytes] - Flags
32-bit unsigned integer. Flags for the Extended QOA Format file.
[3 Bytes] - Sample Rate
24-bit unsigned integer. The output sample rate for the Raw QOA Data.
**MUST** be between 1 and 16777215, inclusive.
[4 Bytes] - Total Samples
32-bit unsigned integer. Total number of samples in the Raw QOA Data. A
size of zero must be considered invalid.
[1 Byte] - Channels
8-bit unsigned integer. The number of channels for the Raw QOA Data.
**MUST** be between 1 and 255, inclusive.
[4 Bytes] - Offset to QOA Data
32-bit unsigned integer. Offset to the Raw QOA Data relative to
[HEADER START]. A data offset less than 34 (or an offset less than 46 when
the 64-bit field flag is set) must be considered invalid.
[4 Bytes] - QOA Size
32-bit unsigned integer. Total size of the Raw QOA Data, in bytes. A size
of zero must be considered invalid.
[4 Bytes] - Tag Offset
32-bit unsigned integer. Offset to the tag data relative to [HEADER START].
**MUST** be zero if no metadata tag is present.
When this is not zero, then an offset less than 34 (or an offset less than
46 when the 64-bit field flag is set) must be considered invalid.
Additionally, when the tag offset is non-zero, then the tag data must not
overlap the QOA data, or else the file must be considered invalid.
[4 Bytes] - Tag Size
32-bit unsigned integer. Total size of the tag data, in bytes. **MUST** be
zero if no metadata tag is present. When the tag data is compressed, this
is the total size of the compressed Vorbis Comment data, otherwise it is
the total size of the uncompressed Vorbis Comment data.
[HEADER END]
```
### Flags
* Bit 0 (the LSB): When 1, then the file following this file should begin
playing immediately after this track (i.e. "gapless playback").
* Bit 1: Tag data is compressed with ZStandard.
* Bits 2 through 30 (inclusive): Reserved. These **MUST** be 0 in format
version 1.0 and 1.1, otherwise the file **MUST** be considered invalid
* Bit 31: When this is 1, the following header fields are 64-bit unsigned
integers instead of 32-bit unsigned integers: Offset to QOA Data, QOA Size,
Tag Offset. The notable exception is Tag Size, which always remains 32-bit.
## Tag Data
Extended QOA Format files **MUST** use Vorbis Comments for their metadata tags.
Unlike some formats, the framing bit must not be set (just like in FLAC).
The following tags definitions **SHOULD** be followed for Extended QOA Format
files:
* `title` - The primary title of the piece of work.
* `subtitle` - Secondary title of the piece of work.
* `artist` - The primary performer(s) of the work.
* `album` - The album name.
* `date` - Either the year the work was released, or the date it was released.
* `genre` - The genre of the work.
* `track` - Either an integer in base 10 indicating the track number; or a
string in the format "x/y" where x and y are both integers, where x is the
track number and y is the total number of tracks of the album that contains
the work.
* `composer` - The composers of the work.
* `arranger` - The arranger of the work.
* `comment` - User comment(s).
* `copyright` - The copyright information for the work.
* `language` - The language(s) used within the work.
* `metadata_block_picture` - An associated image for the work. See
[here](https://wiki.xiph.org/index.php/VorbisComment#Cover_art) for more
information.
* `replaygain_track_gain` - The gain to apply to the track. See the section on
[ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_track_peak` - The peak value of the track. See the section on
[ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_album_gain` - The gain to apply to the track. See the section on
[ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
* `replaygain_album_peak` - The peak value of the album. See the section on
[ReplayGain here](https://wiki.xiph.org/index.php/VorbisComment#Replay_Gain).
The Vorbis Comment data **MAY** be compressed in its entirety before writing it
to an Extended QOA Format file. When it is, bit 1 in the Flags field of the
Header **MUST** be set to a value of 1 (see the Flags section above). Compliant
decoders **MUST** support compressed Vorbis Comment sections.
Note that the Vorbis Comment data uses little endian integers internally.
## Raw QOA Data
The Raw QOA Data format is very closely based on, but not identical to, the
original QOA format. The main difference is that there is no File Header, and
the Frame Header does not include the number of channels or sample rate.
Everything else works the same as the original QOA format. The decoding and
encoding processes are the same.
The Raw QOA Data format consists of Frames, Frame Headers, LMS State Table, and
Slice List, and Slices.
The main data structure is the Frame, which uses the following format:
```
[4 Bytes] - Frame Header
[16*N Bytes] - LMS State Table
[X Bytes] - Slice List
```
The Frame Header uses this format:
```
[2 Bytes] - Frame Samples
16-bit unsigned integer. The number of samples in this Frame.
[2 Bytes] - Frame Size.
16-bit unsigned integer. The size of this Frame in bytes, including the
Frame Header.
```
The LMS State Table uses this format:
```
[8 Bytes] - History
Array of four 16-bit signed integers. The LMS State histories (most recent
is last).
[8 Bytes] - Weights
Array of four 16-bit signed integers. The LMS State weights (most recent is
last).
```
The Slice List uses this format:
```
[256 * Num Channels] - Slices
An array of Slices. May be less than 256 if this is the last Frame.
```
Each Slice uses this format:
```
[4 Bits] - SF_Quant
The Quantized Scalefactor.
[60 bits] - Residuals
Array of 20 3-bit values, each one containing a quantized residual.
```
Each Frame except the last **MUST** contain exactly 256 Slices per channel. The
last Frame may contain between 1 to 256 (inclusive) slices per channel. The last
slice (for each channel) in the last frame may contain less than 20 samples; the
slice still **MUST** be 8 bytes wide, and the unused samples **MUST** zeroed
out.
A valid Extended QOA Format file **MUST** have at least one frame. Each Frame
**MUST** contain at least one channel, and **MUST** contain at least one sample.
## Decoder Considerations
Decoders for Extended QOA Format files **MUST** support at least 2 channels, and
**SHOULD** support at least 8 channels. Channel data is interleaved per Slice,
so for example, a two-channel stereo file would have:
```
slice[0] = L, slice[1] = R, slice[2] = L, slice[3] = R …
```
Channel layouts for channel counts 1 through 8 are:
1. Mono
2. L, R
3. L, R, C
4. FL, FR, B/SL, B/SR
5. FL, FR, C, B/SL, B/SR
6. FL, FR, C, LFE, B/SL, B/SR
7. FL, FR, C, LFE, B, SL, SR
8. FL, FR, C, LFE, BL, BR, SL, SR
## Format Revision History
* **v1.1**
* **June 25th, 2025:** Mention XQATool since it's the official reference tool.
* **June 23rd, 2025:** Clarified that certain header offset/size fields should
be considered invalid. These were implicit before. No change to the
format. Fixed typo regarding the framing bit.
* **April 23rd, 2025:** Removed the Seek Table.
* **v1.0**
* **April 6th, 2025:** Initial format release.