# RSConf - A data format somewhere between JSON and hjson

Author: Remilia Scarlet<br/>
Version: 1.1<br/>
Last Updated: 25 September 2025

## Introduction

RSConf has a few more features than JSON that make it nicer for config files,
such as comments, not needing commas, and unquoted strings.  But it it's less
strict and has fewer features than hjson to keep parsing simple.

It was created to fill a niche and scratch an itch for a nice, easy-to-use
config format that doubles as a data serialization format.  JSON is nice and
(usually) easy to visually parse as a human, but is too strict with its syntax,
and doesn't have comments.  YAML is nice, but has far too many bells and
whistles, leading to all sorts of strange edge cases, and a large space for
possible security problems.  TOML is ugly, and I just don't like it.  Hjson is
nice, but has a few too many unnecessary features for my liking, and is a bit
too flexible.  Thus, RSConf was born to fill the hole left by these other
formats.

As a high-level overview, RSConf is like JSON, but:

* Keys don't need quoting if they do not contain spaces.
* Commas don't need to come after values/key-value pairs unless there are
  multiple values/key-value pairs on the same line.
* Toplevel braces (`{` and `}`) can be omitted if the toplevel is an object.
* Integers and floats are differentiated.
* Integers can be in hex, octal, or binary.  Floats can be in decimal or
  "scientific notation" (and accept either `e` or `d` for their character).
* The null value is called `nil`.
* Comments are allowed and use semicolons (`;`)
* Explicit minimum limits for integers and floats.
* Stricter whitespace.

It is also a lot like hsjon, except:

* All strings are multi-line strings; no separate syntax.
* No quoteless strings.
* Integers can be in hex, octal, or binary.  Floats can be in decimal or
  "scientific notation" (and accept either `e` or `d` for their character).
* The null value is called `nil`.
* Comments are allowed and use semicolons (`;`)
* No multi-line comments.
* Explicit minimum limits for integers and floats.
* Slightly stricter whitespace.

Some well-formatted example RSConf data:

```
;;;;
;;;; Example Document
;;;;

some-object: {
  values: [ 1, 2, 3] ;; Recommended syntax for short arrays

  ;; Better syntax for larger arrays, or ones with long values.
  values-2: [
    "foo"
    "bar"
    "baz"
  ]

  name: "test"
  sub-obj: {
    id: 1, ;; Comma here is optional
    enabled: false ;; or "true"
    something-else: nil ;; Null values are represented with "nil"
  }
}
```

### Notation Used

This spec document uses Common Lisp-style hexadecimal/octal/binary numbers.
This means #xA7 instead of 0xA7, #o25 instead of 025, and #b1011 instead of
0b1011.

## Specification

The name of the format is RSConf, with that capitalization.  It is pronounced
"Arr-Ess-Konf" and stands for Remilia Scarlet's Config Format.

### Encoding

Files *MUST* be UTF-8 encoded without a byte-order marker.  No other encoding is
valid.  The two recommended file extensions are .rsconf and .rsc

The following UTF-8 codepoints are considered whitespace (note the absence of
#x0D, Return): #x0A (newline), #x20 (space), #x09 (tab), #x200B (zero width
space), #xA0 (non-breaking space), #x2007 (figure space), #x1680 (ogham space
mark), #x2000 (en quad), #x2001 (em quad), #x2002 (en space),
#x2003 (em space), #x2004 (three-per-em space), #x2005 (four-per-em space),
#x2006 (five-per-em space), #x2008 (punctuation space), #x2009 (thin space),
#x200A (hair space), #x202F (narrow no-break space), #x205F (medium
mathematical space), #x3000 (ideographic space).

### Line Endings

Line endings must be "UNIX-Style", meaning only the newline character (UTF-8
character #x0A).  Any return character (#x0D) or page character (#x0C) outside
of a string must raise an error and must not be counted as whitespace.  All
other whitespace characters in UTF-8 are counted as whitespace and ignored
outside of strings.

### Comments

There are only single-line comments, which always starts with one or more
semicolons and continues to a newline (#x0A).  Lisp-style comments, where the
number of semicolons vary on the depth, is recommended.  Comments **MUST NOT**
contain parsing directives or similar.  Conforming implementations **MUST NOT**
allow parsing directives, or they cannot claim to be conforming to the RSConf
specification, nor claim to support RSConf.  RSConf explicitly disallows any
sort of data or metadata within comments that can be used during or after
parsing.  Comments are there for human eyes only.

### Scalar Values

There are five types of scalar values: integers, floats, strings, booleans, and
the null value.

Integer values can be in decimal (12345), hexadecimal (#xA7, #xDEADBEEF), octal
(#o64, #o23), or binary (#b1100101, #b1011).  They start with a number, or a #
to indicate a specific radix, and continue to the end of the line, a comma
(#x2C), a comment character, or other whitespace.

Implementing parsers must support integer values that can be represented as
64-bit signed integers.  Greater limits are allowed.

Floats can only use decimal or "scientific notation".  Decimal format starts
with a number, then continues until a period (character #x2E) is reached.  After
that may be additional decimal numbers.  The float continues to the end of the
line, a comma, a comment character, or other whitespace.

Floats can use "scientific notation" in the form `X.Ye+Z` or `X.Ye-Z` or `X.YeZ`
if desired.  The letter `d` can be used in place of `e` as well, and they are
case-insensitive.  Implementing parsers must support values that can be
represented with *at least* 50 bits of precision and an 8-bit exponent; larger
limits are allowed.

Deserialization of RSConf data can still places stricter/looser restrictions on
the integer and float values as-needed.  For example, deserialization code can
place a constraint on an integer key so that it must contain a 8-bit unsigned
integer.  But they must allow for values up to at least the minimum limits
described above.

String values must be quote.  They start with a double quote (character #x22)
and continue to another double quote.  This means all strings are "multi-line"
strings.

The backslash character (`\`, #x5C) is used for escaping characters in a string.
Double quotes that appear within strings must be escaped, e.g. the string
`"hello \"world\""`.

Unicode characters may optionally be escaped with `\u{X}`, where `X` is always
in hexadecimal and is the Unicode code point.  It must be at least one hex digit
long.

Given this, the only special escapes that do anything within strings are for
double-quotes (`\"`) and for UTF-8 characters (`\u{X}`).  All other escapes
simply produce the character they escape, e.g. `"foo\abar"` results in the same
string as `"fooabar"`.

Booleans are the symbols `true` or `false`; these are case-sensitive and must be
lower-case.

The null value is the symbol `nil`.  This is also case-sensitive and must be
lower-case.

### Composite Values

There are two types of composite values in RSConf: objects and arrays.

An object starts with an open brace (`{`, character #x7B) and end with a close
brace (`}`, character #x7D).  Within these braces are zero or more key-value
pairs.

Arrays start with an open bracket (`[`, character #x5B) and end with a close
bracket (`]`, character #x5D).  Within these brackets are zero or more values
(never key-value pairs).

### Keys and Values

Keys can be quoted or unquoted.  For an unquoted key, the key name starts with a
non-whitespace character and continues until whitespace or a colon (character
#x3A) is reached.  A newline (#x0A) is **not** permitted in an unquoted key
between the key name and a colon; a colon must appear on the same line as the
unquoted key name it terminates.  All whitespace before an unquoted key name is
ignored.  All whitespace except for the newline (#x0A) after the key name and
before the colon is ignored.

Quoted keys are treated like string scalars, and thus all characters within them
are accepted.  All whitespace before the opening double-quote is ignored.  All
whitespace except for newlines (#x0A) after the closing double-quote is ignored.
Quoted keys **cannot** have a newline character between its ending quote and the
colon.

Key names cannot be an empty string (`""`).  Key names are always
case-sensitive.

It is recommended that you use unquoted keys whenever possible.  Lowercase
skewer-case names (e.g. `key-name`, `this-is-a-key`) that don't need quoting are
highly recommended, but not required.  Underscores in key names are allowed, but
highly discouraged.

The value associated with the key comes after the colon character.  All
whitespace after the colon and before the value is ignored.

Values, whether as part of a key-value pair or as a value within an array, may
optionally be terminated with a comma (character #x2C).  This comma **MUST** be
on the same line as the value it terminates; all whitespace except for a newline
(character #x0A) is ignored between the value and a comma.

As an alternative, multiple key-value pairs and values may exist on the same
line in both objects and arrays, but **MUST** be separated by commas.  If no
additional keys come after the final value, then the trailing comma (character
#x2C) is optional.

Empty values, and multiple commas in succession (regardless of whitespace
between them), are not allowed and must signal an error.

### Documents

The toplevel is called the "document".  A document must consist of a single
object or a single array.  When a document consists of an object, then the
toplevel braces (`{` and `}`) may optionally be omitted and it is assumed all
key-value pairs are part of the toplevel object.

Duplicate keys (more than one key with the same name at the same level) are
allowed, but later keys overrided earlier keys.  It's highly recommended that a
warning be presented in some way when duplicate keys at the same level are
detected, but this is not required.

## Changelog

* v1.1, 25 Sep 2025: Changed how escapes work so that it matches other
  languages.
