Brush

Check-in [1a694ab3c2]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:A few updates
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 1a694ab3c2f07227ee792d5bb6b894cc5024802b
User & Date: andy 2021-12-26 17:42:20.948
Context
2021-12-27
00:40:47
Begin updating grammar Leaf check-in: ed36f3fb28 user: andy tags: trunk
2021-12-26
17:42:20
A few updates check-in: 1a694ab3c2 user: andy tags: trunk
17:42:09
Roll back one overzealous instance of $$ check-in: 7446a62a1f user: andy tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to doc/concepts.md.
27
28
29
30
31
32
33
34
35
36
37
38
39
40

41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

80
81
82
83

84
85
86
87
88



89
90
91
92
93
94
95
96
97
98
99
 - [Script](#script)
 - [Substitution](#substitution)
 - [Indexing](#indexing)
 - [Expression](#expression)

# <a name="word"></a> Word <a href="#table_of_contents" style="font-size: small">[top]</a>

The fundamental data unit is the word.

All of the following are words:

- value of a variable
- argument to a command
- return value of a command

- result of an expression
- component of a compound word
- any value at all, anywhere

Conceptually, words are immutable, and all words are strings.  Any attempt to
change a word's value really only replaces it with a new word.

To improve performance, unshared words can be directly modified in place, and
their implementation is optimized according to type, where a word's type is
determined by how the word is used.  These internal details are visible only at
the C API level, not the script level.

The term "word" was chosen because of the analogy to machine words.  A machine
word is typically operated on as a unit, yet can be split into its constituent
bits and bytes.  A Brush word is likewise typically operated on as a unit, yet
can be divided.  The difference between machine words and Brush words is that
machine words are a fixed size whereas Brush words are variable and can in fact
contain other words.  Another analogy is to natural language.  Words initially
seem to be the atomic building blocks of sentences, yet upon further examination
they are revealed to be made up of letters, morphemes, syllables, and stems.

## <a name="word_type"></a> Word type <a href="#table_of_contents" style="font-size: small">[top]</a>

All words are strings, but that is simply the common denominator between all
types.  Many specialized types exist.  Word type is a flexible concept, and it
varies freely throughout the execution of a program.

Some examples of word types:

- string
- integer
- blob
- real number
- reference
- glob expression
- regular expression
- list
- set
- map


## <a name="string"></a> String <a href="#table_of_contents" style="font-size: small">[top]</a>

A string is a sequence of Unicode characters.


Internally, strings are encoded using UTF-8 with two modifications:

- NUL is represented as `0xc0 0x80`
- `0x00` is appended




Aside from NUL as described above, denormalized characters are not allowed.

Surrogate pairs are not used.

## <a name="blob"></a> Blob <a href="#table_of_contents" style="font-size: small">[top]</a>

A blob is a sequence of arbitrary 8-bit bytes.  "Blob" is short for "binary
large object", though of course blobs can be any size.

## <a name="reference"></a> Reference <a href="#table_of_contents" style="font-size: small">[top]</a>







|

|




>





|

















|
|
|

|











>



|
>



|
|
>
>
>



|







27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
 - [Script](#script)
 - [Substitution](#substitution)
 - [Indexing](#indexing)
 - [Expression](#expression)

# <a name="word"></a> Word <a href="#table_of_contents" style="font-size: small">[top]</a>

In the Brush programming language, the fundamental data unit is the word.

All of the following are examples of words:

- value of a variable
- argument to a command
- return value of a command
- a number
- result of an expression
- component of a compound word
- any value at all, anywhere

Conceptually, words are immutable, and all words are strings.  Any attempt to
change a word's value only replaces it with a new word.

To improve performance, unshared words can be directly modified in place, and
their implementation is optimized according to type, where a word's type is
determined by how the word is used.  These internal details are visible only at
the C API level, not the script level.

The term "word" was chosen because of the analogy to machine words.  A machine
word is typically operated on as a unit, yet can be split into its constituent
bits and bytes.  A Brush word is likewise typically operated on as a unit, yet
can be divided.  The difference between machine words and Brush words is that
machine words are a fixed size whereas Brush words are variable and can in fact
contain other words.  Another analogy is to natural language.  Words initially
seem to be the atomic building blocks of sentences, yet upon further examination
they are revealed to be made up of letters, morphemes, syllables, and stems.

## <a name="word_type"></a> Word type <a href="#table_of_contents" style="font-size: small">[top]</a>

All words are strings, but that is simply because strings are the common
denominator between all types.  Many specialized types exist.  Word type is a
flexible concept, and it varies freely throughout the execution of a program.

Here is a list giving some examples of word types:

- string
- integer
- blob
- real number
- reference
- glob expression
- regular expression
- list
- set
- map
- script

## <a name="string"></a> String <a href="#table_of_contents" style="font-size: small">[top]</a>

A string is a sequence of zero or more Unicode characters.  Unicode characters
have 21-bit code points ranging from 0 through hexadecimal `0x1fffff`.

Internally, strings are encoded using UTF-8 with two modifications:

- NUL (code point 0) is represented as the two-byte sequence `0xc0 0x80`
- `0x00` is appended to the end of each string

These two modifications make encoded strings backward compatible with classic
NUL-terminated strings, even if the string contains embedded NULs.

Aside from NUL as described above, denormalized characters are not allowed.

Surrogate pairs and UTF-16 are not used.

## <a name="blob"></a> Blob <a href="#table_of_contents" style="font-size: small">[top]</a>

A blob is a sequence of arbitrary 8-bit bytes.  "Blob" is short for "binary
large object", though of course blobs can be any size.

## <a name="reference"></a> Reference <a href="#table_of_contents" style="font-size: small">[top]</a>
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123





124
125
126
127
128
129





130
131
132
133
134
135
136
137

138
139
140
141
142
143
144
145
- vector
- range
- stride

## <a name="list"></a> List <a href="#table_of_contents" style="font-size: small">[top]</a>

A list is a compound word containing zero or more component words in a linear
sequence.  Words are addressed by their zero-based numerical index, which is
known as vectored indexing.

## <a name="set"></a> Set <a href="#table_of_contents" style="font-size: small">[top]</a>

A set is a list that is being accessed with the \[set\] command.  Sets provide
fast exact-match searches, known as keyed indexing.  Behind the scenes, a
critbit tree is used to optimize access.






## <a name="map"></a> Map <a href="#table_of_contents" style="font-size: small">[top]</a>

A map is a list containing an even number of words, alternating between key
words and their associated value words.  The map commands and keyed indexing
operators are used to rapidly perform exact-match searches over the key words.






# <a name="object"></a> Object <a href="#table_of_contents" style="font-size: small">[top]</a>

An object is a map with any number of variable keys and an optional attributes
key.

The variable keys associate variable names with references to their value words.
Empty string is not an allowed variable name, nor can variable names begin with

any of the five special scope search prefix characters `'!:.^`.

The attributes key word is empty string, and its value word is a map associating
various attribute names with values.  The `type` attribute determines the object
type, and other attributes vary by type.  Custom object types can be defined.
The attributes key may be omitted, in which case the object is simply a scope.

## <a name="scope"></a> Scope <a href="#table_of_contents" style="font-size: small">[top]</a>







|




|

|
>
>
>
>
>




|

>
>
>
>
>








>
|







115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
- vector
- range
- stride

## <a name="list"></a> List <a href="#table_of_contents" style="font-size: small">[top]</a>

A list is a compound word containing zero or more component words in a linear
sequence.  Words are addressed by their zero-based numerical index.  This is
known as vectored indexing.

## <a name="set"></a> Set <a href="#table_of_contents" style="font-size: small">[top]</a>

A set is a list that is being accessed with the `[set]` commands.  Sets provide
fast exact-match searches, known as keyed indexing.  Behind the scenes, a
critbit tree is used to optimize access and provide other fast operations such
as sorting, minimum, and maximum.

Sets cannot contain duplicate keys.  Should a list containing duplicate words be
accessed via the `[set]` commands, duplicates will be ignored, and only the
final instance of each duplicate word will be treated as a key in the set.

## <a name="map"></a> Map <a href="#table_of_contents" style="font-size: small">[top]</a>

A map is a list containing an even number of words, alternating between key
words and their associated value words.  The `[map]` commands and keyed indexing
operators are used to rapidly perform exact-match searches over the key words.

As with sets, if a map contains duplicate keys, only the final (highest-indexed)
instance of any given duplicate key is accessible via `[map]` or keyed indexing
operators.  Earlier duplicates will continue to be present in the list and
string representations but will be ignored by all map accesses.

# <a name="object"></a> Object <a href="#table_of_contents" style="font-size: small">[top]</a>

An object is a map with any number of variable keys and an optional attributes
key.

The variable keys associate variable names with references to their value words.
Empty string is not an allowed variable name, nor can variable names begin with
a digit `0-9`.  Variable names may consist of ASCII characters `0-9a-zA-Z_` as
well as any non-ASCII characters (i.e. code points 128 and greater).

The attributes key word is empty string, and its value word is a map associating
various attribute names with values.  The `type` attribute determines the object
type, and other attributes vary by type.  Custom object types can be defined.
The attributes key may be omitted, in which case the object is simply a scope.

## <a name="scope"></a> Scope <a href="#table_of_contents" style="font-size: small">[top]</a>
181
182
183
184
185
186
187


188
189
190
191
192
193
194

A task is a paused command invocation.

Threads appear to execute simultaneously but (on single-core systems) may
actually be taking turns, preemptively and automatically scheduled by the
operating system.  In contrast, tasks expressly take turns and are cooperatively
and manually scheduled by the Brush program.



## <a name="thread"></a> Thread <a href="#table_of_contents" style="font-size: small">[top]</a>

A thread is a simultaneous execution sequence.

Threads are compartmentalized, and each thread runs in its own interpreter.
Thus, threads are a specialized form of interpreter.







>
>







198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213

A task is a paused command invocation.

Threads appear to execute simultaneously but (on single-core systems) may
actually be taking turns, preemptively and automatically scheduled by the
operating system.  In contrast, tasks expressly take turns and are cooperatively
and manually scheduled by the Brush program.

Tasks are also known as coroutines because they are cooperating subroutines.

## <a name="thread"></a> Thread <a href="#table_of_contents" style="font-size: small">[top]</a>

A thread is a simultaneous execution sequence.

Threads are compartmentalized, and each thread runs in its own interpreter.
Thus, threads are a specialized form of interpreter.
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
subsequent words are the arguments to the command.

There are numerous supported ways of typing words in a script.  Each method is
known as a word constructor.  Most word constructors produce a single word, but
the expansion and comment constructors produce multiple or zero words,
respectively.  The first character of each word determines its word constructor.

:Syntax     |:Type   |:Comment
-------------------------------------------------------------------------------
**x**         | Bare   | Allows substitution, treats whitespace as a delimiter
`"`**x**`"`   | Quoted | Allows substitution, inhibits whitespace processing
`{`**x**`}`   | Braced | Inhibits both substitution and whitespace processing
`[`**x**`]`   | Script | Allows nesting, value is the result of the script
`(`**x**`)`   | List |Allows substitution and nesting, preserves word boundaries
`&`**x**      | Reference      | Creates a reference to a variable







|







243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
subsequent words are the arguments to the command.

There are numerous supported ways of typing words in a script.  Each method is
known as a word constructor.  Most word constructors produce a single word, but
the expansion and comment constructors produce multiple or zero words,
respectively.  The first character of each word determines its word constructor.

:Syntax       |:Type   |:Comment
-------------------------------------------------------------------------------
**x**         | Bare   | Allows substitution, treats whitespace as a delimiter
`"`**x**`"`   | Quoted | Allows substitution, inhibits whitespace processing
`{`**x**`}`   | Braced | Inhibits both substitution and whitespace processing
`[`**x**`]`   | Script | Allows nesting, value is the result of the script
`(`**x**`)`   | List |Allows substitution and nesting, preserves word boundaries
`&`**x**      | Reference      | Creates a reference to a variable
255
256
257
258
259
260
261
262
263
264
265
266

267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298

299
300
301
302
303
304
305
within some parts of reference words or other substitutions.

:Syntax        |:Type                          |:Comment
--------------------------------------------------------------------------------
`$`**x**       | Simple variable substitution  | Literal variable name
`$"`**x**`"`   | Computed variable substitution| Name allows nested substitution
`${`**x**`}`   | Expression substitution       |
`$$"`**x**`"`  | Index-ready quoted word       | Allows nested substitution
`$${`**x**`}`  | Index-ready braced word       | Inhibits nested substitution
`$$(`**x**`)`  | Index-ready list word         | Keeps internal word boundaries
`$$[`**x**`]`  | Index-ready script substitution | Shorthand for `$$"[`**x**`]"`
`$[`**x**`]`   | Script substitution           |

`\`**x**       | Backslash substitution        | **x** is `[abefnrtv]`
`\`**x**       | Backslash quoting             | **x** is `[^abefnrtuvx0-7\n]`
`\`**nlws**    | Line wrap                     |
`\`**o**       | Octal 3-bit character         | **o** is `[0-7]`
`\`**oo**      | Octal 6-bit character         | **o** is `[0-7]`
`\`**Ooo**     | Octal 8-bit character      | **O** is `[0-3]`, **o** is `[0-7]`
`\x`**hh**     | Hexadecimal 8-bit character   | **h** is `[0-9a-fA-F]`
`\u`**hhhhhh** | Hexadecimal 21-bit character  | **h** is `[0-9a-fA-F]`

In the above table, "**nlws**" refers to a newline followed by any number of
non-newline whitespace characters.  When `\`**nlws** appears within
`"`quotes`"`, it is replaced with a single space.  Otherwise, it is treated as a
word delimiter.

Computed variable substitution is also used to protect special characters in the
variable name that would otherwise be interpreted specially: `^'*!.(){}@`.  Any
other special character may also be protected by preceding it with `\`backslash.

All substitutions starting with `$` permit indexing, described in the next
section.

The `$$` forms allows index operators to be applied to arbitrary words, even
words including other substitutions, as well as to the result of a script
substitution.  This feature allows index operators to be used without storing
values in single-use temporary variables.

The backslash substitution replacements are listed below:

:Sequence |:Replacement |:Description
---------------------------------------------------
`\a`      | `\x07`      | Audible alert
`\b`      | `\x08`      | Backspace

`\e`      | `\x1b`      | Escape
`\f`      | `\x0c`      | Form feed
`\n`      | `\x0a`      | Line feed, a.k.a. newline
`\r`      | `\x0d`      | Carriage return
`\t`      | `\x09`      | Horizontal tab
`\v`      | `\x0b`      | Vertical tab








<
<
<
<

>
|






|






<
<
<
<



<
<
<
<
<






>







274
275
276
277
278
279
280




281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296




297
298
299





300
301
302
303
304
305
306
307
308
309
310
311
312
313
within some parts of reference words or other substitutions.

:Syntax        |:Type                          |:Comment
--------------------------------------------------------------------------------
`$`**x**       | Simple variable substitution  | Literal variable name
`$"`**x**`"`   | Computed variable substitution| Name allows nested substitution
`${`**x**`}`   | Expression substitution       |




`$[`**x**`]`   | Script substitution           |
`$(`**x y z**`)`|List substitution             |
`\`**x**       | Backslash substitution        | **x** is `[abBefnrtv]`
`\`**x**       | Backslash quoting             | **x** is `[^abefnrtuvx0-7\n]`
`\`**nlws**    | Line wrap                     |
`\`**o**       | Octal 3-bit character         | **o** is `[0-7]`
`\`**oo**      | Octal 6-bit character         | **o** is `[0-7]`
`\`**Ooo**     | Octal 8-bit character      | **O** is `[0-3]`, **o** is `[0-7]`
`\x`**hh**     | Hexadecimal 8-bit character   | **h** is `[0-9a-fA-F]`
`\u`**Hhhhhh** | Hexadecimal 21-bit character| **H** is `[01]`, **h** is `[0-9a-fA-F]`

In the above table, "**nlws**" refers to a newline followed by any number of
non-newline whitespace characters.  When `\`**nlws** appears within
`"`quotes`"`, it is replaced with a single space.  Otherwise, it is treated as a
word delimiter.





All substitutions starting with `$` permit indexing, described in the next
section.






The backslash substitution replacements are listed below:

:Sequence |:Replacement |:Description
---------------------------------------------------
`\a`      | `\x07`      | Audible alert
`\b`      | `\x08`      | Backspace
`\B`      | `\x5c`      | Backslash
`\e`      | `\x1b`      | Escape
`\f`      | `\x0c`      | Form feed
`\n`      | `\x0a`      | Line feed, a.k.a. newline
`\r`      | `\x0d`      | Carriage return
`\t`      | `\x09`      | Horizontal tab
`\v`      | `\x0b`      | Vertical tab