Diff
Not logged in

Differences From Artifact [79126e01f5]:

To Artifact [94299d7020]:


1
2
3
4
5
6
7

8
9
10
11

12
13
14
15
16
17
18
# Hashes: Fossil Artifact Identification

All artifacts in Fossil are identified by a unique hash, currently using
[the SHA3 algorithm by default][hpol], but historically using the SHA1
algorithm. Therefore, there are two full-length hash formats used by
Fossil:


| Algorithm | Raw Bits | Hex ASCII Bytes |
|-----------|----------|-----------------|
| SHA3-256  | 256      | 64              |
| SHA1      | 160      | 40              |


There are many types of artifacts in Fossil: commits (a.k.a. check-ins),
tickets, ticket comments, wiki articles, forum postings, file data
belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)).

There is a loose hierarchy of terms used instead of “hash” in various
parts of the Fossil UI, terms we try to use consistently, though we have







>
|
<
|
|
>







1
2
3
4
5
6
7
8
9

10
11
12
13
14
15
16
17
18
19
# Hashes: Fossil Artifact Identification

All artifacts in Fossil are identified by a unique hash, currently using
[the SHA3 algorithm by default][hpol], but historically using the SHA1
algorithm. Therefore, there are two full-length hash formats used by
Fossil:

<table border="1" cellspacing="0" cellpadding="10">
<tr><th>Algorithm<th>Raw Bits<th>Hexadecimal digits

<tr><td>SHA3-256<td>256<td>64
<tr><td>SHA1<td>160<td>40
</table>

There are many types of artifacts in Fossil: commits (a.k.a. check-ins),
tickets, ticket comments, wiki articles, forum postings, file data
belonging to check-ins, etc. ([More info...](./concepts.wiki#artifacts)).

There is a loose hierarchy of terms used instead of “hash” in various
parts of the Fossil UI, terms we try to use consistently, though we have
45
46
47
48
49
50
51
52
53
54
55

56
57

58


59




60



61
62






63

64
65
66
67
68
69
70
71
72
73

74
75
76
77
78
79
80
81


82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
A unique prefix of a VERSION hash is itself a VERSION. That is, if your
repository has exactly one commit artifact with a hash prefix of
“abc123”, then that is a valid version string as long as it remains
unambiguous.



## <a id="uvh"></a>UUIDs: An Unfortunate Historical Artifact

Historically, Fossil incorrectly used the term “[UUID][uuid]” where it
should use the term “artifact hash” instead. There are two primary

problems with miscalling Fossil artifact hashes UUIDs:


1. UUIDs are always 128 bits in length — 32 hex ASCII bytes — making


   them shorter than any actual Fossil artifact hash.








2. Artifact hashes are necessarily highly pseudorandom blobs, but only
   [version 4 UUIDs][v4] are pseudorandom in the same way. Other UUID






   types have non-random meanings for certain subgroups of the bits,

   restrictions that Fossil artifact hashes do not meet.

Therefore, no Fossil hash can ever be a proper UUID.

Nevertheless, there are several places in Fossil where we still use the
term UUID, primarily for backwards compatibility:


### Repository DB Schema


Almost all of these uses flow from the `blob.uuid` table column. This is
a key lookup column in the most important persistent Fossil DB table, so
it influences broad swaths of the Fossil internals.

Someday we may rename this column and those it has influenced (e.g.
`purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`) by making Fossil
detect the outdated schema and silently upgrade it, coincident with
updating all of the SQL in Fossil that refers to these columns. Until


then, Fossil will continue to have “UUID” all through its internals.

In order to avoid needless terminology conflicts, Fossil code that
refers to these misnamed columns also uses some variant of “UUID.” For
example, C code that refers to SQL result data on `blob.uuid` usually
calls the variable `zUuid`. Another example is the internal function
`uuid_to_rid()`. Until and unless we decide to rename these DB columns,
we will keep these associated internal identifiers unchanged.

You may have local SQL code that digs into the repository DB using these
column names. If so, be warned: we are not inclined to consider
existence of such code sufficient reason to avoid renaming the columns.
The Fossil repository DB schema is not considered an external user
interface, and internal interfaces are subject to change at any time. We
suggest switching to a more stable API: the JSON API, `/timeline.rss`,
TH1, etc.

There are also some temporary tables that misuse “UUID” in this way.
(`description.uuid`, `timeline.uuid`, `xmark.uuid`, etc.) There’s a good
chance we’ll fix these before we fix the on-disk DB schema since no
other code can depend on them.


### TH1 Scripting Interfaces

Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some
kind of hash. For example, the `$tkt_uuid` variable, available via TH1
when [customizing Fossil’s ticket system][ctkt].

Because this is considered a public programming interface, we are
unwilling to unilaterally rename such TH1 variables, even though they
are wrong.” For now, we are simply documenting the misuse. Later, we
may provide a parallel interface — e.g. `$tkt_hash` in this case — and
drop mention of the old interface from the documentation, but still
support it.


### JSON API Parameters and Outputs

The JSON API frequently misuses the term “UUID” in the same sort of way,
most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the
prior case, we can’t fix these without breaking code that uses the JSON
API as originally designed, so our solutions are the same: document the
misuse here for now, then possibly provide a backwards-compatible fix
later.


### `manifest.uuid`

If you have [the `manifest` setting][mset] enabled, Fossil writes a file
called `manifest.uuid` at the root of the check-out tree containing the
commit hash for the current checked-out version. Because this is a
public interface, we are unwilling to rename the file for correctness.


[cin]:  ./checkin_names.wiki
[ctkt]: ./custom_ticket.wiki
[hpol]: ./hashpolicy.wiki
[jart]: ./json-api/api-artifact.md
[jtim]: ./json-api/api-timeline.md
[mset]: /help?cmd=manifest
[tvb]:  ./branching.wiki
[uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
[v4]:   https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)







|

|
|
>
|

>
|
>
>
|
>
>
>
>

>
>
>
|
<
>
>
>
>
>
>
|
>
|

|
|
<
<




>
|



|
|
|
|
>
>
|


|


|
|









<
<
<
<
<









|
|
<
<




|



|
<

















<
<
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

74
75
76
77
78
79
80
81
82
83
84
85


86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117





118
119
120
121
122
123
124
125
126
127
128


129
130
131
132
133
134
135
136
137

138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154


A unique prefix of a VERSION hash is itself a VERSION. That is, if your
repository has exactly one commit artifact with a hash prefix of
“abc123”, then that is a valid version string as long as it remains
unambiguous.



## <a id="uvh"></a>Unconventional Use Of The Term "UUID"

"UUID" is an acronym for "Univerially Unique Identifier".  Hashes
generated by SHA1 or SHA3-256 are universally unique (in practice,
if not in theory) and they identify a particular artifact, and so
it seems reasonable to refer to artifact hashes as UUIDs.

However, the term UUID has acquired a much stricter meaning than its
name alone implies.  Purists insist that UUIDs must be *exactly* 128 bits,
that they must be displayed in a particular hexadecimal format that includes
dashes at proscribed intervals, and that they must have four particular bits
set aside to indicate the "type" of UUID.  Fossil artifact hashes do not
comply with any of these supplemental requirements, and so are not UUIDs
in the strictest sense of the word.  But the artifact hashes in Fossil are
literally "univerally unique identifiers", and so they are sometimes
called "UUIDs" anyhow.

Some readers are greatly annoyed by Fossil's use of "UUID" in its most
literal sense.  To those readers, the designer apologizes, and seeks your
mercy by noting that when the term "UUID" first began to be used by Fossil,
only SHA1 was supported and so all the artifact hashes were 128 bits, making

them close to, if not exactly, in compliance with the rigid definition
of the term.  For his misuse of the term "UUID", the designer has been
frequently rebuked.
Some efforts have been made, over the ensuing years, to avoid and replace
"UUID" in newer code and documentation. 
But it does not seem like such a serious issue as to require an immediate
purge of the term from existing documentation, code, and database schemas,
as some have suggested.  Hence, the unconventional use of the term "UUID"
lingers on in Fossil.  Let new readers beware.

Places where the non-conforming use of "UUID" persists in Fossil are
discussed in the sequel.




### Repository DB Schema

Almost all remaining uses of the term "UUID" in Fossil derive
from the `blob.uuid` table column. This is
a key lookup column in the most important persistent Fossil DB table, so
it influences broad swaths of the Fossil internals.

It is theoretically possible to rename this column and those it has 
influenced (e.g. `purgeitem.uuid`, `shun.uuid`, and `ticket.tkt_uuid`)
by making Fossil detect the outdated schema and silently upgrade it, 
coincident with updating all of the SQL in Fossil that refers to these
columns.  But that is a large and error-prone edit that does
serve any pressing need, and so is unlikely to happen any time soon.
Hence, Fossil will likely continue to have “UUID” all through its internals.

In order to avoid needless terminology conflicts, Fossil code that
refers to these columns also uses some variant of “UUID.” For
example, C code that refers to SQL result data on `blob.uuid` usually
calls the variable `zUuid`. Another example is the internal function
`uuid_to_rid()`. Until and unless the columns are renamed, 
these associated function names will likely also go unchanged.

You may have local SQL code that digs into the repository DB using these
column names. If so, be warned: we are not inclined to consider
existence of such code sufficient reason to avoid renaming the columns.
The Fossil repository DB schema is not considered an external user
interface, and internal interfaces are subject to change at any time. We
suggest switching to a more stable API: the JSON API, `/timeline.rss`,
TH1, etc.







### TH1 Scripting Interfaces

Some [TH1](./th1.md) interfaces use “UUID” where they actually mean some
kind of hash. For example, the `$tkt_uuid` variable, available via TH1
when [customizing Fossil’s ticket system][ctkt].

Because this is considered a public programming interface, we are
unwilling to unilaterally rename such TH1 variables, even though they
are "wrong". For now, we are simply documenting the unconventional
terminology.




### JSON API Parameters and Outputs

The JSON API frequently uses the term “UUID” in the same sort of way,
most commonly in [artifact][jart] and [timeline][jtim] APIs. As with the
prior case, we can’t fix these without breaking code that uses the JSON
API as originally designed, so our solutions are the same: document the
unconventional usage.



### `manifest.uuid`

If you have [the `manifest` setting][mset] enabled, Fossil writes a file
called `manifest.uuid` at the root of the check-out tree containing the
commit hash for the current checked-out version. Because this is a
public interface, we are unwilling to rename the file for correctness.


[cin]:  ./checkin_names.wiki
[ctkt]: ./custom_ticket.wiki
[hpol]: ./hashpolicy.wiki
[jart]: ./json-api/api-artifact.md
[jtim]: ./json-api/api-timeline.md
[mset]: /help?cmd=manifest
[tvb]:  ./branching.wiki