Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
| Comment: | initial ports of static .html to static /doc .wiki |
|---|---|
| Downloads: | Tarball | ZIP archive |
| Timelines: | family | ancestors | descendants | both | trunk |
| Files: | files | file ages | folders |
| SHA1: |
d87ca60c58c949a020ca1be8653079e9 |
| User & Date: | stephan 2008-05-15 20:25:46.000 |
Context
|
2008-05-15
| ||
| 20:26 | minor refactorings to the wiki commands check-in: bfb4d414dd user: stephan tags: trunk | |
| 20:25 | initial ports of static .html to static /doc .wiki check-in: d87ca60c58 user: stephan tags: trunk | |
| 16:58 | Add the "/doc" method on the server. check-in: 7351b6346d user: drh tags: trunk | |
Changes
Added www/build.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | <h1>Installing Fossil</h1> <p>This page describes how to build and install Fossil. The whole process is designed to be very easy.</p> <h2>0.0 Using A Precompiled Binary</h2> <p>You can skip steps 1.0 and 2.0 below by downloading a <a href="http://www.fossil-scm.org/download.html">FIXME precompiled binary</a> appropriate for your platform. If you use a precompiled binary jump immediate to step 3.0.</p> <h2>1.0 Obtaining The Source Code</h2> <p>Fossil is self-hosting, so you can obtain a ZIP archive containing a snapshot of the latest version directly from fossil's own fossil repository. Follow these steps:</p> <ol> <li><p>Pointer your webbrowser at <a href="http://www.fossil-scm.org/fossil/login"> http://www.fossil-scm.org/fossil/login</a>.</p></li> <li><p>Log in as anonymous. The password is shown on screen. The reason for requiring this login is to prevent spiders from walking the entire website, downloading ZIP archives of every historical version, and thereby soaking up all our bandwidth.</p></li> <li><p>Click on the <a href="http://www.fossil-scm.org/fossil/timeline">Timeline</a> or <a href="http://www.fossil-scm.org/fossil/leaves">Leaves</a> link at the top of the page.</p></li> <li><p>Select a version of of fossil you want to download. Click on its link. Note that you must successfully log in as "anonymous" in step 1 above in order to see the link to the detailed version information.</p></li> <li><p>On the version information page, click on the "Zip Archive" link. This link will build a ZIP archive of the complete source code and download it to your browser.</p></li> </ol> <h2>2.0 Compiling</h2> <p>Follow these steps to compile:</p> <ol> <li value="6"> <p>Create a directory to hold the source code. Then unzip the ZIP archive you downloaded into that directory. You should be in the top-level folder of that directory</p></li> <li><p><b>(Optional:)</b> Edit the Makefile to set it up like you want. You probably do not need to do anything. Do not be intimidated: There are only 5 variables in the makefile that can be changed. The whole Makefile is only a few dozen lines long and most of those lines are comments.</p> <li><p>Type "<b>make</b>" </ol> <h2>3.0 Installing</h2> <ol> <li value="9"> <p>The finished binary is named "fossil". Put this binary in a directory that is somewhere on your PATH environment variable. It does not matter where.</p> <li> <p><b>(Optional:)</b> To uninstall, just delete the binary.</p> </ol> |
Added www/concepts.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 |
<h1 align="center">
Fossil Concepts
</h1>
<h2>1.0 Introduction</h2>
<p>
<a href="index.html">Fossil</a> is a
<a href="http://en.wikipedia.org/wiki/Software_configuration_management">
software configuration management</a> system.
Fossil is software that is designed to control and track the
development of a software project and to record the history
of the project.
There are many such systems in use today. Fossil strives to
distinguish itself from the others by being extremely simple
to setup and operate.</p>
<p>This document is intended as a quick introduction to the concepts
behind fossil.</p>
<h2>2.0 Composition Of A Project</h2>
<img src="concept1.gif" align="right" hspace="10">
<p>A software project normally consists of a "source tree".
A source tree is a hierarchy of files that are used to generate
the end product. The source tree changes over time as the
software grows and expands and as features are added and bugs
are fixed. A snapshot of the source tree at any point in time
is called a "version" or "revision" or a "baseline" of the product.
In fossil, we use the name "baseline".</p>
<p>A "repository" is a database that contains copies of all historical
versions or baselines for a project. Baselines are normally stored in the
repository in a highly space-efficient compressed format (delta encoding).
But that is an implementation detail that you the user need not worry over.
Think of the repository as a safe place where all your old baselines are
securely stored away and available for retrieval whenever you need
them.</p>
<p>A repository in fossil is a single file on your disk. This file
might be rather large (dozens or hundreds of megabytes for a large
or long running project) but it is nevertheless just a file. You
can move it around, rename it, write it out to a memory stick, or
do anything else you normally do with files.</p>
<p>Each source tree that is controlled by fossil is associated with
a single repository on the local disk drive. You can tie two or more
source trees to a single repository if you want (though one
tree per repository is the most common configuration.) So a
single repository can be associated with many source trees, but
each source tree is associated with only one repository.</p>
<p>Fossil source trees may not overlap. A fossil source tree is identified
by a file named "_FOSSIL_" in the root directory of the source tree. Every
file that is a sibling of _FOSSIL_ and every file in every subfolder is
considered potentially a part of the source tree. The _FOSSIL_ file
contains (among other things) the pathname of the repository with which
the source tree is associated. On the other hand, the repository has
no record of its source trees. So you are free to delete a source tree
or move it around without consequence. But if you move or rename or
delete a repository, then any source trees associated with that repository
will no longer be able to locate their repository and will stop working.</p>
<p>When multiple developers are working on the same project, each
developer typically has his or her own local repository and an associated
source tree in which to work. Developers share their work by
"syncing" the content of their local repositories either directly
or through a central server. Changes can "push" from the local
repository into a remote repository. Or changes can "pull" from a
remote repository into a local repository. Or one can do a "sync"
which is a shortcut for doing both a push and a pull at the same time.
Fossil also has the concept of "cloning". A "clone" is like a "pull",
except that instead of beginning with an existing local repository,
a clone begins with nothing and creates a new local repository that
is a duplicate of a remote repository.</p>
<p>Communication between repositories is via HTTP. Remote
repositories are identified by URL. You can also point a webbrowser
at a repository and get human-readable status, history, and tracking
information about the project.</p>
<h3>2.1 Identification Of Artifacts</h3>
<p>A particular version of a particular file is called an "artifact".
Each artifact has a universally unique name which is the
<a href="http://en.wikipedia.org/wiki/SHA">SHA1</a> hash of the content
of that file expressed as 40 characters of lower-case hexadecimal. Such
a hash is referred to as the Universally Unique Identifier or UUID
for the artifact. The SHA1 algorithm is created with the purpose of
providing a highly forgery-resistent identifier for a file. Given any
file it is simple to find the UUID for that file. But given a
UUID it is computationally intractable to generate a file that will
have that UUID.</p>
<p>UUIDs look something like this:</p>
<blockquote><b>
6089f0b563a9db0a6d90682fe47fd7161ff867c8<br>
59712614a1b3ccfd84078a37fa5b606e28434326<br>
19dbf73078be9779edd6a0156195e610f81c94f9<br>
b4104959a67175f02d6b415480be22a239f1f077<br>
997c9d6ae03ad114b2b57f04e9eeef17dcb82788
</b></blockquote>
<p>When referring to an artifact using fossil, you can use a unique
prefix of the UUID that is four characters or longer. This saves
a lot of typing. When displaying UUIDs, fossil will usually only
show the first 10 digits since that is normally enough to uniquely
identify a file.</p>
<p>Changing (or adding or removing) a single byte in a file results
in a completely different UUID. And since the UUID is the name of
the artifact, making any change to a file results in a new artifact.
In this way, artifacts are immutable.</p>
<p>A repository is really just an unordered collection of
artifacts. New artifacts can be added to the repository, but
existing artifacts can never be removed. Fossil is designed in
such a way that it can be handed a set of artifacts in any
order and it can figure out the relationship between those
artifacts and reconstruct the complete development history of
a software project.</p>
<h3>2.2 Manifests</h3>
<p>At the root of a source tree is a special file called the
"manifest". The manifest is a listing of all other files in
that source tree. The manifest contains the (complete) UUID
of the file and the name of the file as it appears on disk,
and thus serves as a mapping from UUID to disk name. The UUID
of the manifest is the UUID that identifies a baseline. When
you look at a "timeline" of changes in fossil, the UUID associated
with each check-in or commit is really just the UUID of the
manifest for that baseline.</p>
<p>Fossil automatically generates a manifest whenever you "commit"
a new baseline. So this is not something that you, the developer,
need to worry with. The format of a manifest is intentionally
designed to be simple to parse, so that if
you want to read and interpret a manifest, either by hand or
with a script, that is easy to do. But you will probably never
need to do so.</p>
<p>In addition to identifying all files in the baseline, a
manifest also contains a check-in comment, the date and time
when the baseline was established, who created the baseline,
and links to other baselines from which the current baseline
is derived. There is also a couple of checksums used to verify
the integrity of the baseline. And the whole manifest might
be PGP clearsigned.</p>
<h3>2.3 Key concepts</h3>
<ul>
<li>A <b>baseline</b> is a set of files arranged
in a hierarchy.</li>
<li>A <b>repository</b> keeps a record of historical baselines.</li>
<li>Repositories share their changes using <b>push</b>, <b>pull</b>,
<b>sync</b>, and <b>clone</b>.</li>
<li>A particular version of a particular file is an <b>artifact</b>
that is identified by a <b>UUID</b>.</li>
<li>Artifacts tracked by fossil are inherently immutable.</li>
<li>Fossil automatically generates a <b>manifest</b> file that identifies
every artifact in a baseline.</li>
<li>The UUID of the manifest is the UUID of the baseline.</li>
</ul>
<h2>3.0 Fossil - The Program</h2>
<p>Fossil is software. The implementation of fossil is in the form
of a single executable named "fossil". To install fossil on your system,
all you have to do is obtain a copy of this one executable file (either
by downloading a precompiled version or compiling it yourself) and then
putting that file somewhere on your PATH.</p>
<p>Fossil is completely self-contained. It is not necessary to
install any other software in order to use fossil. You do <u>not</u> need
CVS, gzip, diff, rsync, Python, Perl, Tcl, Java, apache, PostgreSQL, MySQL,
SQLite, patch, or any similar software on your system in order to use
fossil effectively. You will want to have some kind of text editor
for entering check-in comments. Fossil will use whatever text editor
is identified by your VISUAL environment variable. Fossil will also
use GPG to clearsign your manifests if you happen to have it installed,
but fossil will skip that step if GPG missing from your system.
You can optionally set up fossil to use external "diff" programs,
though a perfectly functional "diff" algorithm is built it and works
find for most people.</p>
<p>To uninstall fossil, simply delete the executable.</p>
<p>To upgrade an older version of fossil to a newer version, just
replace the old executable with the new one. You might need to
run a one-time command to restructure your repositories after
an upgrade. Check the instructions that come with the upgrade
for details.</p>
<p>To use fossil, simply type the name of executable in your
shell, followed by one of the various built-in commands and
arguments appropriate for that command. For example:</p>
<blockquote><b>
fossil help
</b></blockquote>
<p>In the next section, when we say things like "use the <b>help</b>
command" we mean to use the command name "help" as the first
token after the name of the fossil executable, as shown above.</p>
<h2>4.0 Workflow</h2>
<img src="concept2.gif" align="right" hspace="10">
<ol>
<li><p>
Establish a local repository using either the <b>new</b> command
to start a new project, or the <b>clone</b> command to make a clone
of a repository for an existing project.
</p></li>
<li><p>
Establish one or more source trees by changing your working directory
to where you want the root of the source tree to be, then issuing
the <b>open</b> command with the name of the repository file as its
argument.
</p></li>
<li><p>
Use the <b>update</b> command followed by a UUID to cause your
source tree to change to the baseline identified by that UUID.
The <b>timeline</b> or <b>leaves</b> commands might help you to
identify an appropriate baseline.
</p></li>
<li><p>
Edit the code. Add new files to the source tree using the <b>add</b>
command. Omit files from future baselines using the <b>rm</b> command.
(Even when you remove files from future baselines, those files continue
to exist in historical baselines.) Test your changes.
</p></li>
<li><p>
Create a new baseline using the <b>commit</b> command. You will be prompted
for a check-in comment and also for your GPG key if you have GPG installed.
The commit copies the edits you have made in your local source
tree into your local repository.
</p></li>
<li><p>
Share your changes with others using the <b>push</b> command.
Push causes the edits you committed into your local repository to be
pushed out into other repositories.
</p></li>
<li><p>
When your coworkers make their own changes, you can pull those changes
into your local repository using the <b>pull</b> command. Note that
the pull command only pulls the changes into your local repository,
not into your local source tree.
</p></li>
<li><p>
After the changes of others are in your local repository, you
can move them into your local source tree using <b>update</b>. If
you have made parallel
changes, you can merge your changes together with your coworkers changes
by do an <b>update</b> to your latest baseline, then doing a
<b>merge</b> with your coworkers latest baseline. After your
verify that the merged code is still functional, you can <b>commit</b>
a new baseline that contains both yours and your coworkers changes
and then push the new baseline back to your coworker.
</p></li>
<li><p>
Repeat all of the above until you have generated great software.
</p></li>
</ol>
<h3>4.1 Variations</h3>
<p>The <b>settings</b> lets you view and modify various operating
properties of fossil. Among the available settings is "autosync"
mode. When autosync is enabled, the push and pull of content from
your local server is largely automated. Whenever you use the <b>update</b>
command, fossil first does a <b>pull</b> to see if other users have
perhaps added new baselines to the central repository. When you
<b>commit</b>, fossil also does a <b>pull</b> and issues a warning
if your check-in would cause a fork. After a <b>commit</b>, fossil
automatically does a <b>push</b> to send your changes up to the
central server.</p>
<p>With autosync enabled, fossil works like
<a href="http://www.nongnu.org/cvs/">CVS</a> or
<a href="http://subversion.tigris.org/">Subversion</a>.
When autosync disabled, fossil works more like
<a href="http://monotone.ca/">Monotone</a>,
<a href="http://git.or.cz">GIT</a>, or
<a href="http://www.selenic.com/mercurial/wiki/">Mercurial</a>.
The fun thing about fossil is that it will work either
way, depending on your needs of the moment. You can freely switch
between these operating modes using commands like:</p>
<blockquote>
<b>fossil setting autosync off<br />
fossil setting autosync on</b>
</blockquote>
<p>For additional information about autosync and other settings
using the <b>help</b> command.</p>
<h2>5.0 Setting Up A Fossil Server</h2>
<p>With other configuration management software, setting up a server is
a lot of work and normally takes time, patience, and a lot of system
knowledge. Fossil is designed to avoid this frustration. Setting up
a server with fossil is ridiculously easy. You have three options:</p>
<ol>
<li><p><b>Setting up a stand-alone server</b></p>
<p>From within your source tree just use the <b>server</b> command and
fossil will start listening for incoming requests on TCP port 8080.
You can point your webbrowser at <a href="http://localhost:8080/">
http://localhost:8080/</a> and begin exploring. Or your coworkers
can do pushes or pulls against your server. Use the <b>--port</b>
option to the server command to specify a different TCP port. If
you do not have a local source tree, use the <b>-R</b> command-line
option to specify the repository file.</p>
<p>A stand-alone server is a great way to set of transient connections
between coworkers for doing quick pushes or pulls. But you can also
set up a permanent stand-alone server if you prefer. Just make
arrangements for fossil to be launched with appropriate arguments
after every reboot.</p>
</li>
<li><p><b>Setting up a CGI server</b></p>
<p>If you have a webserver running on your machine already, you can
set up fossil to be run from CGI. Simply create an executable script
that looks something like this:</p>
<blockquote><pre>
#!/usr/local/bin/fossil
repository: /home/me/bigproject.fossil
</pre></blockquote>
<p>Edit this script to use whatever pathnames are appropriate for
your project. Then point your webbrowser at the script and off you
go.</p></li>
<li><p><b>Setting up an inetd server</b></p>
<p>If you have inetd or xinetd running on your system, you can set
those services up to launch fossil to deal with inbound TCP/IP connections
on whatever port you want. Set up inetd or xinetd to launch fossil
like this:</p>
<blockquote><pre>
/usr/local/bin/fossil http /home/me/bigproject.fossil
</pre></blockquote>
<p>As before, change the filenames to whatever is appropriate for
your system. You can have fossil run as any user that has write
permission on the repository and on the directory that contains the
repository. But it is safer to run fossil as root. When fossil
sees that it is running as root, it automatically puts itself into
a <a href="http://en.wikipedia.org/wiki/Chroot">chroot jail</a> and
drops all privileges prior to reading any information from the client.
Since fossil is a stand-alone program, you do not need to put anything
in the chroot jail with fossil in order for it to do its job.</p>
</li>
</ol>
|
Added www/delta_encoder_algorithm.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | <h1 align="center"> Fossil Delta Encoding Algorithm </h1> <p>A key component for the efficient storage of multiple revisions of a file in fossil repositories is the use of delta-compression, i.e. to store only the changes between revisions instead of the whole file.</p> <p>This document describes the encoding algorithm used by Fossil to generate deltas. It is targeted at developers working on either <a href="index.html">fossil</a> itself, or on tools compatible with it. The exact format of the generated byte-sequences, while in general not necessary to understand encoder operation, can be found in the companion specification titled "<a href="delta_format.html">Fossil Delta Format</a>". </p> <p>The entire algorithm is inspired by <a href="http://samba.anu.edu.au/rsync/">rsync</a>.</p> <a name="argresparam"><h2>1.0 Arguments, Results, and Parameters</h2> <p>The encoder takes two byte-sequences as input, the "original", and the "target", and returns a single byte-sequence containing the "delta" which transforms the original into the target upon its application.</p> <p>Note that the data of a "byte-sequence" includes its length, i.e. the number of bytes contained in the sequence.</p> <p>The algorithm has one parameter named "NHASH", the size of the "sliding window" for the "rolling hash", in bytes. These two terms are explained in the next section. The value of this parameter has to be a power of two for the algorithm to work. For Fossil the value of this parameter is set to "16".</p> <a name="operation"><h2>2.0 Operation</h2> <p>The algorithm is split into three phases which generate the <a href="delta_format.html#header">header</a>, <a href="delta_format.html#slist">segment list</a>, and <a href="delta_format.html#trailer">trailer</a> of the delta, per its general <a href="delta_format.html#structure">structure</a>.</p> <p>The two phases generating header and trailer are not covered here as their implementation trivially follows directly from the specification of the <a href="delta_format.html">delta format</a>.</p> <p>This leaves the segment-list. Its generation is done in two phases, a pre-processing step operating on the "original" byte-sequence, followed by the processing of the "target" byte-sequence using the information gathered by the first step.</p> <a name="preprocessing"><h3>2.1 Preprocessing the original</h3> <p>A major part of the processing of the "target" is to find a range in the "original" which contains the same content as found at the current location in the "target".</p> <p>A naive approach to this would be to search the whole "original" for such content. This however is very inefficient as it would search the same parts of the "original" over and over. What is done instead is to sample the "original" at regular intervals, compute signatures for the sampled locations and store them in a hash table keyed by these signatures.</p> <p>That is what happens in this step. The following processing step can then the compute signature for its current location and then has to search only a narrow set of locations in the "original" for possible matches, namely those which have the same signature.</p> <p>In detail:</p> <ol> <li>The "original" is split into chunks of NHASH bytes. Note that a partial chunk of less than NHASH bytes at the end of "original" is ignored. </li> <li>The <a href="#rollhash">rolling hash</a> of each chunk is computed. </li> <li>A hashtable is filled, mapping from the hashes of the chunks to the list of chunk locations having this hash. </li> </ol> <a name="processing"><h3>2.1 Processing the target</h3> <p>This, the main phase of the encoder, processes the target in a loop from beginning to end. The state of the encoder is captured by two locations, the "base" and the "slide". "base" points to the first byte of the target for which no delta output has been generated yet, and "slide" is the location of the window used to look in the "origin" for commonalities. This window is NHASH bytes long.</p> <p>Initially both "base" and "slide" point to the beginning of the "target". In each iteration of the loop the encoder decides whether to <ul> <li>emit a single instruction to <a href="delta_format.html#copyrange">copy a range</a>, or </li> <li>emit two instructions, first to <a href="delta_format.html#insertlit">insert a literal</a>, then to <a href="delta_format.html#copyrange">copy a range</a>, or </li> <li>move the window forward one byte. </li> </ul> </p> <img src="encode10.gif" align="right" hspace="10"> <p>To make this decision the encoder first computes the hash value for the NHASH bytes in the window and then looks at all the locations in the "origin" which have the same signature. This part uses the hash table created by the pre-processing step to effiently find these locations.</p> <p>For each of the possible candidates the encoder finds the maximal range of bytes common to both "origin" and "target", going forward and backward from "slide" in the "target", and the candidate location in the "origin". This search is constrained on the side of the "target" by the "base" (backward search), and the end of the "target" (forward search), and on the side of the "origin" by the beginning and end of the "origin", respectively.</p> <p>There are input files for which the hash chains generated by the pre-processing step can become very long, leading to long search times and affecting the performance of the delta generator. To limit the effect such long chains can have the actual search for candidates is bounded, looking at most N candidates. Currently N is set to 250.</p> <p>From the ranges for all the candidates the best (= largest) common range is taken and it is determined how many bytes are needed to encode the bytes between the "base" and the end of that range. If the range extended back to the "base" then this can be done in a single copy instruction. Otherwise, i.e if there is a gap between the "base" and the beginning of the range then two instructions are needed, one to insert the bytes in the gap as a literal, and a copy instruction for the range itself. The general situation at this point can be seen in the picture to the right.</p> <p>If the number of bytes needed to encode both gap (if present), and range is less than the number of bytes we are encoding the encoder will emit the necessary instructions as described above, set "base" and "slide" to the end of the encoded range and start the next iteration at that point.</p> <p>If, on the other hand, the encoder either did not find candidate locations in the origin, or the best range coming out of the search needed more bytes to encode the range than there were bytes in the range, then no instructions are emitted and the window is moved one byte forward. The "base" is left unchanged in that case.</p> <p>The processing loop stops at one of two conditions: <ol> <li>The encoder decided to move the window forward, but the end of the window reached the end of the "target". </li> <li>After the emission of instructions the new "base" location is within NHASH bytes of end of the "target", i.e. there are no more than at most NHASH bytes left. </li> </ol> </p> <p>If the processing loop left bytes unencoded, i.e. "base" not exactly at the end of the "target", as is possible for both end conditions, then one last insert instruction is emitted to put these bytes into the delta.<p> <a name="exceptions"><h2>3.0 Exceptions</h2> <p>If the "original" is at most NHASH bytes long no compression of changes is possible, and the segment-list of the delta consists of a single literal which contains the entire "target".</p> <p>This is actually equivalent to the second end condition of the processing loop described in the previous section, just checked before actually entering the loop.</p> <a name="rollhash"><h2>4.0 The rolling hash</h2> <p>The rolling hash described below and used to compute content signatures was chosen not only for good hashing properties, but also to enable the easy (incremental) recalculation of its value for a sliding window, i.e. where the oldest byte is removed from the window and a new byte is shifted in.<p> <a name="rhdef"><h3>4.1 Definition</h3> <p>Assuming an array Z of NHASH bytes (indexing starting at 0) the hash V is computed via</p> <p align=center><table><tr><td> <p><img src="encode1.gif" align="center"></p> <p><img src="encode2.gif" align="center"></p> <p><img src="encode3.gif" align="center"></p> </td></tr></table></p> where A and B are unsigned 16-bit integers (hence the <u>mod</u>), and V is a 32-bit unsigned integer with B as MSB, A as LSB. <a name="rhincr"><h3>4.2 Incremental recalculation</h3> <p>Assuming an array Z of NHASH bytes (indexing starting at 0) with hash V (and components A and B), the dropped byte <img src="encode4.gif" align="center">, and the new byte <img src="encode5.gif" align="center"> , the new hash can be computed incrementally via: </p> <p align=center><table><tr><td> <p><img src="encode6.gif" align="center"></p> <p><img src="encode7.gif" align="center"></p> <p><img src="encode8.gif" align="center"></p> </td></tr></table></p> <p>For A, the regular sum, it can be seen easily that this the correct way recomputing that component.</p> <p>For B, the weighted sum, note first that <img src="encode4.gif" align="center"> has the weight NHASH in the sum, so that is what has to be removed. Then adding in <img src="encode9.gif" align="center"> adds one weight factor to all the other values of Z, and at last adds in <img src="encode5.gif" align="center"> with weight 1, also generating the correct new sum</p> |
Added www/delta_format.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
<h1 align="center">
Fossil Delta Format
</h1>
<p>A key component for the efficient storage of multiple revisions of
a file in fossil repositories is the use of delta-compression, i.e. to
store only the changes between revisions instead of the whole
file.</p>
<p>This document describes the format used to encode such changes,
also known as "delta". It is targeted at developers working on either
<a href="index.html">fossil</a> itself, or on tools compatible with
it.</p>
<a name="structure"><h2>1.0 Structure</h2>
<img src="delta1.gif" align="left" hspace="10">
<p>A delta consists of three parts, a "header", a "trailer", and a
"segment-list" between them.</p>
<p>Both header and trailer provide information about the target
helping the decoder, and the segment-list describes how the target can
be constructed from the original.</p>
<a name="header"><h3>1.1 Header</h3>
<img src="delta6.gif" align="left" hspace="10">
<p>The header consists of a single number followed by a newline
character (ASCII 0x0a). The number is the length of the target in
bytes.</p>
<p>This means that, given a delta, the decoder can compute the size of
the target (and allocate any necessary memory based on that) by simply
reading the first line of the delta and decoding the number found
there. In other words, before it has to decode everything else.</p>
<a name="trailer"><h3>1.2 Trailer</h3>
<img src="delta5.gif" align="left" hspace="10">
<p>The trailer consists of a single number followed by a semicolon (ASCII
0x3b). This number is a checksum of the target and can be used by a
decoder to verify that the delta applied correctly, reconstructing the
target from the original.</p>
<p>The checksum is computed by treating the target as a series of
32-bit integer numbers (MSB first), and summing these up, modulo
2^32-1. A target whose length is not a multiple of 4 is padded with
0-bytes (ASCII 0x00) at the end.</p>
<p>By putting this information at the end of the delta a decoder has
it available immediately after the target has been reconstructed
fully.</p>
<a name="slist"><h3>1.3 Segment-List</h3>
<img src="delta2.gif" align="left" hspace="10">
<p>The segment-list of a delta describes how to create the target from
the original by a combination of inserting literal byte-sequences and
copying ranges of bytes from the original. This is there the
compression takes place, by encoding the large common parts of
original and target in small copy instructions.</p>
<p>The target is constructed from beginning to end, with the data
generated by each instruction appended after the data of all previous
instructions, with no gaps.</p>
<a name="insertlit"><h4>1.3.1 Insert Literal</h4>
<p>A literal is specified by two elements, the size of the literal in
bytes, and the bytes of the literal itself.</p>
<img src="delta4.gif" align="left" hspace="10">
<p>The length is written first, followed by a colon character (ASCII
0x3a), followed by the bytes of the literal.</p>
<a name="copyrange"><h4>1.3.2 Copy Range</h4>
<p>A range to copy is specified by two numbers, the offset of the
first byte in the original to copy, and the size of the range, in
bytes. The size zero is special, its usage indicates that the range
extends to the end of the original.</p>
<img src="delta3.gif" align="left" hspace="10">
<p>The length is written first, followed by an "at" character (ASCII
0x40), then the offset, followed by a comma (ASCII 0x2c).</p>
<a name="intcoding"><h2>2.0 Encoding of integers</h2>
<p>
The format currently handles only 32 bit integer numbers. They are
written base-64 encoded, MSB first, and without leading
"0"-characters, except if they are significant (i.e. 0 => "0").
</p>
<p>
The base-64 coding is described in
<a href="http://www.ietf.org/rfc/rfc3548.txt">RFC 3548</a>.
</p>
<a name="examples"><h2>3.0 Examples</h2>
<a name="examplesint"><h3>3.1 Integer encoding</h3>
<table border=1>
<tr>
<th>Value</th>
<th>Encoding</th>
</tr>
<tr>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6246</td>
<td>1Xb</td>
</tr>
<tr>
<td>-1101438770</td>
<td>2zMM3E</td>
</tr>
</table>
<a name="examplesdelta"><h3>3.2 Delta encoding</h3>
<p>An example of a delta using the specified encoding is:</p>
<table border=1><tr><td><pre>
1Xb
4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;</pre>
</td></tr></table>
<p>This can be taken apart into the following parts:</p>
<table border=1>
<tr><th>What </th> <th>Encoding </th><th>Meaning </th><th>Details</th></tr>
<tr><td>Header</td> <td>1Xb </td><td>Size </td><td> 6246 </td></tr>
<tr><td>S-List</td> <td>4E@0, </td><td>Copy </td><td> 270 @ 0 </td></tr>
<tr><td> </td> <td>2:th </td><td>Literal </td><td> 2 'th' </td></tr>
<tr><td> </td> <td>FN@4C, </td><td>Copy </td><td> 983 @ 268 </td></tr>
<tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr>
<tr><td> </td> <td>1B@Jd, </td><td>Copy </td><td> 75 @ 1256 </td></tr>
<tr><td> </td> <td>6:scenda </td><td>Literal </td><td> 6 'scenda' </td></tr>
<tr><td> </td> <td>5x@Kt, </td><td>Copy </td><td> 380 @ 1336 </td></tr>
<tr><td> </td> <td>6:pieces </td><td>Literal </td><td> 6 'pieces' </td></tr>
<tr><td> </td> <td>79@Qt, </td><td>Copy </td><td> 457 @ 1720 </td></tr>
<tr><td> </td> <td>F: Example: eskil</td><td>Literal </td><td> 15 ' Example: eskil'</td></tr>
<tr><td> </td> <td>~E@Y0, </td><td>Copy </td><td> 4046 @ 2176 </td></tr>
<tr><td>Trailer</td><td>2zMM3E </td><td>Ckecksum</td><td> -1101438770 </td></tr>
</table>
<p>The unified diff behind the above delta is</p>
<table border=1><tr><td><pre>
bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new
--- ../DELTA/old 2007-08-23 21:14:40.000000000 -0700
+++ ../DELTA/new 2007-08-23 21:14:33.000000000 -0700
@@ -5,7 +5,7 @@
* If the server does not have write permission on the database
file, or on the directory containing the database file (and
- it is thus unable to update database because it cannot create
+ it is thus unable to update the database because it cannot create
a rollback journal) then it currently fails silently on a push.
It needs to return a helpful error.
@@ -27,8 +27,8 @@
* Additional information displayed for the "vinfo" page:
+ All leaves of this version that are not included in the
- decendant list. With date, user, comment, and hyperlink.
- Leaves in the decendant table should be marked as such.
+ descendant list. With date, user, comment, and hyperlink.
+ Leaves in the descendant table should be marked as such.
See the compute_leaves() function to see how to find all
leaves.
+ Add file diff links to the file change list.
@@ -37,7 +37,7 @@
* The /xfer handler (for push, pull, and clone) does not do
delta compression. This results in excess bandwidth usage.
- There are some code in xfer.c that are sketches of ideas on
+ There are some pieces in xfer.c that are sketches of ideas on
how to do delta compression, but nothing has been implemented.
* Enhancements to the diff and tkdiff commands in the cli.
@@ -45,7 +45,7 @@
single file. Allow diffs against any two arbitrary versions,
not just diffs against the current check-out. Allow
configuration options to replace tkdiff with some other
- visual differ of the users choice.
+ visual differ of the users choice. Example: eskil.
* Ticketing interface (expand this bullet)
</pre></td></tr></table>
<a name="notes"><h2>Notes</h2>
<ul>
<li>Pure text files generate a pure text delta.
</li>
<li>Binary files generate a delta that may contain some binary data.
</li>
<li>Instead of putting special instructions for general compression
into the delta-format itself, specifically the segment-list, like
run-length encoding of literals, etc. it was considered to be much
more sensible to keep the various concern separate and use a general
compression library, like <a href="http://www.zlib.net">zlib</a>, to
compress the full delta after its generation.
</li>
</ul>
|
Added www/fileformat.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 | <h1 align="center"> Fossil File Formats </h1> <p> The global state of a fossil repository is determined by an unordered set of artifacts. An artifact might be a source code file, the text of a wiki page, part of a trouble ticket, or one of several special control artifacts used to show the relationships between other artifacts within the project. Artifacts can be text or binary. </p> <p> Each artifact in the repository is named by its SHA1 hash. No prefixes or meta information is added to a artifact before its hash is computed. The name of a artifact in the repository is exactly the same SHA1 hash that is computed by sha1sum on the file as it exists in your source tree.</p> <p> Some artifacts have a particular format which gives them special meaning to fossil. Fossil recognizes:</p> <ul> <li> Manifests </li> <li> Clusters </li> <li> Control Artifacts </li> <li> Wiki Pages </li> <li> Ticket Changes </li> </ul> <p>These five artifact types are described in the sequel.</p> <h2>1.0 The Manifest</h2> <p>A manifest defines a baseline or version of the project source tree. The manifest contains a list of artifacts for each file in the project and the corresponding filenames, as well as information such as parent baselines, the name of the programmer who created the baseline, the date and time when the baseline was created, and any check-in comments associated with the baseline.</p> <p> Any artifact in the repository that follows the syntactic rules of a manifest is a manifest. Note that a manifest can be both a real manifest and also a content file, though this is rare. </p> <p> A manifest is a text file. Newline characters (ASCII 0x0a) separate the file into "cards". Each card begins with a single character "card type". Zero or more arguments may follow the card type. All arguments are separated from each other and from the card-type character by a single space character. There is no surplus white space between arguments and no leading or trailing whitespace except for the newline character that acts as the card separator. </p> <p> All cards of the manifest occur in strict sorted lexicographical order. No card may be duplicated. The entire manifest may be PGP clear-signed, but otherwise it may contain no additional text or data beyond what is described here. </p> <p> Allowed cards in the manifest are as follows: </p> <blockquote> <b>C</b> <i>checkin-comment</i><br> <b>D</b> <i>time-and-date-stamp</i><br> <b>F</b> <i>filename</i> <i>SHA1-hash</i><br> <b>P</b> <i>SHA1-hash</i>+<br> <b>R</b> <i>repository-checksum</i><br> <b>U</b> <i>user-login</i><br> <b>Z</b> <i>manifest-checksum</i> </blockquote> <p> A manifest must have exactly one C-card. The sole argument to the C-card is a check-in comment that describes the check-in that the manifest defines. The check-in comment is text. The following escape sequences are applied to the text: A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash (ASCII 0x5C) is represented as two backslashes "\\". Apart from space and newline, no other whitespace characters are allowed in the check-in comment. Nor are any unprintable characters allowed in the comment. </p> <p> A manifest must have exactly one D-card. The sole argument to the D-card is a date-time stamp in the ISO8601 format. The date and time should be in coordinated universal time (UTC). The format is: </p> <blockquote> <i>YYYY</i><b>-</b><i>MM</i><b>-</b><i>DD</i><b>T</b><i>HH</i><b>:</b><i>MM</i><b>:</b><i>SS</i> </blockquote> <p> A manifest has zero or more F-cards. Each F-card defines a file (other than the manifest itself) which is part of the baseline that the manifest defines. There are two arguments. The first argment is the pathname of the file in the baseline relative to the root of the project file hierarchy. No ".." or "." directories are allowed within the filename. Space characters are escaped as in C-card comment text. Backslash characters and newlines are not allowed within filenames. The directory separator character is a forward slash (ASCII 0x2F). The second argument to the F-card is the full 40-character lower-case hexadecimal SHA1 hash of the content artifact. </p> <p> A manifest has zero or one P-cards. Most manifests have one P-card. The P-card has a varying number of arguments that defines other manifests from which the current manifest is derived. Each argument is an 40-character lowercase hexadecimal SHA1 of the predecessor manifest. All arguments to the P-card must be unique to that line. The first predecessor is the direct ancestor of the manifest. Other arguments define manifests with which the first was merged to yield the current manifest. Most manifests have a P-card with a single argument. The first manifest in the project has no ancestors and thus has no P-card. </p> <p> A manifest may optionally have a single R-card. The R-card has a single argument which is the MD5 checksum of all files in the baseline except the manifest itself. The checksum is expressed as 32-characters of lowercase hexadecimal. The checksum is computed as follows: For each file in the baseline (except for the manifest itself) in strict sorted lexicographical order, take the pathname of the file relative to the root of the repository, append a single space (ASCII 0x20), the size of the file in ASCII decimal, a single newline character (ASCII 0x0A), and the complete text of the file. Compute the MD5 checksum of the the result. </p> <p> Each manifest has a single U-card. The argument to the U-card is the login of the user who created the manifest. The login name is encoded using the same character escapes as is used for the check-in comment argument to the C-card. </p> <p> A manifest has an option Z-card as its last line. The argument to the Z-card is a 32-character lowercase hexadecimal MD5 hash of all prior lines of the manifest up to and including the newline character that immediately preceeds the "Z". The Z-card is just a sanity check to prove that the manifest is well-formed and consistent. </p> <h2>2.0 Clusters</h2> <p> A cluster is a artifact that declares the existance of other artifacts. Clusters are used during repository synchronization to help reduce network traffic. </p> <p> Clusters follow a syntax that is very similar to manifests. A Cluster is a line-oriented text file. Newline characters (ASCII 0x0a) separate the artifact into cards. Each card begins with a single character "card type". Zero or more arguments may follow the card type. All arguments are separated from each other and from the card-type character by a single space character. There is no surplus white space between arguments and no leading or trailing whitespace except for the newline character that acts as the card separator. All cards of a cluter occur in strict sorted lexicographical order. No card may be duplicated. The cluster may not contain additional text or data beyond what is described here. Unlike manifests, clusters are never PGP signed. </p> <p> Allowed cards in the cluster are as follows: </p> <blockquote> <b>M</b> <i>uuid</i><br /> <b>Z</b> <i>checksum</i> </blockquote> <p> A cluster contains one or more "M" cards followed by a single "Z" line. Each M card has a single argument which is the UUID of another artifact in the repository. The Z card work exactly like the Z card of a manifest. The argument to the Z card is the lower-case hexadecimal representation of the MD5 checksum of all prior cards in the cluster. Note that the Z card is required on a cluster. </p> <h2>3.0 Control Artifacts</h2> <p> Control artifacts are used to assign properties to other artifacts within the repository. The basic format of a control artifact is the same as a manifest or cluster. A control artifact is a text files divided into cards by newline characters. Each card has a single-character card type followed by arguments. Spaces separate the card type and the arguments. No surplus whitespace is allowed. All cards must occur in strict lexigraphical order. </p> <p> Allowed cards in a control artifact are as follows: </p> <blockquote> <b>D</b> <i>time-and-date-stamp</i><br /> <b>T</b> (<b>+</b>|<b>-</b>|<b>*</b>)<i>tag-name uuid ?value?</i><br /> <b>Z</b> <i>checksum</i><br /> </blockquote> <p> A control artifact must have one D card and one Z card and one or more or more T cards. No other cards or other text is allowed in a control artifact. Control artifacts might be PGP clearsigned.</p> <p>The D card and the Z card of a control artifact are the same as in a manifest.</p> <p>The T card represents a "tag" or property that is applied to some other artifact. The T card has two or three values. The second argument is the 40 character lowercase UUID of the artifact to which the tag is to be applied. The first value is the tag name. The first character of the tag is either "+", "-", or "*". A "+" means the tag should be added to the artifact. The "-" means the tag should be removed. The "*" character means the tag should be added to the artifact and all direct decendants (but not branches) of the artifact down to but not including the first decendant that contains a more recent "-" tag with the same name. The optional third argument is the value of the tag. A tag without a value is a boolean.</p> <p>When two or more tags with the same name are applied to the same artifact, the tag with the latest (most recent) date is used.</p> <p>Some tags have special meaning. The "comment" tag when applied to a baseline will override the check-in comment of that baseline for display purposes.</p> <h2>4.0 Wiki Pages</h2> <p>A wiki page is an artifact with a format similar to manifests, clusters, and control artifacts. The artifact is divided into cards by newline characters. The format of each card is as in manifests, clusters, and control artifacts. Wiki artifacts accept the following card types:</p> <blockquote> <b>D</b> <i>time-and-date-stamp</i><br /> <b>L</b> <i>wiki-title</i><br /> <b>P</b> <i>parent-uuid</i>+<br /> <b>U</b> <i>user-name</i><br /> <b>W</b> <i>size</i> <b>\n</b> <i>text</i> <b>\n</b><br /> <b>Z</b> <i>checksum</i> </blockquote> <p>The D card is the date and time when the wiki page was edited. The P card specifies the parent wiki pages, if any. The L card gives the name of the wiki page. The U card specifies the login of the user who made this edit to the wiki page. The Z card is the usual checksum over the either artifact.</p> <p>The W card is used to specify the text of the wiki page. The argument to the W card is an integer which is the number of bytes of text in the wiki page. That text follows the newline character that terminates the W card. The wiki text is always followed by one extra newline.</p> <h2>5.0 Ticket Changes</h2> <p>A ticket-change artifact represents a change to a trouble ticket. The following cards are allowed on a ticket change artifact:</p> <blockquote> <b>D</b> <i>time-and-date-stamp</i><br /> <b>J</b> ?<b>+</b>?<i>name value</i><br /> <b>K</b> <i>ticket-uuid</i><br /> <b>U</b> <i>user-name</i><br /> <b>Z</b> <i>checksum</i> </blockquote> <p> The D card is the usual date and time stamp and represents the point in time when the change was entered. The U card is the login of the programmer who entered this change. The Z card is the checksum over the entire artifact.</p> <p> Every ticket has a UUID. The ticket to which this change is applied is specified by the K card. A ticket exists if it contains one or more changes. The first "change" to a ticket is what brings the ticket into existance.</p> <p> J cards specify changes to "fields" of the ticket. Each fossil server has a ticket configuration which specifies the fields its understands. This is not a limit on the fields that can appear on the J cards, however. If a J card specifies a field that a particular fossil server does not recognize, then that J card is simply ignored.</p> <p> The first argument of the J card is the field name. The second value is the field value. If the field name begins with "+" then the value is appended to the prior value. Otherwise, the value on the J card replaces any previous value of the field. The field name and value are both encoded using the character escapes defined for the C card of a manifest. </p> |
Changes to www/index.wiki.
|
| | > | > > > | | | > | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <h1>Fossil - A Software Configuration Management System</h1> <p> This is a preliminary homepage for a new software configuration management system called "Fossil". The system is <a href="http://www.fossil-scm.org/fossil/timeline">self-hosting</a> on <a href="http://www.hwaci.com/cgi-bin/fossil/timeline">two separate servers</a>. You can download the lastest sources compile it yourself using the instructions below. Or you can grab <a href="http://www.fossil-scm.org/download.html">precompiled binaries</a>. </p> <p>Design Goals For Fossil:</p> <ul> <li>Supports disconnected, distributed development (like <a href="http://kerneltrap.org/node/4982">git</a>, <a href="http://www.monotone.ca/">monotone</a>, <a href="http://www.selenic.com/mercurial/wiki/index.cgi">mercurial</a>, or <a href="http://www.bitkeeper.com/">bitkeeper</a>) or client/server operation (like <a href="http://www.nongnu.org/cvs/">CVS</a> or <a href="http://subversion.tigris.org/">subversion</a>), or operations on local repositories, or all three at the same time</li> <li>Integrated bug tracking and wiki, along the lines of <a href="http://www.cvstrac.org/">CVSTrac</a> and <a href="http://www.edgewall.com/trac/">Trac</a>.</li> <li>Built-in web interface that supports deep archaeological digs through historical source code.</li> <li>All network communication via <a href="http://en.wikipedia.org/wiki/HTTP">HTTP</a> |
| ︙ | ︙ | |||
39 40 41 42 43 44 45 | <li>Trivial to setup and administer</li> <li>Files and versions are identified by their <a href="http://en.wikipedia.org/wiki/SHA-1">SHA1</a> signature.</a> Any unique prefix is sufficient to identify a file or version - usually the first 4 or 5 characters suffice.</li> <li>The file format is trival and requires nothing more complex than a text editor and the "sha1sum" command-line utility to decode.</li> | | | | | | | | | | | | | > > > > > > > > > > > > > > > > > > > > | 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
<li>Trivial to setup and administer</li>
<li>Files and versions are identified by their
<a href="http://en.wikipedia.org/wiki/SHA-1">SHA1</a> signature.</a>
Any unique prefix is sufficient to identify a file
or version - usually the first 4 or 5 characters suffice.</li>
<li>The file format is trival and requires nothing more complex
than a text editor and the "sha1sum" command-line utility to decode.</li>
<li>Automatic <a href="selfcheck.wiki">self-check</a>
on repository changes makes it exceedingly
unlikely that data will ever be lost because of a software bug.</li>
</ul>
<p>Objectives Of Fossil:</p>
<ul>
<li>Fossil should be ridiculously easy to
<a href="build.wiki">install</a> and
<a href="quickstart.wiki">operate</a>.</li>
<li>With fossil, it should be possible (and
<a href="quickstart.wiki#serversetup">easy</a>) to set up a project
on an inexpensive shared-hosting ISP
(example: <a href="http://www.he.net/hosting.html">Hurricane Electric</a>)
that provides nothing more than web space and CGI capability.
Here is <a href="http://www.hwaci.com/cgi-bin/fossil/timeline">a demo</a>.</li>
<li>Fossil should provide in-depth historical and status information about the
project through a web interface</li>
<li>The integration of <a href="http://wiki.org/wiki.cgi?WhatIsWiki">Wiki</a>
and the ability to safely support anonymous check-in are features sometimes
described as
<a href="http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html">Web 2.0</a>.
Fossil attempts to better capture "collective intelligence" and
"the wisdom of crowds" by opening up write access to the masses.</li>
</ul>
<p>Other Links:</p>
<ul>
<li>The <a href="concepts.wiki">concepts</b> behind fossil</li>
<li><a href="build.wiki">Building And Installing</a></li>
<li><a href="quickstart.wiki">Quick Start</a> guide to using fossil
<li><a href="pop.wiki">Principles Of Operation</a></li>
<li>The <a href="selfcheck.wiki">automatic self-check</a> mechanism
helps insure project integrity.</li>
<li>The <a href="fileformat.wiki">file format</a> used by every content
file stored in the repository.</li>
<li>The <a href="delta_format.wiki">format of deltas</a> used to
efficiently store changes between file revisions.</li>
<li>The <a href="delta_encoder_algorithm.html">encoder algorithm</a> used to
efficiently generate deltas.</li>
<li>The <a href="sync.wiki">synchronization protocol</a>.</li>
<li>There is a <a href="http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users">
mailing list</a> available for discussing fossil issues.</li>
<li>The <a href="http://www.fossil-scm.org/fossil/wiki">self-hosting
fossil wiki</a>, capable of hosting static pages and community-editable wiki
content.</li>
</ul>
<p>Competing Projects:</p>
<ul>
<li><a href="http://www.ditrack.org/">DITrace</a>
- A Distributed Issue Tracker</li>
<li><a href="http://www.distract.wellquite.org/">DisTract</a>
- Another distributed issue tracker based on
<a href="http://www.monotone.ca/">monotone</a>.</li>
<li><a href="http://www.monotone.ca/">Monotone</a> - distributed
SCM in a single-file executable with a single-file SQLite
database repository.</li>
</ul>
|
Added www/pop.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | <h1>Principles Of Operation</h1> <p> This page attempts to define the foundational principals upon which Fossil is built. </p> <ul> <li><p>A project consists of source files, wiki pages, and trouble tickets, and control files (collectively "artifacts"). All historical copies of all artifacts are saved. The project maintains an audit trail.</p></li> <li><p>A project resides in one or more repositories. Each repository is administered and operates independently of the others.</p></li> <li><p>Each repository has both global and local state. The global state is common to all repositories (or at least has the potential to be shared in common when the repositories are fully synchronized). The local state for each repository is private to that repository. The global state represents the content of the project. The local state identifies the authorized users and access policies for a particular repository.</p></li> <li><p>The global state of a repository is an unordered collection of artifacts. Each artifact is named by its SHA1 hash encoded in lowercase hexadecimal. In many contexts, the name can be abbreviated to a unique prefix. A five- or six-character prefix usually suffices to uniquely identify a file.</p></li> <li><p>Because artifacts are named by their SHA1 hash, all artifacts are immutable. Any change to the content of a artifact also changes the hash that forms the artifacts name, thus creating a new artifact. Both the old original version of the artifact and the new change are preserved under different names.</p></li> <li><p>It is theoretically possible for two artifacts with different content to share the same hash. But finding two such artifacts is so incredibly difficult and unlikely that we consider it to be an impossibility.</p></li> <li><p>The signature of an artifact is the SHA1 hash of the artifact itself, exactly as it would appear in a disk file. No prefix or meta-information about the artifact is added before computing the hash. So you can always find the SHA1 signature of a file by using the "sha1sum" command-line utility.</p></li> <li><p>The artifacts that comprise the global state of a repository are the complete global state of that repository. The SQLite database that holds the repository contains additional information about linkages between artifacts, but all of that added information can be discarded and reconstructed by rescanning the content artifacts.</p></li> <li><p>Two repositories for the same project can synchronize their global states simply by sharing artifacts. The local state of repositories is not normally synchronized or shared.</p></li> <li><p>Every baseline has a special file at the top-level named "manifest" which is an index of all other files in that baseline. The manifest is automatically created and maintained by the system.</p></li> <li><p>The <a href="fileformat.html">file formats</a> used by Fossil are all very simple so that with access to the original content files, one can easily reconstruct the content of a baseline without the need for any special tools or software.</p></li> |
Added www/quickstart.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
<h1 align="center">Fossil Quick Start</h1>
<p>This is a guide to get you started using fossil quickly
and painlessly.</p>
<h2>Installing</h2><blockquote>
<p>Fossil is a single self-contained C program. You need to
either download a <a href="download.wiki">FIXME precompiled binary</a>
or <a href="build.wiki">build it yourself</a> from sources.
Install fossil by putting the fossil binary
someplace on your PATH environment variable.</p>
</blockquote>
<h2>Cloning A Existing Repository</h2>
<blockquote>
<p>Use this command:</p>
<blockquote>
<b>fossil clone</b> <i>URL repository-filename</i>
</blockquote>
<p>The <i>URL</i> above is the http URL for the fossil repository
you want to clone. You can call the new repository anything you
want - there are no naming restrictions. As an example, you can
clone the fossil repository this way:</p>
<blockquote>
<b>fossil clone http://www.fossil-scm.org/fossil myclone.fsl</b>
</blockquote>
<p>Note: If you are behind a restrictive firewall, you might need
to <a href="#proxy">specify an HTTP proxy</a> to use.</p>
</blockquote><h2>Starting A New Project</h2><blockquote>
<p>To start a new project with fossil, create a new empty repository
this way:</p>
<blockquote>
<b>fossil new </b><i> repository-filename</i>
</blockquote>
</blockquote><h2>Configuring Your Local Repository</h2><blockquote>
<p>When you create a new repository, either by cloning an existing
project or create a new project of your own, you usually want to do some
local configuration. This is accomplished using a webbrowser. First
start a fossil webserver like this:</p>
<blockquote>
<b>fossil server </b><i> repository-filename</i>
</blockquote>
<p>This creates a mini-webserver listening on port 8080. You can
specify a different port using the <b>-port</b> option on the command-line.
After the server is running, point your webbrowser at
http://localhost:8080/ and start configuring.</p>
<p>By default, fossil does not require a login for HTTP connections
coming in from the IP loopback address 127.0.0.1. You can, and perhaps
should, change this after you create a few users.</p>
<p>When you are finished configuring, just press Control-C or use
the <b>kill</b> command to shut down the mini-server.</p>
</blockquote><h2>Checking Out A Local Tree</h2><blockquote>
<p>To work on a project in fossil, you need to check out a local
copy of the source tree. Create the directory you want to be
the root of your tree and cd into that directory. Then
to this:</p>
<blockquote>
<b>fossil open </b><i> repository-filename</i>
</blockquote>
<p>This leaves you with the original (empty) version of the tree
checked out. To get to the latest version, also do this:</p>
<blockquote>
<b>fossil update</b>
</blockquote>
<p>From anywhere underneath the root of your local tree, you
can type commands like the following to find out the status of
your local tree:</p>
<blockquote>
<b>fossil info</b><br>
<b>fossil status</b><br>
<b>fossil changes</b><br>
<b>fossil timeline</b><br>
<b>fossil leaves</b><br>
<b>fossil ls</b><br>
<b>fossil branches</b><br>
</blockquote>
</blockquote><h2>Making Changes</h2><blockquote>
<p>To add new files to your project, or remove old files, use these
commands:</p>
<blockquote>
<b>fossil add</b> <i>file...</i><br>
<b>fossil rm</b> <i>file...</i>
</blockquote>
<p>You can also edit files freely. Once you are ready to commit
your changes, type:</p>
<blockquote>
<b>fossil commit</b>
</blockquote>
<p>You will be prompted for check-in comments using whatever editor
is specified by your VISUAL or EDITOR environment variable. If you
have GPG installed, you may be prompted for your GPG passphrase so
that the check-in can be signed with your GPG signature. After
this your changes will be checked in.</p>
</blockquote><h2>Sharing Changes</h2><blockquote>
<p>The changes you <b>commit</b> are only on your local repository.
To share those changes with other repositories, do:</p>
<blockquote>
<b>fossil push</b> <i>URL</i>
</blockquote>
<p>Where <i>URL</i> is the http: URL of the server repository you
want to share your changes with. If you omit the <i>URL</i> argument,
fossil will use whatever server you most recently synced with.</p>
<p>The <b>push</b> command only sends your changes to others. To
Receive changes from others, use <b>pull</b>. Or go both ways at
once using <b>sync</b>:</p>
<blockquote>
<b>fossil pull</b> <i>URL</i><br>
<b>fossil sync</b> <i>URL</i>
</blockquote>
<p>When you pull in changes from others, they go into your repository,
not into your checked-out local tree. To get the changes into your
local tree, use <b>update</b>:</p>
<blockquote>
<b>fossil update</b> <i>UUID</i>
</blockquote>
<p>The <i>UUID</i> is some unique abbreviation to the 40-character
version ID. If you omit the <i>UUID</i> fossil moves you to the
leaf version of the branch your are currently on. If your branch
has multiple leaves, you get an error - you'll have to specify the
leaf you want using a <i>UUID</i> argument.</p>
</blockquote><h2>Branching And Merging</h2><blockquote>
<p>You can create branches by doing multiple commits off of the
same base version. To merge to branches back together, first
<b>update</b> to the leaf of one branch. Then do a <b>merge</b>
of the leaf of the other branch:</p>
<blockquote>
<b>fossil merge</b> <i>UUID</i>
</blockquote>
<p>Test to make sure your merge didn't mess up the code, then
<b>commit</b> and possibly also <b>push</b> your changes. Remember
that nobody else can see your changes until you <b>commit</b> and
if other are using a different repository you will also need to
<b>push</b>.</p>
<a name="serversetup">
</blockquote><h2>Setting Up A Server</h2><blockquote>
<p>The easiest way to set up a server is:</p>
<blockquote>
<b>fossil server</b> <i>repository-filename</i>
</blockquote>
<p>You can omit the <i>repository-filename</i> if you are within
a checked-out local tree. This server uses port 8080 by default
but you can specify a different port using the <b>-port</b> command.</p>
<p>Command-line servers like this are useful when two people want
to share a repository on temporary or ad-hoc basis. For a more
permanent installation, you should use either the CGI server or the
inetd server. To use the CGI server, create a CGI script that
looks something like this:</p>
<blockquote><b>
#!/usr/local/bin/fossil<br>
repository: /home/proj1/repos1.fsl
</b></blockquote>
<p>Adjust the paths in this CGI script to match your installation.
Now point clients at the CGI script. That's all there is to it!</p>
<p>You can also run fossil off of inetd or xinetd. For an inetd
installation, make an entry in /etc/inetd.conf that looks something
like this:</p>
<blockquote><b>
80 stream tcp nowait.1000 root /usr/bin/fossil \<br>
/usr/bin/fossil http /home/proj1/repos1.fsl
</b></blockquote>
<p>Adjust the paths to suit your installation, of course. Notice that
fossil runs as root. This is not required - you can run it as an
unprivileged user. But it is more secure to run fossil as root.
When you do run fossil as root, it automatically puts itself in a
chroot jail in the same directory as the repository, then drops
root privileges prior to reading any information from the request.</p>
</blockquote><a name="proxy"></a><h2>HTTP Proxies</h2><blockquote>
<p>If you are behind a restrictive firewall that requires you to use
an HTTP proxy to reach the internet, then you can configure the proxy
in three different ways. You can tell fossil about your proxy using
a command-line option on commands that use the network,
<b>sync</b>, <b>clone</b>, <b>push</b>, and <b>pull</b>.</p>
<blockquote>
<b>fossil clone </b><i>URL</i> <b>--proxy</b> <i>Proxy-URL</i>
</blockquote>
<p>It is annoying to have to type in the proxy URL every time you
sync your project, though, so you can make the proxy configuration
persistent using the <b>setting</b> command:</p>
<blockquote>
<b>fossil setting proxy </b><i>Proxy-URL</i>
</blockquote>
<p>Or, you can set the "<b>http_proxy</b>" environment variable:</p>
<blockquote>
<b>export http_proxy=</b><i>Proxy-URL</i>
</blockquote>
<p>To stop using the proxy, do:</p>
<blockquote>
<b>fossil setting proxy off</b>
</blockquote>
<p>Or unset the environment variable. The fossil setting for the
HTTP proxy takes precedence over the environment variable and the
command-line option overrides both. If you have an persistent
proxy setting that you want to override for a one-time sync, that
is easily done on the command-line. For example, to sync with
a co-workers repository on your LAN, you might type:</p>
<blockquote>
<b>fossil sync http://192.168.1.36:8080/ --proxy off</b>
</blockquote>
</blockquote><h2>More Hints</h2><blockquote>
<p>Try these commands:</p>
<blockquote><b>
fossil help<br>
fossil test-commands
</b></blockquote>
<p>Explore and have fun!</p>
</blockquote>
|
Added www/selfcheck.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | <h1 align="center"> Fossil Repository Integrity Self-Checks </h1> <p> Even though fossil is a relatively new project and still contains many bugs, it is designed with features to give it a high level of integrity so that you can have confidence that you will not lose your files. This note describes the defensive measures that fossil uses to help prevent file loss due to bugs. </p> <p><i>Follow-up as of 2007-11-24:</i> Fossil has been hosting itself and several other projects for months now. Many bugs have been encountered. But, thanks in large part to the defensive measures described here, no data has been lost. The integrity checks are doing their job well.</p> <h2>Atomic Check-ins With Rollback</h2> <p> The fossil repository is an <a href="http://www.sqlite.org/">SQLite version 3</a> database file. SQLite is very mature and stable and has been in wide-spread use for many years, so we have little worries that it might cause repository corruption. SQLite databases do not corrupt even if a program or system crash or power failure occurs in the middle of the update. If some kind of crash does occur in the middle of a change, then all the changes are rolled back the next time that the database is accessed. </p> <p> A check-in operation in fossil makes many changes to the repository database. But all these changes happen within a single transaction. If something goes wrong in the middle of the commit, then the transaction is rolled back and the database is unchanged. </p> <h2>Verification Of Delta Encodings Prior To Transaction Commit</h2> <p> The content files that comprise the global state of a fossil respository are stored in the repository as a tree. The leaves of the tree are stored as zlib-compressed BLOBs. Interior nodes are deltas from their decendants. A lot of encoding is going on. There is zlib-compression which is relatively well-tested but still might cause corruption if used improperly. And there is the relatively new delta-encoding mechanism designed expressly for fossil. We want to make sure that bugs in these encoding mechanisms do not lead to loss of data. </p> <p> To increase our confidence that everything in the repository is recoverable, fossil makes sure it can extract an exact replicate of every content file that it changes just prior to transaction commit. So during the course of check-in (or other repository operation) many different files in the repository might be modified. Some files are simply compressed. Other files are delta encoded and then compressed. While all this is going on, fossil makes a record of every file that is encoded and the SHA1 hash of the original content of that file. Then just before transaction commit, fossil re-extracts the original content of all files that were written, computes the SHA1 checksum again, and verifies that the checksums match. If anything does not match up, an error message is printed and the transaction rolls back. </p> <p> So, in other words, fossil always checks to make sure it can re-extract a file before it commits a change to that file. Hence bugs in fossil are unlikely to corrupt the repository in a way that prevents us from extracting historical versions of files. </p> <h2>Checksum Over All Files In A Baseline</h2> <p> Manifest artifacts that define a baseline have two fields (the R-card and Z-card) that record MD5 hashs of the manifest itself and of all other files in the manifest. Prior to any check-in commit, these checksums are verified to ensure that the baseline checked in agrees exactly with what is on disk. Similarly, the repository checksum is verified after a checkout to make sure that the entire repository was checked out correctly. Note that these added checks use a different hash (MD5 instead of SHA1) in order to avoid common-mode failures in the hash algorithm implementation. </p> |
Added www/sync.wiki.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 |
<h1 align="center">The Fossil Sync Protocol</h1>
<p>Fossil supports commands <b>push</b>, <b>pull</b>, and <b>sync</b>
for transferring information from one repository to another. The
command is run on the client repository. A URL for the server repository
is specified as part of the command. This document describes what happens
behind the scenes in order to synchronize the information on the two
repositories.</p>
<h2>1.0 Transport</h2>
<p>All communication between client and server is via HTTP requests.
The server is listening for incoming HTTP requests. The client
issues one or more HTTP requests and receives replies for each
request.</p>
<p>The server might be running as an independent server
using the <b>server</b> command, or it might be launched from
inetd or xinetd using the <b>http</b> command. Or the server might
be launched from CGI. The details of how the server is configured
to "listen" for incoming HTTP requests is immaterial. The important
point is that the server is listening for requests and the client
is the issuer of the requests.</p>
<p>A single push, pull, or sync might involve multiple HTTP requests.
The client maintains state between all requests. But on the server
side, each request is independent. The server does not preserve
any information about the client from one request to the next.</p>
<h3>1.1 Server Identification</h3>
<p>The server is identified by a URL argument that accompanies the
push, pull, or sync command on the client. (As a convenience to
users, the URL can be omitted on the client command and the same URL
from the most recent push, pull, or sync will be reused. This saves
typing in the common case where the client does multiple syncs to
the same server.)</p>
<p>The client modifies the URL by appending the method name "<b>/xfer</b>"
to the end. For example, if the URL specified on the client command
line is</p>
<blockquote>
http://fossil-scm.hwaci.com/fossil
</blockquote>
<p>Then the URL that is really used to do the synchronization will
be:</p>
<blockquote>
http://fossil-scm.hwaci.com/fossil/xfer
</blockquote>
<h3>1.2 HTTP Request Format</h3>
<p>The client always sends a POST request to the server. The
general format of the POST request is as follows:</p>
<blockquote><pre>
POST /fossil/xfer HTTP/1.0
Host: fossil-scm.hwaci.com:80
Content-Type: application/x-fossil
Content-Length: 4216
<i>content...</i>
</pre></blockquote>
<p>In the example above, the pathname given after the POST keyword
on the first line is a copy of the URL pathname. The Host: parameter
is also taken from the URL. The content type is always either
"application/x-fossil" or "application/x-fossil-debug". The "x-fossil"
content type is the default. The only difference is that "x-fossil"
content is compressed using zlib whereas "x-fossil-debug" is sent
uncompressed.</p>
<p>A typical reply from the server might look something like this:</p>
<blockquote><pre>
HTTP/1.0 200 OK
Date: Mon, 10 Sep 2007 12:21:01 GMT
Connection: close
Cache-control: private
Content-Type: application/x-fossil; charset=US-ASCII
Content-Length: 265
<i>content...</i>
</pre></blockquote>
<p>The content type of the reply is always the same as the content type
of the request.</p>
<h2>2.0 Fossil Synchronization Content</h2>
<p>A synchronization request between a client and server consists of
one or more HTTP requests as described in the previous section. This
section details the "x-fossil" content type.</p>
<h3>2.1 Line-oriented Format</h3>
<p>The x-fossil content type consists of zero or more "cards". Cards
are separate by the newline character ("\n"). Leading and trailing
whitespace on a card is ignored. Blank cards are ignored.</p>
<p>Each card is divided into zero or more space separated tokens.
The first token on each card is the operator. Subsequent tokens
are arguments. The set of operators understood by servers is slightly
different from the operators understood by clients, though the two
are very similar.</p>
<h3>2.2 Login Cards</h3>
<p>Every message from client to server begins with one or more login
cards. Each login card has the following format:</p>
<blockquote>
<b>login</b> <i>userid nonce signature</i>
</blockquote>
<p>The userid is the name of the user that is requesting service
from the server. The nonce is the SHA1 hash of the remainder of
the message - all text that follows the newline character that
terminates the login card. The signature is the SHA1 hash of
the concatenation of the nonce and the users password.</p>
<p>For each login card, the server looks up the user and verifies
that the nonce matches the SHA1 hash of the remainder of the
message. It then checks the signature hash to make sure the
signature matches. If everything
checks out, then the client is granted all privileges of the
specified user.</p>
<p>Privileges are cumulative. There can be multiple successful
login cards. The session privileges are the bit-wise OR of the
privileges of each individual login.</p>
<h3>2.3 File Cards</h3>
<p>Repository content records or files are transferred using
a "file" card. File cards come in two different formats depending
on whether the file is sent directly or as a delta from some
other file.</p>
<blockquote>
<b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i><br>
<b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
</blockquote>
<p>File cards are different from all other cards in that they
followed by in-line "payload" data. The content of the file
or the file delta consists of the first <i>size</i> bytes of the
x-fossil content that immediately follow the newline that
terminates the file card. No other cards have this characteristic.
</p>
<p>The first argument of a file card is the UUID of the file that
is being transferred. The UUID is the lower-case hexadecimal
representation of the SHA1 hash of the entire file content.
The last argument of the file card is the number of bytes of
payload that immediately follow the file card. If the file
card has only two arguments, that means the payload is the
complete content of the file. If the file card has three
arguments, then the payload is a delta and second argument is
the UUID of another file that is the source of the delta.</p>
<p>File cards are sent in both directions: client to server and
server to client. A delta might be sent before the source of
the delta, so both client and server should remember deltas
and be able to apply them when their source arrives.</p>
<h3>2.4 Push and Pull Cards</h3>
<p>Among of the first cards in a client-to-server message are
the push and pull cards. The push card tell the server that
the client is pushing content. The pull card tell the server
that the client wants to pull content. In the event of a sync,
both cards are sent. The format is as follows:</p>
<blockquote>
<b>push</b> <i>servercode projectcode</i><br>
<b>pull</b> <i>servercode projectcode</i>
</blockquote>
<p>The <i>servercode</i> argument is the repository ID for the
client. The server will only allow the transaction to proceed
if the servercode is different from its own servercode. This
prevents a sync-loop. The <i>projectcode</i> is the identifier
of the software project that the client repository contains.
The projectcode for the client and server must match in order
for the transaction to proceed.</p>
<p>The server will also send a push card back to the client
during a clone. This is how the client determines what project
code to put in the new repository it is constructing.</p>
<h3>2.5 Clone Cards</h3>
<p>A clone card works like a pull card in that it is sent from
client to server in order to tell the server that the client
wants to pull content. But unlike the pull card, the clone
card has no arguments.</p>
<blockquote>
<b>clone</b>
</blockquote>
<p>In response to a clone message, the server also sends the client
a push message so that the client can discover the projectcode for
this project.</p>
<h3>2.6 Igot Cards</h3>
<p>An igot card can be sent from either client to server or from
server to client in order to indicate that the sender holds a copy
of a particular file. The format is:</p>
<blockquote>
<b>igot</b> <i>uuid</i>
</blockquote>
<p>The argument of the igot card is the UUID of the file that
the sender possesses.
The receiver of an igot card will typically check to see if
it also holds the same file and if not it will request the file
using a gimme card in either the reply or in the next message.</p>
<h3>2.7 Gimme Cards</h3>
<p>A gimme card is sent from either client to server or from server
to client. The gimme card asks the receiver to send a particular
file back to the sender. The format of a gimme card is this:</p>
<blockquote>
<b>gimme</b> <i>uuid</i>
</blockquote>
<p>The argument to the gimme card is the UUID of the file that
the sender wants. The receiver will typically respond to a
gimme card by sending a file card in its reply or in the next
message.</p>
<h3>2.8 Cookie Cards</h3>
<p>A cookie card can be used by a server to record a small amount
of state information on a client. The server sends a cookie to the
client. The client sends the same cookie back to the server on
its next request. The cookie card has a single argument which
is its payload.</p>
<blockquote>
<b>cookie</b> <i>payload</i>
</blockquote>
<p>The client is not required to return the cookie to the server on
its next request. Or the client might send a cookie from a different
server on the next request. So the server must not depend on the
cookie and the server must structure the cookie payload in such
a way that it can tell if the cookie it sees is its own cookie or
a cookie from another server. (Typically the server will embed
its servercode as part of the cookie.)</p>
<h3>2.9 Error Cards</h3>
<p>If the server discovers anything wrong with a request, it generates
an error card in its reply. When the client sees the error card,
it displays an error message to the user and aborts the sync
operation. An error card looks like this:</p>
<blockquote>
<b>error</b> <i>error-message</i>
</blockquote>
<p>The error message is English text that is encoded in order to
be a single token.
A space (ASCII 0x20) is represented as "\s" (ASCII 0x5C, 0x73). A
newline (ASCII 0x0a) is "\n" (ASCII 0x6C, x6E). A backslash
(ASCII 0x5C) is represented as two backslashes "\\". Apart from
space and newline, no other whitespace characters nor any
unprintable characters are allowed in
the error message.</p>
<h3>2.10 Unknown Cards</h3>
<p>If either the client or the server sees a card that is not
described above, then it generates an error and aborts.</p>
<h2>3.0 Phantoms And Clusters</h2>
<p>When a repository knows that a file exists and knows the UUID of
that file, but it does not know the file content, then it stores that
file as a "phantom". A repository will typically create a phantom when
it receives an igot card for a file that it does not hold or when it
receives a file card that references a delta source that it does not
hold. When a server is generating its reply or when a client is
generating a new request, it will usually send gimme cards for every
phantom that it holds.</p>
<p>A cluster is a special file that tells of the existence of other
files. Any file in the repository that follows the syntactic rules
of a cluster is considered a cluster.</p>
<p>A cluster is a line oriented file. Each line of a cluster
is a card. The cards are separated by the newline ("\n") character.
Each card consists of a single character card type, a space, and a
single argument. No extra whitespace and no trailing or leading
whitespace is allowed. All cards in the cluster must occur in
strict lexicographical order.</p>
<p>A cluster consists of one or more "M" cards followed by a single
"Z" card. Each M card holds an argument which is a UUID for a file
in the repository. The Z card has a single argument which is the
lower-case hexadecimal representation of the MD5 checksum of all
preceding M cards up to and included the newline character that
occurred just before the Z that starts the Z card.</p>
<p>Any file that does not match the specifications of a cluster
exactly is not a cluster. There must be no extra whitespace in
the file. There must be one or more M cards. There must be a
single Z card with a correct MD5 checksum. And all cards must
be in strict lexicographical order.</p>
<h3>3.1 The Unclustered Table</h3>
<p>Every repository maintains a table named "<b>unclustered</b>"
which records the identity of every file and phantom it holds that is not
mentioned in a cluster. The entries in the unclustered table can
be thought of as leaves on a tree of files. Some of the unclustered
files will be clusters. Those clusters may contain other clusters,
which might contain still more clusters, and so forth. Beginning
with the files in the unclustered table, one can follow the chain
of clusters to find every file in the repository.</p>
<h2>4.0 Synchronization Strategies</h2>
<h3>4.1 Pull</h3>
<p>A typical pull operation proceeds as shown below. Details
of the actual implementation may very slightly but the gist of
a pull is captured in the following steps:</p>
<ol>
<li>The client sends login and pull cards.
<li>The client sends a cookie card if it has previously received a cookie.
<li>The client sends gimme cards for every phantom that it holds.
<hr>
<li>The server checks the login password and rejects the session if
the user does not have permission to pull.
<li>If the number entries in the unclustered table on the server is
greater than 100, then the server constructs a new cluster file to
cover all those unclustered entries.
<li>The server sends file cards for every gimme card it received
from the client.
<li>The server sends ihave cards for every file in its unclustered
table that is not a phantom.
<hr>
<li>The client adds the content of file cards to its repository.
<li>The client creates a phantom for every ihave card in the server reply
that mentions a file that the client does not possess.
<li>The client creates a phantom for the delta source of file cards when
the delta source is a file that the client does not possess.
</ol>
<p>These ten steps represent a single HTTP round-trip request.
The first three steps are the processing that occurs on the client
to generate the request. The middle four steps are processing
that occurs on the server to interpret the request and generate a
reply. And the last three steps are the processing that the
client does to interpret the reply.</p>
<p>During a pull, the client will keep sending HTTP requests
until it holds all files that exist on the server.</p>
<p>Note that the server tries
to limit the size of its reply message to something reasonable
(usually about 1MB) so that it might stop sending file cards as
described in step (6) if the reply becomes too large.</p>
<p>Step (5) is the only way in which new clusters can be created.
By only creating clusters on the server, we hope to minimize the
amount of overlap between clusters in the common configuration where
there is a single server and many clients. The same synchronization
protocol will continue to work even if there are multiple servers
or if servers and clients sometimes change roles. The only negative
effects of these unusual arrangements is that more than the minimum
number of clusters might be generated.</p>
<h3>4.2 Push</h3>
<p>A typical push operation proceeds roughly as shown below. As
with a pull, the actual implementation may vary slightly.</p>
<ol>
<li>The client sends login and push cards.
<li>The client sends file cards for any files that it holds that have
never before been pushed - files that come from local check-ins.
<li>If this is the second or later cycle in a push, then the
client sends file cards for any gimme cards that the server sent
in the previous cycle.
<li>The client sends igot cards for every file in its unclustered table
that is not a phantom.
<hr>
<li>The server checks the login and push cards and issues an error if
anything is amiss.
<li>The server accepts file cards from the client and adds those files
to its repository.
<li>The server creates phantoms for igot cards that mention files it
does not possess or for file cards that mention delta source files that
it does not possess.
<li>The server issues gimme cards for all phantoms.
<hr>
<li>The client remembers the gimme cards from the server so that it
can generate file cards in reply on the next cycle.
</ol>
<p>As with a pull, the steps of a push operation repeat until the
server knows all files that exist on the client. Also, as with
pull, the client attempts to keep the size of the request from
growing too large by suppressing file cards once the
size of the request reaches 1MB.</p>
<h3>4.3 Sync</h3>
<p>A sync is just a pull and a push that happen at the same time.
The first three steps of a pull are combined with the first five steps
of a push. Steps (4) through (7) of a pull are combined with steps
(5) through (8) of a push. And steps (8) through (10) of a pull
are combined with step (9) of a push.</p>
<h2>5.0 Summary</h2>
<p>Here are the key points of the synchronization protocol:</p>
<ol>
<li>The client sends one or more PUSH HTTP requests to the server.
The request and reply content type is "application/x-fossil".
<li>HTTP request content is compressed using zlib.
<li>The content of request and reply consists of cards with one
card per line.
<li>Card formats are:
<ul>
<li> <b>login</b> <i>userid nonce signature</i>
<li> <b>push</b> <i>servercode projectcode</i>
<li> <b>pull</b> <i>servercode projectcode</i>
<li> <b>clone</b>
<li> <b>file</b> <i>uuid size</i> <b>\n</b> <i>content</i>
<li> <b>file</b> <i>uuid delta-uuid size</i> <b>\n</b> <i>content</i>
<li> <b>igot</b> <i>uuid</i>
<li> <b>gimme</b> <i>uuid</i>
<li> <b>cookie</b> <i>cookie-text</i>
<li> <b>error</b> <i>error-message</i>
</ul>
<li>Phantoms are files that a repository knows exist but does not possess.
<li>Clusters are files that contain the UUIDs of other files.
<li>Clusters are created automatically on the server during a pull.
<li>Repositories keep track of all files that are not named in any
cluster and send igot messages for those files.
<li>Repositories keep track of all the phantoms they hold and send
gimme messages for those files.
</ol>
|