Fossil: Artifact [7086f1ead6]

Artifact 7086f1ead626268f41b7759fe4d4a15823ceefc6236661ba8aeeb5cd0138bb36:

Wiki page [Signing and verification of artifacts] by george 2022-06-06 17:19:56.
D 2022-06-06T17:19:56.723
L Signing\sand\sverification\sof\sartifacts
N text/x-markdown
P 54f4b558770904f6b8c6d8212ea1614d896560a4cc0fb12240bda19c4f16c702
U george
W 18064
This document tries to bring closer a more ubiquitous,
seamless and useful signing and verification of artifacts.

**This is a draft!**  
It is incomplete.
It sketches out a few possible solutions.
These solutions try to balance flexibility and complexity.

<a id="toc"></a>
Table of content:

 * [Agenda and context](#context)
 * [Identity model](#identity)
 * [Auxiliary definitions](#defs)
 * [Trust model](#trust)

<a id="context"></a>
Agenda and context
==================

The main point is to enable strong authenticity not just for
"[data in transit][]" but also for "[data at rest][]".
This would enable some interesting features:

 * Trust could be decoupled from centralized [CA][]s
   (which is nearly inevitable for [TLS][]).
    
 * Trust could be decoupled from online-managed secret keys
   (like "[cold wallets][]" of some cryptocurrencies do)
   
 * Trust could be maintained even if there is a need
   to use unconventional (yet) transports like
   [GNUnet](https://gnunet.org),
   [Freenet](https://freenetproject.org),
   [IPFS](https://ipfs.io),
   [Dat](https://datprotocol.github.io/how-dat-works/),
   [NNCP](http://www.nncpgo.org),
   [Pigeon post](https://en.wikipedia.org/wiki/Pigeon_post),
   [Sneakernet](https://en.wikipedia.org/wiki/Sneakernet)

 * If a [Fossil repository is repurposed as a document]
   (forum:/forumpost/2ac0171524104616)
   then this document gets digital signatures "for free".

The above would make Fossil robust *distributed* system that
by design can not be surpassed by *any* "server-based" service
(e.g. [GitHub](forum:/forumpost/37617508a6e893a9) and the likes).

The idea is not new. [Monotone](https://www.monotone.ca)
(which is [kind of ancestor of Fossil](forum:/forumpost/ec56963bc602a700de))
automatically signs every commit.
But Monotone seems to be orphaned and does not support
all the goodies provided by Fossil
(like customizable WebUI and Tickets, Wiki, Forum and so on).  
The optimal way to implement the feature in Fossil is not obvious,
so lengthy discussions about the details can easily be anticipated.

Some related topics have already arisen at the Forum:

 * [I2P anonymous protocol want to use Fossil](forum:/forumpost/98b1ca585cfa1f14d329)
 * [Pull requests](forum:/forumpost/ae37ac8428518bb1d7e95300279b84824d0762b58)
 * [Private messages, private keys, and beyond](forum:/forumpost/dbe4ea86a863517e4c7c)
 * [Guest logins](forum:/forumpost/9a32a4d643fda22c7634522ce90f36c49ba464fb7)
 * [Rationale of PGP signing disabled by default](forum:/forumpost/4873706c64745ea4e25d)
 * [Public-key authentication design](forum:/forumpost/c58d4f5de9bb6d)
 * [Toward more-useful commit signing](forum:/forumpost/1edaf0bfea6bd0)
 * [Please reserve a place within structural artifacts for
   Fossil-managed digital signatures](forum:/forumpost/0e84c4bf331c624b)

<a id="noteworthy"></a>
Some more recent noteworthy opinions on the topic:

[**ravbc** on 2020-10-22](forum:/forumpost/ab15bd8812b8def8):

> IMHO, there is no easy escape from distributing
> public keys within a repository

[**offray** on 2020-10-23](forum:/forumpost/84ab1f8aa255529b):

> I really like the idea of having public keys uploaded to the repository
> and signed by others in it.

[**wyoung** on 2020-12-13](forum:/forumpost/c6ebe5cccd650b5e):

> All you can do is establish a PKI standard
> within the set of repos you do control.

[**wyoung** on 2021-09-18](forum:/forumpost/f78af1c7407f47ce)

> It is quite unlikely that your Fossil server has a wild assortment of PGP keys

[**george** on 2022-05-29](forum:/forumpost/0e84c4bf331c624b)

> Fossil 2.19 should accept [structural artifacts][struct] with signatures
> in some prominent (yet undecided) format...

[data in transit]: https://en.wikipedia.org/wiki/Data_in_transit
[data at rest]: https://en.wikipedia.org/wiki/Data_at_rest
[TLS]: https://en.wikipedia.org/wiki/Transport_Layer_Security
[CA]: https://en.wikipedia.org/wiki/Certificate_authority
[cold wallets]: https://en.wikipedia.org/wiki/Cold_wallet
[struct]: /doc/2022-05-28/www/fileformat.wiki#structural

<a id="identity"></a>
Identity model
==============

**Identity** is a cryptographically sound avatar of a human being.
[Identity is distinguished by the public key of it's **main keypair**]
(^
  This is essential. It enables the same identity to participate in
  different projects, even though the owner of that identity previously was 
  registered and is participating in these projects under different UserIDs.
  See also [forum post ae37ac84285](forum:/forumpost/ae37ac8428518bb).
),
which is referred to either directly (for signature schemes with short
public keys, such as Ed25519) or through it's hashsum
(for signature schemes with long public keys, such as Ed448 and RSA).
In both cases a [human-friendly variant of base32 encoding][^base32]
is used in order to prevent confusion with artifacts' UUIDs and also to
facilitate verbal transfers (in the context of signing parties and alike).

Identity does not expire, but can be explicitly **abrogated**.
Identity's *main key* may be used to claim that it was *compromised*
or [intentionally *destroyed*.][^indestroy]
Also identity's *main key* may be used to declare a [**trusted revoker**][^revoker] —
a public key that is authorized to claim that identity's *main key*
is *lost*, *destroyed* or *compromised*.
The former claim may be recovered using the identity's *main key*, while
in the later two cases the whole identity is permanently *abrogated*.
A *trusted revoker* may be a key that is under exclusive control of
identity's owner or may be a *main key* of some other identity.
In both cases authorization of the *trusted revoker* may have an expiration
time set and also be limited to just some of claims (for example,
only "*lost*" and "*destroyed*" claims may be authorized).
A *trusted revoker* need not be public unless it is used.

A set of projects that are relevant for a particular identity
will be denoted as identity's **context**.

Identity's *context* is partitioned into *workspaces*. This means
that a **workspace** is a subset of projects relevant for that identity,
and that at any moment of time any two *workspaces* do not overlap.
However projects may be added to or removed from *workspaces* as time goes.
Identity's *context* may constitute of just a single *workspace*.
Similarly a *workspace* may consist of just a single project.

A person may have just one identity or may choose to maintain
several identities (perhaps with different organization of workspaces).
[It is advised to have as little amount of identities as is reasonable]
(^
  An underline conjecture is that it should help to improve
  the connectedness of the global [Web of Trust][WoT].
);
this model tries to be sufficiently flexible in order to permit that.

**Workspace subkeys** are used for general-purpose signing of structural
artifacts (check-ins, posts, ticket changes, wiki edits etc.).
Each *workspace subkey* is limited in scope to a particular *workspace*
and must be neither used nor propagated outside of that *workspace*.
Each *workspace subkey* has an expiration date set.
(^
  Whenever a *workspace subkey* is introduced, prolongated or rotated
  there is an upper bound for the eligible lifespan.
  The exact optimal lifespan depends on the *workspace*.
  A lifespan of 14 months is suggested as a hard-coded maximum.
)
A *workspace subkey* may be used to revoke itself.  
In the following a "**subkey**" or "**work key**" means
a short form of "*workspace subkey*".

A particular identity at any moment of time may have just one
active *workspace subkey* within any *workspace*.
In the other words: several *workspace subkeys* of a particular identity
must not be used simultaneously within any project.
If the aforementioned clash is observed then identity should be
treated as misbehaving and suspicious.
(^
  It may be tempting to allow several simultaneous *workspace subkeys*
  within a project. In that case different devices could use
  a dedicated *workspace subkey*.
  Thus if a leak of the corresponding secret key should occur then it would
  be possible to identify (and fix) the device that permitted that leak.
  However, it looks like a significant complication of the model,
  which for the time being seems neither necessary nor desirable.
)

It is assumed that the safety of the *main key* is maintained
on a higher level than the safety of the *workspace subkeys*;
and that safety of *trusted revoker(s)* (if any) is somewhere in between.

[WoT]: https://en.wikipedia.org/wiki/Web_of_trust

<a id="defs"></a>
Auxiliary definitions
=====================

 * **Key**  
   Either secret or public key depending on the context.

 * **Owner** of a key  
   A person who generated a keypair (presumably in a secure environment).

 * **Signed artifact**  
   A structural artifact with a cryptographically valid digital signature.
   
 * **Legitimate** artifact  
   A signed artifact that was created according to the concious desire
   of a key's owner.

 * **Counterfeited** artifact  
   A signed artifact that was created without concious consent
   or desire of a key's owner.

 * **Leak** of a key  
   A copy of a secret key is or have been accessible for someone other than
   the owner (who may remain unaware of that).

 * Key **compromise**  
   Probability of a key's leak is not negligible.

 * Key is **lost**  
   Owner is unable to retrieve a copy of a secret key.
   Usual reasons include the loss of a media, a passphrase being forgotten
   or inability to gather enough shares of a distributed secret.

 * Key is **destroyed**  
   It is guaranteed that neither owner nor anybody else will ever be able
   to retrieve a copy of a secret key.
   A breakthrough in cryptoanalysis (for example, a discrete-log problem
   being broken) doesn't count as "retrieving".

 * **Claim**  
   A proposition signed by identity's *main key*. Identity that
   makes (signs) a claim will be refered to as a claim's **source**.

 * **Unitary claim**  
	Is one of the following:  
	
	 * introduction, expiration, prolongation, rotation or revocation
	   of identity's *workspace subkey*;

	 * *abrogation* of the *source* by itself;

	 * certificate of the *trusted revoker*.

 * **[Binary][^] claim**  
	Represents a quantified proposition about some other identity;
	this other identity will be refered to as
	*claim*'s [**destination**][^destination].  
	*Abrogation* of the *destination* by a *trusted revoker*
	may be viewed as [a special case of a *binary claim*][^sbc],
	provided that a *trusted revoker* is equal to the *source*.

 * Identity is **inhibited**  
   Owner may be unable to create or to propagate full set of legitimate
   artifacts; this may be caused by the  
    1. loss of secret key(s)
    1. lack of infrastructure
    1. blackmail
    1. gag order
    1. health conditions
    1. owner's death

 * Identity is **disconnected**  
   Identity is *inhibited* or the owner may be unable to *receive* full set
   of artifacts generated within all projects relevant to that identity.

 * Identity is **disintegrated**  
   *Counterfeiting* of artifacts signed by identity's *workspace subkey*
   either has already occurred or is anticipated.

 * Identity is **stolen**  
   Loss of exclusive control for identity's *main key*.

<a id="trust"></a>
Trust model
===========

The system should try to answer the **ultimate inquiry** from a user:  

> Is this particular signed artifact a *legitimate* one or *counterfeited*?

The answer to this question is guaranteed to be "it is legitimate"
if and only if at the moment when that signed artifact was created

 * identity wasn't *inhibited* and
 * identity had exclusive control for the corresponding secret key

The reality is more complicated because often there is a bit of uncertainty.
The system should derive a probabilistic answer based on the estimates
of probabilities for the values of the above predicates.

That calculation might use the reasoning about possible [temporal][^] sequence
of events and also the *claims* from identities within relevant project(s).

Propositions within *binary claims* fall into one of three categories:

 1. **Connectedness**  
    This is quantified as **ERL** (short for *expected response lag*)
    which estimates the typical duration of information [roundtrips][^].  
    It sums up durations that are needed for
     
     * *workspace*'s new information to reach *claim*'s *destination*,
     * *destination* to understand this information and prepare a response,
     * that response to reach substantial part of *workspace*'s participants.
    
 1. **Integrity**  
    Encapsulates safety of a particular *workspace subkey*
    and also willingness of the *destination*'s owner to revoke or rotate
    a *subkey* immediately upon the discovery of *key compromise*.  
    
    This is quantified as a transient probability that a signed artifact
    is *legitimate*. It is a tripple of scalars, where each scalar
    estimates aforementioned probability for a certain moment:

     * right after a signed artifact has been received,
     * a moment that is [two *ERL*s later][^when],
     * a moment that is [five *ERL*s after an artifact was received][^when]

    If the corresponding *workspace subkey* is revoked then
    all these probabilities are invalidated.
 
 1. **Trustworthiness**  
    Estimate of trust that *source* puts into *pairwise claims*
    signed by the *destination*.  
    
    This is quantified as an integer in the range `[-3;+3]` which
    represents a bias of a *claim*'s *destination* relative to its *source*
    on the abstract axis "*trustworthiness*".
    This abstract axis encapsulates and integrates three very different
    characteristics of a human being:

    * safety  
      — ability to prevent *counterfeiting* of *claims*
        (through a leak of the *main key* in particular);  
        &emsp; this aggregates
        
        * severity of threats
        * willingness to resist
        * resources for defense (such as skills, laws, money, etc.)
    
    * perspicacity  
      — ability to deduce the truth;
        about other *identities* in particular.
   
    * honesty:  
      — intolerance to the falsity of one's own propositions;
        one's own *claims* in particular.

    The integer values of 0, ±1, ±2 and ±3 may be interpreted as
    "same", "slightly", "noticeably" and "much" respectively.

A *claim* with proposition about *trustworthiness* will be referred to as
**t-claim**. *T-claim* is propagated to all
projects that are relevant for both the *source* and the *destination*.
*T-claims* form a global "social graph".

A *claim* with propositions about *connectedness* and *integrity*
will be referred to as **ci-claim**.
*CI-claim* is propagated to all projects that 

 * are relevant for both ends of the *claim*, and that
 * belong to the corresponding *workspace*
   (the one which propositions are about).

For a given signed artifact it is possible to estimate its *legitimacy*
provided that "social graph" contains a path from the identity who makes
an inquiry to the identity who signed that artifact.  
Probability that a signed artifact is *legitimate* may be computed for
arbitrary moment of time as weighted average of approximated *integrities*
from the available *ci-claim*s.  
The aforementioned weights are derived from the *t-claims*
using a computation over the underlying "social graph". This computation
starts from the identity who makes an *inquiry* and computes weights of
other identities in a [BFS-like][^BFS] manner, until the author of the
artifact is reached.

<a id="footnotes"></a>
Footnotes
=========

[^base32]:
  Something like [Crockford's Base32
  encoding](https://en.wikipedia.org/wiki/Base32#Crockford.27s_Base32).

[^revoker]:
  It's yet unclear which word is more appropriate: "trusted" or "designated".

[^indestroy]:
  This is a bit speculative because the signing of the
  "intentionally destroyed" *claim* has to precede
  the actual destruction of the last copy of a secret key;
  and that actual destruction may fail silently.

[^Binary]:
  It's unclear which word is more appropriate: "binary", "pairwise" or some other.

[^destination]:
  It's unclear which word is more appropriate: "destination", "target" or some other.

[^sbc]:
  This special case of a binary claim may be viewed as
  a claim about *trustworthiness*.

[^temporal]:
  The notion of "when" is rather complicated for a distributed system
  without a single source of trusted timestamps.
  The only thing that can be guaranteed is that the knowledge
  of the output of a *secure* hash function can not precede
  the knowledge of the corresponding input.

[^roundtrips]:
  The notion of "roundtrip" is blurry if there is no central server.
  In that case it is more about dissipation of information in
  "both directions" through the network of retransmitters
  (not all of which are necessarily participants of a project).

[^when]:
  The exact values of that delay is debatable. It is assumed that two *ERL*s
  might be enough for the *destination* to react on impersonation,
  and five *ERL*s might be enough for reaction from a *trusted revoker*
  or other participants of the *workspace*.  
  If the delay is modeled by [Erlang-2 distribution][ErlangK],
  then two *ERL*s give 91% probability that response has been received.

[^BFS]:
  [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search).
  Proceeds like an expanding concentric wave on the water.

[ErlangK]: https://en.wikipedia.org/wiki/Erlang_distribution#Erlang-k
Z a378d1b7bd41bf1b3df0f43278499a15