[ Home | Main Table Of Contents | Table Of Contents | Keyword Index ]

blob(n) 1 doc "Blob. General content storage with deduplication"

Name

blob - Blob - Base class, common API

Table Of Contents

Synopsis

  • package require blob

Description

Welcome to the Blob project, written by Andreas Kupries.

For availability please read Blob - How To Get The Sources.

While this package, from its name, looks like the public entrypoint of the system, it is not. This package is internal, providing the base class for all the others implementing actual storage backends.

The following sections are of interest only to developers intending to extend or modify the system, then. Everybody else can skip this document.

Public API

This section lists and describes all the public methods of a proper and functional blob storage. Some of them may have to be implemented by the derived class for a specific kind of storage.

Note further that not all of the public methods are for general use.

<instance> put-string blob

This method adds the string blob to the instance and returns the blob's uuid as the result of the method. Adding the same string multiple times actually adds it only on the first call, and all invokations return the same uuid.

<instance> put-file path

This method adds the (binary) string found in the file at the specified path to the instance and returns the blob's uuid as the result of the method. Adding the same content multiple times actually adds it only on the first call, and all invokations return the same uuid.

<instance> put-channel chan

This method adds the (binary) string found in the channel chan to the instance and returns the blob's uuid as the result of the method. Adding the same content multiple times actually adds it only on the first call, and all invokations return the same uuid.

The content is read from chan once, starting at the current location. After the call the channel is positioned at EOF. Note that the caller has to close the channel.

<instance> new

This method returns a boolean value indicating if the last call to one of the put-* methods actually added a new blob (true), or not (false).

<instance> get-string uuid

This method locates the contents of blob uuid and returns them as the result of the method. An error is thrown if uuid is not known to the instance.

This is an abstract method. Derived classes have to implement it.

<instance> get-channel uuid

This method locates the contents of blob uuid and returns a channel containing it as the result of the method. An error is thrown if uuid is not known to the instance.

The returned channel is read-only, binary, and positioned at the beginning of the blob content. No assurances are made about the ability to seek the channel. It is the responsibility of the caller to close the channel after use.

<instance> get-file uuid

This method locates the contents of blob uuid and returns the path of a file containing it as the result of the method. An error is thrown if uuid is not known to the instance.

The returned file should be considered transient. It is owned by the caller and can be moved, modified, and deleted at will. It is the responsibility of the caller to delete the file after use.

<instance> store-to-file uuid path

This method locates the contents of blob uuid and stores them into the file with the specified path. Any previous content of the file is overwritten by this operation. The result of the method is the empty string. An error is thrown if uuid is not known to the instance.

<instance> remove uuid

This method locates the blob uuid and removes it from the instance. The result of the method is the empty string. An error is thrown if uuid is not known to the instance.

This is an abstract method. Derived classes have to implement it.

<instance> clear

This method removes all blobs from the instance. After the call the instance is empty. The result of the method is the empty string.

This is an abstract method. Derived classes have to implement it.

<instance> size

This method determines the number of blobs found in the instance and returns that number as the result of the method.

This is an abstract method. Derived classes have to implement it.

<instance> names ?pattern...?

This method determines the uuids of all blobs found in the store which match one or more of the specified glob patterns, and returns a list containing them.

<instance> exists uuid

This method returns a boolean value indicating if the blob uuid is known to the instance (true), or not (false).

This is an abstract method. Derived classes have to implement it.

<instance> push to ?uuids?
<instance> push-async donecmd to ?uuids?

This method copies the blobs specified by the list of uuids from the instance to the specified peer to. This has to be an object exporting the same API as documented for blob. The result of the method is the empty string.

If no set of uuids is specified the operation will push all blobs found in the instance, as if "uuids == * had been specified".

Note that the elements of uuids are interpreted as glob patterns.

In the push-async form the execution is done through the event-loop, invoking the command prefix donecmd when the operation completes, with no additional arguments.

<instance> pull from ?uuids?
<instance> pull-async donecmd from ?uuids?

This method copies the blobs specified by the list of uuids from the specified peer from to the instance. The peer has to be an object exporting the same API as documented for blob. The result of the method is the empty string.

If no set of uuids is specified the operation will pull all blobs found in the peer, as if "uuids == * had been specified".

Note that the elements of uuids are interpreted as glob patterns.

In the pull-async form the execution is done through the event-loop, invoking the command prefix donecmd when the operation completes, with no additional arguments.

<instance> sync with ?uuids?
<instance> sync-async donecmd with ?uuids?

This method exchanges the blobs specified by the list of uuids with the specified peer with, copying from and to the instance, as needed. The peer has to be an object exporting the same API as documented for blob. The result of the method is the empty string.

If no set of uuids is specified the operation will exchange all blobs found in the instance and the peer, as if "uuids == * had been specified".

Note that the elements of uuids are interpreted as glob patterns.

In the sync-async form the execution is done through the event-loop, invoking the command prefix donecmd when the operation completes, with no additional arguments.

<instance> ihave-for-string uuids src
<instance> ihave-for-file uuids src
<instance> ihave-for-chan uuids src

These methods pull the blobs specified by the uuid glob patterns found in uuids from the src, an instance command of a blob store providing at least method names and the indicated get-* method. The result of the methods is the empty string.

<instance> ihave-async-string donecmd uuids src
<instance> ihave-async-file donecmd uuids src
<instance> ihave-async-chan donecmd uuids src

These are the asynchronous forms of the ihave-for-* methods above. They yield to the event-loop and invoke the donecmd on completion, without any additional arguments.

<instance> iwant-as-string uuids dst
<instance> iwant-as-file uuids dst
<instance> iwant-as-chan uuids dst

These methods push the blobs specified by the uuid glob patterns found in uuids to the dst, an instance command of a blob store providing at least method exists and the indicated put-* method. The result of the methods is the empty string.

<instance> iwant-async-string donecmd uuids dst
<instance> iwant-async-file donecmd uuids dst
<instance> iwant-async-chan donecmd uuids dst

These are the asynchronous forms of the iwant-for-* methods above. They yield to the event-loop and invoke the donecmd on completion, without any additional arguments.

<instance> iexchange-for-string uuids peer
<instance> iexchange-for-file uuids peer
<instance> iexchange-for-chan uuids peer

These methods exchange the blobs specified by the uuid glob patterns found in uuids with the peer, an instance command of a blob store providing at least method exists, names and the indicated put-* and get-* methods. The result of the methods is the empty string.

<instance> iexchange-async-string donecmd uuids peer
<instance> iexchange-async-file donecmd uuids peer
<instance> iexchange-async-chan donecmd uuids peer

These are the asynchronous forms of the iexchange-for-* methods above. They yield to the event-loop and invoke the donecmd on completion, without any additional arguments.

API to implement

This section lists and describes all the methods a derived class has to override to be a proper and functional blob store. This is not quite a subset of the methods listed in the Public API above, because it also contains a number of private and semi-private methods.

<instance> get-string uuid
<instance> remove uuid
<instance> clear
<instance> size
<instance> exists uuid

These are the public methods a derived class has to implement to become a functional blob store. Their detailed descriptions can be found in section Public API above.

<instance> PreferedPut

This method is used by the standard implementations of push, pull, and sync to decide which of the get-* and/or put-* methods to use for the transfer of blobs between the two instances.

The derived class has to implement it and return one of string, file and chan.

<instance> enter-string uuid blob

This method is used by the standard method put-string to enter the blob with uuid into the instance.

The derived class has to implement it and return a boolean value indicating whether the blob is new (true), or not (false).

This is a semi-private method. Public due to its name nearly no user has a reason to use it directly, and every reason not to, due to the danger of messing up the internals of the store in question. In other words, calling this method with an uuid and a blob which does not match that uuid is a recipe for (likely difficult to debug) failures.

Then why making it public ? For the cases where it actually is useful. Currently the only class in the project which uses this API from the outside of a store is blob::cache. It uses the method to hand incoming blobs directly through to the backend without incurring the cost of re-computing the uuid, which can be substantial.

<instance> enter-file uuid path

This method is used by the standard method put-file to enter the blob found in the file at path with uuid into the instance.

The derived class has to implement it and return a boolean value indicating whether the blob is new (true), or not (false).

This is a semi-private method. See the previous method for an explanation on why it is public despite its dangers.

<instance> Names ?pattern?

This method is used by the standard method names to search the instance for blobs whose uuid matches the pattern.

The derived class has to implement it and return a list with the matching uuids.

API hooks

This section lists the base class methods a derived class may override. While they have implementations these are generic and may not be as efficient as is possible for the derived class itself and its full access to its own datastructures.

<instance> TempFile

The standard implementation of method get-file uses this method to get a path to a temp file it can return to the user.

The standard implementation of this method returns a standard tempfile, as per the fileutil::tempfile command. Derived classes can re-implement it to make their own choices regarding the location of the temp files to return.

Package blob::fs is an example of this, returning temp files located under the base directory the blob store is configured with.

Push and pull overview

Bugs, Ideas, Feedback

Both the package(s) and this documentation will undoubtedly contain bugs and other problems. Please report such at Blob Tickets.

Please also report any ideas you may have for enhancements of either package(s) and/or documentation.

Keywords

blob, blob storage, content deduplication, content storage, deduplication, storage