[ Home | Main Table Of Contents | Table Of Contents | Keyword Index ]

atom(n) 1.1 doc "Atom. Packages for string interning and deduplication"

Name

atom - Atom - Base class, common API

Table Of Contents

Synopsis

Description

Welcome to the Atom project, written by Andreas Kupries.

For availability please read Atom - How To Get The Sources.

While this package, from its name, looks like the public entrypoint of the system, it is not. This package is internal, providing the base class for all the others implementing actual storage backends.

The following sections are of interest only to developers intending to extend or modify the system, then. Everybody else can skip this document.

API to implement

This sections lists and describes all the methods a derived class has to override to be a proper and functional string storage:

<instance> id string

This method's implementation has to add the specified string to the instance, and return its unique numeric identifier as the result of the method.

Multiple calls of this method for the same string have to return the same identifier.

<instance> str id

This method's implementation has to map the specified numeric id back to its string, and return that string as the result of the method.

An error must be thrown if the id is not known to the instance.

<instance> names

This method's implementation has to return a list of all strings which have been interned into the instance.

<instance> exists string

This method's implementation has to test if the specified string has been interned into the instance and return a boolean flag as the result of the method.

The result has to be true if the string is known, and false otherwise.

<instance> exists-id id

This method's implementation has to test if the specified numeric id is known to the instance and return a boolean flag as the result of the method.

The result has to be true if the id is known, and false otherwise.

<instance> size

This method's implementation has to return the number of interned strings known to the instance.

<instance> map string id

This method's implementation has to add the specified string to the instance and force an association with the specified numeric id.

The result of the method has to be the empty string.

An error has to be thrown if the id is already used for a different string.

<instance> clear

This method's implementation has to drop all string/id mappings from the instance. After this method has run the instance must be empty.

The result of this method must be the empty string.

API hooks

This sections lists base class method a derived class may override. While they have implementations these are generic and may not be as efficient as possible for the derived class and its full access to its own datastructures.

<instance> serialize

This method's implementation has to serialize the content of the instance, i.e. the string-to-id map and return it as the result of the method.

The result has to be a Tcl dictionary with the strings as keys and the associated identifiers as the values.

<instance> deserialize serialization

This method's implementation has to take a serialization as generated by method serialize and add it to the instance. It has to use the semantics of method map for this, to preserve the exact string/id mapping found in the input. On the flip-side this means that existing mappings may interfere, in that case an error has to be thrown.

The result of the method has to be the empty string.

<instance> load serialization

This method's implementation has to take a serialization as generated by method serialize and have it replace the previous content of the instance.

The result of the method has to be the empty string.

<instance> merge serialization

This method's implementation has to take a serialization as generated by method serialize and add the strings it contains to the instance, per the semantics of method id.

The ids found in the serialization do not matter and have to be ignored.

The result of the method has to be the empty string.

Bugs, Ideas, Feedback

Both the package(s) and this documentation will undoubtedly contain bugs and other problems. Please report such at Atom Tickets.

Please also report any ideas you may have for enhancements of either package(s) and/or documentation.

Keywords

deduplication, interning, storage, string deduplication, string interning, string storage