Artifact 8da8313b2c8e81a7f4e3f3ea0d4d1f9d9dba23c9118bdcec18675a55be547a36:
- File www/remizstd.md — part of check-in [9a0995ca4b] at 2022-03-10 00:24:16 on branch trunk — Finish web documentation. (user: alexa size: 20540)
title: RemiZstd Manual author: Remilia Scarlet date: March 2022 ...
Introduction
RemiZstd is a set of bindings for Nim of the ZStandard C library. They are essentially a port of the Crystal bindings by Didactic Drunk. The original Crystal bindings can be found here: https://github.com/didactic-drunk/zstd.cr
If you want an alternative set of bindings, check out https://github.com/wltsmrz/nim_zstd
Features
Some features include:
- Two APIs: a simple context-based one, and one for (de)compressing to/from streams.
- API calls allow reusable buffers to reduce GC overhead.
export ZSTD_CLEVEL=1sets the default compression level just like the zstd command line utilities.- Probably pretty fast
- Custom dictionaries (not well tested)
Getting Set Up
You will need the ZStandard development files on your system. Something like one of the following should work on Linux:
slackpkg install zstdpacman -S zstdapt-get install libzstd-devyum install libzstd-devel
Once you have the ZStandard development files installed, you will then need to prepare the repository for use. Note that I do not use Git nor Mercurial, so the process here this may seem a bit out of the ordinary. However, in the end, you can still use Nimble just like with other Nim packages.
- Clone this repository manually using Fossil.
- Change into the cloned directory.
- Run
nimble installto install the library.
Usage
To import all of the library (minus the low-level FFI stuff), you can just use
import remizstd. Or, you can import just what you need:
remizstd/common: Custom dictionaries and some common types/methods/valuesremizstd/compress: Context-based compressionremizstd/decompress: Context-based decompressionremizstd/compstream: Compressing streamsremizstd/decompstream: Decompression streamsremizstd/libffi: Low-level C bindings
Full example:
import std/streams
import remizstd/[compstream, decompstream]
# Generate the test file
let filename = "/tmp/somefile.txt"
writeFile(filename, "This is a test file for remizstd :D")
# Compress to a file stream
var outFile = newFileStream("/tmp/somefile.txt.zst", fmWrite)
assert(not isNil(outFile))
withCompressStream(outFile, 9, cio): # Compression level 9
cio.syncClose = true
cio.write(readFile(filename))
# We set cio.syncClose to true, so outFile is automatically closed as well
# Decompress from one file stream to another file stream
let cfilename = "/tmp/somefile.txt.zst"
let dfilename = "/tmp/somefile.txt.orig"
var inFile = newFileStream(cfilename, fmRead)
outFile = newFileStream(dfilename, fmWrite)
assert(not isNil(outFile))
withDecompressStream(inFile, dio):
outFile.write(dio.readAll())
# Note we didn't set dio.syncClose to true, so we have to close our output file manually
outFile.close()
assert(readFile(filename) == readFile(dfilename))
Basic Streaming API Usage
The streaming API is used to compress data to a stream, or decompress data from a stream. It also includes a few templates that provide useful constructs that will automatically handle closing of the (de)compression stream for you.
Example usage of the streaming API:
import std/streams
import remizstd/[compstream]
# Compression to a string stream
var dest = newStringStream()
withCompressStream(dest, cio):
cio.write("This is some test data")
Basic Context-based API Usage
The context-based API uses wrappers around the lower-level ZstdCCtx
pointers. These allow you to allocate a (de)compression context once, then
re-use it for successive operations. This will generally be more friendly
towards your system's memory.
Note: Contexts are mostly just an optimization for speed and resource usage. It does not change the compression ratio.
Note: In multi-threaded environments, use one different context per thread for parallel execution.
Example usage of the context-based API:
import remizstd/[compress, decompress]
let buf = "this is a test buffer"
# Compression using a context
var
cctx = newCompressCtx(1) # Compression level of 1
cbuf = cctx.compress(buf)
# Decompression using a context
var
dctx = newDecompressCtx()
dbuf = dctx.decompress(cbuf)
assert(dbuf == buf)
API Reference
remizstd Module
[]{#libversion}
func libVersion(): string
Returns the current version of libzstd.
remizstd/common Module
Types
[]{#compressionlevel}
type CompressionLevel* = cint
A compression level.
[]{#zstderr}
type ZStdError = object of CatchableError
All errors in remizstd are of this type, or a subtype.
[]{#compressdictobj}
type CompressDictObj = object
A wraper for a ZstdCDict pointer. You should always use
CompressDict instead.
[]{#compressdict}
type CompressDict = ref CompressDictObj
A managed wraper for a ZstdCDict pointer.
[]{#decompressdictobj}
type DecompressDictObj = object
A wraper for a ZstdDDict pointer. You should always use
DecompressDict instead.
[]{#decompressdict}
type DecompressDict = ref DecompressDictObj
A managed wraper for a ZstdDDict pointer.
[]{#dict}
type Dict = ref object of RootObj
An in-memory representation of a custom dictionary. It has no exported fields.
Globals and Constants
[]{#defaultcomplevel}
var DefaultCompLevel: CompressionLevel
The default compression level. When the bindings are first loaded, this will
check to see if the ZSTD_CLEVEL environment variable is set. If it is, it
uses the integer value from that variable. Otherwise this defaults to 3.
[]{#mincomplevel}
var MinCompLevel*: CompressionLevel
The lowest supported compression level. This will be set when the program is
started using a call to the lower level min_c_level().
[]{#maxcomplevel}
var MaxCompLevel*: CompressionLevel
The highest supported compression level. This will be set when the program is
started using a call to the lower level max_c_level().
Procedures and Functions
[]{#newdict}
proc newDict*(buf: string, level: CompressionLevel = DefaultCompLevel): Dict
Creates a new custom dictionary using buf as the dictionary data.
[]{#dictid}
proc dictID*(d: Dict): cuint
Returns the ID of the Dict.
remizstd/compress Module
Types
[]{#compressctxobj}
type CompressCtxObj* = object
A wrapper and some state around ZstdCCtx used to compress data. You should
use CompressCtx instead.
[]{#compressctx}
type CompressCtx* = ref CompressCtxObj
A managed context used to compress data.
Procedures and Functions
[]{#setdict-compressctx}
proc setDict*(ctx: CompressCtx, d: Dict)
Sets the custom dictionary on a CompressCtx.
[]{#newcompressctx-level}
proc newCompressCtx*(level: CompressionLevel, dict: Dict = nil): CompressCtx
Creates a new context for compressing data at the given compression level. You can optionally pass a custom dictionary to this to use for compression.
Example showing a custom dictionary:
import remizstd/[common, compress]
let myCompLevel = 7
# Load and create custom dictionary with compression level 7
let dictData = readFile("/path/to/dict/file.dat")
var dictionary = newDict(dictData, myCompLevel)
# Compression using a context
var
cctx = newCompressCtx(myCompLevel, dictionary)
cbuf = cctx.compress("Just some test data to compress, could be binary instead")
[]{#newcompressctx}
proc newCompressCtx*(dict: Dict = nil): CompressCtx
Creates a new context for compressing data using the DefaultCompLevel. You can optionally pass a custom dictionary to this to use for compression.
[]{#compress}
proc compress*(ctx: CompressCtx, src: string): string
Compresses src using the given context, then returns the compressed data.
Example: ```nim import remizstd/compress
Compression using a context
var cctx = newCompressCtx(1) # Compression level of 1 cbuf = cctx.compress("this is some test data)
cbuf now holds the compressed data
\
\
[]{#level-compressctx}
nim func level*(ctx: CompressCtx): CompressionLevel ```
Returns the current compression level for the context.
[]{#setlevel-compressctx}
proc setLevel*(ctx: CompressCtx, level: CompressionLevel)
Changes the context's compression level.
[]{#threads-compressctx}
func threads*(ctx: CompressCtx): int
Returns the number of threads the context will use for compression.
[]{#setthreads-compressctx}
proc setThreads*(ctx: CompressCtx, num: int)
Sets the number of threads the context will use for compression.
[]{#checksum-compressctx}
func checksum*(ctx: CompressCtx): bool {.inline.}
Returns true if the checksum flag is set in the context, or false otherwise.
[]{#setchecksum-compressctx}
proc setChecksum*(ctx: CompressCtx, val: bool) {.inline.}
Changes whether or not the checkum flag is set in the context.
[]{#compressbound}
func compressBound*(ctx: CompressCtx, size: int): int
Given data of length size, this returns the maximum compressed size in worst
case single-pass scenario.
[]{#memsize-compressctx}
func memsize*(ctx: CompressCtx): csize_t
Returns the size of the wrapped ZstdCCtx in memory.
remizstd/decompress Module
Types
[]{#decompressctxobj}
type DecompressCtxObj* = object
A wrapper and some state around ZstdDCtx used to compress data. You should
use DecompressCtx instead.
[]{#decompressctx}
type DecompressCtx* = ref DecompressCtxObj
A managed context used to decompress data.
[]{#framesizeunknownerror}
type FrameSizeUnknownError* = object of ZstdError
Indicates that the library cannot automatically determine the destination size. Instead, you should use the streaming API in remizstd/decompstream.
See decompress().
Procedures and Functions
[]{#setdict-decompressctx}
proc setDict*(ctx: DecompressCtx, d: Dict)
Sets the custom dictionary on a decompression context.
[]{#newdecompressctx}
proc newDecompressCtx*(dict: Dict = nil): DecompressCtx
Creates a new context for decompressing data. You can optionally pass a custom dictionary to this to use for decompression.
[]{#framecontentsize}
func frameContentSize*(ctx: DecompressCtx, src: string): (uint64, bool)
Attempts to determine the size of src after decompression. If successful,
this returns the size of src after decompression and true. If this cannot
determine the size, this returns zero and false.
[]{#decompress}
proc decompress*(ctx: DecompressCtx, src: string, size = 0): string
Decompresses src and returns the decompressed data. If size is zero, then
this attempts to determine the correct size for the return value by calling
frameContentSize(). If size is zero and this cannot
determine the size of the return value automatically, this will raise a
FrameSizeUnknownError.
Example showing compression, then decompression: ```nim import remizstd/[compress, decompress]
let buf = "this is a test buffer"
Compression using a context
var cctx = newCompressCtx(1) # Compression level of 1 cbuf = cctx.compress(buf)
Decompression using a context
var dctx = newDecompressCtx() dbuf = dctx.decompress(cbuf)
assert(dbuf == buf) ```
[]{#memsize-decompressctx}
func memsize*(ctx: DecompressCtx): csize_t
Returns the size of the wrapped ZstdDCtx in memory.
remizstd/compstream Module
Types
[]{#compressstream}
type
CompressStream* = ref object of RootObj
syncClose*: bool
Represents a stream that uncompressed data can be written to in order to be
compressed. The syncClose field controls whether or not the underlying stream
is closed when the close() proc is called.
A CompressStream essentially wraps CompressCtx to provide a
stream-like API.
Note: This does not currently implement the full api as found in std/streams.
[]{#level-compressstream}
func level*(strm: CompressStream): CompressionLevel {.inline.}
Returns the current compression level for this stream.
[]{#setlevel-compressstream}
proc setLevel*(strm: CompressStream, level: CompressionLevel) {.inline.}
Sets the compression level for this stream.
[]{#compressstream-threads}
func threads*(strm: CompressStream): int {.inline.}
Returns the number of threads this stream will use for compression.
[]{#compressstream-setthreads}
proc setThreads*(strm: CompressStream, num: int) {.inline.}
Sets the number of threads this stream will use for compression.
[]{#compressstream-checksum}
func checksum*(strm: CompressStream): bool {.inline.}
Returns true if the checksum flag is used by the underlying compression
context, or false otherwise.
[]{#compressstream-setchecksum}
proc setChecksum*(strm: CompressStream, val: bool) {.inline.}
Sets whether or not the underlying context uses the checkum flag.
[]{#compressstream-closed}
func closed*(strm: CompressStream): bool {.inline.}
Returns true if the stream is closed, or false otherwise. This does not
consider the underlying stream.
[]{#newcompressstream-level}
proc newCompressStream*(io: Stream, level: CompressionLevel, outputBufSize = 0): CompressStream
Creates a new stream for compression using the given compression level. Data
written to this stream will be compressed and output to io. If
outputBufSize is zero, then the default output stream size as reported by
libzstd will be used.
[]{#newcompressstream}
proc newCompressStream*(io: Stream): CompressStream
Creates a new stream for compression using the
DefaultCompLevel. Data written to this stream will be
compressed and output to io.
[]{#write}
proc write*(strm: CompressStream, buf: string) {.inline.}
Compresses buf and writes the compressed data to the underlying stream.
[]{#compressstream-close}
proc close*(strm: CompressStream)
Closes the compression stream. If the syncClose field is true, then the
underlying stream is also closed, otherwise it remains open.
Templates
[]{#withcompressstream}
template withCompressStream*(io, strmVar, forms: untyped)
Creates a CompressStream bound to strmVar that will write
compressed data to io, then executes forms inside of a block. The
compression stream will use the DefaultCompLevel. This
ensures that close() is called when the block exits.
Example:
import std/streams
import remizstd/[compstream]
# Compression to a string stream
var dest = newStringStream()
withCompressStream(dest, cio):
cio.write("This is some test data")
template withCompressStream*(io, level, strmVar, forms: untyped)
Creates a CompressStream bound to strmVar that will write
compressed data to io using the given compression level, then executes forms
inside of a block. This ensures that close() is called
when the block exits.
[]{#withcompressstream-level-buffer}
template withCompressStream*(io, level, outputBufSize, strmVar, forms: untyped) =
Creates a CompressStream bound to strmVar that will write
compressed data to io using the given compression level, then executes forms
inside of a block. This also sets the size of the output buffer. This ensures
that close() is called when the block exits.
remizstd/decompstream Module
Types
[]{#decompressctx}
type
DecompressStream* = ref object of RootObj
syncClose*: bool
Represents a stream that compressed data can be written to in order to be
decompressed. The syncClose field controls whether or not the underlying
stream is closed when the close() proc is called.
A DecompressStream essentially wraps DecompressCtx to
provide a stream-like API.
Note: This does not currently implement the full api as found in std/streams.
Procedures and Functions
[]{#dict-decompressstream}
func dict*(strm: DecompressStream): Dict {.inline.}
Returns the custom dictionary assigned to the underlying compression context.
[]{#setdict-decompressstream}
proc setDict*(strm: DecompressStream, d: Dict) {.inline.}
Sets the custom dictionary assigned to the underlying compression context.
[]{#newdecompressstream}
proc newDecompressStream*(io: Stream, dict: Dict = nil, inputBufSize = 0): DecompressStream
Creates a new DecompressStream. Data written to this
stream will be decompressed and output to io. If inputBufSize is zero, then
the default input stream size as reported by libzstd will be used. dict may
optionally be a custom decompression dictionary.
[]{#setio}
proc setIO*(strm: DecompressStream, newStream: Stream)
Sets a new underlying stream to write decompressed data into. This re-opens the DecompressStream.
[]{#read-num}
proc read*(strm: DecompressStream, num: Natural): (string, int)
Decompresses up to num bytes of data, then returns the decompressed data and
the actual number of bytes that was decompressed.
This will return an empty string and zero when the end of the compressed data is reached.
[]{#read-dest}
proc read*(strm: DecompressStream, dest: var string): (string, int)
Reads up to len(dest) bytes of data into dest, then returns the actual
number of bytes that was decompressed into dest.
This will return zero when the end of the compressed data is reached.
[]{#readall}
proc readAll*(strm: DecompressStream, bufferSize: Natural = 1024 * 1024): string
Decompresses all data from the stream and returns it.
[]{#close-decompressstream}
proc close*(strm: DecompressStream)
Closes the decompression stream. If the syncClose field is true, then the
underlying stream is also closed, otherwise it will remain open.
Templates
[]{#withdecompressstream}
template withDecompressStream*(io, strmVar, forms: untyped)
Creates a DecompressStream bound to strmVar that will
write decompressed data to io, then executes forms inside of a block. This
ensures that close() is called when the block exits.
Example, decompressing between streams:
import std/streams
import remizstd/[decompstream]
# Decompress a file stream into a string stream
var src = newFileStream("/path/to/compressed.zstd", fmRead)
var dest = newStringStream()
withDeompressStream(dest, dio):
src.write(dio.readAll())
Example, decompressing to a string:
import std/streams
import remizstd/[decompstream]
# Decompress a file stream into a string stream
var src = newFileStream("/path/to/compressed.zstd", fmRead)
withDeompressStream(dest, dio):
echo "Decompressed data:"
echo dio.readAll()
[]{#withdecompressstream-buffer}
template withDecompressStream*(io, inputBufSize, strmVar, forms: untyped)
Creates a DecompressStream with a custom buffer size bound
to strmVar that will write decompressed data to io, then executes forms
inside of a block. This ensures that close() is
called when the block exits.
[]{#withdecompressstream-buffer}
template withDecompressStream*(io, dict, inputBufSize, strmVar, forms: untyped)
Creates a DecompressStream with a custom dictionary and
buffer size bound to strmVar that will write decompressed data to io, then
executes forms inside of a block. This ensures that
close() is called when the block exits.