[ Home | Main Table Of Contents | Table Of Contents | Keyword Index ]

marpatcl_devguide(n) 1 doc "Marpa/Tcl, a binding to libmarpa"

Name

marpatcl_devguide - Marpa/Tcl - The Developer's Guide

Table Of Contents

Description

Welcome to Marpa/Tcl, a Tcl binding to the "libmarpa" parsing engine.

Please read the document Marpa/Tcl - Introduction to Marpa/Tcl, if you have not done so already. It provides an overview of the whole system.

Audience

This document is a guide targeted at all developers working on the internals of Marpa/Tcl, i.e. maintainers fixing bugs, developers adding functionality, etc.

Please read

  1. Marpa/Tcl - How To Get The Sources and

  2. Marpa/Tcl - The Installer's Guide

first, if that was not done already. Here we assume that the sources are already available in a directory of the reader's choice, and that the reader not only knows how to build and install them, but also has all the necessary requisites to actually do so. The guide to the sources in particular also explains which source code management system is used, where to find it, how to set it up, etc.

Developing for Marpa/Tcl

System Architecture

The system can be split into roughly six sections, as seen in the figure below. The seventh, highlighted in green, is libmarpa itself, which is technically outside of the system.

architecture

In more detail:

Applications

At the top we have the marpa-gen application. It integrates and uses all of the packages to provide a parser generator reading grammars specified using SLIF and producing results in a variety of formats.

SLIF

The packages supporting the SLIF language for grammar specifications. This is a very close sibling to the SLIF language used by Marpa::R2, the current stable version of the Perl binding.

It can be further divided into groups for parsing SLIF, the semantics for translating a parse into a container, a container for SLIF grammars, and the processing of literals (strings and character classes).

Generators

The packages for producing a variety of formats from a SLIF grammar container. Further divided into serialization of containers as Tcl code, parsers and lexers based on the Tcl and C runtimes, and reconstruction of SLIF from a container.

rt-Tcl

The package marpa::runtime::tcl. It implements a parse engine in Tcl. This uses marpa::c, which is a thin wrapper around libmarpa.

rt-C

The package marpa::runtime::c. It implements a parse engine in C, directly on top of libmarpa, without wrappers.

Unicode data and utilities

The marpa::unicode package. It provides access to named character classes, case folding, de- and encoding codepoints to various representations, character class operations, etc.

libmarpa

Jeffrey Kegler's base library implementing an Earley parser which has the fixes by John Aycock, Nigel Horspool and Joop Leo. This is the foundation for the rest of the system.

The package dependencies are a bit large, and will be shown per-section, in the documentation of each section.

  1. Marpa/Tcl - Application.

  2. Marpa/Tcl - SLIF.

  3. Marpa/Tcl - Generation.

  4. Marpa/Tcl - Runtime/Tcl.

  5. Marpa/Tcl - Runtime/C.

  6. Marpa/Tcl - Unicode Data & Utilities.

Directory structure

Helpers
"tools/"

This directory contains helper scripts.

"tools/utf-viewer.tcl"

This helper reads a file containing UTF-8 encoded unicode text and prints the contents in decoded form, especially showing the construction of multi-byte characters.

"tools/unidata.tcl"

This helper reads the unicode tables stored in directory "unidata/" and generates a mix of Tcl and C data structures for use within Marpa/Tcl. The integration point is package marpa::unicode (directory "unicode/").

"generated/"

The directory where "tools/unidata.tcl" places the generated files.

"unidata/"

The directory where "tools/unidata.tcl" reads the unicode tables from.

"bootstrap/"

This directory contains the specifications for SLIF and literal grammars, and the helpers needed to regenerate their parsers.

"bootstrap/marpa-tcl-slif/slif"

SLIF specification of the SLIF grammar.

"bootstrap/marpa-tcl-slif/literal"

SLIF specification of the Literal grammar.

"bootstrap/remeta"
"bootstrap/remeta-tcl"

These helper applications regenerate the SLIF and literal parsers from their grammars. The first variant generates C-based parsers, the other Tcl-based parsers.

Documentation
"doc/"

This directory contains the documentation sources. The texts are written in doctools format, whereas the figures are written in tklib's dia(gram) package and application.

"embedded/"

This directory contains the documentation converted to regular manpages (nroff) and HTML.

It is called embedded because these files, while derived, are part of the fossil repository, i.e. embedded into it. This enables fossil to access and display these files when serving the repositories' web interface. The "Command Reference" link at https://core.tcl-lang.org/akupries/marpa is, for example, accessing the generated HTML.

Examples
"languages/"

This directory contains several worked examples of SLIF grammars for various languages, parsers generated for them, and the infrastructure to build and test them.

Each example resides in its own subdirectory, named after the language it implements a parser for.

We currently have examples for

  1. JSON

  2. (Tcllib) doctools

  3. heredoc

  4. min-dt

Heredoc is a general demonstration how `here` documents can be implemented using stop markers and post-lexeme events.

min-dt is a reduced form of doctools, used to work out the general shape of vset and include processing via stop markers and lexeme events

Package Code, General structure
"c/"

Files for the marpa::c package. It provides a very thin class-based wrapper around the data structures of libmarpa. The entrypoint is "marpa_c.tcl".

"gen-common/"

The implementations of

  1. marpa::gen::runtime::c,

  2. marpa::gen::runtime::tcl,

  3. marpa::gen::remask, and

  4. marpa::gen::reformat

The first two provide the shared code for the main generator packages handling creation of parsers and lexers for Tcl and C runtimes. The other two are also shared code, at a lower level.

"gen-format/"

The main generator package, all placed under the namespace marpa::gen::format.

clex-critcl

Lexer using the C runtime, embedded into Tcl via critcl.

cparse-critcl

Parser using the C runtime, embedded into Tcl via critcl.

cparse-raw

Parser using the C runtime, raw C, no embedding.

gc-compact

See gc, minimal whitespace.

gc-c

See gc, reduced as if targeted at the C runtime.

gc-tcl

See gc, reduced as if targeted at the Tcl runtime.

gc

Container serialization as nested Tcl dictionary.

slif

Reconstructed SLIF.

tlex

Lexer using the Tcl runtime.

tparse

Parser using the Tcl runtime.

"runtime-c/"

Files for the marpa::runtime::c package. The entrypoint is "marpa_runtime_c.tcl".

Note that the two runtimes have very similar internal architecture.

"runtime-tcl/"

Files for the marpa::runtime::tcl package. The entrypoint is "pkg_entry.tcl".

Note that the two runtimes have very similar internal architecture.

"slif-container/"

SLIF grammar support. Provides the package marpa::slif::container, to hold parsed grammars. The entrypoint is "pkg_entry.tcl".

"slif-literal/"

SLIF grammar support. Provides the packages

  1. marpa::slif::literal::parser

  2. marpa::slif::literal::norm

  3. marpa::slif::literal::parse

  4. marpa::slif::literal::reduce::2c4tcl

  5. marpa::slif::literal::reduce::2tcl

  6. marpa::slif::literal::redux

  7. marpa::slif::literal::util

These are helper packages dealing with literals, from parsing over normalization to backend-specific reduction. The parser core is generated from a SLIF specification.

"slif-parser/"

SLIF grammar support. Provides the package marpa::slif::parser, to translate SLIF text into an abstract syntax tree (AST). The entrypoint is "pkg_entry.tcl".

Note: This parser is generated from a SLIF text itself, and can be used to bootstrap further changes to the SLIF specification.

"slif-precedence.tcl"

SLIF grammar support. Provides the package marpa::slif::precedence. This is a helper package containing the algorithm used to rewrite a set of grammar rules with precedence into an equivalent set of rules without.

"slif-semantics/"

SLIF grammar support. Provides the package marpa::slif::semantics, to translate grammars represented by an AST (parse result) into a container. The entrypoint is "pkg_entry.tcl".

"unicode/"

The files for package marpa::unicode. This package also includes the files under "generated/". If such files do not exist at build time the "tools/unidata.tcl" helper is automatically invoked to generate them.

"util/"

Files for the marpa::util package, a set of general utilities. The entrypoint is "pkg_entry.tcl".

Bugs, Ideas, Feedback

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such at the Marpa/Tcl Tracker. Please report any ideas for enhancements you may have for either package and/or documentation as well.

Keywords

aycock, character classes, context free grammar, document processing, earley, horspool, joop leo, lexing, libmarpa, nigel horspool, parsing, regex, table parsing