Tcl Source Code: Artifact [d28b2ca9f4]

Artifact d28b2ca9f495a9dc0323ceaf02b79014ae39da1b:

File parse.txt — part of check-in [a7c8af4793] at 2015-03-01 21:21:37 on branch spjuth-parse — Added info text (user: pspjuth size: 3179)
~Introduction

Parsing Tcl code is useful for a number of applications such as syntax
checking, cross referencing code, macro expansion or a code beautifier.

Parsing Tcl from Tcl is currently possible but rather cumbersome and
feels a bit strange when you are so close to the master of Tcl parsing,
the Tcl core itself.

To keep things simple, it is enough to expose Tcl_ParseCommand and
Tcl_ParseExpr, and the rest can be done at the script level.

Note from dpg: spjuth, leave room in your TIP for extensions like those
on dgp-refactor branch, please

~Rationale

~~Inputs and outputs

Input to parsecommand: script
Output from parsecommand: list of tokens

Output: Whether there was a parse error and the error message.
This could be an error exception or some other return code but since
a parsing error is expected from a parser it would lead to tricky
catch code whenever used, so some other way is preferred.
And since an empty script is valid, an empty list of tokens is not
useful as error indicator either. 

Input: A starting index, to parse more than one command in a script.
Output: End index, to use as start index in subsequent calls.

~Specification

~~tcl::parsecommand

tcl::parsecommand ?-comment? ?-all? ?-append? ?-start index? ?-error errorVar? script tokenVar

Returns the index where the parse ended, or -1 if an error occured.
If an error occurs the error message is left in errorVar, if specified.

The return value can be used as -start in subsequent calls to parsecommand
thus simplifying parsing an entire script.

A list of tokens is put in tokenVar.  Normally this is a one element list
consisting of a command token, or an empty list when no command was found.

If -comment is set, a comment token is added if a comment preceeds the
command, otherwise comments are ignored.  In this case the command
normally returns a two element list, one comment token and one command token.

If -all is set, whitespace, semicolon and comment tokens are added as needed
to make it possible to reproduce the script from the token tree.

If -append is set tokens are appended to the variable.

~~Token

A token consist of a list with elements:
type string index length subtokenlist

type: token type such as command/word/var/text
string: The text for the entire token.
index:  Start of token
length: Length of token
subtokenlist:  A list of tokens. Empty if there are no subtokens.

Length and string are redundant, each can be computed using the other.
But length is cheaper to return, and it might be beneficial to have
a variation where the string is limited to e.g. leaves to save time.
By including length, it makes the structure more flexible.

~~tcl::parseexpr

parseexpr ?-all? expr tokenVar

~Examples

Parsing a script becomes something like :
proc parseScript {script} {
    set len [string length $script]
    set i 0
    set tokens {}
    while {$i < $len} {
        set i [parsecommand -append -start $i $script tokens]
        if {$i < 0} {
            # Do something
            break
        }
    }
    return $tokens
}

~Other possibilites
Control depth.  Just return command tokens? Down to word tokens?
Automatically parse an entire script.