The www project
www manual page
www::websocket manual page
www::http2 manual page
www::proxypac manual page
Installation
The www package comes packaged as a set of Tcl modules. The latest check-in can be download in two formats:- Tarball: www.tar.gz
- ZIP archive: www.zip
For people unfamiliar with the Tcl module system: Modules will only be found by a package require command if they are placed in a directory mentioned in the list of module paths. The current list can be retrieved using tcl::tm::path list. Alternatively, the path where the www module is located can be added to the list of module paths using tcl::tm::path add $dir.
The module system doesn't search sub-directories. So the top-level www directory must be omitted when copying the package to the target directory. The final structure should look as follows (version numbers may differ):
<directory on the list of module paths> ├─ www-2.3.tm └─ www ├─ digest-2.0.tm ├─ http2-1.0.tm ├─ proxypac-1.0.tm ├─ socks-1.0.tm └─ websocket-1.0.1.tmThe tests and doc directories, as well as the license.terms file do not need to be copied.
Introduction
The www package has been conceived to make dealing with web resources in Tcl much easier than it was before. The common method for working with the HTTP protocol in Tcl used to be the http package. However, the http package only does a very low-level request/response transaction. Tcl script developers had to deal with all the error handling and higher level protocol interactions every time they use the package. This makes the http package quite cumbersome to use. On top of that, the package has a number of issues. The www package has been written from scratch to be able to avoid those issues.
To demonstrate the benefits, here's an example of how to get a web page with the www package:
package require www if {[catch {www get http://www.tcl.tk/} data]} { puts stderr "Failed to get the web page" } else { puts $data }For anyone who has used the http package, the advantages should be obvious: No need to deal with tokens, three possible levels of errors, or cleaning up. There are also a number of less obvious advantages: Redirects are automatically followed and cookies are collected and provided back to the site in the next call as applicable. Proxies, encrypted connections(1), and authentication(2) are also handled automatically. In case of transient errors, the request is retried a few times. If any fatal errors are encountered while fetching the resource, this is indicated using the standard Tcl exception reporting methods. This means that normal exception handling using the Tcl catch and try commands can be applied.
Metadata
Under certain circumstances it may be necessary to access the metadata of the request. For example to get the content type of the returned data, or to provide a more detailed error message. This information can be obtained from the return options:
if {[catch {www get http://www.tcl.tk/} data meta]} { puts stderr "Failed to get the web page:\ [dict get $meta status code] [dict get $meta status reason]" } else { if {[string match {text/*} [dict get $meta headers content-type]]} { puts $data } else { set name [file tail [dict get $meta uri]] set f [open [file join $downloads $name] wb] puts -nonewline $f $data close $f } }
All headers are listed under the metadata 'headers' key using lowercase names. HTTP field names are case-insensitive, so this makes it easy to locate the desired header using dict exist and dict get commands. The headers metadata field is actually a key-value list to accommodate for headers that occur multiple times. But for most situations dict operations on this data will work just fine. The www header command is provided to properly deal with the headers list.
In case of errors, the different subcommands generate detailed error types that can be used in a try command to distinguish different situations:
try { www get http://www.tcl.tk/ } on ok {data} { puts $data } trap {WWW CODE 4XX 404} {} { puts stderr "Page not found" } trap {WWW CODE 4XX} {err meta} puts stderr "Client error: [dict get $meta status code]" } trap {WWW CODE 5XX} {err meta} puts stderr "Server error: [dict get $meta status code]" } trap {WWW CODE} {err meta} puts stderr "Unexpected response code: [dict get $meta status code]" } trap {WWW URL} {err} puts stderr "Invalid URL: [string toupper $err 0 0]" } trap {WWW} {err} puts stderr "Failed to get the web page: [string toupper $err 0 0]" }
Background operation
The www command does not provide a special way for the HTTP operation to be done in the background. Ever since coroutines have been added to Tcl, there is no need for such facilities anymore. Commands such as the www command can just be executed from within a coroutine to the same effect. The www command is in fact completely non-blocking. Especially a DNS delay during the socket setup would normally block an application. In case of DNS problems, that can amount to several seconds. The www package avoids this problem by delegating the socket connection setup to a separate thread. While that thread is waiting for the connection to complete, fail, or time out, the main thread can continue to service events generated by the application. When the connection completes successfully, the channel is transferred to the main thread(3). If the www command is invoked outside of a coroutine, it will use the vwait command to wait for the operation to complete. Because multiple vwait calls will nest, it is advised to only call the www command from within a coroutine in all but the most simple applications.Cookies
If a web site returns one or more cookies, they are kept in a sqlite database. In subsequent requests the appropriate cookies are returned to the server. The package handles cookie expiry and will only send secure cookies over secure connections. By default, an in-memory sqlite database is used, causing all cookies to be lost when the application terminates. But a cookie file can be configured, in which case persistent cookies will be saved on disk for reuse across application restarts.Extensions
In addition to the base www package, this project also hosts a number of extensions for the www package.
- WebSocket
- The www::websocket package adds the ability to connect to a WebSocket server. The www websocket command returns an object command that can be used for communications over the WebSocket connection.
- HTTP/2
- The www::http2 package adds support for the HTTP/2 protocol(4). This package handles HTTP/2 for both http and https URIs. But because the major web browsers have decided to only support HTTP/2 over TLS (h2), there may not be many web sites that support clear text HTTP/2 (h2c).
- Proxy auto-configuration
- It is quite common for a corporate proxy server to provide the proxy rules by means of a proxy auto-configuration (PAC) file. A PAC file is file containing a special JavaScript function that determines whether web browser requests can go directly to the destination or should be forwarded to a web proxy server. The www::proxypac package adds a proxy filter option for handling PAC files.
Note 1: Requires the Tcl TLS package.
Note 2: Digest authentication requires the tcllib md5 package.
Note 3: Requires at least Tcl 8.6.11 on linux.
Note 4: Requires Tcl TLS patch e1f9a21c67 to enable HTTP/2 over TLS (h2).