reduce-historical: Artifact [95654e9f28]

Artifact 95654e9f28738fe5ab0545f9a732d38a739032aa5895d313794d27a0d6cddb0d:

File r34.1/doc/scope.tex — part of check-in [f2fda60abd] at 2011-09-02 18:13:33 on branch master — Some historical releases purely for archival purposes
git-svn-id: https://svn.code.sf.net/p/reduce-algebra/code/trunk/historical@1375 2bfe0521-f11c-4a00-b80e-6202646ff360 (user: arthurcnorman@users.sourceforge.net, size: 56591) [annotate] [blame] [check-ins using] [more...]
\documentstyle[11pt,reduce]{article}
\title{SCOPE, a Source-Code Optimization PackagE for REDUCE}
\date{}
\author{J.A. van Hulzen \\ Twente University, The Netherlands \\
Email: infhvh@cs.utwente.nl}
\begin{document}
\maketitle
\index{SCOPE package !}

A survey of the strategy behind and the facilities of SCOPE, a
Source-Code Optimization PackagE for {\REDUCE} is given. We avoid a
detailed discussion of the different algorithms and concentrate on the
user aspects of the package.  Examples of straightforward and more
advanced usage are shown.  A combined use of GENTRAN and SCOPE is not
yet discussed in this preliminary version of the SCOPE manual.
\index{GENTRAN ! with SCOPE package}

\section{Introduction}\label{SCOPE:intro}

An important application of computer algebra systems is the generation
of code for numerical purposes via automatic or semiautomatic program
generation. GENTRAN~\cite{Gates:85,Gates:84} a flexible general-purpose
package, was especially developed to assist in such a task, when using
MACSYMA or {\REDUCE}.

\index{optimization}
Attendant to automatic program generation is the problem of automatic
source-code optimization. This is a crucial aspect because code
generated from symbolic computations often tends to be made up of
lengthy arithmetic expressions. One of our test examples contained,
for instance, 20534 additions and subtractions, 4174 multiplications,
12473 integer exponentiations and 7990 other operations, such as
function calls.  These lengthy expressions will be grouped together in
blocks of straight-line code in a program for numerical purposes. The
main objective of source-code optimization is to minimize the number
of (elementary) arithmetic operations in such blocks.  This form of
optimization is often helpful in reducing redundancy in expressions.
Simplification algorithms applied on expressions viewed as entities,
neither guarantee complete structure preservation nor allow
improvements inside expressions by renaming subexpressions.

\index{optimizing compilers}
Optimizing compilers ought to deal effectively and efficiently with
the average, hand coded program. The enormous, arithmetic intensive
expressions, easily producable by a computer algebra system, fall
outside the range of the FORTRAN programs, once analyzed and discussed
by Knuth~\cite{Knuth:71}. He suggested that optimization of the arithmetic in
such a program is slightly overdone. This may explain why even in
reasonably recent literature, such as~\cite{Aho:86}, optimization of arithmetic
code is hardly discussed. The DAG models, usually employed for
optimization of arithmetic code, hardly allow the application of any
algebraic identity (see for instance~\cite{Gonzales}). These models often force
constants to act as if they were indeterminates and powers as objects
requiring function calls, i.e. they force to think of $2a\ +\ 3b$ and
$4 a \ +\ 6b$ or of $a^2$, $a^{4}$ and $a^{6}$ as being different
entities.  Our optimization strategy however, requires the validity of
some elementary algebraic laws. We employ heuristic techniques to
reduce the arithmetic complexity of the given representation of a set
of input expressions $ {\rm E}_in$, thus producing a set of output
expressions ${\rm E}_out$. The optimized version of the earlier
mentioned test example contains ``only'' 4316 additions and
subtractions, 4919 multiplications, 13 integer exponentiations and 60
other operations.  ${\rm E}_{in}$ and ${\rm E}_{out}$ define blocks of
code, which would compute the same exact values for the same exact
inputs, thus implicitly proving the correctness of the underlying
software.  Obviously the use of ${\rm E}_{out}$ saves a considerable
amount of execution time in comparison with the application of ${\rm
E}_{in}$.  Johnson et al~\cite{Johnson:79} suggest that such
transformations do not destabilize the computations.  However this is only
apparent after a careful error analysis of both ${\rm E}_{in}$ and ${\rm
E}_{out}$.  In view of the size of both ${\rm E}_{in}$ and ${\rm E}_{out}$
such an analysis has to be automatized as well.  Work in this direction is
in progress.

The current version of SCOPE, our Source-Code Optimization PackagE, is
written in RLISP. It can be used as an extension of {\REDUCE}.  It
allows to subject almost any set of proper {\REDUCE} assignment
\index{common subexpressions (cse)} \index{cse (common subexpressions)}
statements to a heuristic search for common (sub)expressions (cse's).
The output is obtained as a sequence of assignment statements, by
default given in {\REDUCE} syntax.

The first version of the package was designed to optimize the
description of {\REDUCE}-statements, generated by
NETFORM~\cite{Smit:81,Smit:82}. This
version was tailored to a restrictive class of problems, occurring
mainly in electrical network theory, thus implying that the right-hand
sides (rhs's) in the input were limited to elements of ${{\rm {\bf }}
Z}_2$[V], where V is a set of identifiers.  The second
version~\cite{vanHulzen:83}
allowed rhs's from {\bf Z}[V]. For both versions the validity of the
commutative and the associative law was assumed.  A third version
evolved from the latter package by allowing to apply the distributive
law, i.e. by replacing (sub)expressions like $a.b\ +\ a.c$ by $a.(b\
+\ c)$ whenever possible.  But the range of possible applications of
this version was really enlarged by redefining V as a set of kernels,
implying that, at least by that time, almost any proper {\REDUCE}
expression could function as a rhs.  The mathematical capabilities of
this version are shortly summarized in~\cite{Wang:84}, in the context of code
generation and optimization for finite-element analysis.  It is used
\index{GENTRAN ! with SCOPE package}
in combination with GENTRAN, for the construction of Jacobians and
Hessians~\cite{Heuvel:89} and also in experiments with strategies for code
vectorization~\cite{Goldman:89}. It still assumes constant coefficients to be
elements of {\bf Z}.  The user-interface of the present version relies
on some GENTRAN facilities.

In~\cite{vanHulzen:81,vanHulzen:83} we described the overall
optimization strategy used for
SCOPE as a composite function ${{\rm R}^{-1}}\ \circ\ {{\rm T}}\
\circ\ {{\rm R}}$.  The function R defines how to store the input
${{\rm E}}_{0}$ in an expression data base ${{\rm D}}_{0}$. This
${{\rm D}}_{0}$ is formed by two matrix structures and a function
table.  The incidence matrices represent ${{\rm E}}_{0}$, a set of
arithmetic expressions, in a two-dimensional structure where the rows
represent expression or subexpression references and the columns
represent identifier references such as variable and function names.
The function names are taken from the function table, consisting of a
list of pairs of function applications occurring in ${{\rm E}}_0$, and
system selected names functioning as their placeholders during the
optimization process. Arguments of functions are similarly entered in
the matrix structures when ever relevant.  A given subexpression will
be entered in one of two types of incidence matrices, one for sums and
one for products, depending on the nature of the arithmetic operation
at the top level of the expression.  The two matrices are correlated
by auxiliary predecessor-successor information at the row level for
every subexpression reference. The actual entries in the matrices are
either multiplicative numerical coefficients for the sums matrix or
powers for the products matrix.  The inverse function ${{\rm
R}}^{{-1}}$ defines the output production.  The function T defines the
optimization process itself. It essentially consists of a heuristic
remodeling of the (extendable) matrices in combination with storing
information required for a fast retrieval and correct insertion of the
detected cse's in the output.  This is accomplished by an iteratively
applied search, resulting in a stepwise reduction of the arithmetic
complexity of the input set, using an extended version of Breuer's
\index{Breuer's Algorithm}
grow factor algorithm~\cite{Breuer:69,vanHulzen:81,vanHulzen:83}.
It is applied until no further profit
is gained, i.e. until the reduction in arithmetic complexity stops.
Before producing output, a finishing touch can be performed to further
reduce the arithmetic complexity with some locally applied techniques.
The overall process can be summarized as follows: $$ {{\rm R}}\ :\
{{{\rm E}}_0}\ \to\ ({{{\rm D}}_0},{{{\rm profit}}_0}) $$ $$ {{{\rm
T}}_{\beta}}\ :\ ({{{\rm D}}_i},{{{\rm profit}}_i})\ \to\ ({{{\rm
D}}_{{i+1}}},{{{\rm profit}}_{{i+1}}})\ ,\ {{\rm i}}\ =\ 0,..., \lambda
- 1.  $$ $$ {{\rm F}}\ :\ ({{{\rm D}}_{\lambda}},{{{\rm
profit}}_{\lambda}})\ \to\ {D_{\lambda}} $$ $$ {{{\rm R}}^{{-1}}}\ :\
{D_{\lambda}}\ \to\ {{{\rm E}}_{\lambda}} $$ ${{\rm D}}_{0}$ is
created as a result of an R-application performed on input ${{\rm
E}}_{0}$.  The termination condition depends on some profit criterion
related to the arithmetic complexity of the latest version of the
input, ${{{\rm D}}_i}$. Hence we assume ${{{\rm profit}}_i}\ =\ true$
for $i\ =\ 0,..., \lambda -1$ and ${{{\rm profit}}_\lambda}\ =\
false$.  The function T is composite as well, and defined by ${{\rm
T}}\ =\ {{\rm F}}\ \circ\ {{{{\rm T}}_{\beta}}^{\lambda}}$. ${{\rm
T}}_{\beta}$ defines one iteration step, i.e. one application of the
extended version of Breuer's algorithm. The function F defines a
finishing touch, resulting in the final version $D_{\lambda}$ of
${{\rm D}}_{0}$, used to produce the output ${{\rm E}}_{\lambda}$. We
omit a further discussion of the different algorithms used for
optimization; this can be found in~\cite{vanHulzen:81,vanHulzen:83},
for instance.

The present version makes use of some GENTRAN facilities to translate
its input into LISP prefix forms. This approach can be seen as a form
of preprocessing, i.e. ${{\rm E}}_{0}$, the input for R can be
considered as a list of {\bf setq}-applications

The GENTRAN-SCOPE Interface, allows other preprocessing activities.
We introduced the optional use of GENTRAN's {\bf declare}-statement,
\index{DECLARE statement ! GENTRAN}
thus allowing specification of the type of some or all of the lhs's and of
the identifiers used to construct the rhs's. In addition to the
prefixlist, a list of declarations in the Target Language can be
produced, based on default assumptions concerning untyped lhs's and
identifiers in the input.  This facility is based on the use of
GENTRAN's symbol table.

Before optimizing rhs's it might be attractive to rewrite them using a
\index{Horner's Rule}
generalized form of Horner's Rule. We designed such a command, which
does not necessarily have to be used in the context of SCOPE. It can
operate on a set of assignment statements and it can deliver the
result in the form of a sequence of prefix forms, defining the
rewritten statements. Subjecting such a sequence of prefix forms to a
SCOPE application implies that the GENTRAN approach is not directly
applicable. The GENTRAN := and :=: assignment operators define literal
translation or rhs-simplification, respectively. Therefore we extended
our Interface with special facilities, allowing SCOPE to accept the
result of the application of such a command literally. Besides the
{\bf g}(eneralized) {\bf horner} (rule) we have a command,
generalizing the impact of the {\bf structr}-command to a set of
assignment statements.

We discuss and illustrate a straightforward use of SCOPE in
section~\ref{SCOPE:basic} In section~\ref{SCOPE:pre} we introduce the
special commands {\bf ghorner} and {\bf gstructr} and show how to use
them, also in combination with SCOPE.  We use section~\ref{SCOPE:decl} to
discuss the declaration facilities and section~\ref{SCOPE:files} to show
the different file-handling possibilities and modes of operation.
Section!\ref{SCOPE:future} discusses future work.  Guidelines for
installing the package are given in the final section.

\section{Source-Code Optimization : The Basic Facilities}\label{SCOPE:basic}
\subsection{The Strategy}

Before illustrating the effect of applying SCOPE, we shortly describe
the operations, covered by the functions ${{\rm T}}_{\beta}$ and F,
mentioned in the previous section.

The function R accepts assignment statements given in prefix form.  We
can divide these forms in three categories using their leading
operator. We distinguish between PLUS, TIMES and OTHER-operators.
Leaving aside the OTHER-operators for awhile, we reduce the structure
of possible rhs's to those of not necessarily expanded multivariate
polynomials with integer coefficients. Assuming the leading operator
\ttindex{PLUS}
is PLUS, the operands, being terms of a polinomial (for instance $3a\
+\ 2b\ +\ 3 {b^2} c (3a\ +\ 2b){(c\ +\ d)^2}$), can either be
primitive or composite.  A primitive term is an integer, an identifier
or the product of an integer and an identifier.  Hence the primitive
terms of a sum form an (eventually empty) linear expression ($3a\ +\
2b$).  Composite terms are products, which cannot be qualified as a
primitive term ($3 {b^2} c (3a\ +\ 2b) {(c\ +\ d)}^{2}$) Like sums,
\ttindex{TIMES}
prefix forms with a TIMES-operator, can have a primitive and/or
composite part. The primitive part of a product is an (eventually
empty) power product(${b^2} c$).  The composite part is a product of
sums and/or powers of sums ($(3a\ +\ 2b) {(c\ +\ d)^2}$).  Observe
that our expression-structure discussed so far is still too simple.
\ttindex{EXPT}
Powers of sums have EXPT as their leading operator (${(c\ +\
d)}^{2}$).  Similarly, a product can have a integer coefficient ($3
{b^2} c$).

This description suggests, as already indicated in section~\ref{SCOPE:intro},
that we
can consider any set of rhs's as being built with linear expressions
and power products only. This allows to map such a set onto two
incidence matrices: One defining the linear expressions, using the
coefficients, and another defining the power products, using the
exponents. The rows of these matrices can be associated with the
(sub)expressions under consideration and the columns with the
identifiers, used to construct these expressions. This is why we need
to assume the validity of the commutative and associative law. To be
able to retrieve the structure of the assignment statements forming
the input set, we need to combine additional information with the rows
and columns of these matrices. Essential is, for instance, storage of
the exponents of sums and of the coefficients of products.  Equally
important is storage of the lhs's, which are the rhs-recognizers.
Details are given in~\cite{vanHulzen:83}. Example~\ref{ex:2.2.1} on
page~\pageref{ex:2.2.1} and example~\ref{ex:2.2.2} on page~\pageref{ex:2.2.2}
provide illustrations of these data structures.

When introducing kernels, i.e. when assuming the set of
OTHER-operators to be not empty, we have to store lists of
non-commutable arguments.  Therefore a function table of pairs is
made, formed by the kernels and their internally created names. These
names are entered in the matrices as new identifiers. The arguments of
such a kernel can be arbitrary {\REDUCE}-expressions, which also have to
be incorporated in the matrices.  Hence the function table is created
recursively.

\index{cse (common subexpressions)}
What is a cse and how do we locate its occurrences?  A (sub)expression
is common when it occurs repeatedly in the input. The occurrences are,
as part of the input, distributed over the matrices, and shown as
equivalent integer (sub)patterns.  In fact, we repeatedly search for
completely dense (sub)matrices of rank 1.  The expression $2a\ +\ 3c$
is a cse of ${e_1} \ =\ 2a\ +\ 4b\ +\ 3c$ and ${e_2}\ =\ 4a\ +\ 6c\ +\
5d$, representable by (2,4,3,0) and (4,0,6,5), respectively.  We
indeed have to assume commutativity, so as to be able to produce new
patterns (2,0,3,0,0), (0,4,0,0,1) and (0,0,0,5,2), representing $s\ =\
2a\ +\ 3c$, ${e_1}\ =\ 4b\ +\ s$ and ${e_2}\ =\ 5d\ +\ 2s$,
respectivily, and thus saving one addition and one multiplication.
Such an additive cse can be a factor in a composite (sub)product,
which in turn can be reduced to a primitive product, when the cse is
replaced by a new symbol.  Therefore an essential part of an
optimization step is regrouping of information. This migration of
information between the matrices is performed if the Breuer-searches
are temporarily completed.  After this regrouping the distributive law
is applied, eventually also leading to a further regrouping. If at
least one of these actions leads to a rearrangement of information the
function ${\rm T} _{\beta}$ is again applied.  Observe that this
${{\rm T}}_{\beta}$ is also a composite function.  In view of the
iterative character of the optimization process we always accept
minimal profits.

A similar search is performed to detect multiplicative cse's, for
instance occuring in ${e_1}\ =\ {a^2} {b^4} {c^3}$ and ${e_2}\ =\
{a^4} {c^6} {d^5}$.  However, given a power product $\prod_{i=1}^m
{x_i}^{{a}_i}$, any product $\prod_{i=1}^m {x_i}^{{b}_i}$, such that
some ${b_i}\ \ {a_i}$, for i = 1(1)m, can function as a cse.  We
therefore extend the search for multiplicative cse's by employing this
property, and as indicated in~\cite{vanHulzen:83}.

The function F -defining the finishing touch- performs one-row and/or
one-column searches. Once the extended Breuer-searches do not lead to
further reduction in the arithmetic complexity we try -applying it- to
improve what is left.  The integer coefficients in (sub)sums can have,
possibly locally, a gcd, which can be factored out.  One-column
operations serve to discover and properly replace integer multiples of
identifiers.  As part of the output-process we subject all
exponentiations left - at most one for each identifier - to an
addition chain algorithm.  Another action, covered by F is therefore
replacement by a new symbol of those (sub)sums, which are raised to
some integer power.

\subsection{The Facilities}

{\REDUCE} allows, roughly speaking, two modes of operation: {\tt ON EXP}
or {\tt OFF EXP}.  The first alternative is the default setting leading to
expanded forms.  The latter gives unexpanded forms, as discussed by Hearn
in some detail~\cite{Hearn:85,Hearn:86}.  It is obvious that the {\tt OFF
EXP} setting is in \index{EXP switch} general preferable over the {\tt ON
EXP} setting when attempting to optimize the description of a set of
assignment statements.

Starting a {\REDUCE} session gives the initial state. All switches
have their initial value: {\tt ON EXP, PERIOD} and {\tt OFF FORT}, for
instance.  When loading SCOPE we create a new operating environment,
without disturbing the current state of the system.

The result of an application of SCOPE can be influenced by the use of
certain {\REDUCE}- or SCOPE-switches. The influence of {\tt EXP} is obvious.
\index{ACINFO switch} \index{echo ! in SCOPE}
By default the switch {\tt ACINFO} is turned on. This guarantees an echo of
the form in which the assignment statements are consumed by SCOPE. It
also guarantees tables with the numbers of arithmetic operations,
occuring in ${{{\rm E}}_0}$ and ${{\rm E}}_{\lambda}$, respectively,
to be printed.  Some switches are available to obtain information
about the process itself.  They were introduced to assist in debugging
\index{tracing ! SCOPE package} \index{PRIMAT switch}
and testing.  {\tt PRIMAT} can be used to visualize both ${{\rm D}}_{0}$ and
\index{PRIALL switch}
$D_{\lambda}$.  {\tt PRIALL} is a switch which combines not only the effect
of {\tt ACINFO} and {\tt PRIMAT}, but also allows to obtain timings of the
different sub-algorithms of SCOPE.

Output is by default given in {\REDUCE} syntax, but FORTRAN syntax is
\index{PREFIX switch}
possible in the usual way. The switch {\tt PREFIX} can be used to obtain the
prefixlist itself as output.

\index{OPTIMIZE command}
A SCOPE action is easily performed.  Either the command ``{\bf
optimize} $<$ object$>$;'' or the command ``{\bf optimize}
$<$object$>$ {\bf iname} $<$cse-prefix$>$;'' suffices.  The
$<$object$>$ to be elaborated is either one assignment statement or a
list of such statements, all obeying the GENTRAN rules.  The
$<$cse-prefix$>$ is an identifier, used to generate the cse-names, by
extending it with an integer part. The {\bf gensym}-function is
applied when the {\bf iname}-extension is omitted.

We now illustrate the use of SCOPE through some small examples, by
showing parts of {\REDUCE} sessions.

\example\label{ex:2.2.1}
\index{SCOPE package ! example}

The multivariate polynomial Z is a sum of 6 composite terms. These
terms, monomials, are constant multiples of primitive products.  A
picture of ${{\rm D}}_{0}$ is shown after the input echo. The
sums-matrix consists of only one row, identifiable by its Fa(the)r Z,
the lhs. Its exponent is given in the E(xponent or )C(oefficient)
field. The 6 monomials are stored in the products-matrix. The
coefficients are stored in the EC-fields and the predecessor row
index, 0, is given in the Far-field. Before the $D_{\lambda}$ picture
is given the effect of the optimization process, the output and the
operator counts are shown. The optimized form of Z is obtained by
applying the distributive law. The output also shows applications of
an addition chain algorithm (\cite{Knuth:80} 441-466) as part of ${{\rm
R}}^{{-1}}$, although its use in example~\ref{ex:2.2.3} is more apparent.

Observe that the output illustrates the heuristic character of the
optimization process: In this particular case the rhs can be written
as a polynomial in S3, thus saving one extra multiplication.

{\small
\begin{verbatim}
on primat$

optimize z:=a^2*b^2+10*a^2*m^6+a^2*m^2+2*a*b*m^4+
            2*b^2*m^6+b^2*m^2 iname s;

      2  2       2  6    2  2          4      2  6    2  2
Z := A *B  + 10*A *M  + A *M  + 2*A*B*M  + 2*B *M  + B *M



Sumscheme :

   || EC|Far
- ------------
  0||  1| Z
- ------------


Productscheme :

   |  0  1  2| EC|Far
- ---------------------
  1|     2  2|  1| 0
  2|  6     2| 10| 0
  3|  2     2|  1| 0
  4|  4  1  1|  2| 0
  5|  6  2   |  2| 0
  6|  2  2   |  1| 0
- ---------------------
0  : M
1  : B
2  : A


Number of operations in the input is:

Number of (+,-)-operations : 5
Number of (*)-operations : 10
Number of integer exponentiations : 11
Number of other operations : 0
S0 := B*A
S4 := M*M
S8 := B*B
S1 := S4*S8
S9 := A*A
S2 := S4*S9
S3 := S4*S4
Z := S1 + S2 + S0*(2*S3 + S0) + S3*(2*S1 + 10*S2)


Number of operations after optimization is:

Number of (+,-)-operations : 5
Number of (*)-operations : 12
Number of integer exponentiations : 0
Number of other operations : 0



Sumscheme :

   |  0  3  4  5| EC|Far
- ------------------------
  0|        1  1|  1| Z
 15|        2 10|  1| 14
 17|  2  1      |  1| 16
- ------------------------
0  : S3
3  : S0
4  : S1
5  : S2

Productscheme :

   |  8  9 10 11 17 18 19 20| EC|Far
- ------------------------------------
  7|                    1  1|  1| S0
  8|  1                 2   |  1| S1
  9|  1                    2|  1| S2
 10|  2                     |  1| S3
 11|                 2      |  1| S4
 14|     1                  |  1| 0
 16|              1         |  1| 0
- ------------------------------------
8  : S4
9  : S3
10 : S2
11 : S1
17 : S0
18 : M
19 : B
20 : A
\end{verbatim}}

\example\label{ex:2.2.2} \index{SCOPE package ! example}

The input echo below shows the literal copy of the first assignment,
in accordance with the GENTRAN := operator.  The second assignment,
again in accordance with the GENTRAN operator ::=:, has a rhs in
expanded form.
\newline
The ${{\rm D}}_{0}$ picture shows that during parsing string matching
of kernels in prefix form already contributes to optimization : S2 =
C*X + D and S3 =SIN(S2) are stored once.

Application of the distributive law gives the original structure of
A(1,1) back.

{\small
\begin{verbatim}
on primat$
operator a$
k:=j:=1$
u:=c*x+d$
v:=sin(u)$

optimize {a(k,j) := v*(v^2*cos(u)^2+u),
          a(k,j) ::=:v*(v^2*cos(u)^2+u)} iname s;


              2       2
A(K,J) := V*(V *COS(U)  + U)

                 2             3
A(1,1) := COS(C*X + D) *SIN(C*X + D)  + SIN(C*X + D)*C*X
          + SIN(C*X + D)*D
\end{verbatim}
\newpage
\begin{verbatim}
Sumscheme :

   |  7  8| EC|Far
- ------------------
  1|  1   |  1| 0
  3|      |  1| A(1,1)
  5|     1|  1| S2
- ------------------
7  : U
8  : D

Productscheme :

   |  0  1  2  3  4  5  6| EC|Far
- ---------------------------------
  0|                    1|  1| A(K,J)
  2|                 2  2|  1| 1
  4|     3  2            |  1| 3
  6|           1  1      |  1| 5
  7|     1     1  1      |  1| 3
  8|  1  1               |  1| 3
- ---------------------------------
0  : D
1  : S3=SIN(S2)
2  : S1=COS(S2)
3  : X
4  : C
5  : S0=COS(U)
6  : V

Number of operations in the input is:

Number of (+,-)-operations : 7
Number of (*)-operations : 10
Number of integer exponentiations : 4
Number of other operations : 5

S6 := COS(U)*V
S9 := S6*S6
A(K,J) := V*(U + S9)
S2 := D + X*C
S3 := SIN(S2)
S7 := S3*COS(S2)
S8 := S7*S7
A(1,1) := S3*(S2 + S8)

Number of operations after optimization is:

Number of (+,-)-operations : 3
Number of (*)-operations : 7
Number of integer exponentiations : 0
Number of other operations : 3

Sumscheme :

   |  2 12 13| EC|Far
- ---------------------
  1|     1   |  1| 0
  3|         |  1| A(1,1)
  5|        1|  1| S2
 11|  1      |  1| 10
- ---------------------
2  : S2
12 : U
13 : D

Productscheme :

   |  0  1  5  6  7  8  9 10 11| EC|Far
- ---------------------------------------
  0|                          1|  1| A(K,J)
  2|     2                     |  1| 1
  4|  2                        |  1| 11
  9|                 1  1      |  1| 5
 10|           1               |  1| 3
 13|                       1  1|  1| S6
 14|           1  1            |  1| S7
- ---------------------------------------
0  : S7
1  : S6
5  : D
6  : S3=SIN(S2)
7  : COS(S2)
8  : X
9  : C
10 : COS(U)
11 : V
\end{verbatim}}

\example\label{ex:2.2.3}
\index{SCOPE package ! example}

The effect is shown of a finishing touch application, in combination
with FORTRAN output.  During output preparation {\tt S0} is
rewritten, using the earlier mentioned addition chain algorithm.

{\small
\begin{verbatim}
on fort$
off acinfo,period$

optimize z:=96*a+18*b+9*c+3*d+6*e+18*f+6*g+5*h+5*k+3)^13
            iname s;

      S0=5*(H+K)+3*(3*C+D+1+6*(B+F)+2*(A+E+G))
      S4=S0*S0
      S3=S0*S4
      S2=S3*S3
      S1=S2*S2
      Z=S0*S1
\end{verbatim}}

\example\label{ex:2.2.4}
\index{SCOPE package ! example}

Recovery of repeatedly occurring integer multiples of identifiers,
as part of the finishing touch, is illustrated. The switch {\tt ACINFO}
is turned off.

\begin{verbatim}
optimize {x:=3*a*p,
          y:=3*a*q,
          z:=6*a*r+2*b*p,
          u:=6*a*d+2*b*q,
          v:=9*a*c+4*b*d,
          w:=4*b} iname s;

S1 := 3*A
X := S1*P
Y := S1*Q
S2 := 6*A
S3 := 2*B
Z := S3*P + S2*R
U := S3*Q + S2*D
S0 := 4*B
V := S0*D + 9*A*C
W := S0
\end{verbatim}

\example\label{ex:2.2.5}
\index{SCOPE package ! example}

The effect of {\tt ON EXP} or {\tt OFF EXP} on the result of a
SCOPE-application is now shown by optimizing the representation of the
determinant of a symmetric (3,3) matrix M.  Besides differences in
computing time we also observe that the arithmetic complexity of the
optimized version of the expanded representation of the determinant is
about the same as the not optimized form of the unexpanded representation.

{\small
\begin{verbatim}
matrix M(3,3)$

m(1,1):=18*cos(q3)*cos(q2)*m30*p^2-sin(q3)^2*j30y+sin(q3)^2*j30z-
        9*sin(q3)^2*m30*p^2+j1oy+j30y+m10*p^2+18*m30*p^2$
m(2,1):=
m(1,2):=9*cos(q3)*cos(q2)*m30*p^2-sin(q3)^2*j30y+sin(q3)^2*j30z-
        9*sin(q3)^2*m30*p^2+j30y+9*m30*p^2$
m(3,1):=
m(1,3):=-9*sin(q3)*sin(q2)*m30*p^2$
m(2,2):=-sin(q3)^2*j30y+sin(q3)^2*j30z-9*sin(q3)^2*m30*p^2+j30y+
        9*m30*p^2$
m(3,2):=
m(2,3):=0$
m(3,3):=9*m30*p^2+j30x$

optimize detm:=:det(M) iname s;

                   4        2  6    3             4        2  4    2
DETM := 729*SIN(Q3) *SIN(Q2) *P *M30  + 81*SIN(Q3) *SIN(Q2) *P *M30 *

                 4        2  4    2                   2        2  6
J30Y - 81*SIN(Q3) *SIN(Q2) *P *M30 *J30Z - 729*SIN(Q3) *SIN(Q2) *P *

   3             2        2  4    2                   2  6    3
M30  - 81*SIN(Q3) *SIN(Q2) *P *M30 *J30Y - 729*SIN(Q3) *P *M30  - 81*

       2  6    2                 2  4    2                  2  4    2
SIN(Q3) *P *M30 *M10 - 81*SIN(Q3) *P *M30 *J30Y + 81*SIN(Q3) *P *M30

                  2  4    2                  2  4    2
*J30Z - 81*SIN(Q3) *P *M30 *J1OY - 81*SIN(Q3) *P *M30 *J30X - 9*

       2  4                         2  4                         2  4
SIN(Q3) *P *M30*J30Y*M10 + 9*SIN(Q3) *P *M30*J30Z*M10 - 9*SIN(Q3) *P

                         2  2                          2  2
*M30*M10*J30X - 9*SIN(Q3) *P *M30*J30Y*J1OY - 9*SIN(Q3) *P *M30*J30Y*

                2  2                          2
J30X + 9*SIN(Q3) *P *M30*J30Z*J1OY + 9*SIN(Q3) *P *M30*J30Z*J30X - 9*

       2  2                        2  2                        2  2
SIN(Q3) *P *M30*J1OY*J30X - SIN(Q3) *P *J30Y*M10*J30X + SIN(Q3) *P *

                       2                         2
J30Z*M10*J30X - SIN(Q3) *J30Y*J1OY*J30X + SIN(Q3) *J30Z*J1OY*J30X - 

           2        2  6    3             2        2  4    2
729*COS(Q3) *COS(Q2) *P *M30  - 81*COS(Q3) *COS(Q2) *P *M30 *J30X + 

     6    3       6    2           4    2            4    2
729*P *M30  + 81*P *M30 *M10 + 81*P *M30 *J30Y + 81*P *M30 *J1OY + 81

  4    2           4                   4                   2
*P *M30 *J30X + 9*P *M30*J30Y*M10 + 9*P *M30*M10*J30X + 9*P *M30*J30Y

           2                    2                  2
*J1OY + 9*P *M30*J30Y*J30X + 9*P *M30*J1OY*J30X + P *J30Y*M10*J30X + 

J30Y*J1OY*J30X

Number of operations in the input is:

Number of (+,-)-operations : 36
Number of (*)-operations : 148
Number of integer exponentiations : 84
Number of other operations : 32
S0 := SIN(Q3)
S30 := S0*S0
S1 := SIN(Q2)
S34 := S1*S1
S35 := P*P
S7 := S35*M30
S33 := S7*S7
S5 := S33*J30Y
S6 := S30*S7
S8 := S30*M10
S49 := COS(Q2)*COS(Q3)
S9 := S49*S49
S11 := S34*S30*S30
S22 := S35*S7
S14 := S30*J30Z
S19 := S35*J30X
S23 := J30X*J10Y
S31 := S33*S7
S47 := 81*S33*J30X
S39 :=  - S47 - S23*J30Y - 81*S33*J1OY
S40 :=  - 81*S30*S5 - 729*S33*S6
S45 := 9*S6*J30Z
S46 := 9*S6
S48 := 81*S5
DETM := S48 + S40 - S39 + 729*S31 + ( - J1OY - J30X)*(9*(S6*J30Y - S7

      *J30Y) - S45) + (J30Z - J30Y)*(9*S22*S8 + S19*S8) + 9*(M10 - S8

   )*(S22*J30X + 9*S22*S7) + M10*J30Y*(9*S22 + S19) + S23*(S14 + 9*S7

    - S46) + S39*S30 + S31*(729*(S11 - S9)) + S34*(S40 - S46*S45) - 

S47*S9 + 81*S33*S14 + S48*S11


Number of operations after optimization is:

Number of (+,-)-operations : 29
Number of (*)-operations : 58
Number of integer exponentiations : 0
Number of other operations : 4


off exp$

optimize detm:=:det(M) iname s;

             2                           2                   2
DETM := ((9*P *M30 + J30Y - J30Z)*SIN(Q3)  - (18*M30 + M10)*P  - 18

                    2                         2
  *COS(Q3)*COS(Q2)*P *M30 - J30Y - J1OY)*((9*P *M30 + J30Y - 

                   2      2                 2
      J30Z)*SIN(Q3)  - 9*P *M30 - J30Y)*(9*P *M30 + J30X) - 

     2                            2                      2
((9*P  *M30 + J30Y - J30Z)*SIN(Q3)  - 9*COS(Q3)*COS(Q2)*P *M30 - 

   2             2     2                      2
9*P * M30 - J30Y) *(9*P *M30 +J30X) + 81*((9*P *M30+J30Y - J30Z)*

          2      2                    2        2  4    2
   SIN(Q3)  - 9*P *M30 - J30Y)*SIN(Q3) *SIN(Q2) *P *M30


Number of operations in the input is:

Number of (+,-)-operations : 24
Number of (*)-operations : 42
Number of integer exponentiations : 21
Number of other operations : 10

S0 := SIN(Q3)
S9 := S0*S0
S8 := P*P
S5 := S8*M30
S6 := S5*COS(Q2)*COS(Q3)
S15 := 9*S5
S13 := (S15 + J30Y - J30Z)*S9
S14 := S13 - S15 - J30Y
S3 := S14 - 9*S6
S4 := SIN(Q2)
DETM := (S15 + J30X)*(S14*(S13 - 18*S6 - J30Y - J1OY - S8*(18*M30 +

  M10)) - S3*S3) + 9*S15*S14*S9*S5*S4*S4


Number of operations after optimization is:

Number of (+,-)-operations : 13
Number of (*)-operations : 20
Number of integer exponentiations : 0
Number of other operations : 4
\end{verbatim}}

We can also use this example to show that correctness of the results
\index{NAT switch}
can easily be verified. When turning off the switch {\tt NAT} and storing
the result of a SCOPE application in a file, it is of course possible
to read the result in again. But we then operate in a normal
{\REDUCE}-like way. This implies that all cse-names are automatically
replaced by their values.  We show the ``correctness'' of SCOPE by
scoring the optimized version of the expanded form of the determinant
of M, called detm1 in file out1 and the result of a SCOPE-application
on the unexpanded form, detm2, in file out2, followed by reading both
files and by subtracting detm2 from detm1, resulting in the value 0.
This is of course an ad hoc correctness-proof for one specific
example. It is in fact another way of testing the code of the package.
So, assuming SCOPE is loaded and the matrix M is known to the system,
all we have to do is:
{\small
\begin{verbatim}
2: off acinfo,nat$

3: out out1$
4: optimize detm1:=:det(M) iname s;
5: write "end$"$
6: shut "out1"$

7: off exp$

8: out out2$
9: optimize detm2:=:det(M) iname t;
10: write "end$"$
11: shut out2$

12: on nat$

13: in out1;

S0 := SIN(Q3)$
S30 := S0*S0$
S1 := SIN(Q2)$
S34 := S1*S1$
S35 := P*P$
S7 := S35*M30$
S33 := S7*S7$
S5 := S33*J30Y$
S6 := S30*S7$
S8 := S30*M10$
S49 := COS(Q2)*COS(Q3)$
S9 := S49*S49$
S11 := S34*S30*S30$
S22 := S35*S7$
S14 := S30*J30Z$
S19 := S35*J30X$
S23 := J30X*J1OY$
S31 := S33*S7$
S47 := 81*S33*J30X$
S39 :=  - S47 - S23*J30Y - 81*S33*J1OY$
S40 :=  - 81*S30*S5 - 729*S33*S6$
S45 := 9*S6*J30Z$
S46 := 9*S6$
S48 := 81*S5$
DETM1 := 
S48 + S40 - S39 + 729*S31 + ( - J1OY - J30X)*(9*(S6*J30Y - S7*J30Y) -

S45) + (J30Z - J30Y)*(9*S22*S8 + S19*S8) + 9*(M10 - S8)*(S22*J30X + 9

*S22*S7) + M10*J30Y*(9*S22 + S19) + S23*(S14 + 9*S7 - S46) + S39*S30

+ S31*(729*(S11 - S9)) + S34*(S40 - S46*S45) - S47*S9 + 81*S33*S14 +

S48*S11$

end$

14: in out2;

T0 := SIN(Q3)$
T9 := T0*T0$
T8 := P*P$
T5 := T8*M30$
T6 := T5*COS(Q2)*COS(Q3)$
T15 := 9*T5$
T13 := (T15 + J30Y - J30Z)*T9$
T14 := T13 - T15 - J30Y$
T3 := T14 - 9*T6$
T4 := SIN(Q2)$
DETM2 := (T15 + J30X)*(T14*(T13 - 18*T6 - J30Y - J1OY - T8*(18*M30 +

 M10)) - T3*T3) + 9*T15*T14*T9*T5*T4*T4$

end$

15: detm1-detm2;

0
\end{verbatim}
}

\example\label{ex:2.2.6}
\index{SCOPE package ! example}

This example serves to show how SCOPE deals with rational exponents.
All rational exponents of a variable are collected. The least common
multiple lcm of the denominators of these rational exponents is
computed and the variable is replaced by a possibly newly selected
variable name, denoting the variable raised to the power 1/lcm. This
facility is only efficient for what we believe to be problems
occurring in computational practice. This is easily verified by
extending the sum we are elaborating here with some extra terms.

Producing FORTRAN-output shows an implied danger, due to a shortcoming
in GENTRAN. This rational exponent will in practice act as if it were
0.

This example is also used to show the effect of turning on the switch
{\tt PRIALL}.
{\small
\begin{verbatim}
on fort,priall$

optimize z:=:for j:=2:6 sum q^(1/j) iname s;


      1/6    1/5    1/4    1/3
Z := Q    + Q    + Q    + Q    + SQRT(Q)

Sumscheme :

   || EC|Far
- ------------
  0||  1| Z
- ------------

Productscheme :

   |  0| EC|Far
- ---------------
  1| 10|  1| 0
  2| 12|  1| 0
  3| 15|  1| 0
  4| 20|  1| 0
  5| 30|  1| 0
- ---------------
0  : Q


Number of operations in the input is:

Number of (+,-)-operations : 4
Number of (*)-operations : 0
Number of integer exponentiations : 0
Number of other operations : 5


Time: 2992 ms

Breuer search :
Time: 867 ms

Removal of different names for identical cse's :
Time: 17 ms

Change Scheme :
Time: 0 ms

Local Factorization :
Time: 34 ms

Breuer search :
Time: 204 ms

Removal of different names for identical cse's :
Time: 0 ms

Change Scheme :
Time: 17 ms

Local Factorization :
Time: 0 ms

Breuer search :
Time: 187 ms

Removal of different names for identical cse's :
Time: 0 ms

Change Scheme :
Time: 17 ms

Local Factorization :
Time: 0 ms

Breuer search :
Time: 119 ms

Removal of different names for identical cse's :
Time: 0 ms

Change Scheme :
Time: 17 ms

Local Factorization :
Time: 0 ms

Additional optimization during finishing touch :
Time: 34 ms


      Q=Q**(1/60)
      S7=Q*Q
      S6=S7*Q
      S4=S7*S6
      S2=S4*S4
      S1=S7*S2
      S0=S6*S1
      S3=S4*S0
      Z=S3+S0+S1+S2+S3*S2



Number of operations after optimization is:

Number of (+,-)-operations : 4
Number of (*)-operations : 8
Number of integer exponentiations : 0
Number of other operations : 1
\end{verbatim}
\newpage
\begin{verbatim}
Sumscheme :

   |  3  4  5  6| EC|Far
- ------------------------
  0|  1  1  1  1|  1| Z
- ------------------------
3  : S3
4  : S0
5  : S1
6  : S2


Productscheme :

   |  9 10 12 13 14 15 16 22| EC|Far
- ------------------------------------
  5|           1  1         |  1| 0
  6|     1           1      |  1| S0
  7|  1           1         |  1| S1
  8|        2               |  1| S2
  9|        1           1   |  1| S3
 10|  1  1                  |  1| S4
 12|  1                    1|  1| S6
 13|                       2|  1| S7
- ------------------------------------
9  : S7
10 : S6
12 : S4
13 : S3
14 : S2
15 : S1
16 : S0
22 : Q

Time: 459 ms
\end{verbatim}
}

\section{Preprocessing Possibilities}\label{SCOPE:pre}

It may happen that structure is obviously visible in the rhs's of a
set of assignment statements, which we want to optimize.  One can
think of a set of partial derivatives of products.  Or one may
consider the application of Horner-rules.  Such facilities may be
attractive, independent of the question if a SCOPE-application will be
performed on its result.  Therefore we first discuss these facilities
and show their effect, again by using simple examples, before we
continue with a combined use of SCOPE and these possibilities.

The first alternative demands a generalized {\bf structr}-command.  We
implemented such a facility. Its syntax is straightforward: ``{\bf
\index{GSTRUCTR command}
gstructr} $<$object$>$ {\bf name} $<$id$>$;'' The $<$object$>$ to be
elaborated is one assignment statement or a set of such statements,
separated by semicolons and grouped together between the special
symbols $<<$ and $>>$. Instead of a statement a matrix name is also
allowed. Then all non-zero matrix elements are incorporated in the
search for obvious cse's. The $<$id$>$ of the optional {\bf name}-part,
being an identifier, is used to identify the subexpressions, produced
via the application of a {\bf gstructr} command. When the switch
\index{ALGPRI switch}
{\tt ALGPRI} is on -the default setting- the output is given in {\REDUCE}
syntax, while turning it off leads to output in prefix form.  The
latter is employed by the function R, used to store SCOPE-input in
${{\rm D}}_{0}$.  It is also possible to get FORTRAN-output by turning
\index{PERIOD switch} \index{FORTRAN switch}
off the switch {\tt PERIOD} and turning on the switch {\tt FORTRAN}.  The input
\index{EXP switch}
remains unchanged when the switch {\tt EXP} is on.

\example\label{ex:3.1}
\index{SCOPE package ! example}

We show part of a {\REDUCE} session.
{\small
\begin{verbatim}
off exp$

matrix a(2,2)$

a(1,1) := x+y+z$
a(1,2) := x*y$
a(2,1) := (x+y)*x*y$
a(2,2) := (x+2*y+3)^3-x$

on fort$
off period$

load struct$

gstructr << a;
            b:=(x+y)^2;
            c:=(x+y)*(y+z);
            d:=(x+2*y)*(y+z)*(z+x)^2
         >> name v;

      V1=X+Y+Z
      A(1,1)=V1
      A(1,2)=X*Y
      V2=X+Y
      A(2,1)=V2*X*Y
      V3=X+2*Y+3
      V4=V3**3-X
      A(2,2)=V4
      B=V2**2
      V5=Y+Z
      C=V2*V5
      V6=X+2*Y
      V7=X+Z
      D=V6*V7**2*V5

\end{verbatim}
}

Observe that V1, V3, V4, V6 and V7 only occur once in this result of a
{\bf gstructr}-application. When applied as part of a SCOPE-operation
these redundancies will be removed before the actual optimization
process is performed, as shown in example~\ref{ex:3.3}.

\index{GHORNER command} \index{Horner's Rule}
The syntax for the {\bf ghorner}-command is very similar.  The
application of a Horner-rule assumes an ordering of the identifiers.
We allow instead of the {\bf name}-part of the {gstructr} command an
optional {\bf vorder} $<$list of id.s$>$ extension.  The $<$list of
id.s$>$ consists of at least one identifier. This list overrules, in
the order given, the current identifier ordering of the system. The
rhs's are considered as polynomials in the leftmost element of the
{\bf vorder}-list. The thus created coefficients are in turn
considered as polynomials in the second element of this list. And so
on. When the {\bf vorder}-extension is omitted the current system
identifier ordering is applied.  The internal switch {\tt ALGPRI} is again
\index{ALGPRI switch}
applicable and has the same meaning as for {\bf gstructr}.

Some optimizing compilers apply Horner-rules when possible. Our
optimization strategy is based on an all levels, all expressions
search. This contradicts the Horner-mechanism. To avoid destabilizing
side-effects of Horner-rule applications we decided to bring such a
facility under user-control.

\example\label{ex:3.2}
\index{SCOPE package ! example}

Some Taylor-expansions are shown.
\newpage
{\small
\begin{verbatim}
algebraic procedure taylor(fx,x,x0,n);
sub(x=x0,fx)+for k:=1:n sum
        (sub(x=x0,df(fx,x,k))*(x-x0)^k/
      (if k<3 then k else for j:=2:k product j))$
let x^4=0,y^7=0$

f1:=(taylor(e^x,x,0,4)*taylor(cos y,y,0,6))^2;


             3  6       3  4        3  2        3       2  6       2
F1 :=  - (8*X *Y  - 60*X *Y  + 180*X *Y  - 180*X  + 12*X *Y  - 90*X *

    4        2  2        2         6         4          2
   Y  + 270*X *Y  - 270*X  + 12*X*Y  - 90*X*Y  + 270*X*Y  -

              6       4        2
   270*X + 6*Y  - 45*Y  + 135*Y  - 135)/135

load horner$

ghorner << f1:=f1;
           g1:=taylor(e^x,x,0,4);
           h1:=taylor(cos y,y,0,6);
           f1:=(g1*h1)^2 >> vorder y,x;

                                            2
F1 := ((135 + X*(270 + X*(270 + X*180))) + Y *(( - 135 + X*( - 270 +

                             2
  X*( - 270 + X*(-180)))) + Y *((45 + X*(90 + X*(90 + X

                  2
        *60))) + Y *( - 6 + X*( - 12 + X*( - 12 + X*(-8

        )))))))/135

       6 + X*(6 + X*(3 + X))
G1 := -----------------------
                 6

              2            2        2
       720 + Y *( - 360 + Y *(30 + Y *(-1)))
H1 := ---------------------------------------
                    720


                             2            2         2           2   2
      (6 + X*(6 + X*(3 + X)))  * (-720 + Y *(360 + Y  ( - 30 + Y )))
F1 := ---------------------------------------------------------------
                                  18662400
\end{verbatim}
}

Both commands can be used inside an {\bf optimize}-command. We advise
to compile both facilities and SCOPE separately (see also
section~\ref{SCOPE:install} on page~\pageref{SCOPE:install}).

To be able to order the application of either a {\bf gstructr}-command or
a {\bf ghorner}-rewrite instruction inside the definition of a
SCOPE-operation we have to extend the rules given in
section~\label{SCOPE:2.2}.  The permissible structures for the
$<$object$>$'s to be elaborated by SCOPE are simply extended with
syntactically correct {\bf ghorner}-and {\bf gstructr}-commands.  Hence
the structure of an {\bf optimize}-command is not altered, as is shown by
the following two examples.

\example\label{ex:3.3}
\index{SCOPE package ! example}

We show the effect of an application of the {\bf optimize}-command on the
{\bf gstructr}-command of example~\ref{ex:3.1}.  Observe that the
cse-names produced during optimization begin with an S, while {\bf
gstructr} created names start with a V.
{\small
\begin{verbatim}
on fort,acinfo$
off exp,period$

optimize gstructr << a;
                     b:=(x+y)^2;
                     c:=(x+y)*(y+z);
                     d:=(x+2*y)*(y+z)*(z+x)^2
                  >> name v
  iname s;


A(1,1) := X + Y + Z

A(1,2) := X*Y

V2 := X + Y

A(2,1) := V2*X*Y

            3
A(2,2) := (X + 2*Y + 3)  - X

       2
B := V2

V5 := Y + Z

C := V2*V5

                      2
D := (X + 2*Y)*(X + Z) *V5

Number of operations in the input is:

Number of (+,-)-operations : 9
Number of (*)-operations : 8
Number of integer exponentiations : 3
Number of other operations : 0


      S5=X+Z
      A(1,1)=S5+Y
      S8=Y*X
      A(1,2)=S8
      V2=X+Y
      A(2,1)=S8*V2
      S6=X+2*Y
      S4=S6+3
      A(2,2)=S4*S4*S4-X
      B=V2*V2
      V5=Y+Z
      C=V5*V2
      D=S6*S5*S5*V5


Number of operations after optimization is:

Number of (+,-)-operations : 7
Number of (*)-operations : 10
Number of integer exponentiations : 0
Number of other operations : 0
\end{verbatim}
}

\example\label{ex:3.4}
\index{SCOPE package ! example}

For completeness we also show how to use the Horner facilities inside
an {\bf optimize} command. Due to the structure of the method, we
operate internally on expanded forms, both representations of h1, and
thus also of the corresponding prefix representations used to built
${{\rm D}}_{0}$ slightly differ. The consequences are visualized in
the results of the SCOPE application.
{\small
\begin{verbatim}
load scope$

optimize ghorner <<h1:=taylor(cos y,y,0,6);
       f1:=(taylor(e^x,x,0,4)*h1)^2>> vorder y,x
  iname s;


              2            2        2
       720 + Y *( - 360 + Y *(30 + Y *(-1)))
H1 := ---------------------------------------
                         720

                         2        2         2           2   2
      (6+X*(6 + X*(3+X))) *(-720+Y *(360 + Y *( - 30 + Y )))
F1 := -------------------------------------------------------
                           18662400

Number of operations in the input is: 

Number of (+,-)-operations : 9
Number of (*)-operations : 8
Number of integer exponentiations : 8
Number of other operations : 2


S6 := Y*Y
       720 + S6*(S6*(30 - 1*S6) - 360)
H1 := ---------------------------------
                    720

S7 := (S6*(360 + S6*(S6 - 30)) - 720)*(6 + X*(6 + X*(3 + X)))

         S7*S7
F1 := ----------
       18662400


Number of operations after optimization is:

Number of (+,-)-operations : 9
Number of (*)-operations : 10
Number of integer exponentiations : 0
Number of other operations : 2
\end{verbatim}
}

\section{Generation of Declarations}\label{SCOPE:decl}

The GENTRAN {\bf declare}-statement can also be used as an optional
extension of the {\bf optimize}-command.  An illustration of this facility
is given in example~\ref{ex:4.1}.  The syntax of such a statement is in
accordance with the GENTRAN-rules~\cite{Gates:85}
% (see page~\pageref{explicit:type}.
We also use the symbol table of GENTRAN.
During parsing, the declared identifiers and/or array- and matrix names
are entered in the symbol table.  Once optimization is finished all
relevant information for completing the declarations is known, and
collected in the prefixlist, which is used for output-production.  This
prefixlist is employed to decide which not yet typed identifiers and
system selected cse-names have to be entered in the symbol table.  We make
use of already known information, expression-structure and the normal
hierarchy in data types.  The strategy to achieve this is essentially
based on chapter 6 of~\cite{Aho:86}.  Once this table is completed a list
of declarations is produced if the \index{OPTDECS switch} switch {\tt
OPTDECS} is turned on.  SCOPE-output is by default given in {\REDUCE}
syntax.  Alternative output is obtained by assigning a \ttindex{Optlang*}
relevant value to the global identifier {\tt Optlang!*}.  This causes
GENTRAN to take over the output preparation, as shown in:

\example\label{ex:4.1}
\index{SCOPE package ! example}

{\small
\begin{verbatim}
on optdecs$
off acinfo$
optlang!*:='fortran$

optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)}
  iname s
  declare << a(4,4),x(4),y(5):real; b(5):integer >>;


      INTEGER B(5),I,S1,S3
      DOUBLE PRECISION A(4,4),S4,X(4),Y(5)
      S1=I+1
      S3=I-1
      S4=B(I)
      X(S1)=A(S1,S3)+S4
      Y(S3)=A(S3,S1)-S4

optlang!*:='c$
optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)}
  iname s
  declare << a(4,4),x(4),y(5):real; b(5):integer >>;


LONG B[6],I,S1,S3;
DOUBLE A[5][5],S4,X[5],Y[6];
{
    S1=I+1;
    S3=I-1;
    S4=B[I];
    X[S1]=A[S1][S3]+S4;
    Y[S3]=A[S3][S1]-S4;
}

optlang!*:='pascal$

optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)}
  iname s
  declare << a(4,4),x(4),y(5):real; b(5):integer >>;


VAR
    B[0..5],I,S1,S3: INTEGER;
    A[0..4,0..4],S4,X[0..4],Y[0..5]: REAL;
BEGIN
    S1:=I+1;
    S3:=I-1;
    S4:=B[I];
    X[S1]:=A[S1,S3]+S4;
    Y[S3]:=A[S3,S1]-S4
END;

optlang!*:='ratfor$

optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)}
  iname s
  declare << a(4,4),x(4),y(5):real; b(5):integer >>;


INTEGER B(5),I,S1,S3
DOUBLE PRECISION A(4,4),S4,X(4),Y(5)
{
    S1=I+1
    S3=I-1
    S4=B(I)
    X(S1)=A(S1,S3)+S4
    Y(S3)=A(S3,S1)-S4
}

%%% The following command restores the initial situation. %%%

optlang!*:='nil$
\end{verbatim}
}

\section{File Management and Optimization Strategies}\label{SCOPE:files}

Another alternative for the $<$object$>$'s to be optimized is formed
by the sequence {\bf in} ${{\rm file}}_{1}$, ${{\rm file}}_{2}$, ...,
${{\rm file}}_{n}$, $n\ \ge\ 1$.  Each of these files is assumed to
contain one or a list of more assignment statements, obeying the
GENTRAN-assignment rules.  A SCOPE application results in a unified
sequence of assignment statements in the target language. This is
illustrated by the following example, where each file $f_i$ contains one
assignment statement of the form $e_i$ := some expression.

\example\label{ex:5.1}
\index{SCOPE package ! example}
{\small
\begin{verbatim}

3: optimize in f1,f2,f3 iname s;


                               2
                2       (X + Y)            8      2     2
       2*(SIN(X) - COS(E        )+3*COS(X)) *(X+Y) + 4*Y + 4*Y
E1 := ---------------------------------------------------------
                          3*X + 2*Y

\end{verbatim}
\newpage
\begin{verbatim}
E2 := (4*

                          2
           2       (X + Y)            2      3     2
    (SIN(X) - COS(E        )+2*COS(X)) *(X+Y) +(4*X  - 4*Y)

              2
   - 6*X)/(8*X  + 3*Y - 2*X)

                         2
                  (X + Y)
E3 := (4*SIN(COS(E        )) + SIN(X + Y) +

           2           2
       (4*X  - X + 2*Y) )/(3*Y + F(X,G( - COS(X))))


Number of operations in the input is:

Number of (+,-)-operations : 21
Number of (*)-operations : 20
Number of integer exponentiations : 12
Number of other operations : 16


S3 := SIN(X)
S7 := X + Y
S6 := S7*S7
           S6
S4 := COS(E  )
S8 := COS(X)
S28 := S3*S3 - S4
S2 := S28 + 3*S8
S36 := S2*S2
S35 := S36*S36
S30 := 2*Y
S9 := S30 + 3*X
       2*(2*Y + S30*Y + S6*S35*S35)
E1 := ------------------------------
                   S9
S12 := S28 + 2*S8
S29 := 4*X*X
S27 := S29 - X
S31 := 3*Y
       S29 - 2*S9 + 4*S6*S12*S12*S7
E2 := ------------------------------
               S31 + 2*S27
S18 := S30 + S27
       4*SIN(S4) + SIN(S7) + S18*S18
E3 := -------------------------------
           S31 + F(X,G( - S8))



Number of operations after optimization is:

Number of (+,-)-operations : 15
Number of (*)-operations : 24
Number of integer exponentiations : 0
Number of other operations : 11

\end{verbatim}
}

However a switch is available for stepwise performing the optimization
of a set of assignment statements, distributed over different files.
When turning on this {\tt AGAIN} switch the finishing touch is not
\index{AGAIN switch}
done.  Moreover, the system is instructed to save relevant internal
information in combination with the result of the present optimization
run. The thus extended output is assumed to be stored in a file. When
the optimization task is continued during another session this file is
assumed to be read before all other remaining files.  This mode of
operation is illustrated in

\example\label{ex:5.2}
\index{SCOPE package ! example}

{\small
\begin{verbatim}
2: off acinfo$
3: in again$
4: out f5$
5: optimize in f1,f2 iname s;
6: write "end$"$
7: shut f5$

8: off again$
9: on acinfo$
10: optimize in f5,f3 iname t;

S7 := X + Y
\end{verbatim}
\newpage
\begin{verbatim}
        2
S6 := S7

S8 := COS(X)

             2        S6
S18 := SIN(X)  - COS(E  )

S9 := 3*X + 2*Y

                2                    8
       4*Y + 4*Y  + 2*S6*(S18 + 3*S8)
E1 := ---------------------------------
                   S9

        2
S15 := X

                                       2
       4*S15 - 2*S9 + 4*S6*(S18 + 2*S8) *S7
E2 := --------------------------------------
             8*S15 - 2*X + 3*Y

                         2
                  (X + Y)
E3 := (4*SIN(COS(E        )) + SIN(X + Y) +

           2           2
       (4*X  - X + 2*Y) )/(3*Y + F(X,G( - COS(X))))

Number of operations in the total input, i.e. in the 2 input sets is:

Number of (+,-)-operations : 22
Number of (*)-operations : 20
Number of integer exponentiations : 13
Number of other operations : 17


T17 := X + Y
T16 := T17*T17
S8 := COS(X)
T1 := SIN(X)
           T16
T2 := COS(E   )
S18 := T1*T1 - T2
T28 := 2*Y
S9 := T28 + 3*X
T6 := S18 + 3*S8
T36 := T6*T6
T35 := T36*T36
       2*(2*Y + T28*Y + T35*T35*T16)
E1 := -------------------------------
                   S9
S15 := X*X
T9 := S18 + 2*S8
T30 := 4*S15
T26 := T30 - X
T29 := 3*Y
       T30 - 2*S9 + 4*T17*T9*T9*T16
E2 := ------------------------------
               T29 + 2*T26
T19 := T28 + T26
       4*SIN(T2) + SIN(T17) + T19*T19
E3 := --------------------------------
             T29 + F(X,G( - S8))



Number of operations after optimization is:

Number of (+,-)-operations : 15
Number of (*)-operations : 24
Number of integer exponentiations : 0
Number of other operations : 11
\end{verbatim}
}

Since the construction of declarations in combination with some
optimization activity is based on a quite specific use of GENTRAN's
symbol table, one has to operate carefully when optimizing input in
different sessions. A correct list of declarations is only guaranteed,
when the last optimization-command is extended with the required
declaration-information.

\section{Some Possible Shortcomings and Future Versions}\label{SCOPE:future}

The present version of SCOPE may have some shortcomings and possibly also
some inefficiencies.  However, since we are working on a second version,
as stated in~\cite{vanHulzen:90}, we do not have the intention to largely
modify the present version.  However, we intend to improve one special
aspect of the present SCOPE-version:  The combined use of SCOPE and
GENTRAN.  This preliminary version of the manual will shortly be extended
with the description of these combined features.

Bugs and obvious deficiencies will of course be removed.

\section*{Acknowledgements}

The many discussions I had over the past years with Barbara L. Gates,
Victor V. Goldman, Anthony C. Hearn, Jaap Smit and Paul S. Wang about
the symbolic-numeric aspects of computer algebra have been very
stimulating and valuable. They also contributed to the present status
of SCOPE.

Completion of the code would have been impossible without the dedicated
assistance of my students and the frequent discussions we had.
I certainly want to mention Ben Hulshof, Pim van den Heuvel, Marcel van
Heerwaarden, Anco Smit, Johan de Boer and Jan Verheul.

\section*{How to install the Code}\label{SCOPE:install}
\index{SCOPE package ! installation}
The code consists of a number of modules, collected in five files. Two of
these modules play a special role and can best be compiled separately:
gstructr, defining the {\bf gstructr} facilities, and ghorner, containing
the code for the Horner-rules.

The other modules form SCOPE. Since ${{\rm D}}_{0}$ and all operations
on it and on its later versions ${{\rm D}}_{i}$ are defined using {\bf
smacros's} it is essential to read in the module cosmac, containing
these {\bf smacro's}, first. Since we also use part of the GENTRAN
code care have to be taken that GENTRAN is loaded when compiling the
code.
\bibliography{scope}
\bibliographystyle{plain}
\end{document}