Artifact 95654e9f28738fe5ab0545f9a732d38a739032aa5895d313794d27a0d6cddb0d:
- File
r34.1/doc/scope.tex
— part of check-in
[f2fda60abd]
at
2011-09-02 18:13:33
on branch master
— Some historical releases purely for archival purposes
git-svn-id: https://svn.code.sf.net/p/reduce-algebra/code/trunk/historical@1375 2bfe0521-f11c-4a00-b80e-6202646ff360 (user: arthurcnorman@users.sourceforge.net, size: 56591) [annotate] [blame] [check-ins using] [more...]
\documentstyle[11pt,reduce]{article} \title{SCOPE, a Source-Code Optimization PackagE for REDUCE} \date{} \author{J.A. van Hulzen \\ Twente University, The Netherlands \\ Email: infhvh@cs.utwente.nl} \begin{document} \maketitle \index{SCOPE package !} A survey of the strategy behind and the facilities of SCOPE, a Source-Code Optimization PackagE for {\REDUCE} is given. We avoid a detailed discussion of the different algorithms and concentrate on the user aspects of the package. Examples of straightforward and more advanced usage are shown. A combined use of GENTRAN and SCOPE is not yet discussed in this preliminary version of the SCOPE manual. \index{GENTRAN ! with SCOPE package} \section{Introduction}\label{SCOPE:intro} An important application of computer algebra systems is the generation of code for numerical purposes via automatic or semiautomatic program generation. GENTRAN~\cite{Gates:85,Gates:84} a flexible general-purpose package, was especially developed to assist in such a task, when using MACSYMA or {\REDUCE}. \index{optimization} Attendant to automatic program generation is the problem of automatic source-code optimization. This is a crucial aspect because code generated from symbolic computations often tends to be made up of lengthy arithmetic expressions. One of our test examples contained, for instance, 20534 additions and subtractions, 4174 multiplications, 12473 integer exponentiations and 7990 other operations, such as function calls. These lengthy expressions will be grouped together in blocks of straight-line code in a program for numerical purposes. The main objective of source-code optimization is to minimize the number of (elementary) arithmetic operations in such blocks. This form of optimization is often helpful in reducing redundancy in expressions. Simplification algorithms applied on expressions viewed as entities, neither guarantee complete structure preservation nor allow improvements inside expressions by renaming subexpressions. \index{optimizing compilers} Optimizing compilers ought to deal effectively and efficiently with the average, hand coded program. The enormous, arithmetic intensive expressions, easily producable by a computer algebra system, fall outside the range of the FORTRAN programs, once analyzed and discussed by Knuth~\cite{Knuth:71}. He suggested that optimization of the arithmetic in such a program is slightly overdone. This may explain why even in reasonably recent literature, such as~\cite{Aho:86}, optimization of arithmetic code is hardly discussed. The DAG models, usually employed for optimization of arithmetic code, hardly allow the application of any algebraic identity (see for instance~\cite{Gonzales}). These models often force constants to act as if they were indeterminates and powers as objects requiring function calls, i.e. they force to think of $2a\ +\ 3b$ and $4 a \ +\ 6b$ or of $a^2$, $a^{4}$ and $a^{6}$ as being different entities. Our optimization strategy however, requires the validity of some elementary algebraic laws. We employ heuristic techniques to reduce the arithmetic complexity of the given representation of a set of input expressions $ {\rm E}_in$, thus producing a set of output expressions ${\rm E}_out$. The optimized version of the earlier mentioned test example contains ``only'' 4316 additions and subtractions, 4919 multiplications, 13 integer exponentiations and 60 other operations. ${\rm E}_{in}$ and ${\rm E}_{out}$ define blocks of code, which would compute the same exact values for the same exact inputs, thus implicitly proving the correctness of the underlying software. Obviously the use of ${\rm E}_{out}$ saves a considerable amount of execution time in comparison with the application of ${\rm E}_{in}$. Johnson et al~\cite{Johnson:79} suggest that such transformations do not destabilize the computations. However this is only apparent after a careful error analysis of both ${\rm E}_{in}$ and ${\rm E}_{out}$. In view of the size of both ${\rm E}_{in}$ and ${\rm E}_{out}$ such an analysis has to be automatized as well. Work in this direction is in progress. The current version of SCOPE, our Source-Code Optimization PackagE, is written in RLISP. It can be used as an extension of {\REDUCE}. It allows to subject almost any set of proper {\REDUCE} assignment \index{common subexpressions (cse)} \index{cse (common subexpressions)} statements to a heuristic search for common (sub)expressions (cse's). The output is obtained as a sequence of assignment statements, by default given in {\REDUCE} syntax. The first version of the package was designed to optimize the description of {\REDUCE}-statements, generated by NETFORM~\cite{Smit:81,Smit:82}. This version was tailored to a restrictive class of problems, occurring mainly in electrical network theory, thus implying that the right-hand sides (rhs's) in the input were limited to elements of ${{\rm {\bf }} Z}_2$[V], where V is a set of identifiers. The second version~\cite{vanHulzen:83} allowed rhs's from {\bf Z}[V]. For both versions the validity of the commutative and the associative law was assumed. A third version evolved from the latter package by allowing to apply the distributive law, i.e. by replacing (sub)expressions like $a.b\ +\ a.c$ by $a.(b\ +\ c)$ whenever possible. But the range of possible applications of this version was really enlarged by redefining V as a set of kernels, implying that, at least by that time, almost any proper {\REDUCE} expression could function as a rhs. The mathematical capabilities of this version are shortly summarized in~\cite{Wang:84}, in the context of code generation and optimization for finite-element analysis. It is used \index{GENTRAN ! with SCOPE package} in combination with GENTRAN, for the construction of Jacobians and Hessians~\cite{Heuvel:89} and also in experiments with strategies for code vectorization~\cite{Goldman:89}. It still assumes constant coefficients to be elements of {\bf Z}. The user-interface of the present version relies on some GENTRAN facilities. In~\cite{vanHulzen:81,vanHulzen:83} we described the overall optimization strategy used for SCOPE as a composite function ${{\rm R}^{-1}}\ \circ\ {{\rm T}}\ \circ\ {{\rm R}}$. The function R defines how to store the input ${{\rm E}}_{0}$ in an expression data base ${{\rm D}}_{0}$. This ${{\rm D}}_{0}$ is formed by two matrix structures and a function table. The incidence matrices represent ${{\rm E}}_{0}$, a set of arithmetic expressions, in a two-dimensional structure where the rows represent expression or subexpression references and the columns represent identifier references such as variable and function names. The function names are taken from the function table, consisting of a list of pairs of function applications occurring in ${{\rm E}}_0$, and system selected names functioning as their placeholders during the optimization process. Arguments of functions are similarly entered in the matrix structures when ever relevant. A given subexpression will be entered in one of two types of incidence matrices, one for sums and one for products, depending on the nature of the arithmetic operation at the top level of the expression. The two matrices are correlated by auxiliary predecessor-successor information at the row level for every subexpression reference. The actual entries in the matrices are either multiplicative numerical coefficients for the sums matrix or powers for the products matrix. The inverse function ${{\rm R}}^{{-1}}$ defines the output production. The function T defines the optimization process itself. It essentially consists of a heuristic remodeling of the (extendable) matrices in combination with storing information required for a fast retrieval and correct insertion of the detected cse's in the output. This is accomplished by an iteratively applied search, resulting in a stepwise reduction of the arithmetic complexity of the input set, using an extended version of Breuer's \index{Breuer's Algorithm} grow factor algorithm~\cite{Breuer:69,vanHulzen:81,vanHulzen:83}. It is applied until no further profit is gained, i.e. until the reduction in arithmetic complexity stops. Before producing output, a finishing touch can be performed to further reduce the arithmetic complexity with some locally applied techniques. The overall process can be summarized as follows: $$ {{\rm R}}\ :\ {{{\rm E}}_0}\ \to\ ({{{\rm D}}_0},{{{\rm profit}}_0}) $$ $$ {{{\rm T}}_{\beta}}\ :\ ({{{\rm D}}_i},{{{\rm profit}}_i})\ \to\ ({{{\rm D}}_{{i+1}}},{{{\rm profit}}_{{i+1}}})\ ,\ {{\rm i}}\ =\ 0,..., \lambda - 1. $$ $$ {{\rm F}}\ :\ ({{{\rm D}}_{\lambda}},{{{\rm profit}}_{\lambda}})\ \to\ {D_{\lambda}} $$ $$ {{{\rm R}}^{{-1}}}\ :\ {D_{\lambda}}\ \to\ {{{\rm E}}_{\lambda}} $$ ${{\rm D}}_{0}$ is created as a result of an R-application performed on input ${{\rm E}}_{0}$. The termination condition depends on some profit criterion related to the arithmetic complexity of the latest version of the input, ${{{\rm D}}_i}$. Hence we assume ${{{\rm profit}}_i}\ =\ true$ for $i\ =\ 0,..., \lambda -1$ and ${{{\rm profit}}_\lambda}\ =\ false$. The function T is composite as well, and defined by ${{\rm T}}\ =\ {{\rm F}}\ \circ\ {{{{\rm T}}_{\beta}}^{\lambda}}$. ${{\rm T}}_{\beta}$ defines one iteration step, i.e. one application of the extended version of Breuer's algorithm. The function F defines a finishing touch, resulting in the final version $D_{\lambda}$ of ${{\rm D}}_{0}$, used to produce the output ${{\rm E}}_{\lambda}$. We omit a further discussion of the different algorithms used for optimization; this can be found in~\cite{vanHulzen:81,vanHulzen:83}, for instance. The present version makes use of some GENTRAN facilities to translate its input into LISP prefix forms. This approach can be seen as a form of preprocessing, i.e. ${{\rm E}}_{0}$, the input for R can be considered as a list of {\bf setq}-applications The GENTRAN-SCOPE Interface, allows other preprocessing activities. We introduced the optional use of GENTRAN's {\bf declare}-statement, \index{DECLARE statement ! GENTRAN} thus allowing specification of the type of some or all of the lhs's and of the identifiers used to construct the rhs's. In addition to the prefixlist, a list of declarations in the Target Language can be produced, based on default assumptions concerning untyped lhs's and identifiers in the input. This facility is based on the use of GENTRAN's symbol table. Before optimizing rhs's it might be attractive to rewrite them using a \index{Horner's Rule} generalized form of Horner's Rule. We designed such a command, which does not necessarily have to be used in the context of SCOPE. It can operate on a set of assignment statements and it can deliver the result in the form of a sequence of prefix forms, defining the rewritten statements. Subjecting such a sequence of prefix forms to a SCOPE application implies that the GENTRAN approach is not directly applicable. The GENTRAN := and :=: assignment operators define literal translation or rhs-simplification, respectively. Therefore we extended our Interface with special facilities, allowing SCOPE to accept the result of the application of such a command literally. Besides the {\bf g}(eneralized) {\bf horner} (rule) we have a command, generalizing the impact of the {\bf structr}-command to a set of assignment statements. We discuss and illustrate a straightforward use of SCOPE in section~\ref{SCOPE:basic} In section~\ref{SCOPE:pre} we introduce the special commands {\bf ghorner} and {\bf gstructr} and show how to use them, also in combination with SCOPE. We use section~\ref{SCOPE:decl} to discuss the declaration facilities and section~\ref{SCOPE:files} to show the different file-handling possibilities and modes of operation. Section!\ref{SCOPE:future} discusses future work. Guidelines for installing the package are given in the final section. \section{Source-Code Optimization : The Basic Facilities}\label{SCOPE:basic} \subsection{The Strategy} Before illustrating the effect of applying SCOPE, we shortly describe the operations, covered by the functions ${{\rm T}}_{\beta}$ and F, mentioned in the previous section. The function R accepts assignment statements given in prefix form. We can divide these forms in three categories using their leading operator. We distinguish between PLUS, TIMES and OTHER-operators. Leaving aside the OTHER-operators for awhile, we reduce the structure of possible rhs's to those of not necessarily expanded multivariate polynomials with integer coefficients. Assuming the leading operator \ttindex{PLUS} is PLUS, the operands, being terms of a polinomial (for instance $3a\ +\ 2b\ +\ 3 {b^2} c (3a\ +\ 2b){(c\ +\ d)^2}$), can either be primitive or composite. A primitive term is an integer, an identifier or the product of an integer and an identifier. Hence the primitive terms of a sum form an (eventually empty) linear expression ($3a\ +\ 2b$). Composite terms are products, which cannot be qualified as a primitive term ($3 {b^2} c (3a\ +\ 2b) {(c\ +\ d)}^{2}$) Like sums, \ttindex{TIMES} prefix forms with a TIMES-operator, can have a primitive and/or composite part. The primitive part of a product is an (eventually empty) power product(${b^2} c$). The composite part is a product of sums and/or powers of sums ($(3a\ +\ 2b) {(c\ +\ d)^2}$). Observe that our expression-structure discussed so far is still too simple. \ttindex{EXPT} Powers of sums have EXPT as their leading operator (${(c\ +\ d)}^{2}$). Similarly, a product can have a integer coefficient ($3 {b^2} c$). This description suggests, as already indicated in section~\ref{SCOPE:intro}, that we can consider any set of rhs's as being built with linear expressions and power products only. This allows to map such a set onto two incidence matrices: One defining the linear expressions, using the coefficients, and another defining the power products, using the exponents. The rows of these matrices can be associated with the (sub)expressions under consideration and the columns with the identifiers, used to construct these expressions. This is why we need to assume the validity of the commutative and associative law. To be able to retrieve the structure of the assignment statements forming the input set, we need to combine additional information with the rows and columns of these matrices. Essential is, for instance, storage of the exponents of sums and of the coefficients of products. Equally important is storage of the lhs's, which are the rhs-recognizers. Details are given in~\cite{vanHulzen:83}. Example~\ref{ex:2.2.1} on page~\pageref{ex:2.2.1} and example~\ref{ex:2.2.2} on page~\pageref{ex:2.2.2} provide illustrations of these data structures. When introducing kernels, i.e. when assuming the set of OTHER-operators to be not empty, we have to store lists of non-commutable arguments. Therefore a function table of pairs is made, formed by the kernels and their internally created names. These names are entered in the matrices as new identifiers. The arguments of such a kernel can be arbitrary {\REDUCE}-expressions, which also have to be incorporated in the matrices. Hence the function table is created recursively. \index{cse (common subexpressions)} What is a cse and how do we locate its occurrences? A (sub)expression is common when it occurs repeatedly in the input. The occurrences are, as part of the input, distributed over the matrices, and shown as equivalent integer (sub)patterns. In fact, we repeatedly search for completely dense (sub)matrices of rank 1. The expression $2a\ +\ 3c$ is a cse of ${e_1} \ =\ 2a\ +\ 4b\ +\ 3c$ and ${e_2}\ =\ 4a\ +\ 6c\ +\ 5d$, representable by (2,4,3,0) and (4,0,6,5), respectively. We indeed have to assume commutativity, so as to be able to produce new patterns (2,0,3,0,0), (0,4,0,0,1) and (0,0,0,5,2), representing $s\ =\ 2a\ +\ 3c$, ${e_1}\ =\ 4b\ +\ s$ and ${e_2}\ =\ 5d\ +\ 2s$, respectivily, and thus saving one addition and one multiplication. Such an additive cse can be a factor in a composite (sub)product, which in turn can be reduced to a primitive product, when the cse is replaced by a new symbol. Therefore an essential part of an optimization step is regrouping of information. This migration of information between the matrices is performed if the Breuer-searches are temporarily completed. After this regrouping the distributive law is applied, eventually also leading to a further regrouping. If at least one of these actions leads to a rearrangement of information the function ${\rm T} _{\beta}$ is again applied. Observe that this ${{\rm T}}_{\beta}$ is also a composite function. In view of the iterative character of the optimization process we always accept minimal profits. A similar search is performed to detect multiplicative cse's, for instance occuring in ${e_1}\ =\ {a^2} {b^4} {c^3}$ and ${e_2}\ =\ {a^4} {c^6} {d^5}$. However, given a power product $\prod_{i=1}^m {x_i}^{{a}_i}$, any product $\prod_{i=1}^m {x_i}^{{b}_i}$, such that some ${b_i}\ \ {a_i}$, for i = 1(1)m, can function as a cse. We therefore extend the search for multiplicative cse's by employing this property, and as indicated in~\cite{vanHulzen:83}. The function F -defining the finishing touch- performs one-row and/or one-column searches. Once the extended Breuer-searches do not lead to further reduction in the arithmetic complexity we try -applying it- to improve what is left. The integer coefficients in (sub)sums can have, possibly locally, a gcd, which can be factored out. One-column operations serve to discover and properly replace integer multiples of identifiers. As part of the output-process we subject all exponentiations left - at most one for each identifier - to an addition chain algorithm. Another action, covered by F is therefore replacement by a new symbol of those (sub)sums, which are raised to some integer power. \subsection{The Facilities} {\REDUCE} allows, roughly speaking, two modes of operation: {\tt ON EXP} or {\tt OFF EXP}. The first alternative is the default setting leading to expanded forms. The latter gives unexpanded forms, as discussed by Hearn in some detail~\cite{Hearn:85,Hearn:86}. It is obvious that the {\tt OFF EXP} setting is in \index{EXP switch} general preferable over the {\tt ON EXP} setting when attempting to optimize the description of a set of assignment statements. Starting a {\REDUCE} session gives the initial state. All switches have their initial value: {\tt ON EXP, PERIOD} and {\tt OFF FORT}, for instance. When loading SCOPE we create a new operating environment, without disturbing the current state of the system. The result of an application of SCOPE can be influenced by the use of certain {\REDUCE}- or SCOPE-switches. The influence of {\tt EXP} is obvious. \index{ACINFO switch} \index{echo ! in SCOPE} By default the switch {\tt ACINFO} is turned on. This guarantees an echo of the form in which the assignment statements are consumed by SCOPE. It also guarantees tables with the numbers of arithmetic operations, occuring in ${{{\rm E}}_0}$ and ${{\rm E}}_{\lambda}$, respectively, to be printed. Some switches are available to obtain information about the process itself. They were introduced to assist in debugging \index{tracing ! SCOPE package} \index{PRIMAT switch} and testing. {\tt PRIMAT} can be used to visualize both ${{\rm D}}_{0}$ and \index{PRIALL switch} $D_{\lambda}$. {\tt PRIALL} is a switch which combines not only the effect of {\tt ACINFO} and {\tt PRIMAT}, but also allows to obtain timings of the different sub-algorithms of SCOPE. Output is by default given in {\REDUCE} syntax, but FORTRAN syntax is \index{PREFIX switch} possible in the usual way. The switch {\tt PREFIX} can be used to obtain the prefixlist itself as output. \index{OPTIMIZE command} A SCOPE action is easily performed. Either the command ``{\bf optimize} $<$ object$>$;'' or the command ``{\bf optimize} $<$object$>$ {\bf iname} $<$cse-prefix$>$;'' suffices. The $<$object$>$ to be elaborated is either one assignment statement or a list of such statements, all obeying the GENTRAN rules. The $<$cse-prefix$>$ is an identifier, used to generate the cse-names, by extending it with an integer part. The {\bf gensym}-function is applied when the {\bf iname}-extension is omitted. We now illustrate the use of SCOPE through some small examples, by showing parts of {\REDUCE} sessions. \example\label{ex:2.2.1} \index{SCOPE package ! example} The multivariate polynomial Z is a sum of 6 composite terms. These terms, monomials, are constant multiples of primitive products. A picture of ${{\rm D}}_{0}$ is shown after the input echo. The sums-matrix consists of only one row, identifiable by its Fa(the)r Z, the lhs. Its exponent is given in the E(xponent or )C(oefficient) field. The 6 monomials are stored in the products-matrix. The coefficients are stored in the EC-fields and the predecessor row index, 0, is given in the Far-field. Before the $D_{\lambda}$ picture is given the effect of the optimization process, the output and the operator counts are shown. The optimized form of Z is obtained by applying the distributive law. The output also shows applications of an addition chain algorithm (\cite{Knuth:80} 441-466) as part of ${{\rm R}}^{{-1}}$, although its use in example~\ref{ex:2.2.3} is more apparent. Observe that the output illustrates the heuristic character of the optimization process: In this particular case the rhs can be written as a polynomial in S3, thus saving one extra multiplication. {\small \begin{verbatim} on primat$ optimize z:=a^2*b^2+10*a^2*m^6+a^2*m^2+2*a*b*m^4+ 2*b^2*m^6+b^2*m^2 iname s; 2 2 2 6 2 2 4 2 6 2 2 Z := A *B + 10*A *M + A *M + 2*A*B*M + 2*B *M + B *M Sumscheme : || EC|Far - ------------ 0|| 1| Z - ------------ Productscheme : | 0 1 2| EC|Far - --------------------- 1| 2 2| 1| 0 2| 6 2| 10| 0 3| 2 2| 1| 0 4| 4 1 1| 2| 0 5| 6 2 | 2| 0 6| 2 2 | 1| 0 - --------------------- 0 : M 1 : B 2 : A Number of operations in the input is: Number of (+,-)-operations : 5 Number of (*)-operations : 10 Number of integer exponentiations : 11 Number of other operations : 0 S0 := B*A S4 := M*M S8 := B*B S1 := S4*S8 S9 := A*A S2 := S4*S9 S3 := S4*S4 Z := S1 + S2 + S0*(2*S3 + S0) + S3*(2*S1 + 10*S2) Number of operations after optimization is: Number of (+,-)-operations : 5 Number of (*)-operations : 12 Number of integer exponentiations : 0 Number of other operations : 0 Sumscheme : | 0 3 4 5| EC|Far - ------------------------ 0| 1 1| 1| Z 15| 2 10| 1| 14 17| 2 1 | 1| 16 - ------------------------ 0 : S3 3 : S0 4 : S1 5 : S2 Productscheme : | 8 9 10 11 17 18 19 20| EC|Far - ------------------------------------ 7| 1 1| 1| S0 8| 1 2 | 1| S1 9| 1 2| 1| S2 10| 2 | 1| S3 11| 2 | 1| S4 14| 1 | 1| 0 16| 1 | 1| 0 - ------------------------------------ 8 : S4 9 : S3 10 : S2 11 : S1 17 : S0 18 : M 19 : B 20 : A \end{verbatim}} \example\label{ex:2.2.2} \index{SCOPE package ! example} The input echo below shows the literal copy of the first assignment, in accordance with the GENTRAN := operator. The second assignment, again in accordance with the GENTRAN operator ::=:, has a rhs in expanded form. \newline The ${{\rm D}}_{0}$ picture shows that during parsing string matching of kernels in prefix form already contributes to optimization : S2 = C*X + D and S3 =SIN(S2) are stored once. Application of the distributive law gives the original structure of A(1,1) back. {\small \begin{verbatim} on primat$ operator a$ k:=j:=1$ u:=c*x+d$ v:=sin(u)$ optimize {a(k,j) := v*(v^2*cos(u)^2+u), a(k,j) ::=:v*(v^2*cos(u)^2+u)} iname s; 2 2 A(K,J) := V*(V *COS(U) + U) 2 3 A(1,1) := COS(C*X + D) *SIN(C*X + D) + SIN(C*X + D)*C*X + SIN(C*X + D)*D \end{verbatim} \newpage \begin{verbatim} Sumscheme : | 7 8| EC|Far - ------------------ 1| 1 | 1| 0 3| | 1| A(1,1) 5| 1| 1| S2 - ------------------ 7 : U 8 : D Productscheme : | 0 1 2 3 4 5 6| EC|Far - --------------------------------- 0| 1| 1| A(K,J) 2| 2 2| 1| 1 4| 3 2 | 1| 3 6| 1 1 | 1| 5 7| 1 1 1 | 1| 3 8| 1 1 | 1| 3 - --------------------------------- 0 : D 1 : S3=SIN(S2) 2 : S1=COS(S2) 3 : X 4 : C 5 : S0=COS(U) 6 : V Number of operations in the input is: Number of (+,-)-operations : 7 Number of (*)-operations : 10 Number of integer exponentiations : 4 Number of other operations : 5 S6 := COS(U)*V S9 := S6*S6 A(K,J) := V*(U + S9) S2 := D + X*C S3 := SIN(S2) S7 := S3*COS(S2) S8 := S7*S7 A(1,1) := S3*(S2 + S8) Number of operations after optimization is: Number of (+,-)-operations : 3 Number of (*)-operations : 7 Number of integer exponentiations : 0 Number of other operations : 3 Sumscheme : | 2 12 13| EC|Far - --------------------- 1| 1 | 1| 0 3| | 1| A(1,1) 5| 1| 1| S2 11| 1 | 1| 10 - --------------------- 2 : S2 12 : U 13 : D Productscheme : | 0 1 5 6 7 8 9 10 11| EC|Far - --------------------------------------- 0| 1| 1| A(K,J) 2| 2 | 1| 1 4| 2 | 1| 11 9| 1 1 | 1| 5 10| 1 | 1| 3 13| 1 1| 1| S6 14| 1 1 | 1| S7 - --------------------------------------- 0 : S7 1 : S6 5 : D 6 : S3=SIN(S2) 7 : COS(S2) 8 : X 9 : C 10 : COS(U) 11 : V \end{verbatim}} \example\label{ex:2.2.3} \index{SCOPE package ! example} The effect is shown of a finishing touch application, in combination with FORTRAN output. During output preparation {\tt S0} is rewritten, using the earlier mentioned addition chain algorithm. {\small \begin{verbatim} on fort$ off acinfo,period$ optimize z:=96*a+18*b+9*c+3*d+6*e+18*f+6*g+5*h+5*k+3)^13 iname s; S0=5*(H+K)+3*(3*C+D+1+6*(B+F)+2*(A+E+G)) S4=S0*S0 S3=S0*S4 S2=S3*S3 S1=S2*S2 Z=S0*S1 \end{verbatim}} \example\label{ex:2.2.4} \index{SCOPE package ! example} Recovery of repeatedly occurring integer multiples of identifiers, as part of the finishing touch, is illustrated. The switch {\tt ACINFO} is turned off. \begin{verbatim} optimize {x:=3*a*p, y:=3*a*q, z:=6*a*r+2*b*p, u:=6*a*d+2*b*q, v:=9*a*c+4*b*d, w:=4*b} iname s; S1 := 3*A X := S1*P Y := S1*Q S2 := 6*A S3 := 2*B Z := S3*P + S2*R U := S3*Q + S2*D S0 := 4*B V := S0*D + 9*A*C W := S0 \end{verbatim} \example\label{ex:2.2.5} \index{SCOPE package ! example} The effect of {\tt ON EXP} or {\tt OFF EXP} on the result of a SCOPE-application is now shown by optimizing the representation of the determinant of a symmetric (3,3) matrix M. Besides differences in computing time we also observe that the arithmetic complexity of the optimized version of the expanded representation of the determinant is about the same as the not optimized form of the unexpanded representation. {\small \begin{verbatim} matrix M(3,3)$ m(1,1):=18*cos(q3)*cos(q2)*m30*p^2-sin(q3)^2*j30y+sin(q3)^2*j30z- 9*sin(q3)^2*m30*p^2+j1oy+j30y+m10*p^2+18*m30*p^2$ m(2,1):= m(1,2):=9*cos(q3)*cos(q2)*m30*p^2-sin(q3)^2*j30y+sin(q3)^2*j30z- 9*sin(q3)^2*m30*p^2+j30y+9*m30*p^2$ m(3,1):= m(1,3):=-9*sin(q3)*sin(q2)*m30*p^2$ m(2,2):=-sin(q3)^2*j30y+sin(q3)^2*j30z-9*sin(q3)^2*m30*p^2+j30y+ 9*m30*p^2$ m(3,2):= m(2,3):=0$ m(3,3):=9*m30*p^2+j30x$ optimize detm:=:det(M) iname s; 4 2 6 3 4 2 4 2 DETM := 729*SIN(Q3) *SIN(Q2) *P *M30 + 81*SIN(Q3) *SIN(Q2) *P *M30 * 4 2 4 2 2 2 6 J30Y - 81*SIN(Q3) *SIN(Q2) *P *M30 *J30Z - 729*SIN(Q3) *SIN(Q2) *P * 3 2 2 4 2 2 6 3 M30 - 81*SIN(Q3) *SIN(Q2) *P *M30 *J30Y - 729*SIN(Q3) *P *M30 - 81* 2 6 2 2 4 2 2 4 2 SIN(Q3) *P *M30 *M10 - 81*SIN(Q3) *P *M30 *J30Y + 81*SIN(Q3) *P *M30 2 4 2 2 4 2 *J30Z - 81*SIN(Q3) *P *M30 *J1OY - 81*SIN(Q3) *P *M30 *J30X - 9* 2 4 2 4 2 4 SIN(Q3) *P *M30*J30Y*M10 + 9*SIN(Q3) *P *M30*J30Z*M10 - 9*SIN(Q3) *P 2 2 2 2 *M30*M10*J30X - 9*SIN(Q3) *P *M30*J30Y*J1OY - 9*SIN(Q3) *P *M30*J30Y* 2 2 2 J30X + 9*SIN(Q3) *P *M30*J30Z*J1OY + 9*SIN(Q3) *P *M30*J30Z*J30X - 9* 2 2 2 2 2 2 SIN(Q3) *P *M30*J1OY*J30X - SIN(Q3) *P *J30Y*M10*J30X + SIN(Q3) *P * 2 2 J30Z*M10*J30X - SIN(Q3) *J30Y*J1OY*J30X + SIN(Q3) *J30Z*J1OY*J30X - 2 2 6 3 2 2 4 2 729*COS(Q3) *COS(Q2) *P *M30 - 81*COS(Q3) *COS(Q2) *P *M30 *J30X + 6 3 6 2 4 2 4 2 729*P *M30 + 81*P *M30 *M10 + 81*P *M30 *J30Y + 81*P *M30 *J1OY + 81 4 2 4 4 2 *P *M30 *J30X + 9*P *M30*J30Y*M10 + 9*P *M30*M10*J30X + 9*P *M30*J30Y 2 2 2 *J1OY + 9*P *M30*J30Y*J30X + 9*P *M30*J1OY*J30X + P *J30Y*M10*J30X + J30Y*J1OY*J30X Number of operations in the input is: Number of (+,-)-operations : 36 Number of (*)-operations : 148 Number of integer exponentiations : 84 Number of other operations : 32 S0 := SIN(Q3) S30 := S0*S0 S1 := SIN(Q2) S34 := S1*S1 S35 := P*P S7 := S35*M30 S33 := S7*S7 S5 := S33*J30Y S6 := S30*S7 S8 := S30*M10 S49 := COS(Q2)*COS(Q3) S9 := S49*S49 S11 := S34*S30*S30 S22 := S35*S7 S14 := S30*J30Z S19 := S35*J30X S23 := J30X*J10Y S31 := S33*S7 S47 := 81*S33*J30X S39 := - S47 - S23*J30Y - 81*S33*J1OY S40 := - 81*S30*S5 - 729*S33*S6 S45 := 9*S6*J30Z S46 := 9*S6 S48 := 81*S5 DETM := S48 + S40 - S39 + 729*S31 + ( - J1OY - J30X)*(9*(S6*J30Y - S7 *J30Y) - S45) + (J30Z - J30Y)*(9*S22*S8 + S19*S8) + 9*(M10 - S8 )*(S22*J30X + 9*S22*S7) + M10*J30Y*(9*S22 + S19) + S23*(S14 + 9*S7 - S46) + S39*S30 + S31*(729*(S11 - S9)) + S34*(S40 - S46*S45) - S47*S9 + 81*S33*S14 + S48*S11 Number of operations after optimization is: Number of (+,-)-operations : 29 Number of (*)-operations : 58 Number of integer exponentiations : 0 Number of other operations : 4 off exp$ optimize detm:=:det(M) iname s; 2 2 2 DETM := ((9*P *M30 + J30Y - J30Z)*SIN(Q3) - (18*M30 + M10)*P - 18 2 2 *COS(Q3)*COS(Q2)*P *M30 - J30Y - J1OY)*((9*P *M30 + J30Y - 2 2 2 J30Z)*SIN(Q3) - 9*P *M30 - J30Y)*(9*P *M30 + J30X) - 2 2 2 ((9*P *M30 + J30Y - J30Z)*SIN(Q3) - 9*COS(Q3)*COS(Q2)*P *M30 - 2 2 2 2 9*P * M30 - J30Y) *(9*P *M30 +J30X) + 81*((9*P *M30+J30Y - J30Z)* 2 2 2 2 4 2 SIN(Q3) - 9*P *M30 - J30Y)*SIN(Q3) *SIN(Q2) *P *M30 Number of operations in the input is: Number of (+,-)-operations : 24 Number of (*)-operations : 42 Number of integer exponentiations : 21 Number of other operations : 10 S0 := SIN(Q3) S9 := S0*S0 S8 := P*P S5 := S8*M30 S6 := S5*COS(Q2)*COS(Q3) S15 := 9*S5 S13 := (S15 + J30Y - J30Z)*S9 S14 := S13 - S15 - J30Y S3 := S14 - 9*S6 S4 := SIN(Q2) DETM := (S15 + J30X)*(S14*(S13 - 18*S6 - J30Y - J1OY - S8*(18*M30 + M10)) - S3*S3) + 9*S15*S14*S9*S5*S4*S4 Number of operations after optimization is: Number of (+,-)-operations : 13 Number of (*)-operations : 20 Number of integer exponentiations : 0 Number of other operations : 4 \end{verbatim}} We can also use this example to show that correctness of the results \index{NAT switch} can easily be verified. When turning off the switch {\tt NAT} and storing the result of a SCOPE application in a file, it is of course possible to read the result in again. But we then operate in a normal {\REDUCE}-like way. This implies that all cse-names are automatically replaced by their values. We show the ``correctness'' of SCOPE by scoring the optimized version of the expanded form of the determinant of M, called detm1 in file out1 and the result of a SCOPE-application on the unexpanded form, detm2, in file out2, followed by reading both files and by subtracting detm2 from detm1, resulting in the value 0. This is of course an ad hoc correctness-proof for one specific example. It is in fact another way of testing the code of the package. So, assuming SCOPE is loaded and the matrix M is known to the system, all we have to do is: {\small \begin{verbatim} 2: off acinfo,nat$ 3: out out1$ 4: optimize detm1:=:det(M) iname s; 5: write "end$"$ 6: shut "out1"$ 7: off exp$ 8: out out2$ 9: optimize detm2:=:det(M) iname t; 10: write "end$"$ 11: shut out2$ 12: on nat$ 13: in out1; S0 := SIN(Q3)$ S30 := S0*S0$ S1 := SIN(Q2)$ S34 := S1*S1$ S35 := P*P$ S7 := S35*M30$ S33 := S7*S7$ S5 := S33*J30Y$ S6 := S30*S7$ S8 := S30*M10$ S49 := COS(Q2)*COS(Q3)$ S9 := S49*S49$ S11 := S34*S30*S30$ S22 := S35*S7$ S14 := S30*J30Z$ S19 := S35*J30X$ S23 := J30X*J1OY$ S31 := S33*S7$ S47 := 81*S33*J30X$ S39 := - S47 - S23*J30Y - 81*S33*J1OY$ S40 := - 81*S30*S5 - 729*S33*S6$ S45 := 9*S6*J30Z$ S46 := 9*S6$ S48 := 81*S5$ DETM1 := S48 + S40 - S39 + 729*S31 + ( - J1OY - J30X)*(9*(S6*J30Y - S7*J30Y) - S45) + (J30Z - J30Y)*(9*S22*S8 + S19*S8) + 9*(M10 - S8)*(S22*J30X + 9 *S22*S7) + M10*J30Y*(9*S22 + S19) + S23*(S14 + 9*S7 - S46) + S39*S30 + S31*(729*(S11 - S9)) + S34*(S40 - S46*S45) - S47*S9 + 81*S33*S14 + S48*S11$ end$ 14: in out2; T0 := SIN(Q3)$ T9 := T0*T0$ T8 := P*P$ T5 := T8*M30$ T6 := T5*COS(Q2)*COS(Q3)$ T15 := 9*T5$ T13 := (T15 + J30Y - J30Z)*T9$ T14 := T13 - T15 - J30Y$ T3 := T14 - 9*T6$ T4 := SIN(Q2)$ DETM2 := (T15 + J30X)*(T14*(T13 - 18*T6 - J30Y - J1OY - T8*(18*M30 + M10)) - T3*T3) + 9*T15*T14*T9*T5*T4*T4$ end$ 15: detm1-detm2; 0 \end{verbatim} } \example\label{ex:2.2.6} \index{SCOPE package ! example} This example serves to show how SCOPE deals with rational exponents. All rational exponents of a variable are collected. The least common multiple lcm of the denominators of these rational exponents is computed and the variable is replaced by a possibly newly selected variable name, denoting the variable raised to the power 1/lcm. This facility is only efficient for what we believe to be problems occurring in computational practice. This is easily verified by extending the sum we are elaborating here with some extra terms. Producing FORTRAN-output shows an implied danger, due to a shortcoming in GENTRAN. This rational exponent will in practice act as if it were 0. This example is also used to show the effect of turning on the switch {\tt PRIALL}. {\small \begin{verbatim} on fort,priall$ optimize z:=:for j:=2:6 sum q^(1/j) iname s; 1/6 1/5 1/4 1/3 Z := Q + Q + Q + Q + SQRT(Q) Sumscheme : || EC|Far - ------------ 0|| 1| Z - ------------ Productscheme : | 0| EC|Far - --------------- 1| 10| 1| 0 2| 12| 1| 0 3| 15| 1| 0 4| 20| 1| 0 5| 30| 1| 0 - --------------- 0 : Q Number of operations in the input is: Number of (+,-)-operations : 4 Number of (*)-operations : 0 Number of integer exponentiations : 0 Number of other operations : 5 Time: 2992 ms Breuer search : Time: 867 ms Removal of different names for identical cse's : Time: 17 ms Change Scheme : Time: 0 ms Local Factorization : Time: 34 ms Breuer search : Time: 204 ms Removal of different names for identical cse's : Time: 0 ms Change Scheme : Time: 17 ms Local Factorization : Time: 0 ms Breuer search : Time: 187 ms Removal of different names for identical cse's : Time: 0 ms Change Scheme : Time: 17 ms Local Factorization : Time: 0 ms Breuer search : Time: 119 ms Removal of different names for identical cse's : Time: 0 ms Change Scheme : Time: 17 ms Local Factorization : Time: 0 ms Additional optimization during finishing touch : Time: 34 ms Q=Q**(1/60) S7=Q*Q S6=S7*Q S4=S7*S6 S2=S4*S4 S1=S7*S2 S0=S6*S1 S3=S4*S0 Z=S3+S0+S1+S2+S3*S2 Number of operations after optimization is: Number of (+,-)-operations : 4 Number of (*)-operations : 8 Number of integer exponentiations : 0 Number of other operations : 1 \end{verbatim} \newpage \begin{verbatim} Sumscheme : | 3 4 5 6| EC|Far - ------------------------ 0| 1 1 1 1| 1| Z - ------------------------ 3 : S3 4 : S0 5 : S1 6 : S2 Productscheme : | 9 10 12 13 14 15 16 22| EC|Far - ------------------------------------ 5| 1 1 | 1| 0 6| 1 1 | 1| S0 7| 1 1 | 1| S1 8| 2 | 1| S2 9| 1 1 | 1| S3 10| 1 1 | 1| S4 12| 1 1| 1| S6 13| 2| 1| S7 - ------------------------------------ 9 : S7 10 : S6 12 : S4 13 : S3 14 : S2 15 : S1 16 : S0 22 : Q Time: 459 ms \end{verbatim} } \section{Preprocessing Possibilities}\label{SCOPE:pre} It may happen that structure is obviously visible in the rhs's of a set of assignment statements, which we want to optimize. One can think of a set of partial derivatives of products. Or one may consider the application of Horner-rules. Such facilities may be attractive, independent of the question if a SCOPE-application will be performed on its result. Therefore we first discuss these facilities and show their effect, again by using simple examples, before we continue with a combined use of SCOPE and these possibilities. The first alternative demands a generalized {\bf structr}-command. We implemented such a facility. Its syntax is straightforward: ``{\bf \index{GSTRUCTR command} gstructr} $<$object$>$ {\bf name} $<$id$>$;'' The $<$object$>$ to be elaborated is one assignment statement or a set of such statements, separated by semicolons and grouped together between the special symbols $<<$ and $>>$. Instead of a statement a matrix name is also allowed. Then all non-zero matrix elements are incorporated in the search for obvious cse's. The $<$id$>$ of the optional {\bf name}-part, being an identifier, is used to identify the subexpressions, produced via the application of a {\bf gstructr} command. When the switch \index{ALGPRI switch} {\tt ALGPRI} is on -the default setting- the output is given in {\REDUCE} syntax, while turning it off leads to output in prefix form. The latter is employed by the function R, used to store SCOPE-input in ${{\rm D}}_{0}$. It is also possible to get FORTRAN-output by turning \index{PERIOD switch} \index{FORTRAN switch} off the switch {\tt PERIOD} and turning on the switch {\tt FORTRAN}. The input \index{EXP switch} remains unchanged when the switch {\tt EXP} is on. \example\label{ex:3.1} \index{SCOPE package ! example} We show part of a {\REDUCE} session. {\small \begin{verbatim} off exp$ matrix a(2,2)$ a(1,1) := x+y+z$ a(1,2) := x*y$ a(2,1) := (x+y)*x*y$ a(2,2) := (x+2*y+3)^3-x$ on fort$ off period$ load struct$ gstructr << a; b:=(x+y)^2; c:=(x+y)*(y+z); d:=(x+2*y)*(y+z)*(z+x)^2 >> name v; V1=X+Y+Z A(1,1)=V1 A(1,2)=X*Y V2=X+Y A(2,1)=V2*X*Y V3=X+2*Y+3 V4=V3**3-X A(2,2)=V4 B=V2**2 V5=Y+Z C=V2*V5 V6=X+2*Y V7=X+Z D=V6*V7**2*V5 \end{verbatim} } Observe that V1, V3, V4, V6 and V7 only occur once in this result of a {\bf gstructr}-application. When applied as part of a SCOPE-operation these redundancies will be removed before the actual optimization process is performed, as shown in example~\ref{ex:3.3}. \index{GHORNER command} \index{Horner's Rule} The syntax for the {\bf ghorner}-command is very similar. The application of a Horner-rule assumes an ordering of the identifiers. We allow instead of the {\bf name}-part of the {gstructr} command an optional {\bf vorder} $<$list of id.s$>$ extension. The $<$list of id.s$>$ consists of at least one identifier. This list overrules, in the order given, the current identifier ordering of the system. The rhs's are considered as polynomials in the leftmost element of the {\bf vorder}-list. The thus created coefficients are in turn considered as polynomials in the second element of this list. And so on. When the {\bf vorder}-extension is omitted the current system identifier ordering is applied. The internal switch {\tt ALGPRI} is again \index{ALGPRI switch} applicable and has the same meaning as for {\bf gstructr}. Some optimizing compilers apply Horner-rules when possible. Our optimization strategy is based on an all levels, all expressions search. This contradicts the Horner-mechanism. To avoid destabilizing side-effects of Horner-rule applications we decided to bring such a facility under user-control. \example\label{ex:3.2} \index{SCOPE package ! example} Some Taylor-expansions are shown. \newpage {\small \begin{verbatim} algebraic procedure taylor(fx,x,x0,n); sub(x=x0,fx)+for k:=1:n sum (sub(x=x0,df(fx,x,k))*(x-x0)^k/ (if k<3 then k else for j:=2:k product j))$ let x^4=0,y^7=0$ f1:=(taylor(e^x,x,0,4)*taylor(cos y,y,0,6))^2; 3 6 3 4 3 2 3 2 6 2 F1 := - (8*X *Y - 60*X *Y + 180*X *Y - 180*X + 12*X *Y - 90*X * 4 2 2 2 6 4 2 Y + 270*X *Y - 270*X + 12*X*Y - 90*X*Y + 270*X*Y - 6 4 2 270*X + 6*Y - 45*Y + 135*Y - 135)/135 load horner$ ghorner << f1:=f1; g1:=taylor(e^x,x,0,4); h1:=taylor(cos y,y,0,6); f1:=(g1*h1)^2 >> vorder y,x; 2 F1 := ((135 + X*(270 + X*(270 + X*180))) + Y *(( - 135 + X*( - 270 + 2 X*( - 270 + X*(-180)))) + Y *((45 + X*(90 + X*(90 + X 2 *60))) + Y *( - 6 + X*( - 12 + X*( - 12 + X*(-8 )))))))/135 6 + X*(6 + X*(3 + X)) G1 := ----------------------- 6 2 2 2 720 + Y *( - 360 + Y *(30 + Y *(-1))) H1 := --------------------------------------- 720 2 2 2 2 2 (6 + X*(6 + X*(3 + X))) * (-720 + Y *(360 + Y ( - 30 + Y ))) F1 := --------------------------------------------------------------- 18662400 \end{verbatim} } Both commands can be used inside an {\bf optimize}-command. We advise to compile both facilities and SCOPE separately (see also section~\ref{SCOPE:install} on page~\pageref{SCOPE:install}). To be able to order the application of either a {\bf gstructr}-command or a {\bf ghorner}-rewrite instruction inside the definition of a SCOPE-operation we have to extend the rules given in section~\label{SCOPE:2.2}. The permissible structures for the $<$object$>$'s to be elaborated by SCOPE are simply extended with syntactically correct {\bf ghorner}-and {\bf gstructr}-commands. Hence the structure of an {\bf optimize}-command is not altered, as is shown by the following two examples. \example\label{ex:3.3} \index{SCOPE package ! example} We show the effect of an application of the {\bf optimize}-command on the {\bf gstructr}-command of example~\ref{ex:3.1}. Observe that the cse-names produced during optimization begin with an S, while {\bf gstructr} created names start with a V. {\small \begin{verbatim} on fort,acinfo$ off exp,period$ optimize gstructr << a; b:=(x+y)^2; c:=(x+y)*(y+z); d:=(x+2*y)*(y+z)*(z+x)^2 >> name v iname s; A(1,1) := X + Y + Z A(1,2) := X*Y V2 := X + Y A(2,1) := V2*X*Y 3 A(2,2) := (X + 2*Y + 3) - X 2 B := V2 V5 := Y + Z C := V2*V5 2 D := (X + 2*Y)*(X + Z) *V5 Number of operations in the input is: Number of (+,-)-operations : 9 Number of (*)-operations : 8 Number of integer exponentiations : 3 Number of other operations : 0 S5=X+Z A(1,1)=S5+Y S8=Y*X A(1,2)=S8 V2=X+Y A(2,1)=S8*V2 S6=X+2*Y S4=S6+3 A(2,2)=S4*S4*S4-X B=V2*V2 V5=Y+Z C=V5*V2 D=S6*S5*S5*V5 Number of operations after optimization is: Number of (+,-)-operations : 7 Number of (*)-operations : 10 Number of integer exponentiations : 0 Number of other operations : 0 \end{verbatim} } \example\label{ex:3.4} \index{SCOPE package ! example} For completeness we also show how to use the Horner facilities inside an {\bf optimize} command. Due to the structure of the method, we operate internally on expanded forms, both representations of h1, and thus also of the corresponding prefix representations used to built ${{\rm D}}_{0}$ slightly differ. The consequences are visualized in the results of the SCOPE application. {\small \begin{verbatim} load scope$ optimize ghorner <<h1:=taylor(cos y,y,0,6); f1:=(taylor(e^x,x,0,4)*h1)^2>> vorder y,x iname s; 2 2 2 720 + Y *( - 360 + Y *(30 + Y *(-1))) H1 := --------------------------------------- 720 2 2 2 2 2 (6+X*(6 + X*(3+X))) *(-720+Y *(360 + Y *( - 30 + Y ))) F1 := ------------------------------------------------------- 18662400 Number of operations in the input is: Number of (+,-)-operations : 9 Number of (*)-operations : 8 Number of integer exponentiations : 8 Number of other operations : 2 S6 := Y*Y 720 + S6*(S6*(30 - 1*S6) - 360) H1 := --------------------------------- 720 S7 := (S6*(360 + S6*(S6 - 30)) - 720)*(6 + X*(6 + X*(3 + X))) S7*S7 F1 := ---------- 18662400 Number of operations after optimization is: Number of (+,-)-operations : 9 Number of (*)-operations : 10 Number of integer exponentiations : 0 Number of other operations : 2 \end{verbatim} } \section{Generation of Declarations}\label{SCOPE:decl} The GENTRAN {\bf declare}-statement can also be used as an optional extension of the {\bf optimize}-command. An illustration of this facility is given in example~\ref{ex:4.1}. The syntax of such a statement is in accordance with the GENTRAN-rules~\cite{Gates:85} % (see page~\pageref{explicit:type}. We also use the symbol table of GENTRAN. During parsing, the declared identifiers and/or array- and matrix names are entered in the symbol table. Once optimization is finished all relevant information for completing the declarations is known, and collected in the prefixlist, which is used for output-production. This prefixlist is employed to decide which not yet typed identifiers and system selected cse-names have to be entered in the symbol table. We make use of already known information, expression-structure and the normal hierarchy in data types. The strategy to achieve this is essentially based on chapter 6 of~\cite{Aho:86}. Once this table is completed a list of declarations is produced if the \index{OPTDECS switch} switch {\tt OPTDECS} is turned on. SCOPE-output is by default given in {\REDUCE} syntax. Alternative output is obtained by assigning a \ttindex{Optlang*} relevant value to the global identifier {\tt Optlang!*}. This causes GENTRAN to take over the output preparation, as shown in: \example\label{ex:4.1} \index{SCOPE package ! example} {\small \begin{verbatim} on optdecs$ off acinfo$ optlang!*:='fortran$ optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)} iname s declare << a(4,4),x(4),y(5):real; b(5):integer >>; INTEGER B(5),I,S1,S3 DOUBLE PRECISION A(4,4),S4,X(4),Y(5) S1=I+1 S3=I-1 S4=B(I) X(S1)=A(S1,S3)+S4 Y(S3)=A(S3,S1)-S4 optlang!*:='c$ optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)} iname s declare << a(4,4),x(4),y(5):real; b(5):integer >>; LONG B[6],I,S1,S3; DOUBLE A[5][5],S4,X[5],Y[6]; { S1=I+1; S3=I-1; S4=B[I]; X[S1]=A[S1][S3]+S4; Y[S3]=A[S3][S1]-S4; } optlang!*:='pascal$ optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)} iname s declare << a(4,4),x(4),y(5):real; b(5):integer >>; VAR B[0..5],I,S1,S3: INTEGER; A[0..4,0..4],S4,X[0..4],Y[0..5]: REAL; BEGIN S1:=I+1; S3:=I-1; S4:=B[I]; X[S1]:=A[S1,S3]+S4; Y[S3]:=A[S3,S1]-S4 END; optlang!*:='ratfor$ optimize {x(i+1):=a(i+1,i-1)+b(i),y(i-1):=a(i-1,i+1)-b(i)} iname s declare << a(4,4),x(4),y(5):real; b(5):integer >>; INTEGER B(5),I,S1,S3 DOUBLE PRECISION A(4,4),S4,X(4),Y(5) { S1=I+1 S3=I-1 S4=B(I) X(S1)=A(S1,S3)+S4 Y(S3)=A(S3,S1)-S4 } %%% The following command restores the initial situation. %%% optlang!*:='nil$ \end{verbatim} } \section{File Management and Optimization Strategies}\label{SCOPE:files} Another alternative for the $<$object$>$'s to be optimized is formed by the sequence {\bf in} ${{\rm file}}_{1}$, ${{\rm file}}_{2}$, ..., ${{\rm file}}_{n}$, $n\ \ge\ 1$. Each of these files is assumed to contain one or a list of more assignment statements, obeying the GENTRAN-assignment rules. A SCOPE application results in a unified sequence of assignment statements in the target language. This is illustrated by the following example, where each file $f_i$ contains one assignment statement of the form $e_i$ := some expression. \example\label{ex:5.1} \index{SCOPE package ! example} {\small \begin{verbatim} 3: optimize in f1,f2,f3 iname s; 2 2 (X + Y) 8 2 2 2*(SIN(X) - COS(E )+3*COS(X)) *(X+Y) + 4*Y + 4*Y E1 := --------------------------------------------------------- 3*X + 2*Y \end{verbatim} \newpage \begin{verbatim} E2 := (4* 2 2 (X + Y) 2 3 2 (SIN(X) - COS(E )+2*COS(X)) *(X+Y) +(4*X - 4*Y) 2 - 6*X)/(8*X + 3*Y - 2*X) 2 (X + Y) E3 := (4*SIN(COS(E )) + SIN(X + Y) + 2 2 (4*X - X + 2*Y) )/(3*Y + F(X,G( - COS(X)))) Number of operations in the input is: Number of (+,-)-operations : 21 Number of (*)-operations : 20 Number of integer exponentiations : 12 Number of other operations : 16 S3 := SIN(X) S7 := X + Y S6 := S7*S7 S6 S4 := COS(E ) S8 := COS(X) S28 := S3*S3 - S4 S2 := S28 + 3*S8 S36 := S2*S2 S35 := S36*S36 S30 := 2*Y S9 := S30 + 3*X 2*(2*Y + S30*Y + S6*S35*S35) E1 := ------------------------------ S9 S12 := S28 + 2*S8 S29 := 4*X*X S27 := S29 - X S31 := 3*Y S29 - 2*S9 + 4*S6*S12*S12*S7 E2 := ------------------------------ S31 + 2*S27 S18 := S30 + S27 4*SIN(S4) + SIN(S7) + S18*S18 E3 := ------------------------------- S31 + F(X,G( - S8)) Number of operations after optimization is: Number of (+,-)-operations : 15 Number of (*)-operations : 24 Number of integer exponentiations : 0 Number of other operations : 11 \end{verbatim} } However a switch is available for stepwise performing the optimization of a set of assignment statements, distributed over different files. When turning on this {\tt AGAIN} switch the finishing touch is not \index{AGAIN switch} done. Moreover, the system is instructed to save relevant internal information in combination with the result of the present optimization run. The thus extended output is assumed to be stored in a file. When the optimization task is continued during another session this file is assumed to be read before all other remaining files. This mode of operation is illustrated in \example\label{ex:5.2} \index{SCOPE package ! example} {\small \begin{verbatim} 2: off acinfo$ 3: in again$ 4: out f5$ 5: optimize in f1,f2 iname s; 6: write "end$"$ 7: shut f5$ 8: off again$ 9: on acinfo$ 10: optimize in f5,f3 iname t; S7 := X + Y \end{verbatim} \newpage \begin{verbatim} 2 S6 := S7 S8 := COS(X) 2 S6 S18 := SIN(X) - COS(E ) S9 := 3*X + 2*Y 2 8 4*Y + 4*Y + 2*S6*(S18 + 3*S8) E1 := --------------------------------- S9 2 S15 := X 2 4*S15 - 2*S9 + 4*S6*(S18 + 2*S8) *S7 E2 := -------------------------------------- 8*S15 - 2*X + 3*Y 2 (X + Y) E3 := (4*SIN(COS(E )) + SIN(X + Y) + 2 2 (4*X - X + 2*Y) )/(3*Y + F(X,G( - COS(X)))) Number of operations in the total input, i.e. in the 2 input sets is: Number of (+,-)-operations : 22 Number of (*)-operations : 20 Number of integer exponentiations : 13 Number of other operations : 17 T17 := X + Y T16 := T17*T17 S8 := COS(X) T1 := SIN(X) T16 T2 := COS(E ) S18 := T1*T1 - T2 T28 := 2*Y S9 := T28 + 3*X T6 := S18 + 3*S8 T36 := T6*T6 T35 := T36*T36 2*(2*Y + T28*Y + T35*T35*T16) E1 := ------------------------------- S9 S15 := X*X T9 := S18 + 2*S8 T30 := 4*S15 T26 := T30 - X T29 := 3*Y T30 - 2*S9 + 4*T17*T9*T9*T16 E2 := ------------------------------ T29 + 2*T26 T19 := T28 + T26 4*SIN(T2) + SIN(T17) + T19*T19 E3 := -------------------------------- T29 + F(X,G( - S8)) Number of operations after optimization is: Number of (+,-)-operations : 15 Number of (*)-operations : 24 Number of integer exponentiations : 0 Number of other operations : 11 \end{verbatim} } Since the construction of declarations in combination with some optimization activity is based on a quite specific use of GENTRAN's symbol table, one has to operate carefully when optimizing input in different sessions. A correct list of declarations is only guaranteed, when the last optimization-command is extended with the required declaration-information. \section{Some Possible Shortcomings and Future Versions}\label{SCOPE:future} The present version of SCOPE may have some shortcomings and possibly also some inefficiencies. However, since we are working on a second version, as stated in~\cite{vanHulzen:90}, we do not have the intention to largely modify the present version. However, we intend to improve one special aspect of the present SCOPE-version: The combined use of SCOPE and GENTRAN. This preliminary version of the manual will shortly be extended with the description of these combined features. Bugs and obvious deficiencies will of course be removed. \section*{Acknowledgements} The many discussions I had over the past years with Barbara L. Gates, Victor V. Goldman, Anthony C. Hearn, Jaap Smit and Paul S. Wang about the symbolic-numeric aspects of computer algebra have been very stimulating and valuable. They also contributed to the present status of SCOPE. Completion of the code would have been impossible without the dedicated assistance of my students and the frequent discussions we had. I certainly want to mention Ben Hulshof, Pim van den Heuvel, Marcel van Heerwaarden, Anco Smit, Johan de Boer and Jan Verheul. \section*{How to install the Code}\label{SCOPE:install} \index{SCOPE package ! installation} The code consists of a number of modules, collected in five files. Two of these modules play a special role and can best be compiled separately: gstructr, defining the {\bf gstructr} facilities, and ghorner, containing the code for the Horner-rules. The other modules form SCOPE. Since ${{\rm D}}_{0}$ and all operations on it and on its later versions ${{\rm D}}_{i}$ are defined using {\bf smacros's} it is essential to read in the module cosmac, containing these {\bf smacro's}, first. Since we also use part of the GENTRAN code care have to be taken that GENTRAN is loaded when compiling the code. \bibliography{scope} \bibliographystyle{plain} \end{document}