Artifact 6a857ab8b6b86477c9e947714f9856403995b9e686518d65891b7f0e395242fb:
- File
psl-1983/3-1/doc/implementation-guide.mss
— part of check-in
[eb17ceb7f6]
at
2020-04-21 19:40:01
on branch master
— Add Reduce 3.0 to the historical section of the archive, and some more
files relating to version sof PSL from the early 1980s. Thanks are due to
Paul McJones and Nelson Beebe for these, as well as to all the original
authors.git-svn-id: https://svn.code.sf.net/p/reduce-algebra/code/historical@5328 2bfe0521-f11c-4a00-b80e-6202646ff360 (user: arthurcnorman@users.sourceforge.net, size: 134467) [annotate] [blame] [check-ins using] [more...]
@make(article) @Case(Draft, 1 <@device(Omnitech)>, else <@device(LPT)> ) @Comment{ For use with the final versions } @Style(WidowAction=warn) @Style(Hyphenation Off) @comment(on) @Style(DoubleSided no) @comment(yes) @style(Spacing 1, LeftMargin 1.2 Inch) @comment[See G:MSS-junk.MSS] @use(Bibliography "<griss.docs>mtlisp.bib") @comment{ Font related stuff } @Define(OP,FaceCode Y,TabExport)@comment{ used for indicating opcodes in C-macros } @modify(enumerate,numbered=<@a. @,@i. >, spread 1) @modify(itemize,spread 1) @modify(description,leftmargin +2.0 inch,indent -2.0 inch) @LibraryFile(PSLMacrosNames) @LibraryFile(SpecialCharacters) @comment{ The logos and other fancy macros } @PageHeading(Left "Utah Symbolic Computation Group", Right "May 1982", Line "Operating Note No. xx" ) @set(page=1) @newpage() @Begin(TitlePage) @begin(TitleBox) @MajorHeading(@PSL Implementation Guide) @Heading(M. L. Griss, E. Benson, R. Kessler, S. Lowder, G. Q. Maguire, Jr. and J. W. Peterson) Utah Symbolic Computation Group Computer Science Department University of Utah Salt Lake City, Utah 84112 (801)-581-5017 Last Update: @value(date) @end(TitleBox) @begin(abstract) This note describes the steps involved in bringing PSL up on a new machine. It combines information from the previous BOOTSTRAP, LAP, CMACRO and TEST guides. @end(abstract) @center[ File: @Value(SourceFile) Printed: @value(date)] @copyrightnotice(Griss, Benson, Lowder, Maguire and Peterson) @begin(ResearchCredit) Work supported in part by the National Science Foundation under Grant No. MCS80-07034, and by Livermore Lawrence Laboratories under Subcontract No. 7752601, IBM and HP. @end(ResearchCredit) @end(TitlePage) @pageheading(Left "Implementation Guide", Center "@value(date)", Right "Page @Value(Page)" ) @comment{@pageheading(Even,Left "Page @Value(Page)", Right "Operating Note No. xx" )} @set(page=1) @newpage() @section(Introduction) This document describes the techniques used to implement PSL on a new machine. This note assumes that the reader has some familiarity with the basic strategy of @PSL implementation (see the 1982 LISP Conference Paper on PSL, UCP-83), and has also read the papers on the @PSL Portable @xlisp compiler (Griss and Hearn, "Software Practice and Experience", and Griss, Hearn and Benson, 1982 Compiler Conference). Also see the compiler chapter (19) of the @PSL manual@cite[Griss81]. Finally, a basic understanding of how to use PSL and LISP is required@cite[Griss81]. In order to explain a new PSL implementation, we will first describe the PSL compilation model, hopefully providing some insight into the various steps involved in the transformation of PSL sources into code executable on the target machine. @comment{May want to add a description of each section to follow} The initial level of transformation takes the RLISP format and translates it into LISP for those source files that are written in RLISP format; those files already in LISP may be directly input into the system (see the figure below). The LISP code is then compiled into instructions for an Abstract Lisp Machine (ALM). The ALM is a general-purpose register machine designed for its ease as a target for compilation@cite(Griss81b) in which temporary variables are allocated in a block of locations on a @ei[stack]. The ALM instructions are expressed in LAP format (LISP Assembly Program) which consists of a list whose first element is the ALM opecode followed by zero or more ALM operands which are ALM addressing modes. The ALM format is (ALMopcode ALMoperand ... ALMoperand). The ALMopcode is a macro referred to as a CMACRO and the addressing modes of the ALMoperands are referred to as ANYRegs. The ALM instructions are macro expanded into instructions for the Target Lisp Machine (TLM). TLM instructions have the same LAP format, except the operators are now TLM operators and the operands are TLM addressing modes. From here, a number of alternate routes are possible for the final code generation. So far the LISP or RLISP has transformed into into a set of TLM instructions that can take one of three paths. @begin(enumerate) Fist, the TLM instructions can be printed out as Target Machine Assembly code (ASM) for assembly on the target machine. This route is followed in the initial phases of the PSL implementation process to produce code for the target machine. Secondly, a file of the target machine code can be produced in a format that can be loaded directly into a running PSL system. This process is called FASLing, producing a FASt Load format file. Finally, the TLM code can be assembled and deposited directly into memopry of the running PSL system. This is basically analogous to the process used to load in a FASL file produced above except the code is not written to or read from a FASL file. @end(enumerate) This process is illustrated below: @begin(verbatim,leftmargin 0,group) .-----------------. Rlisp: Procedure SelectOne x; | RLISP input code| x := car x; `-----------------' v .------. | LISP | Lisp: (de selectone (x) `------' (setq x (car x))) v .----------. | Compiler | `----------' v .------------------------. ALM: (!*entry selectone expr 1) |ALM instructions in LAP | (!*alloc 0) | format | (!*move (car (reg 1)) `------------------------' (reg 1)) v (!*exit 0) .----------. | Pass1Lap | `----------' | v .---------------------. TLM: [68000 code] | TLM instructions in | (Fullword 1) Count of Args | LAP format. | (!*Entry selectone expr 1) `---------------------' (movea!.l (indirect | | (reg 1)) (reg 1)) | v (rts) | .------------. | | TLM to ASM | | | converter | | `------------' | v | .-------------------. ASM: dc.l 1 | | | movea.l (a1),a1 | | Asm code suitable | rts | | for TM assembler | | `-------------------' v .--------------. .-----------------. | LAP resident |----->| Resident binary | | assembler | | `-----------------' +--------------+ | .------------. `-->| FASL files | `------------' @end(verbatim) In summary, here is an overview of the steps necessary to implement PSLon your target machine. More details will be given in the following sections. @begin(enumerate) Prelimaries: @begin(enumerate) Believe in yourself. Choose the host machine. Test file transfer. @end(enumerate) Decide how to map the ALM architecture to the TLM. Implement the TLM to ASM. Implement the ALM to TLM. Build the Cross Compiler and test. Run Cmacro Tests. Build Bare PSL. Implement a resident TLM assembler. Implement FASL. Bootstrap the compiler. @end(enumerate) @section(Overview of the Abstract LISP Machine) The abstract machine is really a class of related machines rather than a single fixed machine (such as PASCAL P-code, or some true @xlisp machines). The exact set of @CMACRO@XS, the number of registers, etc@. are under the control of parameters, flags and compiler code-generator patterns defined for the specific machine. This flexibility permits the match between the compilation model and the target machine to be better set, producing better code. Therefore, the exact set and meaning of @CMACRO@XS are not fixed by this definition; rather, they form an adjustable @dq[convention] between the compilation and @CMACRO/Assembly phase. The compiler itself is defined in PC:COMPILER.RED@Foot[dir: represents a logical directory name, in this PC: stands for <PSL.Comp> under Tops-20 or /psl/comp under UNIX.] and is augmented by machine-specific files, described later. The ABSTRACT LISP MACHINE (ALM) used by our compiler has the following characteristics. @begin(enumerate) There are 15 general purpose registers, 1 ..@. 15; and a stack for call/return addresses. Locals and temporaries variables are allocated on the stack by allocating a frame of temporaries large enough to hold them all, not by the use of push and pop instructions. The function calling mechanism loads N args into 1 ..@. N, and then transfers to the function entry point, pushing the return address onto the stack if necessary. The functions result is returned in register 1. Each procedure is responsible to save any values it needs on stack; small procedures often do not use the stack at all. The following is a brief lisp of all the ALM opcodes (CMACROS). @begin(verbatim) (!*ALLOC nframe:integer) (!*ASHIFT dest:any-alterable source:any) (!*CALL name:id) (!*DEALLOC nframe:integer) (!*EXIT nframe:integer) (!*FIELD operand:any-alterable starting-bit:integer bit-length:integer) (!*FOREIGNLINK name:id type:id number-of-arguments:integer) (!*FREERSTR l:nonlocalvars-list) (!*JCALL name:id) (!*JUMP label:any) (!*JUMPEQ label:any source1:any source2:any) (!*JUMPINTYPE label:any source1:any type-name:id) (!*JUMPNOTEQ label:any source1:any source2:any) (!*JUMPNOTINTYPE label:any source1:any type-name:id) (!*JUMPNOTTYPE label:any source1:any type-name:id) (!*JUMPON source:any lower-bound:integer upper-bound:integer l:label-list) (!*JUMPTYPE label:any source1:any type-name:id) (!*JUMPWGEQ label:any source1:any source2:any) (!*JUMPWGREATERP label:any source1:any source2:any) (!*JUMPWITHIN label:any lower-bound:integer upper-bound:integer) (!*JUMPWLEQ label:any source1:any source2:any) (!*JUMPWLESSP label:any source1:any source2:any) (!*LAMBIND r:registers-list l:nonlocalvars-list) (!*LBL label:tagged-label) (!*LINK name:id type:id number-of-arguments:integer) (!*LINKE nframe:integer name:id type:id number-of-arguments:integer) (!*LOC dest:any-alterable source:any) (!*MKITEM inf:any-alterable tag:any) (!*MOVE source:any dest:any-alterable) (!*POP dest:any-alterable) (!*PROGBIND l:nonlocalvars-list) (!*PUSH source:any) (!*PUTFIELD source:any dest:any-alterable starting-bit:integer bit-length:integer) (!*SIGNEDFIELD operand:any-alterable starting-bit:integer bit-length:integer) (!*WAND dest:any-alterable source:any) (!*WDIFFERENCE dest:any-alterable source:any) (!*WMINUS dest:any-alterable source:any) (!*WNOT dest:any-alterable source:any) (!*WOR dest:any-alterable source:any) (!*WPLUS2 dest:any-alterable source:any) (!*WSHIFT dest:any-alterable source:any) (!*WTIMES2 dest:any-alterable source:any) (!*WXOR dest:any-alterable source:any) (LABELGEN tag:id) (LABELREF tag:id) (!*CERROR message:any) (FULLWORD [exp:wconst-expression]) (HALFWORD [exp:wconst-expression]) (BYTE [exp:wconst-expression]) (STRING s:string) (FLOAT f:float) @end(verbatim) ALM operand forms ("addressing" modes) @begin(verbatim) (FLUID name:id) (!$FLUID name:id) (GLOBAL name:id) (!$GLOBAL name:id) (WVAR name:id) (WARRAY name:id) (WSTRING name:id) (WCONST expr:wconst-expression) (IMMEDIATE wconst-expression:any) (QUOTE s-exp:s-expression) (LABEL l:id) (MEMORY base:any offset:wconst-expression) (CAR base:any) (CDR base:any) (FRAME n:integer) (REG reg-descriptor:{integer,id}) (LIT [any-instruction-or-label:{list,id}]) (LABELGEN tag:id) (LABELREF tag:id) (IDLOC symbol:id) @end(verbatim) @end(enumerate) @Section(System Overview for Bootstrapping) Currently PSL is half bootstrapped from a complete PSL system on a host machine. At the moment only the Decsystem 20 and the VAX 750 can be used as hosts; shortly we expect the Apollo and HP9836 to be also usuable. If you have a choice for your host machine, one important consideration will be the ease in shipping code between the host and target. It is worth taking the time initially to be sure this pathway is as smooth and troublefree as possible. The need for easy file transfers is derived from the half bootstrap method and the iterative nature of developing and debugging the tables used in the ALM to TLM transformation. The size of the transferred files will be in the range of 1 to 70 KBytes. Having a fast network or a tape transfer from host to target is worth considering in the beginning of a PSL implementation. The first major step in the implementation will be to modify the host PSL to become a cross compiler, turning lisp or rlisp into the target machines assembly language. @SubSection(Overview of the Cross Compiler) Three modules are created, compiled and loaded into a host PSL to transform it into a cross compiler. @begin(enumerate) The first module will be xxx-comp.red (we will use XXX to represent the name of the target machine, like DEC20, VAX, etc.); a file containing patterns used by the compiler to control which ALM instructions are emitted for certain instructions. Basically it is used in LISP to ALM transformations and initially will only require you to copy the same file used on your host machine. The second module will be xxx-cmac.sl. This file contains the tables(CMacroPatternTables) used to convert ALM opcodes to TLM opcodes, the tables used to convert ALM addressingmodes into TLM addressingmodes (ANYREGS), and some miscellaneous required opencoded functions. The last module, xxx-asm, consists of two files, xxx-asm.red and xxx-data-machine.red. The first file, xxx-asm.red, specifies the necessary formats, costants, and procedures for converting TLM instructions into the host's actual assembly language. The file, xxx-data-machine.red, provides constants for describing to the compiler some of the specific choices for what registers to use and how the lisp item will be used in the machine words. @end(enumerate) All of these modules are compiled and loaded into a host PSL to turn it into the cross compiler. The next few sections will try to describe to the reader how these three modules are actually designed and built from the bottom up. It will be worth getting a listing of these modules for your host machine and also for a machine most similar to your target machine, if available. @Section(Designing the TLM instruction format). The implementor must decide first the specifics of the TLM instruction format patterned around the form (TLMopcode TLMoperand ... TLMoperand). The TLM to ASM translation occurs in a parallel manner. (TLMopcode TLMoperand TLMoperand) TLM format. | | | ASMopcode ASMoperand ASMoperand Some ASM format. The closer the ASM format approaches the TLM format the better. However in some cases this will not be possible and the reader must devise a scheme. Take a look at the case studies for some ideas of ways to handle some of these issues. TLM opcodes are usually passed through unchanged to the ASM code. However the TLM operands will require extensive changes. [Mention terminal operands!!!]. The TLM operands are of the form (addressingmode value-expression). The addressingmode is a tag which will direct what procedures will be used to convert and print the ASM operands. The reader should pick these addressingmode names to closely match the addressingmodes of the target machine. Some examples of these would be (immediate ...), (indirect ...), (displacement ...), or (indexed ...). Here again the case studies will give you some information for proceeding. [Mention CRAY mismatch of TLM]. @Section(Implementing the TLM to ASM conversion) You can begin by creating the xxx-data-machine.red file and begin to add some definitions. First pick a name for your system, anything representative will do like the name of its operating system or its manufacturers identifier. Some examples are dec20, vax, apollo, or m68000. @begin[verbatim] fluid '(system_list!*); system_list!* := '(MC68000 Chipmunk HP9836); @end[verbatim] The next step is quite important. You must decide how you are going to implement the LISP item on the target machine. The LISP item consists of 2 or three fields; each field having a position and size in the machines item picked by the implementor. All LISP items must have a tag field and an INFormation field and some implementations have a garbage collector field. The tag field must be at least 5 bits long@Foot[Nineteen (19) different tags are presently used.] and the inf field should be large enough to hold a target machine address. Some implementations, such as the Vax, will choose an inf smaller than the largest address possible on the machine and will have to mask tag bits out when using the inf field as an address. This does cause problems and should be avoided if possible. If space allows it the INF field may be larger to allow larger numeric operands to be stored in registers. Currently PSL provides two different garbage collection methods, one of which should be chosen (or a new one developed if needed). One is a two-space copying collector, which requires no extra garbage collection bits, but is very wasteful of space and is best for a virtual memory machine (in fact, there are two copies of the heap). The other is a one space compacting collector, and requires at least one bit for marking, and ideally additional bits for relocation (sometimes, these extra bits can be stored in a separate bit table). Naturally these fields may be larger to make their accessing easier, like aligning on a byte boundary. Once you have decided upon how the LISP item will be implemented on the machine you can begin filling in the constant definitions for the xxx-data-machine.red file. When numbering bits in a machine word, we have settled upon the convention that the most significant bit is zero and counts up to the max-1 bit. The current constants are @begin(verbatim) TagStartingBit TagBitLength InfStartingBit InfBitLength AddressingUnitsPerItem CharactersPerWord BitsPerWord AddressingUnitsPerFunctionCell StackDirection and optionally GCStartingBit GCBitLength @end(verbatim) The following figure illustrates the positions of these constants: @begin(verbatim) .-----------------------------------------. | TAG | [gc] | INF | `-----------------------------------------' FILL IN LATER @end(verbatim) Some other decisions that must be made include: @begin(enumerate) Which and how many registers to dedicate as the compiler-allocated @ei[Registers]; How large an integer will be supported in the @xlisp item; How many tags are to be supported How to implement the recursion stack and check for stack overflow (either using an explicit test, or some machine-interrupt); How to pack and unpack strings; @Comment{PSL must have explicitly tagged items, and the current allocator is a simple linear model, so this is not relevant. Whether to have a heterogeneous heap, multiple heaps, a @ei[page] per type, or whatever;} @Comment{This is also not relevant. Pairs are the same on all machines. How pairs are referenced, i.e. does the pointer to a pair point to the first element, to the second element, are the pairs allocated separately in parallel areas, or is there some type of CDR coding being done.} @end(enumerate) The next step is to implement the tables that accept the ALM form and emits assembly code for the target machine. Most of the program is machine-independent (using PC:LAP-TO-ASM.RED), and an @dq[xxxx-ASM.RED] file is to be written. We have the following already written as a guide: @DEC20 @dq[MACRO], @VAX750 @UNIX @dq[as], @68000 for @apollo and WICAT, and CRAY CTSS CIVIC. The main problem is to emit the correct format, such as: placement of tabs, commas, spaces, parentheses; renaming symbols (certain legal @xlisp IDs are not legal in some assemblers); and determining how and where to place EXTERNAL, ENTRY and GLOBAL declarations, how to declare and reserve blocks of storage, and how to overcome certain problems involved with large files and restrictions on addressing modes and relocation. Finally, the ALM to ASM needs to be tested. This is usually accomplished by Hand-coding some small test routines, and then convert from ALM to machine X assembly code, assemble, and run. This checks the final details of required Prologues and Epilogues@Foot[Prologues and Epilogues contain operating system-specific standard module headers and trailers.], understanding of the instruction set, and so on. Suggested LAP tests are described @ei[generically], but will have to be translated by the implementor into machine-dependent LAP for machine X, and depending on the flavor of assembler and LAP, other tests will have to be devised by the implementor. This is a good time to investigate how Assembly coded routine can call (and be called) by the most common language used on machine X (such as FORTRAN, PASCAL, C, etc.). This "Foreign" language can be used for initial operating system support. @section(Implementing the ALM instructions) The ALM instructions consists of a set of operations and their addressing mode operands. These ALM instructions are commonly referred to as CMACRO's and the addressing modes are ANYREG's. The purpose of this part of the PSL implementation is to implement the functionality of each ALM instruction in terms of other ALM instructions and TLM instructions. The ability to recursively define the ALM instructions in terms of other ALM instructions is a benefit because it greatly decreases the amount of code required to implement a particular instruction. For example, a good technique in designing the ALM instructions is to carefully implement the !*MOVE instruction (to distinguish ALM instructions, they generally have a !* in the front of their name) to efficiently handle transfer between any possible locations (memory to register, stack frame to memory, etc.). Then when implementing another instruction, the code for moving the actual operands to locations necessary for the TLM instruction can be accomplished using a recursive call to the !*MOVE ALM instruction. The important tasks of the implementor are to @begin(enumerate) Carefully examine the instruction set and architecture of the TLM to see which instruction (instructions) correspond to each ALM CMACRO; Decide how to map the ALM registers and addressing modes onto the TLM registers and addressing modes (some will map one-to-one, others will take some thought, and a sequence of actions); Decide on a set of classifications of the TLM modes that distinguish which of a related set of TLM opcodes should be used to implement a particular ALM opcode, and write predicates that examine ALM and TLM modes to decide which class they are in; Write tables to map ALM modes into TLM modes, using these predicates, and then ALM opcodes into a (sequence of) TLM opcodes with the correct TLM modes. @end(enumerate) @subsection(Mechanics of ALM Instruction Definition) Before we get into the description of the ALM instructions, we must first define the table-driven pattern matching approach used to implement them. This approach allows definition of an ALM instruction in terms of a pattern predicate which is used to match the operands of the ALM instruction and a body that may consist of a mixture of ALM instructions (for recursive decomposition) and TLM instructions (for direct code generation). This is exactly analogous to the COND construct in LISP. Just like COND, any number of predicate/body pairs may be included in the expansion of an ALM instruction. Also, the order of the pairs is quite important (since they are compared in order from first to last). Typically, the most specific predicates are described first followed by gradually more and more general ones. The table definition for a specific ALM instruction is compiled into a single procedure. The instruction name must then be flagged with 'MC to indicate that it is a legal ALM instruction. The pattern table itself must then be stored under the indicator 'CMACROPATTERNTABLE on the ALM instruction property list. To simplify this process, the DefCmacro Macro has been defined: @begin(verbatim) (DefCMacro ALMInstructionName (pred1 body1) (pred2 body2) ... lastbody) @end(verbatim) Each ALM instruction is defined with a set number of arguments and the predicates are used to compare the types and/or values of the arguments. A predicate need not test all arguments, with non-tested arguments defaulting to T for a value. For example, one could define the following patterns: @begin(verbatim) Predicate Body (DefCMacro ALMInst ((FOOP) (Body1)) ((FEEP BARP) (Body2)) ((ANYP) (Body3)) (Body4)) @end(verbatim) Note that this looks almost exactly like the LISP operation COND. The one difference lies with the Body4 in the above example, which has no predicate and will always be evaluated if all others fail (Similar to the final 'T case in a Cond without the T). This last predicate/body pair may NOT have a predicate. If it doesn't, it will be evaluted just like the body. [!!Future change - CERROR on the default case, and make the defined use ANYP for his default case] The predicate functions are automatically passed one argument which is the ALM operand in the position of the test. So, in the above example, FOOP is passed the first operand and BARP is passed the second, after failure in the FOOP test. The body can be thought of as an implicit PROGN that contains a set of ALM and TLM instructions. These instructions then reference the various operands as ARGONE, ARGTWO, ARGTHREE, etc. using lexical ordering in the instruction. For example, if an ALM instruction mapped directly to a TLM one, it may be defined as: @begin(verbatim) ((FOOP BARP) (TLMOperator ARGONE ARGTWO)) @end(verbatim) Or, it may map into a number of ALM and TLM instructions: @begin(verbatim) ((FEEP) (ALMOperator ARGONE Something) (TLMOperator Something ARGTWO) (ALMOperator Something ARGONE)) @end(verbatim) Notice that even though the predicates only test the first operand ARGONE, the other operands may be referenced in the body. Also, "Something" can be thought of as a kind of constant operand (like a particular register, an integer constant, a memory location or whatever). In order to facilitate more complicated instructions within the body, we must now introduce a number of other features. First, suppose that you wish to include code generation time constants within the body. This can be accomplished by placing on the property of a variable name, 'WCONST with its value being the desired constant. Then when the variable is encountered in the instruction expansion, it will be replaced by the value on its property list under the 'WCONST indicator. A useful function to perform this operation would be: @begin(verbatim) (DE MakeReferencedConst (ConstName ConstValue) (Put ConstName 'WCONST ConstValue)) @end(verbatim) Therefore, if you perform a (MakeReferencedConst 'TAGPOSITION 10) then the body may reference TAGPOSITION directly: @begin(verbatim) ((FOOP) (ALMOperator ARGONE TAGPOSITION)) @end(verbatim) Now, that we have constants, it is sometimes desirable to have constant expressions. As long as all of the operands are either direct or referenced constants, the expression can be evaluated in an ALM or TLM instruction (the function may also be called if it doesn't have any operands). For example, the following could be imbedded within an instruction body: @begin(verbatim) (Plus2 (Foo 35 TagPosition) WordWidth) @end(verbatim) The system also provides for an alias mechanism, so you can map one name into another. This is accomplished by placing on the property of the alias, the name of the acutal function under the property DOFN. Thus, if you wanted to map FEE into PLUS2, you would simply: (Put 'FEE 'DOFN 'PLUS2). Therefore, another useful function would be: @begin(verbatim) (DE Alias (AliasFunction ActualFunction) (Put AliasFunction 'DOFN ActualFunction)) @end(verbatim) Sometimes in the process of generating the TLM instructions, it is necessary to make use of a temporary label (i.e. to generate a forward branch). This can be accomplished by referencing TEMPLABEL (just like a reference to ARGONE), which will create a label name consistent with a particular body. For example: @begin(verbatim) ((FOOP) (Test ARGONE) (GO (Label TEMPLABEL)) (Operate ARGONE ARGTWO) (Label TEMPLABEL)) @end(verbatim) Notice that even if the label references are separated by recursive ALM instructions, it will still create a unique reference to the label in both places. There is another mechanism to accomplish the same task in a more general fashion, that allows referencing of multiple labels. This mechanism is used with two functions: @begin(description) LabelGen@\This function takes one argument and returns a generated label. The argument and label are stored on an A-List for later reference. The argument may be any atom. LabelRef@\Look up the argument on the label's A-List and return the associated label. @end(description) An example of the use of these two functions is: @begin(verbatim) ((FOOP) (Label (LabelGen 'L1)) (Test ARGONE) (Go (LabelGen 'L2)) (Operator ARGTWO)) (Go (LabelRef 'L1)) (Label (LabelRef 'L2))) @end(verbatim) Finally, if the need arises to be able to call a function within an ALM instruction expansion. This can be accomplished by using the ANYREG mechanism. It is important to know that this technique will not work for a function call within a TLM instruction, only in the recursive expansion of an ALM instruction (there is no method for calling a function within a TLM instruction). (Note: ANYREG's will be explained in detail later, but the mechanism can be used to call a function). The technique is to first define the function that you wish to call, with one extra argument (the first one) that will be ignored. Then define an anyreg function that calls your function. For example, suppose you want a function that returns an associated register based upon a register argument (with the association stored in an A-List). The code would be implemented as follows: @begin(verbatim) (De GetOtherRegFunction (DummyArgument RegName) (Assoc RegName '((A1 S3) (A2 S2) (A3 S1)))) (DefAnyReg GetOtherReg GetOtherRegFunction) @end(verbatim) Then the pattern that may use the function would be: @begin(verbatim) ((FOOP) (ALMOperator (GetOtherReg ARGONE) (GetOtherReg ARGTWO))) @end(Verbatim) [Future Change - Implement a technique so if it is necessary for a random function to be called, all one has to do is define it and flag it as something appropriate - like 'ALMRandomFunction] @subsection(@ANYREG and @CMACRO patterns) Certain of the ALM operands are @ei[tagged] with a very special class of functions thought of as extended addressing modes; these @ANYREG@xs are essentially Pseudo instructions, indicating computations often done by the addressing hardware (such as field extract, indexing, multiple indexing, offset from certain locations, etc.). For example, the @xlisp operations CAR and CDR often are compiled in one instruction, accessing a field of a word or item. Using @ANYREG in this case, CAR and CDR are done as part of some other operations. In most cases, the @ANYREG feature is reserved for operations/addressing modes usable with most instructions. In some cases, the @ANYREG is too complicated to be done in one instruction, so its expansion emits some code to @ei[simplify] the requested addressing operation and returns a simpler addressing mode. The main thing is all desired computations are done using 1 or zero registers, hence the name @dq[@ANYREG]. The @ANYREG@xs have an associated function and possible table, with the name of the function under the property 'ANYREGRESOLUTIONFUNCTION and the pattern under 'ANYREGPATTERNTABLE. Just like the DefCMacro macro has been defined to aid ALM instruction description, the macro DefAnyReg has been provided to help set up these associations: @begin(verbatim) (DEFANYREG anyregname anyregfunction (pred1 body1) (pred2 body2) ... lastbody) @end(verbatim) As you can see, the structure of a DefAnyReg is exactly the same as DefCMacro, except an additional operand AnyRegFunction must be supplied. When an AnyReg is found in the instruction expansion, the function is called with two or more arguments: @begin(enumerate) Temp Register - Since the anyreg must perform its operation using zero or one register, this is the register that it may use to perform its task. (CAVEAT: The current implementation provides either (Reg T1) or (Reg T2) as the temporary register in all cases except one. That is when the anyreg is the source of a move and the destination is a register. In that case, the destination register is passed as the temporary. This can cause a problem if any part of the anyreg requires the destination to first be a source. [Future change - Eliminate this problem used in move and always pass in T1 or T2]). Source - This is the actual body of the anyreg. It may be referenced within the AnyRegPatternTable as SOURCE. ArgTwo - Only one anyreg (Memory) currently has more than two arguments. If they are desired, this third argument may be referenced by ARTTWO. @end(enumerate) A defect in the current system is that the pattern predicates following the anyreg function may not test the Temporary Register. This is quite inconsistent, since the function definition must consider the operand, while the pattern table must ignore it. [Future change - Fix This problem] @subsection(ALM Instruction Expansion) Now that we understand the mechanics of defining ALM instructions and anyreg tables we need to explore the order of expansion of the instructions. The compiler emits ALM instructions, with the operands being legal ALM "addressing" modes. These instructions are collected in a list and passed to the Pass1Lap function. Pass1Lap looks at each instruction and attempts to simplify it. It looks on the property of the opcode and checks to see if it has been flagged with 'MC. If so, it calls the function of the same name with the operands unchanged. Most ALM expansion functions first apply the function @begin(verbatim) ResolveOperand(Reg, Source) @end(verbatim) to each operand, passing a temporary register as the first argument, REG. This resolution process converts ALM operand forms into TLM operand forms i.e, legal addressing modes of the TLM. After each operand has been "resolved", the CMACRO pattern table is used, and the resulting LIST of CMACROS processed recursively. This is what is accomplished in the three functions: @begin(verbatim) EXPAND1OPERANDCMACRO(Arg1,Name) EXPAND2OPERANDCMACRO(Arg1,ARg2,Name) EXPAND4OPERANDCMACRO(Arg1,ARg2,Arg3,Arg4,Name) @end(verbatim) which first resolves the arguments using the available registers and then calls the routine (CMACROPATTERNEXPAND) which finds the pattern table of the Name argument (ALM instruction) stored on the property list under the indicator 'CMACROPATTERNTABLE. For example, (de !*WPlus2 (Arg1 Arg2) (Expand2OperandCMacro Arg1 Arg2 '!*WPlus2)) Only the (!*MOVE s d) ALM opcode tries to be smarter about temporary regs: d:=RESOLVEOPERAND('(Reg t2),d) If d is a register, then RESOLVEOPERAND(d,S) else RESOLVEOPERAND('(REG t1),s); [Future change - This should be changed in the future] Recall also that Processing an arugment with RESOLVEOPERAND may require other CMACRO's to be emitted first, to "simplify" the complex addressing mode; each Operand is free to destroy/modify its given register. For example, note how register t1 is reused below to resolve multiple CAR's and CDR's into MOVE's and simpler CAR's and CDR's: (!*MOVE (CAR (CAR x)) d) => (!*MOVE (CAR x) (REG t1)) (!*MOVE (CAR (REG t1)) d) (!*MOVE (CAR (CAR(reg 1))) (CDR (CDR (reg 2)))) => (!*MOVE (CDR (reg 2)) (REG t2)) (!*MOVE (CAR (REG 1)) (REG t1)) (!*MOVE (CAR (reg t1)) (CDR (reg t2))) Therefore, typically the operands are first processed before the ALM instruction table is used. AnyReg processing works the same way as with the ALM instructions. The operands are first resolved by calling the ResolveOperand function and then ExpandOneArgumentAnyReg (or TwoArgument) is called to process the pattern table. This has also been combined into a single function: OneOperandAnyReg and TwoOperandAnyReg. [[WARNING - There is an inconsistency in the naming here. For CMacro expansion the combined functions are called EXPANDxOPERANDCMACRO where for anyregs it is ONEOPERANDANYREG. BE CAREFUL!!!!!!! Another inconsistency is that CMacros are flagged with 'MC, which AnyRegs are not flagged]] @paragraph(ResolveOperand) The ResolveOperand function takes two arguments, a temporary register and the source to resolve. It performs the following resolution, in the order given: @begin(Description) an ID@\cals ResolveWConst on the ID; number or string@\returned unchanged; (OP s)@\If OP is flagged 'TerminalOperand, it is returned as is. (OP s)@\If OP is an @anyreg (has an 'AnyregResolutionFunction), it is applied to (Register s). (OP s)@\Otherwise, it is examined to see if it is a WCONST expression. @end(description) The function ResolveWConst tests its operand to see if it is a constant or constant expression, and returns its value. It performs the following resolution: @begin(description) (WCONST number)@\returns the number ID@\If WCONST indicator is on the ID's property, the associated number is returned otherwise the ID is returned. Expression@\Each operand is tested to determine if it can be resolved as a WCONST and if so, the function is applied to all of the operands (ANY FUNCTION CAN BE CALLED) @end(description) ?????Insert some SUMMARY USING THE FOLLOWING???????? Most ANYREGS use OneOperandAnyReg, ie recursively process arguments inside out (CAR anyreg), (CDR anyreg), etc % (de AnyRegCAR(R S) (OneOperandAnyReg R S 'CAR)) % (defAnyReg CAR AnyRegCar ....) Those that do not permit anyregs as args, use ExpandOneOperandAnyReg eg, (QUOTE s), (WCONST w), (WVAR v), (REG r) or flag name as TERMINALOPERAND to pass direct to ASM so here is a simple WCONST expression. As long as args are WCONSTEVALUABEL themselves, any function can be applied: @section(Predicates) Provided in the common machine independent files are a number of useful predicates. Those include: [[[[List the predicates provided in common-predicates]]]] Each of the following predicates expects one argument; call it X: @begin(Description) RegisterP@\(EqCAR X 'REG) tests for any register AnyP@\ Always T, used as filler EqTP@\ (equal X T) MinusOneP@\(equal X -1) InternallyCallableP@\Check if legal to make a fast internal call. Essentially checks the following: @begin(format) [(or !*FastLinks % all calls Fastlinks? (and !*R2I (memq X EntryPoints!*)) % or specially declared (FlagP X 'InternalFunction) (FlagP X 'FastLink)))] @end(format) AddressConstantP@\(or (NumberP X) (EqCar X 'Immediate))) @end(Description) @section(Standard ANYREGS) The following are the basic @ANYREG functions, which in many cases look for an AnyregTable: @begin(Description) @B[ID]@\@B[Flagged] CAR@\OneOperandAnyreg, 'CAR table@comment{ need to explain all of these tables - particularly the WVar table } CDR@\OneOperandAnyreg, 'CDR table QUOTE@\ExpandOneArgumentAnyreg, 'QUOTE table WVAR@\ExpandOneArgumentAnyreg, 'WVar table REG@\ExpandOneArgumentAnyreg, 'REG table WCONST@\OneOperandAnyreg, 'WConst table, default normally just SOURCE. FRAME@\ExpandOneArgumentAnyreg, computes offset from stack pointer, and passes this (in bytes) to 'FRAME table FRAMESIZE (Register)@\Computes (NAlloc!* @Value(Times) AddressingUnitsPerItem) to give size of frame to any special code needing it. MEMORY (Register Source ArgTwo)@\Used to compute indexed memory access: TwoOperandAnyreg, Look for 'MEMORY table. LABEL@\Flags a label, does no processing. @end(Description) The implementor of @PSL for any particular machine is free to add additional @ANYREG@xs (addressing modes), that are emitted as part of @CMACRO@XS by machine specific compiler patterns or COMPFNs. IMMEDIATE is a tag used to @ei[suggest] address or immediate constant. @subsection(Some AUXILLIARY Operand Modes for the TLM) Each of the following functions expects one argument; call it X: @begin(Description) UnImmediate@\If X @Value(Eq)(Immediate Y), removes tag to get Y. ExtraReg@\Converts argument X into Access to ArgumentBlock[X-LastActualReg] QUOTE@\Compiles X into a constant. If !*ImmediateQuote is T, returns an ITEM for object, else emits ITEM into a memory location, returns its address. @end(Description) Note @CMACRO@XS (flagged 'MC) are first expanded, then the PASS1PSEUDO@xs. This means the @CMACRO@XS are able to insert and manage TAGS that are removed or modified by final PASS1PSEUDO. @section(more junk) @i[Implement the Compiler Patterns and Tables]. This requires selecting certain alternative routes and parameterizations allowed by the compiler, trying to improve the match between the Abstract @PSL machine used by the compiler and the target architecture X. Mostly this phase is reserved for optimization, but the basic tables have to be installed to map @xlisp function names to corresponding @cmacro names and select the Compiler functions (COMPFNs and OPENFNs) to be used for each construct. This file, @dq[xxxx-COMP.RED], is usually copied from one of the existing machines and modified as needed. Most of the modifications relate to the legality of certain addressing combinations. These tables are briefly described in the Compiler chapter of the manual, but currently this task is still somewhat "arcane".@comment{ There needs to be some mention of what the usual modifications are! } @i[Build and Test the CROSS Compiler]. Now compile a series of LAP (mostly @CMACRO tests), @xlisp and @syslisp files to X assembly code, link and run. As the tests proceed, certain small I/O and function calling procedures are written in LAP. A common way to do I/O is to implement a @ei[Foreign Function]-calling protocol, used from @xlisp to call functions according to FORTRAN, PASCAL, C or other useful conventions. Calls in compiled @xlisp/@syslisp code to function names flagged with the 'FOREIGN-FUNCTION flag are called with a non-@xlisp protocol. This permits a standard I/O library to be called and allows simple routines to be written in another language. The purpose of this separate function-calling mechanism is to allow the @xlisp system to use the most efficient calling method possible, compatible with the needs of @syslisp and @xlisp. This method is not necessarily the most flexible, general, or safe method and need not be used by other languages. However, to allow the @xlisp/@syslisp system to call upon existing routines, particularly system-provided services, this additional function-calling mechanism should be provided. Some care needs to be taken to preserve and restore registers appropriately. @chapter(Test Series) In order to accomplish the PSL bootstrap with a minimum of fuss, a carefully graded set of tests is being developed, to help pinpoint each error as rapidly as possible. This section describes the current status of the test files. The first phase requires the coding of an initial machine dependent I/O package and its testing using a familar system language. Then the code-generator macros can be succesively tested, making calls on this I/O package as needed. Following this is a series of graded SYSLISP files, each relying on the correct working of a large set of SYSLISP constructs. At the end of this sequence, a fairly complete "mini-LISP" is obtained. At last the complete PSL interpreter is bootstrapped, and a variety of PSL functional and timing tests are run. @section(Basic I/O Support) The test suite requires a package of I/O routines to read and print characters, and print integers. These support routines are usually written in a "foreign" language (call it "F"), such as PASCAL, C or FORTRAN; they could also be coded in LAP, using CMACROs to call operating system commands, if simple enough. (E.g., JSYS's on DEC-20, Traps on 68000, etc.). These routines typically are limited to using the user's terminal/console for input and output. Later steps in the bootstraping sequence introduce a more complete stream based I/O module, with file-IO. On some systems, it is appropriate to have a main routine written in "F" which initializes various things, and then calls the "LISP" entry point; on others, it is better to have "LISP" as the main routine, and have it call the initialization routines itself. In any event, it is best to first write a MAIN routine in "F", have it call a subroutine (called, say TEST), which then calls the basic I/O routines to test them. The documentation for the operating system should be consulted to determine the subroutine calling conventions. Often, the "F" compiler has an "ASSEMBLY Listing switch", which can be turned on to see how the standard "F" to "F" calling sequence is constructed, and to give some useful guidance to writing correct assembly code. This can also be misleading, if the assembler switch only shows part of the assembly code, thus the user is cautioned to examine both the code and the documentation. On directory PT: (which stands for /psl/tests or <PSL.TESTS>), or its subdirectories, we have a number of sample I/O packages, written in various languages: PASCAL, FORTRAN, C and DEC20 assembly code. Each has been used successfully with some PSL bootstrap. The primitives provided in these files are often named XXX-yyyy, where XXX is the machine name, and yyyy is the primitive, provided that these are legal symbols. Of course, the name XXX-yyyy may have to be changed to conform to "F" and the associated linker symbol conventions. Each name XXX-yyyy will be flagged as a "ForeignFunction", and called by a non-LISP convention. The following is a brief description of each primitive, and its use. For uniformity we assume each "foreign" primitive gets a single integer argument, which it may use, ignore, or change (VAR c:integer in PASCAL). @Comment{Is this assumed to be a WORD size quantity, i.e. on the 68000 a 32 bit quantity or can it be a small integer???} The following routines ("yyyy") in LISP, will be associated with the corresponding "foreign" routine "XXX-yyyy" in an appropriate way: @begin(description) init()@\Called once to set up I/O channels, open devices, print welcome message, initialize timer. Quit()@\Called to terminate execution; may close all open files. PutC(C)@\C is the ASCII equivalent of a character, and is printed out without line termination (I/O buffering may be needed). C=EOL=10 (ASCII LF) @Comment{does this mean that the character should appear right away, or can it wait till the EOL is sent???} will be used to signal end-of-line, C=EOF=26 (ASCII SUB) will be used to signal end of file. GetC()@\Returns the ASCII equivalent of the next input character; C=EOL=10 for end of line, and C=EOF=26 for end of file. Note it is assumed that GetC does not echo the character. TimC()@\Returns the runtime since the start of this program, in milli-seconds, unless micro-seconds is more appropriate. For testing purposes this routine could also print out the time since last called. PutINT(C)@\Print C as an integer, until a SYSLISP based Integer printer that calls XXX-PutC works. This function is used to print integers in the initial tests before the full I/O implementation is ready. @comment{Err(C)@\Called in test code if an error occurs, and prints C as an error number. It should then call Quit() .} @end(description) The following functions will probably need to be defined in LAP, using either the ALM (cmacro level ) or machine specific (TLM) level: @begin(description) !%Store!-Jcall(Code-Address,Storage-Address)@\The Storage-Address is the address of the slot in the SYMFNC table where a jump instruction to the Code-Address must be stored. This implements a compiled call to a compiled function. You may have to insert padding or legal code to make the code match the call to the compiled code. The LAP for the Dec20 is: @begin(verbatim) LAP '((!*entry !%Store!-Jcall Expr 2) % CodeAddress, Storage Address (!*alloc 0) (!*WOR (reg 1) 8#254000000000) % Load a JRST in higher-bits (!*MOVE (reg 1) (memory (reg 2) (wconst 0))) (!*EXIT 0)); @end(verbatim) !%Copy!-Function!-Cell(From-Address,To-Address)@\Copies the SYMFNC cell located at the From-Address to the SYMFNC cell located at the To-Address. If your machine has the SYMFNC cell the same width as that of MEMORY, the following code used on the Dec-20 will work: @begin(verbatim) LAP '((!*entry !%copy!-function!-cell Expr 2) % from to (!*alloc 0) (!*move (memory (reg 1) (Wconst 0)) (memory (reg 2) (wconst 0))) (!*exit 0)); @end(verbatim) UndefinedFunction()@\In general, we think of the storage of the number of arguments in a register (Reg NargReg) and the index of the called function in a register (Reg LinkReg). This function must store the linkage register in the fluid UndefnCode!* and the Narg register in the fluid UndefnNarg!*. Finally, it must !*JCALL to the UndefinedFunctionAux. The following code implements this function in a manner that is portable across all machines that use the LinkReg and NargReg as real register: @begin(verbatim) FLUID '(UndefnCode!* UndefnNarg!*); LAP '((!*ENTRY UndefinedFunction expr 0) % No alloc 0 ? and no LINKE % because we don't want to % change LinkReg. (!*Move (reg LinkReg) (Fluid UndefnCode!*)) (!*Move (reg NargReg) (Fluid UndefnNarg!*)) (!*JCALL UndefinedFunctionAux) ); @end(verbatim) Flag(Dummy1,Dummy2)@\A call to this function is automatically generated by the compiler, but is never used. So, you must implement this function to call your error routine if it is actually called (This function will be redefined in a later test). The code for the Dec-20 is portable except the linkage to the Machine Dependent Error routine Err20: @begin(verbatim) LAP '((!*ENTRY FLAG expr 2) (!*alloc 0) (!*MOVE 2 (REG 1)) (!*LINKE 0 Err20 Expr 1) ); @end(verbatim) @end(description) Finally, the following three functions must be implemented to allow arithmetic operations of sufficient length. @begin(description) LongTimes(Arg1,Arg2)@\Compute the product of Arg1 and Arg2 and return: @begin(verbatim) procedure LongTimes(x,y); x*y; @end(verbatim) LongDiv(Arg1,Arg2)@\Compute the quotient of Arg1 and Arg2 and return the value: @begin(verbatim) procedure LongDiv(x,y); x/y; @end(verbatim) LongRemainder(Arg1,Arg2)@\Compute the Remainder of Arg1 with respect to Arg2: @begin(verbatim) procedure LongRemainder(x,y); Remainder(x,y); @end(verbatim) @end(description) As a simple test of these routines implement in "F" the following. Based on the "MainEntryPointName!*" set in XXX-ASM.RED, and the decision as to whether the Main routine is in "F" or in "LISP", XXX-MAIN() is the main routine or first subroutine called: @begin(verbatim) % MAIN-ROUTINE: CALL XXX-INIT(0); CALL XXX-MAIN(0); CALL XXX-QUIT(0); % XXX-MAIN(DUMMY): INTEGER DUMMY,C; CALL XXX-PUTI(1); % Print a 1 for first test CALL XXX-PUTC(10); % EOL to flush line CALL XXX-PUTI(2); % Second test CALL XXX-PUTC(65); % A capital "A" CALL XXX-PUTC(66); % A capital "B" CALL XXX-PUTC(97); % A lowercase "a" CALL XXX-PUTC(98); % A lowercase "b" CALL XXX-PUTC(10); % EOL to flush line CALL XXX-PUTI(3); % Third test, type "AB<cr>" CALL XXX-GETC(C); CALL XXX-PUTC(C); % Should print A65 CALL XXX-PUTI(C); CALL XXX-GETC(C); CALL XXX-PUTC(C); % Should print B66 CALL XXX-PUTI(C); CALL XXX-GETC(C); CALL XXX-PUTI(C); % should print 10 and EOL CALL XXX-PUTC(C); CALL XXX-PUTI(4); % Last Test CALL XXX-ERR(100); CALL XXX-PUTC(26); % EOF to flush buffer CALL XXX-QUIT(0); % END @end(verbatim) For examples, see PT20:20IO.MAC for DEC-20 version, PHP:HP.TEXT for HP9836 PASCAL version, PCR:shell for CRAY fortran version. @section(LAP-TO-ASM and CMACRO Tests) After the basic XXX-ASM.RED file has been written and the XXX-CROSS.EXE has been built, and seems to be working, an exhastive set of CMACRO tests should be run. The emitted code should be carefully examined, and the XXX-CMAC.SL adjusted as seems necessary. Part of the CMACRO tests are to ensure that !*MOVEs in and out of the registers, and the ForeignFunction calling mechanism work. The goal of this test, and the following few sections is to guide you in getting the first piece of ALM code to translate to TLM form, correctly assemble, and finally execute on the target machine. There are a large number of details to worry about, and one will have to come back and refine decisions a number of times. Some of the decisions you will have to make are based on incomplete information, and are based on an interaction of the ALM model, LISP usage statistics and unknown oddities of the target machine. In many cases, you will have to make the decision just to proceed to get the skeleton together, and then immediately come back to fix the code. The first major milestone will be to set up enough of the basic cross-compiler to be able to translate and assemble the following file, called PT:MAIN0.RED: @begin(verbatim) % MAIN0.RED - A "trivial" file of ALM level LAP to test % basic set of tools: LAP-TO-ASM mostly, % and CMACROs LAP '((!*ENTRY DummyFunctionDefinition Expr 1) (!*ALLOC 0) (!*MOVE (REG 1) (REG 2)) (!*EXIT 0)); END; @end(verbatim) It consists of a single procedure, written in LAP using only 4 CMACROs, each quite simple. Notice the procedure defined has a "long" name, which may have to be mapped to a simpler symbol (for your assembler) by a routine in your xxx-ASM.RED file. The !*ENTRY cmacro is actually handled by LAP itself, so there are 3 CMACROs to be written: @Begin(description) (!*ALLOC n)@\Issues instructions to allocate a frame of n items on the stack. May also have to issue instructions to check stack overflow if the system hardware does not. For some machines, with n=0, no code is emitted, while for others, !*ALLOC is a good place to establish certain registers for the code body. (On the CRAY, the call instruction puts the return address in a register, which get saved on the stack in the !*ALLOC). (!*MOVE source dest)@\Issue code to move the contents of source to the destination. In the MAIN0 example, a register to register move is desired. ALM (REG 1) and (REG 2) are almost always allocated to real TLM registers. An "anyreg" for the REG mapping will have to be written. (!*EXIT n)@\Issues code to clean up the stack, by removing the frame that was allocated by a corresponding (!*ALLOC n), and then returns to the caller, whose address was saved on the stack (usually) by an appropriate TLM instruction. (On CRAY, the return address is restored to the special register). @end(description) Here is an example of the processing of this file on the DEC-20. On the DEC20 we produce 2 files, the CODE-FILE and the DATA-FILE: @begin(verbatim) CODE-FILE, MAIN0.MAC DATA-FILE, DMAIN0.MAC @end(verbatim) In summary, here are the initial steps you will have to follow, with some indication of the decisions you will have to make: @begin(description) Decide on PSL Item layout@\How many bits for the tag; should there be a GC field; will the tag have to be masked out when the INF field is used as an address; should the fields be aligned to byte, word or other boundaries to make TAG and INF access faster; Decide on TLM register use@\Some registers will be used for the ALM registers (rest simulated by memory locations), some used for CMACRO temporaries, some for Target OS interface or addressibility, some for Linkage registers and some for the stack. Stack Implementation@\Should the LISP stack be same as system stack; can we use stack hardware; how about stack overflow; which way should stack grow; ALM needs to access elements inside the stack relative to the stack pointer; the stack pointer needs to be accessible so that the GC and other things can access and examine elements. @end(description) @section(More details on Arcitecture mapping) Need to explain why currently 1 tags used, expect more or less in future. Perhaps explain which tests are MOST important so at least those can be done efficiently, even if others encoded in a funny wya. Mention idea that in future may want to put (say) 3 bits of tag in lower word, force double or quadword alignment, and put rest of tag in object. Mention how some data-types are immediate, others point into memory, and some already have headers. Mention possibel user-defind extension types. Need to clarify how ALM registers are used so can be mapped to TLM or memory. Need to explain Stack registers, CMACRO temporary registers, link registers. Need to explain relative importance of certain CMACROs and order in which they should be written and debugged. Make a CMACRO test file to be examined by hand, to be assembled, and maybe even run. Need to give more detailed steps on how to get MAIN1 running; seems like a BIG step. Perhaps break down into smaller MAIN0, just to get off the ground. (Ie, might not execute, but should assemble). Give a check list of steps. Explain that at first, just get all pieces together, then can fill in details once the skeleton is correct, and flesh out stubs. Explain data-file versus code-file model. @section(SysLisp Tests) This set of tests involve the compilation to target assmbly code, the linking and execution of a series of increasingly more complex tests. The tests are organized as a set of modules, called by a main driver. Two of these files are machine dependent, associating convenient LISP names and calling conventions with the "Foreign" XXX-yyyy function, define basic data-spaces, define external definitions of them for inclusion, and also provide the appropriate MAIN routine, if needed. These files should probably be put on a separte subdirectory of PT: (e.g., PT20:, PT68:, etc.) The machine dependent files are: @begin(description) XXX-HEADER.RED@\Is a machine dependent "main" include file, read into each MAINn.RED file, to define the data-spaces needed, and perhaps define a main routine in LAP, and have the appropriate XXX-MAIN call the "FirstCall" function, used to start the body of the test. Also included are the interface routines to the "F" coded I/O package. providing a set of LISP entry-points to the XXX-yyy functions. This should be copied and edited for the new target machine as needed. Notice that in most cases, it simply defines "procedure yyyy(x); XXX-yyyy(x);", relying on "ForeignFunction" declaration of XXX-yyyy. XXX-TEST-GLOBAL-DATA.RED@\This contains a series of external declarations to correspond to the Global Data definitions in the above header file file. It is automatically included in all but the MAINn module via the "GlobalDataFileName!*" option of XXX-ASM.RED. @end(description) The machine independent test files and drivers are: @begin(description) MAIN1.RED@\Is a very simple driver, that calls Getc and Putc, does a few tests. It does an 'IN "XXX-HEADER.RED";'. The "FirstCall" procedure then calls "init", uses "putc" to print AB on one line. It should then print factorial 10, and some timings for 1000 calls on Factorial 9 and Tak(18,12,6). Build by itself, and run with IO. @Comment{This seems to hide the assumption that 10! can be done in the integer size of the test implementation.??? } SUB2.RED@\Defines a simple print function, to print ID's, Integer's, Strings and Dotted pairs in terms of repeated calls on PutC. Defines PRIN1, PRIN2, PRINT, PRIN2T, TERPRI and a few other auxilliary print functions used in other tests. Tries to print "nice" list notation. MAIN2.RED@\Tests printing and access to strings. It peforms most of the useful string operations, printing messages to verify that they function properly. Uses Prin2String to print a greeting, solicit a sequence of characters to be input, terminated by "#". Watch how end-of-line is handled. Then Print is called, to check that TAG's are correctly recognized, by printing a LISP integer, an ID and 2 dotted pairs. Requires SUB2 and IO modules. Finally, it tests the undefined function calling mechanism to verify that it does print out an error message. Therefore, the UndefinedFunction routine must be defined in xxx-header by this test 2. SUB3.RED@\Defines a mini-allocator, with the functions GtHEAP, GtSTR, GtVECT, GtCONS, Cons, XCons, NCons, MkVect and MkString. Requires primitives in SUB2 module. MAIN3.RED@\First Executes a Casetest, trying a variety of Branches and Defaults in the case staement. There are a number of calls on Ctest with an integer from -1 to 12; Ctest tries to classify its argument using a case statement. ConsTest simply calls the mini-allocator version of CONS to build up a list and then prints it. Requires SUB2, SUB3 and IO modules. SUB4.RED@\Defines a mini-reader, with InitRead, RATOM and READ. It has the facilities to convert case input, using the !*RAISE switch (and the SetRaise function). This mini-READ does not yet read vectors. Requires SUB3, SUB2, and IO modules. MAIN4.RED@\First, this test checks to see that EQSTR works. Then it tests FindId to see if it can find Identifiers known to exist. After that, it tests to see if new Id's can be found and then found in the same place. Then a test loop is created that calls RATOM, printing the internal representation of each token. Type in a series of id's, integer's, string's etc. Watch that the same ID goes to same place. When the user types a Q, it should go into a READ-PRINT loop. You should type in a variety of S-Expressions, checking that they are correctly printed. Once again, you should finally type a Q to exit. Requires SUB3, SUB2 and IO modules. SUB5.RED@\Defines a mini-EVAL. Does not permit user defined functions. Can eval ID's, numbers, and simple forms. No LAMBDA expressions can be applied. FEXPR Functions known are: QUOTE, SETQ, COND, PROGN and WHILE. The Nexpr LIST is also known. Can call any compiled EXPR, with the standard 15 arguments. Requires SUB4, SUB3, SUB2 and I/O. MAIN5.RED@\Starts a mini-READ-EVAL-PRINT loop, to which random simple forms may be input and evaluated. When ready, input (TESTSERIES) to test PUT, GET and REMPROP. Then an undefined function is called to test the UNDEFINED function mechanism. Requires SUB5, SUB4, SUB3, SUB2 and IO modules. Note that input ID's are case raised (!*RAISE has been set to T by default) so input can be in in lowercase for built-in functions. Terminates on Q input. SUB6.RED@\Defines a more extensive set of primitives to support the EVAL, including LAMBDA expressions, and user defined EXPR, FEXPR, NEXPR and MACRO functions. This is a complete model of PSL, but has a restriced set of the PSL functions present. Can call any compiled or interpreted function. Requires SUB5, SUB4, SUB3, SUB2 and I/O. MAIN6.RED@\Tests the full PSL BINDING modules (PI:BINDING.RED and PT:P-FAST-BINDER.RED). Call the (TESTSERIES) routine to do a test of Binding, the Interpretive LAMBDA expression evaluator, and binding in compiled functions. Requires SUB6,SUB5, SUB4, SUB3, SUB2 and IO modules. !*RAISE is once again on. Terminates on Q input. SUB7.RED@\A set of routines to define a minimal file-io package, loading the machine independent files: PT:SYSTEM-IO.RED and PT:IO-DATA.RED, and a machine dependent file XXX-SYSTEM-IO.RED. The latter file defines primitives to OPEN and CLOSE files, and read and write RECORDS of some size. The following definitions are used in the routines: @begin(verbatim) FileDescriptor: A machine dependent word to references an open file. FileName: A Lisp string @end(verbatim) @begin(description) SYSCLEARIO()@\Called by Cleario to do any machine specific initialization needed, such as clearing buffers, initialization tables, setting interrupt characters, etc. SysOpenRead(Channel,FileName)@\Open FileName for input and return a file descriptor used in later references to the file. Channel may be used to index a table of "unit" numbers in FORTRAN-like systems. SysOpenWrite(Channel,FileName)@\Open FileName for Output and return a file descriptor used in later references to the file. Channel may be used to index a table of "unit" numbers in FORTRAN-like systems. SysReadRec(FileDescriptor,StringBuffer)@\Read from the FileDescriptor, a record into the StringBuffer. Return the length of the string read. SysWriteRec (FileDescriptor, StringToWrite, StringLength)@\ StringLength characters from StringToWrite from the first position. SysClose (FileDescriptor)@\Close FileDescriptor, allowing it to be reused. SysMaxBuffer(FileDesc)@\Return a number to allocate the file-buffer as a string; this should be maximum for this descriptor. @end(description) RDS, WRS, OPEN, CLOSE, DSKIN and TYPEFILE are defined. MAIN7.RED@\Starts the LISP READ-EVAL-PRINT loop tested before, and now permits the user to test io. Call (IOTEST). Other functions to try are (OPEN "foo" 'OUTPUT), (WRS n), (RDS n) etc. [Now the GETC and PUTC IO routines in XXX-HEADER will finally call the file-oriented IndependentReadChar and IndependentWriteChar]. Also includes the standard PSL-TIMER.RED (described below), which can be invoked by doing (DSKIN "PT:TIME-PSL.SL"). Since the garbage collector not yet present, may run out of space. FIELD.RED@\A a set of extensive tests of the Field and Shift functions. Needs a WCONST BitsPerWord defined in XXX-HEADER.RED. Build by itself, and execute with the IO support. @end(description) Test set "n" is run by using a set of command files to set up a multi-module program. These files are stored on the approriate subdirectory (PT20: for the DEC20). Note that each module usually produces 2-3 files ("code", "data" and "init") @begin(Enumerate) First Connect to the Test subdirectory for XXX: @verbatim[ @@CONN PTxxx:] Then initialize a fresh symbol table for program MAINn, MAINn.SYM: @verbatim[ @@MIC FRESH MAINn] Now successively compile each module, SUB2..SUBn @verbatim[ @@MIC MODULE SUB2,MAINn @@MIC MODULE SUB3,MAINn @@MIC MODULE SUBn,MAINn] Now compile the MAIN program itself @verbatim[ @@MIC PROGRAM MAINn] As appropriate, compile or assemble the output "F" language modules (after shipping to the remote machine, removing tabs, etc..). Then "link" the modules, with the XXX-IO support, and execute. On the DEC-20, the @verbatim[ @@EX @@MAINn.CMD] command files are provided as a guide] Rather than including output from some older test runs, we insist that you run the tests yourself on the HOST machine to be absolutley sure of what output they produce, and what input is expected. Also, if errors occur during testing, the examination of the HOST tests will help. This will also help as additonal tests are added by new implementors. @end(enumerate) @section(Mini PSL Tests) The next step is to start incorporating portions of the PSL kernel into the test series (the "full" Printer, the "full" reader, the "full" Allocator, the "full" Eval, etc.), driving each with more comprehensive tests. Most of these should just "immediately" run. There some peices of Machine specific code that have to be written (in LAP or SYSLISP), to do channel I/O, replacing the simple XXX-IO; to do fast APPLY; Fluid Binding and Arithmetic. This set of tests will help check these peices out before getting involved with large files. @section(Full PSL Tests) Now that PSL seems to be running, a spectrum of functional tests and timing tests should be run to catch any oversights, missing modules or bugs, and as a guide to optimization. The following tests exist: @Description[ PSLTEST.SL@\A fairly comprehensive test of the Standard LISP subset of PSL. Do (DSKIN "pt:psltest.sl"). There are a few tests of the error mechanism that have to be "pushed" through for a full test. MATHLIB.TST@\A series of tests of MATHLIB. First LAOD MATHLIB; into RLISP, then do IN "MATHLIB.TST"; . PSL-TIMER.SL, TIME-PSL.SL@\A standard timimg test covering PSL basics. Compile PSL-TIMER.SL into kernel, or with resident compiler, then (LAPIN "PT:TIME-PSL.TEST"). ] @section(Stabilize Basic PSL) Finally, compile the kernel modules of @PSL, link with the additional machine-dependent modules, and @PSL (hopefully) comes right up@Foot[Presently an unlikely possibility, as the system may still change arbitrarily from under the implementor!]. Additional work is underway to develop a much more comprehensive test set, that will not change while the implementor is proceeding with the bootstrap; unfortunately, @PSL is still undergoing continuous development at Utah, resulting in some "out-of-phase" communication problems. After the basic interpreter is working, additional modules can also be compiled from @xlisp to X and linked with the kernel. The most common of these might be the @RLISP parser and even the @REDUCE@cite[Hearn73] computer algebra system@Comment{???or should this be symbolic algebra system??? }. As more files are compiled to machine X and linked, the task becomes more tedious. At this point, we need to consider the bootstrap of the @ei[Resident] Compiler, LAP and fast-loader (FASL). The most common way to build and maintain large @PSL programs is to build the kernel @PSL with a resident FASLIN for loading fast-load files, and then compile required modules to FASL (xxxx.b) files. A @PSL-based system is built by loading the appropriate FASL files, and then saving the @dq[core] image as an executable file. On some machines this is easy; on others it is quite hard; see the discussions below. These additional steps are: @begin(enumerate) @i[Implement Resident LAP]. Using an existing LAP.RED as a guide, write a table-driven program that does the actual assembly of code written in LAP form for machine X, to the appropriate bit-patterns; the details of this process are discussed at length in @dq[Reading, Writing and Testing LAP]@cite[Griss82h]. @PSL provides many tools to make this task quite easy, but the process is still very machine dependent. Future work may lead to the use of an architectural description language. @i[Test LAP]. The depositing of bit-patterns into BPS@Foot[BPS is Binary Program Space. The name BPS is a remnant of @xlisp 1.6. The desire to have a separate code space is based on the desire to @ei<not> relocate compiled code.] needs to be checked. Check also that procedures can be constructed with LAP, compile LAP into the kernel, and assemble some small files. @i[Implement FASLIN]. FASLIN requires some binary I/O and other small support procedures described in a separate section below. @i[Implement FASLOUT]. Once LAP works, the FASLOUT process seems quite simple, requiring only the Binary I/O etc@. used by FASLIN. It should be possible to get xxxx-FASLOUT working on an existing @PSL, and cross-FASL for machine X. This has not yet been tested. When it works, FASLIN could be made part of the @PSL kernel very early on. @i[Test FASL files]. Check that FASL files can be easily written and read. @Comment{What kind of tests should be done??? This "easily written and read" sounds like apple pie, but it would seem that a piece of SYSLISP could be written that would give the FASL mechanism a good work out, perhaps two pieces with cross references to one another. } @i[Implement and test Core saving]. Determine how to save the image of an executing program, so that it can be restarted. We only require that it be restarted at the beginning, not where it was when it was saved. We usually change the MAIN entry function to call an appropriate TopLoop. See the more extensive discussion below. @foot[Actually, the only part which must be saved is the impure data part; the pure data section, the pure code section and the control stack need not be preserved - however, if only the impure data part is saved, the restart mechanism must map the pure data and code back in. For an example of programs which do selective dumping see EMACS MKDUMP and @interlisp SYSOUT. @Comment{We probably need to think about some way of loading the libraries similar to EMACS, such that it is easy to reload the libraries (particularly if they remain pure).}] @end(enumerate) @chapter(DETAILED REFERENCE MATERIAL) @section(Details on the ALM Operand forms) The following are references to a variety of memory locations: In the current implementation the following 4 reference the same location, the SYMVAL cell of the associated ID. This is the contents of the location SYMVAL+AddressingUnitsPerItem*IDLOC(id): @begin(verbatim) (FLUID name:id) (!$FLUID name:id) (GLOBAL name:id) (!$GLOBAL name:id) @end(verbatim) @begin(description) (WVAR name:id)@\This references the contents of the static location named by the ID. @end(description) The following are all constants, either absolute bit-patterns, or address expressions. @begin(description) (WARRAY name:id)@\Address of the base of a static array (WSTRING name:id)@\Address of the base of a static string (WCONST expr:wconst-expression)@\Any constant expression, either numeric, a declared constant, addresses of thinsg that could also be passed as WARRAY or WSTRING, or other expressions that can be handled by the TLM assembler. (IMMEDIATE wconst-expression:any)@\Really only introduced as a "tag" to make later processing easier; a constant is either an explict constant or (IMMEDIATE expression). This is default TLM mode wrapped when RESOLVEOPERAND is "unsure". We are confused about the differences between WConsts and Immediates in some cases. (QUOTE s-exp:s-expression)@\Is the constant bit-pattern representing a tagged PSL item. (LABEL l:id)@\Reference to a local location (symbol) in the current set of ALM instructions, processed in a single call to LAP, usually a single function. (MEMORY base:any offset:wconst-expression)@\This is the basic ALM "indexing" operation, and represents the contents of the location (base)+offset. (CAR base:any)@\Reference the contents of the ITEM pointed at by INF(base). It is assumed that base is actually a PAIR (not checked). In principle this is sort of like (MEMORY (INF base) (WCONST 0)). (CDR base:any)@\Refernce the contents of the ITEM pointed at by INF(base). It is assumed that base is actually a PAIR (not checked). In principle this is sort of like (MEMORY (INF base) (WCONST AddressingUnitsPerItem)). (FRAME n:integer)@\Contents of the n'th location in the current stack frame. In most versions of the ALM, there is an explicit register, (REG ST), which points at the base of the frame. The stack grows in some direction determined by features on the TLM, so that this could in principle be expressed as (MEMORY (reg ST) (WCONST (times StackDirection -1 AddressingUnitsPerItem (SUB1 n)))) (REG reg-descriptor:{integer,id})@\Reference to an ALM register. (LIT [any-instruction-or-label:{list,id}])@\Plants the instruction sequence elswhere, and leaves a reference to its start. Essetially equivalent to (label g), with g starting a block of the instructions, in "literal" space. (LABELGEN tag:id)@\A mechnism (with LABELREF) to generate and reference a label local to a particular CMACRO pattern. Meant mostly for implementing conditional jumps of various kinds. (LABELREF tag:id)@\Reference a label that was assigned to the Tag. @end(description) The following set of ALM instruction forms are used to define constant data which is intermixed with instructions. @begin(description) (FULLWORD [exp:wconst-expression])@\The expressions are deposited in successive "words" (item-sized units). (HALFWORD [exp:wconst-expression])@)\The expressions are deposited in succesive halfwords (two per item-sized unit). (BYTE [exp:wconst-expression])@\The expressions are deposited in successive "bytes" (character-sized units). (STRING s:string)@\The ASCII values of the characters of the string are deposited in successive bytes, terminated by a zero byte. (FLOAT f:float)@\The 2 word bit pattern for the floating point number is deposited. @end(description) These must be processed by the TLM to ASM translator (and later by the resident assmbler). @subsection(Standard @CMACRO@xs) The following are the basic @CMACRO@XS; additional @CMACRO@XS are of course frequently added either to aid in writing the @CMACRO@XS (a @CMACRO @ei[subroutine]), or to aid some aspect of the machine-specific details. Recall that each @CMACRO returns a list of LAP instructions (which are simpler to generate code for, although it may be a more complex list of operations) representing the appropriate expansion of this @CMACRO (these may also call other @CMACRO@XS). These instructions are then recursively processed by the @CMACRO expander (i.e@. LAP). The !*MOVE @CMACRO is very commonly used for this purpose, to get a @ei[general] operand into a register, so the particular @CMACRO can operate on it. The following @CMACRO@XS deal with function ENTRY, EXIT and function call: @begin(Description) !*Entry((FunctionName FunctionType NumberOfArguments)@\Normally the user does not code this @CMACRO, since it is processed completely by LAP itself. It is used to indicate the start of a function (or entry point within a function). Normally just plants a label corresponding to FunctionName. !*Exit (N)@\Exits (@dq[returns]) from procedure, deallocating N items, as needed. N corresponds to the N items allocated by !*Alloc, see below. !*Link (FunctionName FunctionType NumberOfArguments)@\If FunctionName is flagged 'FOREIGNFUNCTION, emit a call (!*ForeignLink FunctionName FunctionType NumberOfArguments), else emit a (!*Call FunctionName). This is the basic function call macro. It assumes the appropriate number of arguments are in the registers (previously loaded) in the registers, @w[(REG 1) ... (REG n)]. We currently do not check either NumberOfArguments or FunctionType, so a simpler @CMACRO, !*CALL is provided for basic function call. !*Call (FunctionName)@\Basic or @dq[Standard] function call. Checks to see if FunctionName has an 'OPENCODE property, and returns the stored instruction list if any. Otherwise it looks for an appropriate pattern table stored by DEFCMACRO under 'CMACROPATTERNTABLE, as described above. !*LinkE (DeAllocCount FunctionName FunctionType NumberOfArguments)@\An @dq[exit] call. Emitted when the caller does not need to examine the result, but returns it directly. The !*LinkE @CMACRO does not save the return address, so a return from the called function is not to this caller, but to the previous !*LINK. Essentially deallocates the frame (if any), does either an ordinary !*ForeignCall and then !*Exit(0), or does a !*JCALL which does no return address saving. !*JCall (FunctionName)@\First checks for an EXITOPENCODE table, then for an OPENCODE table (followed by a normal return, !*EXIT(0)) or looks for the general '!*JCALL table. The generated code is supposed to call the function without saving a return address, essentially a JUMP. !*ForeignLink (FunctionName FunctionType NumberOfArguments)@\ This is the basic linkage to a foreign function. It assumes the appropriate number of arguments are in the registers (previously loaded) in the registers, @w[(REG 1) ... (REG n)]. It then pushes the arguments on a stack, or moves them to a global location, as appropriate and transfers to the ForeignFunction in an appropriate manner (REWRITE). Some care must be taken in interfacing to the LISP world, with cleanup on return. @end(description) The following @CMACRO@XS handle the allocation and deallocation of a Frame of temporary items on the stack, used for argument saving, PROG local variables, etc. @Begin(description) !*Alloc (N)@\Allocates a frame of N @Value(Times) AddressingUnitsPerItem units by adjusting the stack (generally increasing it) by using a stack operation that invokes an overflow signal, if any. Otherwise the stack register should be compared against an appropriate UpperBound. It passes N @Value(Times) AddressingUnitsPerItem to the pattern, to be used for indexing or displacement. Note some stacks grow in the @ei[negative] direction, and this is a major source of @CMACRO errors. Currently, there is a major problem, that this MACRO may not be called recursively. FIX in the future. !*DeAlloc (N)@\Decrement stack by N @Value(Times) AddressingUnitsPerItem units, deallocating the temporary FRAME. Passes N*AddressingUnitsPerItem to the pattern. @end(Description) The following @CMACRO@XS deal with the binding and unbinding of FLUID variables used as Lambda or Prog parameters. They are usually quite complex to code. The basic idea is to follow the call on a Lambind or Progbind procedure by a compact table of Fluid addresses or offsets. The call may have to be special, and @ei[internal], so that the support code (usually hand-coded in LAP) can pick up and process each entry in the compact table. @begin(Description) !*LamBind(Registers FluidsList)@\Registers is of the form @w[(REGISTERS (REG a) (REG b) ... (REG c))], and FluidsList is of the form @w[(NONLOCALVARS (FLUID f) ...)]. The intent of this @CMACRO is to save the current value of each Fluid in the list on the Binding Stack, paired with the Fluid name. Then the value in the corresponding register is stored into the Value cell. Later unbinding by !*FreeRstr or the Catch and Throw mechanism, restores the saved value. !*ProgBind (FluidsList)@\Emitted for Fluid variables in Prog parameter lists. Idea is as above, but stores a NIL in the value cell after saving the old contents. Usually implemented as @w[(!*LamBind '(REGISTERS) FluidsList))], but may be able to use a more compact table. !*FreeRstr (FluidsList)@\Restores the old values of the fluids. Since we use a special binding stack with Fluid names stored on it, we really only need the number to unbind. [Perhaps we should use !*UnBind(N) to make this decision explicit.] @end(Description) Data-moving @CMACRO@XS. Most of the work is done by !*MOVE, with some PUSH/POP optimizations if the !*MOVE is close to an !*ALLOC or !*DEALLOC. Other data moving may be done in conjuction some of the operations, such as !*WAND, !*WOR, !*WPLUS2, !*WMINUS, etc. @begin(Description) !*Move (Source Destination)@\The major work horse. Generates code to move SOURCE to DESTINATION. Uses (REG t1) and (REG t2) as temporary registers if needed. First simplifies destination (@ei[Anyreg resolution]), using (REG t1) as a temporary if needed. It then simplifies the SOURCE, using the as temporary either the destination (if a register), or (REG t2). Finally, the !*MOVE table is used. !*Push (Arg1)@\Emitted during peep hole optimization to replace a pair !*ALLOC(1) and !*MOVE(arg1,(FRAME 1)). This is a very common optimization. !*Pop (Arg1)@\Emitted during the peep hole phase to replace the common pair !*MOVE((FRAME 1),Arg1), followed by !*DEALLOC(1). This modifies the argument ARG1. @end(Description) The JUMP @CMACRO@XS are given the label as the first operand, but they pass the label as the third (and last) argument to the pattern (usually as ARGTHREE) after resolving the other arguments. The label is tagged (LABEL Label). @begin(Description) @begin(group) !*Lbl (Label)@\This @CMACRO is emitted when a label is inserted in the generated code. Its body is usually trivial, but can be more complex if some form of short and long jump optimization is attempted. @hinge !*Jump (Label)@\Emit code to jump to Label. Label often involves memory. @hinge !*JumpEQ (Label Arg1 Arg2)@\Generate code to JUMP if Arg1 EQ Arg2. Used for @xlisp EQ and @syslisp WEQ. @hinge !*JumpNotEQ (Label Arg1 Arg2)@\Generate code to JUMP if not(Arg1 EQ Arg2). Used for @xlisp EQ and @syslisp WEQ. @hinge !*JumpWLessP (Label Arg1 Arg2)@\Generate code to JUMP if Arg1 @Value(LT) Arg2. Used for @syslisp WLESSP. @hinge !*JumpWGreaterP (Label Arg1 Arg2)@\Generate code to JUMP if Arg1 @Value(GT) Arg2. Used for @syslisp WGREATERP. @hinge !*JumpWLEQ (Label Arg1 Arg2)@\Generate code to JUMP if Arg1 @Value(LTE) Arg2. Used for @syslisp WLEQ. !*JumpWGEQ (Label Arg1 Arg2)@\Generate code to JUMP if Arg1 @Value(GTE) Arg2. Used for @syslisp WGEQ. !*JumpType (Label Arg TypeTag)@\Generate code to JUMP if TAG(Arg) @Value(Eq) TypeTag. The TypeTags are small integers, defined in the xxxx-Data-Machine file. This @CMACRO is emitted for opencoded Type checking, such as IDP(x), etc. It should be implemented very efficiently. Instead of extracting the TAG and comparing with the small integer, it may be easier just to mask the INF portion of Arg, and compare with a shifted version of TypeTag (previously saved, of course). @hinge !*JumpNotType (Label Arg TypeTag)@\Generate code to JUMP if not(TAG(Arg) @Value(Eq) TypeTag). See comments above. @hinge !*JumpInType (Label Arg TypeTag)@\Generate code to JUMP if Tag(Arg) is in the range @w([0 ... TypeTag,NegInt]). This is used to support the numeric Types, which are encoded as 0,...M, and -1 for negative Inums. Thus NumberP, FixP, etc@. have to test a range. Note that NegInt is tested specially. @hinge !*JumpNotInType (Label Arg TypeTag)@\Generate code to JUMP if Tag(Arg) is not in the range @w([0 ... TypeTag, NegInt]). See above comment. @hinge !*JumpOn (Register LowerBound UpperBound LabelList)@\Used to support the CASE statement. This is usually written by hand and no pattern is used. It tests if Register is in range LowerBound @value[Lte] Register @value[Lte] UpperBound; if so, it jumps to the appropriate label in labellist, using (Register @value[MinusSign] LowerBound) as the index. If not in range, it Jumps to a label planted at the end of the label table. In some implementations, the label table has to be a jump table. @hinge !*JumpWithin (Label LowerBound UpperBound)@\This is also used to support the CASE statement, in the situation where the overall label range is large, and there are many sub-ranges. This generates code to JUMP to Label if LowerBound @value(LTE) (REG 1) @value(LTE) UpperBound. A default version uses !*JumpWLessP and !*JumpWLeq tests. [Perhaps should be modified to use ANY reg]. @end(group) @end(Description) The following @CMACRO@XS perform simple computations on their arguments. Binary operations take two arguments, (Dest Source), and leave the result in DEST. @begin(description) !*MkItem (Arg1 Arg2)@\Computes Arg1 @Value(Eq) Item(Arg1,Arg2); construct an Item into Arg1 from the tag in Arg1 and Information part in ARg2. May have to shift and mask both Arg1 and Arg2. Equivalent to !*WOR(!*Wshift(Arg1,24),!*Wand(Arg2,16#FFFFFF)) on the 68000 [This may actually use a stored preshifted version of the tag]. [[[[[Check the ORDER!!!! and use parameters rather than 24 and fffff]]]]]] !*WPlus2 (Arg1 Arg2)@\Compute Arg1 @Value(Eq) Arg1 + Arg2. Look for special cases of 1, -1, 0, etc. Note on the 68000 it checks for a small integer, i.e. -8..8 since these are done with a @dq[QUICK] instruction. [Ignore overflow?] !*WDifference (Arg1 Arg2)@\Compute Arg1 @Value(Eq) Arg1-Arg2. Look for special cases of 1, -1, 0, etc. !*WTimes2 (Arg1 Arg2)@\Compute Arg1 @Value(Eq) Arg1*Arg2. It first looks to see if Arg2 is constant and a power of 2. If so, it emits a corresponding !*Ashift(Arg1,PowerOfTwo Arg2). This check for special cases is in the pattern. !*AShift (Arg1 Arg2)@\Shift Arg1 by Arg2, using Arithmetic shift. Used to support !*WTIMES2. Should do appropriate Sign Extend. !*WShift (Arg1 Arg2)@\Shift Arg1 by Arg2, logically, doing 0 fill. !*WAnd (Arg1 Arg2)@\Arg1 @Value(Eq) Arg1 AND Arg2. BitWise AND, each bit of Arg1 is 1 only if BOTH corresponding bits of Arg1 and Arg2 are 1. !*WOr (Arg1 Arg2)@\Arg1 @Value(Eq) Arg1 OR Arg2. BitWise OR. !*WXOr (Arg1 Arg2)@\Arg1 @Value(Eq) Arg1 Xor Arg2. !*WMinus (Arg1 Arg2)@\Arg1 @Value(Eq) @Value(MinusSign) Arg2. !*WNot (Arg1 Arg2)@\Arg1 @Value(Eq) Logical NOT Arg2. !*Loc (Arg1 Arg2)@\Arg1 @Value(Eq) Address (Arg2). @end(description) The following are important optimizations, that may be initially implemented as procedures: @begin(description) !*Field (Arg1 Arg2 Arg3 Arg4)@\Arg1 @Value(Eq) Extract Field of Arg2 starting at Bit Arg3, of Length Arg4. Bits are numbered 0...Size(Word)@Value(MinusSign)1. The most significant bit is numbered 0 in our model. There is an assumption that Arg3 Arg4 are constants. !*SignedField (Arg1 Arg2 Arg3 Arg4)@\Arg1 @Value(Eq) Extract Field of Arg2 starting at Bit Arg3, or Length Arg4. Bits are numbered 0...Size(Word)@Value(MinusSign)1. The field is to be sign extended into Arg1. !*PutField (Arg1 Arg2 Arg3 Arg4)@\Deposit into Arg1 a field of Arg2 starting at Bit Arg3, or Length Arg4. Bits are numbered 0...Size(Word)@Value(MinusSign)1. @end(Description) @section(Organization of the Compiler and Assembler Source Files) The code is organized as a set of common files kept on the PC: directory, augmented by machine-specific files kept on other directories@Foot[These generally have logical names of the form PxxxC: where xxx is the root name of the directories for a given machine/OS implementation.]. The @dq[skeletal] common files and machine-specific files (mostly kept as compiled FASL files) make up the CROSS compiler and assembler. The machine-specific files customize the compiler for the specific target machine and assembler (currently we compile for @DEC20, @VAX750, @Apollo, @WICAT, and Cray-1). @subsection(Common Files) The machine-independent part of compiler is kept as PL:COMPILER.B@Foot[PL: is <PSL.LAP> or ~psl/lap.], built by PC:COMPILER.CTL. It consists of the files: @begin(description) PC:COMPILER.RED@\The basic compiler PC:COMP-DECLS.RED@\Common declarations configuring the compiler: installing the compiler specific functions, such as PA1FNs, COMPFNs, OPENFNS etc. These are described in the compiler chapter. PC:PASS-1-LAP.SL@\Basic PASS1 of @CMACRO/LAP process. PC:ANYREG-CMACRO.SL@\The @CMACRO and @anyreg pattern matcher and support functions. PC:COMMON-CMACROS.SL@\Standard or default @CMACRO@xs and @anyreg@xs used by most implementations. PC:COMMON-PREDICATES.SL@\Useful predicates to aid in writing the @CMACRO@xs. @end(Description) In addition, the following file is needed: @Begin(Description) PC:LAP-TO-ASM.RED@\Standard functions to convert LAP into machine-dependent assembly code. @end(Description) @subsection(Machine-Specific Files) For machine xxxx, the files: @begin(description) xxxx-COMP.RED@\Machine-Specific Compiler Patterns and Function installations. This file may have some special @CMACRO support in it@Foot{This is the case of extending the abstract machine for a particular implementation.}. xxxx-CMAC.SL@\Machine-Specific @CMACRO@xs and @anyreg@xs. xxxx-ASM.RED@\Definition of FORMATS, and special addressing mode conversion functions, declaration Pseudos, etc. xxxx-DATA-MACHINE.RED@\Smacros and constants to define @syslisp macros needed for the implementation. This file associates @syslisp functions with @CMACRO@xs for special cases. @end(description) Finally, during the compilation of XXXX- user files, the following two files: @begin(description) xxxx:GLOBAL-DATA.Red@\Describes GLOBAL symbols used everywhere. @end(description) @subsection(Building the CROSS Compiler) [For the moment, see the distribution guide for the Host machine]. @section(Design of LAP Format) The argument to the function LAP is a list of lists and atoms. The lists are instructions, pseudo-ops and @cmacro@xs, and the atoms are labels which are used to refer to positions in the code. Note these need not be IDs, but can also be strings, saving on ID space. Instructions should be of the form @w[(@i(opcode) . @i(operands))], where @i(opcode) is a mnemonic for an opcode, and @i(operands) is a list of operands. Each operand should be either an integer, which represents an immediate integer operand, a label, or a list of the form @w[(@i(mode) . @i(suboperands))]. A @i(mode) is an addressing mode, such as INDEXED or INDIRECT on the PDP-10, and DISPLACEMENT, DEFERRED, AUTOINCREMENT, etc@. for the VAX-11. REG must exist on all machines; others will be chosen as appropriate for the system. Remember that these are mainly used for @cmacro expansions rather than for writing code, so choose names for mnemonic value rather than brevity. @i(Suboperands) may also be operands, or they may be specific to the mode, e.g@. register names.@comment(more on @xlisp specific ones, QUOTE and FLUID) See also the READING/WRITING/TESTING of LAP operating note@cite[Griss82h]. @comment[We have a LOT to write here!] @subsection(Addressing Modes) @subsection(Register Designators) @subsection(Labels) @subsection(Storage Pseudos) @section(Implement LAP-TO-ASM) @SubSection(Needed Values) Values must be given for: @begin(description) MainEntryPointName!*@\An ID which is the main procedure name. NumericRegisterNames!*@\A vector of the symbolic names for the compiler registers. @end(description) In addition, each of the registers (as IDs) must be declared, using DefList to provide the string name of the register and flagging the property list of the ID with 'RegisterName. @subsection(Tables) The list ForeignExternList!* is used to remember each of the foreign functions that has been called in the course of a module so that the proper externs can be emitted. @SubSection(Printing routines) A number of routines which are used to print the strings, constants, etc@. are listed as follows: @begin(format) PrintString(S) PrintByte!,(X) TruncateString(S,n) PrintByteList(L) PrintByte(X) PrintHalfWordList(L) PrintHalfWord(X) PrintHalfWords(X) PrintOpcode(X) SpecialActionForMainEntryPoint() PrintNumericOperand(X) @end(format) @subsection(Symbol Mapping) The function ASMSymbolP(X) must be written to check whether a @Xlisp ID is also a legal symbol for the target assembler. @Subsection(Formats) The following formats must be declared to tell the LAP-TO-ASM routines how to print objects and the format of file names to use: CodeFileNameFormat!*, DataFileNameFormat!*, LabelFormat!*, CommentFormat!*, ExportedDeclarationFormat!*, ExternalDeclarationFormat!*, FullWordFormat!*, HalfWordFormat!*, ReserveDataBlockFormat!*, ReserveZeroBlockFormat!*, DefinedFunctionCellFormat!*, UndefinedFunctionCellInstructions!*, and the description for how to construct an item (for MkItem). @section(Independent Compilation) In order to maintain the PSL kernel as a set of reasonable sized modules (about 15) a method to permit (semi-)independent translation from LISP (or RLISP) to TLM assembly format was devised. This method records information about symbols and structures defined in one module and needed in another in a file called the SYM file. When a set of modules is to be assembled into a program, a fresh SYM file is allocated (usually called XXX-PSL.SYM or "Program-name.SYM"). Then as each module, MMM.RED is translated, the SYM file is first read in to initialize various SYMBOL counters. After the translation is complete an updated SYM file is written for the next step. When all modules are tranlated, a last (MAIN) module is translated, and some of the data information gathered in the SYM file is converted into global data declarations in the assembly file. Each module, MMM.RED (perhaps described by a MMM.BUILD file), is converted into 3 files, and updates to the SYM file: @begin(description) Code-File@\Contains the actual instructions for the procedues in the MMM file. May also contain "read-only" data, such as some strings or s-expressions. Typically called something like MMM.asm Data-file@\Contains data-objects that may get changed, typically WVAR and WARRAYs. This file typically called DMMM.asm or MMMd.asm. Init-file@\Contains S-expressions that were not compilable procedures found in the MMM.red file. Typically FLUID declarations, SETQ's and PUT's dominate this sort of code. This file will be read-in by the executing PSL after basic INITCODE is executed. Typically called MMM.INIT. @end(description) The .SYM file data structures are updated. These structures are: @begin(description) Startup-Sexpressions@\Certain s-expressions must be evaluated during INITCODE, before the .INIT files can be read. These are collected into a single procedure, and compiled as INITCODE in the MAIN module. This is the (SAVEFORCOMPILATION (QUOTE ...)) expression in the SYM file. ID list@\New IDs encountered in this file are added to a list of IDs in ID# order. IDs are referred to by ID#; list is called ORDEREDIDLIST!*. NEXTIDNUMBER!*@\The next ID# that will be allocated to the next new ID. STRINGGENSYM!*@\A string representing the last generated symbol-name. Used for internal labels, and external names that are too complex. Individual ID descriptors@\Each ID is now "installed" with a set of PUT's, indicating its ID#, the assembly symbol that is its entry point, if it is a WCONST, WVAR ,WARRAY etc. for example: @begin(Verbatim) (PUT 'INFBITLENGTH 'SCOPE 'EXTERNAL) % An exported WCONST (PUT 'INFBITLENGTH 'ASMSYMBOL 'NIL) % no symbol allocated (PUT 'INFBITLENGTH 'WCONST '18) % Its compile time value (PUT 'STACKUPPERBOUND 'SCOPE 'EXTERNAL) % An exported WVAR (PUT 'STACKUPPERBOUND 'ASMSYMBOL '"L2041") % The Assembly SYMBOL (PUT 'STACKUPPERBOUND 'WVAR 'STACKUPPERBOUND) % Type of VAR (PUT 'TWOARGDISPATCH 'ENTRYPOINT '"L1319") % An internal FUNCTION and its Assembly % SYMBOL (PUT 'RELOAD 'ENTRYPOINT 'RELOAD) % A simple entry point, not renamed (PUT 'RELOAD 'IDNUMBER '552) % Its ID number. SYMFNC(552)-> % JUMP RELOAD (PUT 'CADR 'ENTRYPOINT 'CADR) % Another simple entry point (PUT 'CADR 'IDNUMBER '229) (PUT 'LIST2STRING 'ENTRYPOINT '"L0059") % Entry point, renamed because too long % SYMFNC(147)->JUMP L0059 (PUT 'LIST2STRING 'IDNUMBER '147) (PUT 'SPECIALRDSACTION!* 'IDNUMBER '598) % A Global variable, INITIALLY NIL (FLAG '(SPECIALRDSACTION!*) 'NILINITIALVALUE) (PUT 'GLOBALLOOKUP 'ENTRYPOINT '"L3389") (PUT 'GLOBALLOOKUP 'IDNUMBER '772) (PUT 'CLEARCOMPRESSCHANNEL 'ENTRYPOINT '"L2793") (PUT 'CLEARCOMPRESSCHANNEL 'IDNUMBER '678) @end(Verbatim) @end(description) The contents of SYMFNC are filled in during the translation of the MAIN module, and JUMPs to the entrypoints of symbols that have them are filled in. Other symbols get a JUMP to the UndefinedFunction Entry point. In general, individual modules can be retranslated, since the information they generate is initially taken from the SYM file (ensuring that ID's and SYMBOLS get the same IDNUMBER and ENTRYPOINT as before). The procedure is to translate the desired model (modules) again, replacing the CODE-FILE, DATE-FILE and INIT-FILE previously produced, and also to retranslate the MAIN module, since additonal symbols S-expressions etc may have been produced, and therefor need to be converted into INIOTCODE or HEAP or SYMBOL data. @subsection(Data Pseudos) The following are pseudo operations (from the @68000 version) which must have a procedure to implement them in xxxx-ASM.RED: HalfWord, Deferred, Displacement, Indexed, Immediate, Iconst, AutoIncrement, AutoDecrement, Absolute, and ForeignEntry. @section(Configure the Compiler) This is still somewhat arcane. Basically, the compiler tables that select the COMPFN's and OPENFN's and patterns need to be installed. The most common method of doing this is to start from the xxxx-COMP.RED file most like the target machine X@Foot[It is still the case that you need a compiler wizard to help you with this as the details are still changing and often undocumented, with a lot of "You have to do this, to do that, but ..."]. [Effort is required to describe this more clearly] @Section(Write the Additional LAP Modules) A variety of small LAP routines are required for I/O, system interface, core-saving, efficient function-linkage, variable binding, etc. Some of these are described in the following System Dependent Section. Others are: @subsection(Apply-LAP) These procedures are rather important, and unfortunately tricky to write. They are used to enable compiled-code to call interpreted code and vice versa. When they are used, the registers R1...Rn have the arguments loaded in them, so SYSLISP can't be used. The routines are CodeApply(codePtr,Arglst), CodeEvalApply(CodePtr,Arglst), BindEval(Formals,Args), CompileCallingInterpreted(IdOfFunction), FastApply(), and UndefinedFunction(). These are partially described in SYSLISP, and written in LAP with mostly @CMACRO@XS@Foot[See P20:APPLY-LAP.RED and PV:APPLY-LAP.RED.]. Need to discuss tricks in more detail, devise a set of tests. @subsection(Fast-Bind) This consists of efficient routines written in LAP (using mostly @CMACRO@xs) to BIND and UNBIND fluid variables. The specifics depend on how the !*LAMBIND, !*PROGBIND and !*FREERESTR @CMACRO@xs are implemented. In general, a machine specific "fast-call" is used, rather than the more general recursive LISP call, and a list of ID numbers and values ( NIL or register numbers) are passed in a block. The FASTBIND routine uses the ID number to find the current value of the ID, and saves the ID number and this value on the binding stack. Then NIL (for PROGBIND), or the register value (for LAMBIND) is installed in SYMVAL(ID#). Note that the compiler registers R1...Rn should not be changed, so either they have to be saved, or other "hidden" registers have to be used. Since some hidden registers may be used in the implementation of certain @CMACRO@xs, care has to be exercized. FASTUNBIND is usually simpler, since all it needs is a number of @W[(ID# . Old-value)] pairs to pop off the Binding stack, and restore @Foot[See P20:FAST-BINDER.RED or PV:FAST-BINDER.RED for some ideas.]. @SECTION(System Dependent Primitives) The following set of functions are needed to complete the system-dependent part of @PSL: @subsection(System-dependent input and output) @PSL uses a one-character-at-a-time stream model for I/O. I/O channels are just small integers in a range from 0 to 32 (32 was chosen for no particular reason and could easily be increased if desired). They are used as indices to the WArrays ReadFunction, WriteFunction and CloseFunction, which contain the names (as @xlisp items) of the functions to be called. Thus a stream is an object with a set of operations, buffer(s), and static vaiables associated with it. The current implementation of streams uses parallel vectors for each of the operations that can be associated with a stream. The Channel Number is used as an index into these vectors. For example, the standard input channel is 0@Foot[This corresponds to the @UNIX STDIO channel "stdin".] thus ReadFunction[0] contains 'TerminalInputHandler, which is a function used to get a character from the terminal. The system-dependent file input and output functions are responsible for associating these channels with @ei[file pointers] or @ei[JFNs] or whatever is appropriate to your system. These functions must also perform any buffering required. We have been lucky so far because the @UNIX and Tops-20 systems have single character primitives@Foot[Thus the operating system hides the buffering.]. The reading function is responsible for echoing characters if the flag !*ECHO is T. It may not be appropriate for a read function to echo characters. For example, the "disk" reading function does echoing, while the reader used to implement the @b[Compress] function does not. The read function should return the ASCII code for a line feed (EOL) character to indicate an end of line (or "newline"). This may require that the ASCII code for carriage return be ignored when read, not returned. The VAX UNIX version of SYSTEM-IO.RED (stored on PV:@Foot[PV: is <PSL.VAX-Interp> or ~benson/psl/vax-interp.]) is the simplest, since the UNIX STDIO library is so close to this model. This is a good starting point for a new version. It also uses the file PSLIO.C, which contains the array @w[@Value(UnderScore)FILEPOINTEROFCHANNEL], used for channel allocation. The function @b(ClearIO) is called at system-startup time and when the function RESET is called. It should do all dynamic initialization of the system, but should not close any open files. Static initialization of slots in the function arrays is done in the system-dependent file IO-DATA.RED, and the array used for channel allocation should also have initialized slots for the channels used for terminal input (STDIN!* = 0), terminal output (STDOUT!* = 1) and channels 2 thru 4, used by BLDMSG, COMPRESS/EXPLODE and FLATSIZE. The variable ERROUT!* should have a terminal output channel associated with it. This may be shared with STDOUT!* as in the @Dec20, or be associated with a separate error diagnostic stream, as on the VAX. Channel allocation is handled by the system-dependent part of I/O, so when the @Xlisp function Open calls the function @b(SystemOpenFileSpecial) for a non-file-oriented I/O stream, it should just mark a free channel as being in use and return it. @b(SystemMarkAsClosedChannel) does the opposite, returning a channel to the pool of available ones. @b(SystemOpenFileForInput) and @b(SystemOpenFileForOutput) each takes a string as an argument and should return a channel and set appropriate functions in the corresponding slots in ReadFunction, WriteFunction and CloseFunction. If a file cannot be opened, a continuable error should be generated whose error form is (OPEN @dq[file name] 'TYPE), where TYPE is either INPUT or OUTPUT. Terminal output should be unbuffered if possible. If it must be buffered, it should be flushed when terminal input is done and when EOLs are written. Terminal input should be line buffered, using line editing facilities provided by the operating system if possible. The terminal input routine is responsible for the display of the variable PromptString!*, using a @PSL channel for output if desired, as the VAX version does. The @Dec20 terminal input routine uses a line editing facility that redisplays the prompt and previously typed characters when a Control-R is typed. End of file on input is indicated by returning a character which is CHAR EOF, Control-Z (ASCII 26) on the @Dec20 and Control-D (ASCII 4) on UNIX. This can be changed to any control character. The file SCAN-TABLE.RED will contain the CharConst definition for EOF, and a copy of LispScanTable!* with an 11 (delimiter) in that position. @subsection(Terminate Execution) The function QUIT(); terminates execution. It should probably close open files, perhaps restore system state to "standard" if special I/O capabilities were enabled. On some systems, execution can continue after the QUIT() at the next instruction, using a system command such as START or CONTINUE; on others, the core-image cannot be continued or restarted (see DUMPLISP(), below). On the DEC-20, the HALTF jsys is used, and execution can be continued. On the VAX under UNIX, a Stop signal (18) is sent via the "kill(0,18)" call. This also can be continued under Berkeley 4.1 UNIX. See the file SYSTEM-EXTRAS.RED on PV: and P20: @subsection(Date and Time) The function TIMC(); is supposed to return the run-time in milliseconds. This time should be from the start of this core-image, rather than JOB or SYSTEM time. It is used to time execution of functions. Return it as a full-word, untagged integer in register 1. On the DEC-20, we use the RUNTM jsys, on the VAX the C call on "times" is used, and multipled by 17, to get 1/1020'ths of a second. While not yet required, a TIMR() to get REAL, or WALL, time may be useful@Foot[See TIMC.RED on P20: and PV:.]. The DATE(); function is supposed to return a Tagged @XLISP string containing the current date. No particular format is currently assumed, and the string is used to create welcome messages, etc. Later developments may require a standard for TIMESTAMPS on files, and may also require a CLOCK-time function. The Allocator function GtSTR(nbytes) may be useful to get a fresh string to copy the string returned by a system call into. The string should be 0-terminated. The DEC-20 uses ODTIM, and "writes" to the string in "6-jun-82" format. On the VAX, the "ctime" call is used, and the result "shuffled" into the same format as the DEC-20@Foot[See SYSTEM-EXTRAS.RED on PV: and P20:]. @subsection(ReturnAddressP) The function RETURNADDRESSP(x); supports the backtrace mechanism, and is supposed to check that the instruction before the supposed address X, is in fact a legal CALL instruction. It is used to scan the stack, looking for return addresses@Foot[Very TRICKY, see SYSTEM-EXTRAS.RED on PV: and P20:]. @subsection(Interrupt Handler) Also very crude at present; on the DEC-20, written as a loadable module, P20:20-INTERRUPT.RED, using the JSYS package. This enables CTRL-G, CTRL-T, some stack and arithmetic overflows, binding them to some sort of Throw or Error routine. On the VAX, the file PV:TRAP.RED defines some signal setup, and InitializeInterrupts routine, and is included in the kernel. It associates each trap with a STDERROR call with a given message. Not yet standardized. We really should "bind" all trappable interupts to an appropriate THROW('!$SIGNAL!$,n), and indicate whether to treat as a Fatal Error, a Continuable Error, or not an Error at all. @subsection(Core Image Saving) A way in which @PSL (and most @XLISP@xs) get used involves the ability to load @XLISP and FASL code into an executing @PSL, saving this augmented "core-image" in a named file for subsequent restart later. Some Operating Systems permit a running program to be saved into an executable file, and then restarted from the beginning; others permit the saved program to be continued at the instruction following the call to the SAVE routine. Some operating systems do not normally permit or encourage the saving of a running program into an executable file, and there is a lot of work to be done. The model currently used in @PSL is that a call on DUMPLISP(); does the following (this is based on VAX and DEC-20 experience, and could change as Apollo and CRAY are completed): @begin(enumerate) calls RECLAIM(); to compact the heap, or move the upper heap into the lower heap. @Comment{How is it told that this is a cleanup reclaim that is to put the results in the "lower" heap???} makes some system calls to free unused space, decreasing the executable image; space is returned from HEAP, BPS and STACK. the core-image is saved in a file, whose name is the string in the global variable, DumpFileName!* (this string may have to be passed to the system routine, similar to I/O, using a small peice of LAP as interface, or using the Foreign function protocol); execution continues without leaving the running program; to terminate, the QUIT(); function must be called explicitly [this may not be possible on some systems, and may require a change in the model, or a machine specific restriction]. the saved executable file will restart "from-the-top", i.e. by calling the machine specific "startup" function defined in MAIN-START.RED, which calls initialization functions CLEARBINDINGS(), CLEARIO(), INITIALIZEINTERRUPTS(), etc. Then the Startup function calls MAIN();, which can be redefined by the user before calling DUMPLISP();. MAIN() typically calls StandardLISP() or RLISP(), or some other TopLoop. This startup function also has a @XLISP accesible name, RESET. @end(Enumerate) On some machines, the core-image will automatically start "from-the-top", unless effort is expended to change the "restart-vector" (e.g@. the TOPS-20 SSAVE jsys on the DEC-20); on others, an explicit LINKE CALL (a JUMP) to RESET should be included after the core-save call, to ensure execution of RESET (e.g@. the CTSS DROPFILE call on the CRAY-1). On the VAX under UNIX, a new function UNEXEC was written in C, to convert an executing program back into "a.out" format. See the files MAIN-START.RED and DUMPLISP.RED on P20: and PV:, and the preliminary documentation on the @apollo MAP_CODE.TXT, on PD:. @section(How LAP/TLM assembler works) @Section(How the LAP works) This discription of how the resident assembler (LAP) works is taken from the 68000 implementations. Refer to the diagram below to aid the understanding of this description. ALM instructions are passed into the procedure called LAP. The first thing LAP does is to pass them through the procedure PASS1LAP to transform ALM into TLM. The TLM is handed to OptimizeBranches to check to see if long branches are needed. OptimizeBranches is responsible for computing the offset of each label from the beginning of the function. A list called BranchAndLabelAlist is created which stores the labels and their offsets from the start of the code for this function. Upon the exit from OptimizeBranches the user may turn on the flag "PGWD" and will be able to see the current state of the code. If the code is to be compiled into memory and not fasled to a file then BPS space is allocated. Now the code make take one of three parallel paths. If the code is a label then it is ignored. If the instruction is an instance of !*Entry then the instruction is passed to the procedure SaveEntry to establish the address of the entry point of the code. On all other cases the instruction is passed to the procedure deposit instruction. This is often a good procedure to trace when debugging lap so that one can see what is actually heading off to be depsoited. Once the code has passed through one of the above three paths, the function defineEntries is called which loads the new code pointer into the function cell in the SYMFNC table. Following this the code pointer is tagged as code and returned as the result value of the function LAP. The following details are provideed as a guide to writing your own assembler. Consderation should be give to @begin(enumerate) Regular vs Irregular Machines Templates to Assemble Portions of Instruction Variable Length Instructions Alignment Problems Data Psuedos @xlisp Specific Pseudos @end(enumerate) @section(How do opcodes get defined for the LAP assembly process) There are three procedures used to define the opcodes. The first is DefineOpcode which defines, sets the necessary properties on the opcode's property list, for 680000 opcodes that have no ,byte,word, or long variants. The second function is DefineOpcodes (notice it is simply the plural of the first function) which defines an opcode with variants for byte,word, and long mode. And third is the function DefineCCOpcodes which sets up the properties for all the condition codes. @Section(Description of DefineOpcode) The function DefineOpcode an have three, four, or five arguments. They are defined to be: @begin(enumerate) The opcode name or id. The base 2 value of the opcode, only the constant bits in the opcodes binary value are given initially, the varible fields of an opcode are ORed into the word later. These are all two bytes long. This is tagged on a functions property list as its OpcodeValue. The function to be used to assemble this opcode, referred to on the property list by a functions InstructionDepositFunction. The forth field if present represents the mode to be used with this instruction: either byte, word, or long mode. The default is always word mode. This value is stored on the property list under the tag of Size. The fifth field is the number of bytes that the instruction will take up in the resulting binary code. Generally, only instructions that take no arguments will have this field filled in. This value is stored on the property list under the tag of InstructionLength. @end(enumerate) DefOpcode finally calls the function EvDefopcode which puts all the properties on the property list. @Section(How the Function DefOpcodes works) This function works just like the previous function DefOpcode except that it takes one less field, the size field which tells how the opcode will be used: byte, word, or long. This procedure will define an opcode for each case. For example if an opcode name is move then an id with associated property list will be created for move.b, move.w, and move.l. @Section(How the procedure DefCCOpcodes Works) This function was written just to save typing in all the cases of opcodes that use the condition codes. It does that same thing as DefOpcode above but for each condition code variant of an opcode. @section(Ok so what happens in a functions instruction depositfunction??) The opcode and oprands are selected out of the list and if the operands are not normal then they are passed throught the function effective address which classifies then as to the 68000 convention of register and mode. Purpose: convert an operand from symbolic to numeric form. Returns: Addressing mode in the range 0..7 -------------------------------------------------- M68K addressing modes (from appendix B of the M68K User's Manual) Addressing Mode Mode Reg Valid Modes* Assembler Data MEM Cont Alter Syntax Data Register Direct 000 reg no. X - - X Dn Address Register Direct 001 reg no. - - - X An Addr Reg Indirect 010 reg no. X X X X (An) with PostIncrement 011 reg no. X X - X (An)+ with PreDecrement 100 reg no. X X - X -(An) with Displacement 101 reg no. X X X X d(An) with Index 110 reg no. X X X X d(An,Ri) Absolute Short 111 000 X X X X xxxx Absolute Long 111 001 X X X X xxxxxxxx PC with Displacement 111 010 X X X - d(PC) PC with Index 111 011 X X X - d(PC,Ri) Immediate 111 100 X X - - #xxxxxxxx * = Valid Addressing modes for each type of Addressing Category Data - used to refer to data operands Mem = Memory - used to refer to memory operands Cont = Control - used to refer to memory operands without an associated size Alter = Alterable - used to refer to alterable (writeable) operands -------------------------------------------------- Operand is of the form: case 1: numeric immediate data or (immediate x) case 2: non-numeric atom a local label, which uses PC with displacement case 3: (reg x) x is a number or symbolic register name case 4: (deferred (reg x)) address register indirect in Motorola jargon case 5: (autoincrement (reg x)) address register indirect with postincrement case 6: (autodecrement (reg x)) address register indirect with predecrement case 7: (displacement (reg x) n) if (reg x) is an A reg then if n is 0 then (deferred (reg x)) else address register indirect with displacement else if (reg x) is a D reg then address register indirect with index, using A6 (zero) case 8: (indexed (reg x) (displacement (reg y) n)) address register indirect with index case 9+: various Lisp addressing modes, all of which are absolute long addresses The value returned by this function is the mode field of the instruction for the operand. In addition, the fluid variables OperandRegisterNumber!* and OperandExtension!* will be set. If there are no words to follow, OperandExtension!* will be set to NIL. Otherwise, possible values of OperandExtension!* are: number or (immediate exp) immediate data (number) 16-bit signed displacement non-numeric atom pc relative label (displacement reg disp) index extension word other absolute long, i.e. LISP addressing mode LAP is a complete assembly form and can be used by @xlisp programmers to write any legal assembly code@Foot{There is no real guarantee that the entire set of machine opcodes is supported by the LAP. An implementor may have chosen to implement only those constructs used by the compiler-produced code or explicitly used in hand written LAP. The reason for this partial implementation is that many modern processors have included operations to facilitate @ei[high level language compilation], which often seem to be less than useful.} @section(Binary FAST Loader,FASL) [Explain FASL in general] [Explain essential problem, relocation of machine addresses and LISP ids] [Give big-picture of FASL] [Find MAGUIREs pictures of FASL blocks or regenerate ] This section is a guide to the internal workings of faslout and then faslin. The user begins the faslout procedure by calling the procedure faslout with a string that does not have the extension (because it will add the appropriate binary extension for you). However, when fasling in, the file name requires the binary extension [Change this inconsistency]. Inside the procedure faslout, the file name is assigned to the fluid variable ModuleName!*. Depending upon the setting of the flag !*Quiet_Faslout, the system will either print out a greeting message or not. Next, an output binary file is opened using the argument file name. It will return the channel number to a fluid variable CodeOut!*. CodeFileHeader is called to put in a header in the output file. CodeFileHeader writes out a word consisting of the Fasl Magic Number (currently set to 99). This magic word is used to check consistency between old and current fasl format files (an error is given upon fasling in the file if there is not a 99 as the first word). Therefore, the system must consistently modify that number when a new fasl format is produced. To continue, we need to understand the allocation that takes place within the Binary Program Space (BPS). The BPS is a large, non-collected space that contains compiled code, warrays, the string assocaited with interned ID's, constant data in fasl files, etc. Space is allocated from both ends of the space. Compiled code is allocated from the bottom (using NextBPS as a pointer) and warrays are allocated from the top (using LastBPS as the pointer). When an allocation is attempted, the desired size is checked to see if it will cause LastBPS and NextBPS to cross; if it will, an error message will be printed. The next step is to allocate 2/3 or the remaining BPS from the top. @begin(verbatim,leftmargin 0) .----------------------------. | | | WArrays | | | | | Last_BPS>|----------------------------| <-FaslBlockEnd!* ---. | Code | | | | | | | | | | 2/3 |============================| <-CodeBase!* | | Bit Table | | |============================| <-BitTableBase!* ---' | | | | Next_BPS>|----------------------------| | | | | | | `----------------------------' Binary Program Space @end(verbatim) The procedure AllocateFaslSpaces will setup the following fluid variables. FaslBlockEnd!* will be the address to the top of the available space for this particular allocation. BitTableBase!* points to the beginning of the BitTable. CurrentOffset!* keeps a pointer into the codespace of this allocation to the next available point to add more code. BitTableOffset!* is a running pointer to the current location in the BitTable where the next entry will go. CodeBase!* is the base pointer to the beginning of the code segment for this allocation. MaxFaslOffset!* is the max size of the codespace allowed for this implementation. OrderedIDList!* keeps record of the ID's as they are added. NextIDNumber!* is a base number used just in fasl files to indicate which IDs are local and which are global. It is assumed that there will never be more than 2048 pre-allocated ID's, currently there are 129. The first 128 preallocated IDs are ASCII codes(0-127) and the last one is NIL(128). Everything is now setup to begin fasling PSL code out to the file. The remainder of the faslout procedure sets up three more fluid variables. !*DEFN is set to T which indicates that you are not going to do normal evaluation from the top loop and from files such as using the functions IN and DSKIN. DFPRINT!* signals that DFPRINT!* is now used as the printing function. The procedure used will be DFPRINTFasl!*. !*WritingFaslFile is set to T to let the system know that fasling out is goping on as opposed to compiling code directly into memory inside the PSL system. @subsection(Binary I/O and File Format) @u[Current FASL file format:] Check accuracy, this was PC:fasl-file.Specs @begin(description) Word@\Magic number (currently 99).@comment{ Why the magic number 99??? } Word@\Number of local IDs. Block@\Local ID names, in order, in regular @xlisp format (string size followed by block of chars).@comment{ need to specify that the string size is given as a word, and the character counts is interms of bytes} Word@\Size of code segment in words. Word@\Offset in addressing units of initialization procedure. Block@\Code segment. Word@\Size of bit table in words (redundant, could be eliminated). Block@\Bit table. @end(description) @subsection(Relocation/Bit Table) Describes how to adjust addresses and ID numbers in previous Code Segment. [Should add GENSYM generator option.] This is a block of 2 bit items, one for each \addressing unit/ in the code block.@comment{ Are we committed to two bits forever? } @begin(description) 0@\Don't relocate at this offset. 1@\Relocate the word at this offset in the code segment. 2@\Relocate the (halfword on VAX, right half on 20) at this offset. @comment[Can this be generalized some more????] 3@\Relocate the info field of the @xlisp item at this offset. @end(description) The data referred to by relocation entries in the bit table are split into tag and info fields. The tag field specifies the type of relocation to be done:@comment{ Where is this data stored??? } @begin(description) 0@\Add the code base to the info part. 1@\Replace the local ID number in the info part by its global ID number. 2@\Replace the local ID number in the info part by the location of its value cell. 3@\Replace the local ID number in the info part by the location of its function cell. @end(description) Local ID numbers begin at 2048@comment{why this magic number???}, to allow for statically allocated ID numbers (those which will be the same at compile time and load time). @subsection(Internal Functions) [IS there any special handling of these, or restrictions] @subsection(Foreign Functions, Externs, etc) [Explain why cant do in FASL now. Need to do run-time look up of LOADER symbols, and use in LAP/FASL part of things. Will need to add extra RELOC types to FASL]. @subsection(Init Code) [Explain how executable -sexpressions that are not procedure definitions are gathered into a single LISP procedure, compiled, and given name, sort of !*!*FASL-INIRTCODE!*!*, or some such. Is called as last action of LOAD. Explain current restriction on FASL initcode size, suggest soluitions] @subsection(Annotated FASL file example) @begin(verbatim) *Annotated version of a dump* procedure adder(x); begin scalar y; y:=x; return y+1; end; Dump of "trythis.b" 000000: 0020 0001 E7DF FEDF 0000 0080 0000 00A0 000010: 1800 0000 0000 0000 0000 0000 0000 0000 000020: 0000 0080 0000 0063 16#63 is the magic number which indicates that is a FASL file 0000 0003 Number of local IDs 0000 0004 The first ID, in the form Length of String, String name 000030: 4144 4445 ADDER 5200 0000 0000 0003 Second ID, 3 (+1) characters "ADD1" 4144 4431 ADD1 000040: 0000 0000 0000 0007 Third ID, 7 (+1) characters of "PUTENTRY" 5055 5445 PUTENTRY 4E54 5259 000050: 0000 0000 0000 0003 Fourth ID, 3 (+1) characters "EXPR" 4558 5052 EXPR 0000 0000 000060: 0000 000A CodeSize = 10 words 0000 000A Offset of INIT function -------------------- Code Block 2649 MOVEA.L A1,A3 2449 MOVEA.L A1,A2 4EF9 C000 JMP C000 0801 ^ Relocate Function cell (ID.1 call on "ADD1") 000070: 0801 ---------- The init code 267C 0000 0000 MOVEA.L #0,A3 247A 0010 MOVEA.L 10(pc),A2 227A 0008 MOVEA.L 8(pc),A1 000080: 4EF9 C000 0802 JMP C000 0802 ^ Relocate Function cell (ID.2 = "PUTENTRY") FE40 0800 (ID.0 the procedure ^ Relocate ID number name "ADDER") FE40 0803 (ID.3 the procedure ^ Relocate ID number type "EXPR") 0000 -------------------- Bit Table Section 000090: 0000 0003 Length of Bit table in words -------------------- Bit Table 0004 0000 : 0000 0000 0000 0100 0000 0000 0000 0000 ^ = Relocate Word 0000 040C : 0000 0000 0000 0000 0000 0100 0000 1100 Relocate Word ^ ^ Relocate Inf------------' 0C00 0000 : 0000 1100 0000 0000 0000 0000 0000 0000 ^ Relocate Inf @end(verbatim) [Explain how to use a BDUMP routine to examine this] @subsection(Binary I/O) The following functions are needed for FASLIN and FASLOUT: @i(BinaryOpenRead(Filename:string):system-channel) This should take a filename and open it so that binary input can be done. The value returned is used only by the other functions in this group, and so can be whatever is appropriate on your system. @i(BinaryOpenWrite(Filename:string):system-channel) Similar to BinaryOpenRead, open a file for binary output. @i(BinaryClose(SChn:system-channel):none returned) SChn is the value returned by BinaryOpenRead or BinaryOpenWrite. The file is closed. @i(BinaryRead(SChn:system-channel):word) One word (i.e. Lisp item sized quantity) is read from the binary file. On the Dec-20 this is done using the @i(BIN) jsys with the file opened in 36-bit mode using a 36-bit byte pointer. The VAX Unix implementation uses @i(getw) from the stdio library. @i(BinaryReadBlock(SChn:system-channel, A:word-address, S:integer):none returned) S words are read from the binary file and deposited starting at the word address A. The Dec-20 version uses the @i(SIN) jsys and VAX Unix uses the @i(fread) function. @i(BinaryWrite(SChn:system-channel, W:word):none returned) One word is written to the binary file. On the Dec-20 this is done using the @i(BOUT) jsys with the file opened in 36-bit mode using a 36-bit byte pointer. The VAX Unix implementation uses @i(putw) from the stdio library. @i(BinaryWriteBlock(SChn:system-channel, A:word-address, S:integer):none returned) S words starting at the word address A are written to the binary file. The Dec-20 version uses the @i(SOUT) jsys and VAX Unix uses the @i(fwrite) function. @i(BitTable(A:word-address, B:bit-table-offset):integer) This is similar to @i(Byte) and @i(HalfWord), except that a 2-bit unit is being extracted. A is a word address, the base of a table of 2-bit entries. The one B entries from the beginning is returned. @i(PutBitTable(A:word-address, B:bit-table-offset, I:integer):) Analagous to @i(PutByte) and @i(PutHalfWord), except that a 2-bit unit is being deposited. A is a word address, the base of a table of 2-bit entries. The low-order 2 bits of the integer I are stored at offset B. [Explain how to test Binary I/O, in test N] @subsection(Miscellaneous) To use EMODE/NMODE and PRLISP on some systems, a "raw" I/O mode may be required. See the PBIN, PBOUT, CHARSININPUTBUFFER, ECHOON and ECHOOFF functions in EMOD2:RAWIO.RED and SYSTEM-EXTRAS.RED. Some sort of system-call, fork or similar primitives are useful, clearly system dependent. See the JSYS and EXEC package on P20:, the SYSTEM call in PV:SYSTEM-EXTRAS.RED (written in C as a Foreign Function), or the SYSCALL on the APOLLO. This set is not yet standardized.