Welcome to the Datatron 205 and 220 Blog

This blog is a companion to T J Sawyer's Web Page that outlines the history of the Burroughs Datatron 205 and Paul Kimpel's incredible 205 and 220 emulators. Please visit those sites and return here to post any comments.

Saturday, August 8, 2015

Knuth's EASY Assembler

It is difficult to appreciate today, but there was a time when compilers were unheard of and assemblers were held in suspicion. In the late-1940s and early/mid-1950s, Real Programmers wrote raw, seething machine code. At first they did this because assemblers did not yet exist, but later because, well, they were Real Programmers, and things like mnemonic operator codes, symbolic addresses, resolution of forward references, and ease of modification were for sissies.

This is Paul Kimpel again, with an update on the retro-205 emulator and progress towards reconstruction of Donald Knuth's Algol-58 compiler for the ElectroData/Burroughs Datatron 205. This post focuses on the assemblers for the 205 he wrote to develop that compiler.


Origin of the EASY Assembler


In 1960, Knuth contracted with Burroughs Corporation in Pasadena, California, to write an Algol compiler for the 205. The specification for Algol 60 had not yet been published, so this compiler was to be for the preliminary version of the language, known as Algol-58.

The work was to be done that summer, between the time of Knuth's graduation from Case Institute of Technology (now Case Western Reserve University) in Cleveland, Ohio, and his start of graduate studies at the California Institute of Technology in Pasadena. Knuth discusses the project briefly in his Oral History for the Computer History Museum. The transcript from the 1985 Burroughs B5000 Conference, as well as Richard Waychoff's Stories of the B5000... memoir, also mention the project.

At that point in time, compilers had to be written in assembly language, although the B5000 project was to turn that notion on its head just a year or so later. Burroughs supported an assembler for the 205, named "STAR 0". There was also the "Shell" assembler developed by Shell Research. For reasons that are not clear, Knuth decided to use neither of these, and instead to write his own assembler for the 205, calling it EASY, the "Elegant Assembly System for the 205."

In notes for a later variant of this assembler, MEASY (described later in this post), Knuth indicated that the "elegant" in the title should be taken in its mathematical connotation of "short and sweet." Indeed, EASY is short -- the entire program is only 569 lines, of which more than half represent the constant data to initialize the symbol table and Cardatron format bands. The executing code, including the run-time loader the assembler emits, is little more than 250 instructions.

Reconstructing the Algol-58 compiler is a major goal of this project, but in order to do that, we first needed to reconstruct either EASY or MEASY. Eventually I did both of them. I am presently using MEASY for the compiler, but MEASY requires a tape drive. Magnetic tape has only recently been implemented in the retro-205 emulator, and as I write this, has not yet been officially released. When I started working on the assemblers back in March of this year, all I had working were card and paper tape devices. EASY is a card-based assembler, so I started with it, and later used EASY to bootstrap MEASY.


Using EASY


EASY is easy to use. You place its load deck in the reader followed by the source deck to be assembled, press the INPUT SETUP button on the Cardatron Control Unit to initialize the processor to load from cards, press START on the reader, and then either START on the Supervisory Console or CONT on the Control Console. Assembly terminates when EASY senses an END pseudo-instruction in the card deck.

EASY reads its source from cards and punches its output to cards. It does not generate a printed listing. The 205's Cardatron subsystem controls the card equipment, which was typically an IBM 089 collator for input and IBM 523 punch for output. As described in an earlier post, the Cardatron requires a numeric punch in a selected column of each card to specify the format band under which the card will be interpreted. The format band determines how columns from the card will be translated into decimal words before being sent to the 205 memory drum. Thus, EASY requires a "1" punch in the first column of each of its source cards to select input format band 1. That format band is part of the assembler object code, and is loaded to the Cardatron by EASY during its initialization.

Interestingly, each input card generates one output card. The left side of an output card holds the original source text; the right side holds the generated object code. EASY copies the "1" punch into the first column of the output card, but also places a "6" punch in the second column. Format band 6 is hard-wired in the Cardatron. It is used to load all-numeric data, and in particular, object code. Therefore, by changing the setting in the Cardatron Input Unit to select format from column 1, you read the output deck back in as source for another assembly. By changing the format selection to column 2, you load the object code from the deck to run the program. Sweet.

The numeric word in columns 70-80 of an output card is the assembled word to be loaded into memory. The numeric word in columns 59-69 is a card read instruction with a sign digit of 6. The Cardatron reads data backwards from the end of the card, thus under format band 6, the word in columns 70-80 is read first and stored at the effective memory address for the current card-read instruction. Since the second word read from columns 59-60 has a sign of 6, it is interpreted as an instruction to be executed rather than data to be stored. That sign value terminates the data transfer from the current card (ignoring the remainder of the card) and executes the new card-read instruction to read the next card into the effective address of that instruction.

Thus, when reading the output deck under format band 6, each card bootstraps the next one and specifies the address at which the object word on the next card will be loaded. Programs generally begin with a START pseudo-operator to specify the initial assembly address. Since a START operator does not generate a word of object code, the output card for it has the bootstrapping card-read instruction for the next card in columns 70-80, serving to set the address for the word to be read from the next card. Hence, it does not matter what address is used for the initial card-read instruction to load the deck -- the executable instructions embedded in the deck will relocate to the correct address automatically.

For a more detailed example of how loading code with format band 6 works, see this description of a short card-load program that clears the 205 memory drum.


Programming with EASY


Traditionally, assembly languages specify one computer instruction per line. Most assemblers require at least three fields on each line:
  • A field for a symbolic label. This symbol becomes defined as the memory address at which that line is assembled.
  • A field for an operation code. This holds a mnemonic for the type of instruction or data on that line. Many assemblers also have pseudo-instructions to control operation of the assembler. Lines with pseudo-instructions generally do not generate object code.
  • A field for the operand address or addresses used by the instruction. Usually this references the symbolic label for some other line in the program. A major task of an assembler is to assign addresses to these symbolic labels as they are defined in the first column of lines in the program, and to apply those addresses to the operand fields of other lines where the labels are referenced. Most assemblers also support some sort of "address arithmetic" for the operand addresses, e.g., specifying an address so many locations before or after that for a symbolic label.
  • Any remaining text on the line after the operand(s) is generally ignored, and considered to be a comment.
EASY follows this general idea, but uses more columns to simplify the parsing and processing of instructions. In particular, the specification of operand addresses is spread across four fields, as discussed below. With the exception of the START and END pseudo-instructions, each line of input generates one word of assembled object code. There is no provision for "comment lines" as with most assemblers.

Input cards to EASY must have the following format:
1       10        20        30        40        50        60
.........|.........|.........|.........|.........|.........|
1   P LLLLL SCCCCIIAAAA OOOOO YYYYY PP T XXXXXXXXXXXXXXX
Column 1: must contain a "1" to satisfy the Cardatron's requirement for format band selection.

Column 5: Program-point label: Program points can be thought of as local, reusable labels. Programs typically have numerous references to nearby locations, especially in branch instructions to implement control flow. These locations are often referenced only once or twice, so rather than require a globally-unique symbol for each of these briefly-used references, program points provide short labels the programmer can reuse multiple times throughout a program, but in different local scopes. EASY provides nine program-point symbols, 1-9. Placing one of these in column 5 labels that line with that program point. Other lines can then refer to the nearest program point before or after their location, but only the nearest one -- see the discussion for columns 37-38 below. Once a program-point label is reused, earlier definitions of it fall out of scope.

A number of assembly languages for Burroughs machines supported program-point labels. I first encountered them in the assembler for the B3500, and always wondered where they originated. In researching the Shell assembler, I found this footnote in its reference manual:1
The idea of the program point address is apparently that of Melvin Conway of the Case Institute of Technology. It was incorporated into SOAP III, an assembler for the IBM 650, by Donald E. Knuth, also of Case Institute. See Case Institute Report 1005.
This explains why program points were in EASY, but also note that the Shell assembler predated EASY by at least two years.


Columns 7-11: Symbolic location label: the traditional label field for an assembly line. Labels defined in this field must be unique within the program. Labels are limited to five characters, since the 205 stored five alphanumeric characters per word.

Columns 13-23: Numeric word: this field can be used for several purposes, including specifying literal values, relative addresses, and additional digits in instruction words. The exact effect depends on what is specified in other fields on the line. To understand this, the 11-digit 205 instruction word can be thought of as containing four sub-fields: SCCCCIIAAAA, where S is the sign digit, CCCC are the so-called "control digits" used to specify breakpoints and by some I/O instructions for unit selection and control, II is the two-digit operation code, and AAAA is the operand address. Any values in these sub-fields on the card become the default values in corresponding fields of the assembled word, with blanks being interpreted as zeroes. The operation code (II) and operand address (AAAA) fields can be modified by other fields on the line, as discussed in the following.

Columns 25-29: Symbolic operation code: If this field is non-blank, the value of the symbol in that field, modulo 100, replaces the II digits (columns 18-19) in the assembled instruction word. The symbol table in EASY is initialized with a set of operation mnemonics and their corresponding numeric codes. Knuth chose a set of mnemonics somewhat different from those used by other 205 assemblers, apparently to be consistent with the assembler mnemonics for the Burroughs 220, a later machine with a very similar architecture:

EASY & MEASY Assembler Operation Codes

In addition, there are two pseudo-operators, START, the operand address for which specifies the next location at which instructions will be assembled, and END, which indicates the end of the program. The operand address for END specifies the location at which the loader routine will be placed during program load.

Columns 31-35: Symbolic operand address: If this field is non-blank, it modifies the operand address digits (AAAA, columns 20-23, which are most often blank or zero), in the following ways:
  • If the symbolic label has been previously defined (i.e., its address is already known), that address is added to the value for AAAA, modulo 10,000. This is a way to address a location relative to that of a symbolic label. To address relatively in a negative direction, the value in AAAA must be the 10s-complement of the desired relative offset (e.g., 9999 equates to -1).
  • If the symbolic label has not been previously defined (i.e., it's a forward reference), the value in AAAA is replaced by a link to any previous instructions referencing the same forward label. The reason for this forward-reference linkage is that EASY is a one-pass assembler. It does not resolve forward references at assembly time. Instead, such references are linked backwards through the address fields of all those instructions referencing a given forward address, with a zero address terminating the chain. At the end of the assembly, EASY emits a short loader routine and a table of head addresses for the forward-reference chains. Using this table, the loader works backward through the forward-reference chains and resolves the forward references each time the program is loaded.
Columns 37-38: Program-point address: This field is ignored if the symbolic operand address in columns 31-35 is non-blank. Otherwise, the first column in this field is interpreted as a numeric digit, and the second column indicates how that digit will modify the value of AAAA (which again is most often zero), thus:
  • n+: The current location address plus n words is added to the value in AAAA, modulo 10,000.
  • n-: The current location address minus n words is added to the value in AAAA, modulo 10,000.
  • nB: The location address of the immediately-prior program-point label n is added to the value in AAAA, modulo 10,000.
  • nF: The location address of the immediately-next program-point label n will eventually replace AAAA when it is resolved by the loader, as described above for forward symbolic operand addresses.
Column 40: High-speed loop tag: This column, if non-blank, must be in the range 4-7. When present, this digit eventually replaces the high-order digit in the address that is finally resolved into the AAAA field, causing that address to reference a location in one of the four, 20-word, high-speed loops on the 205 memory drum. A common 205 coding practice was to assemble instructions into locations in main memory and then transfer them using one of the "blocking" instructions to a high-speed loop for execution, where they would run almost ten times faster. Use of the loop tag allows instructions to be coded as if they were going to run from main memory, but the tag modifies the operand address to be relative to the loop where the code will actually run.

Columns 42-56: Remarks: These columns are ignored by the assembler, and can be used for comments.

Columns 59-80: Loadable code: These columns are ignored by the assembler on input cards. On output cards, they hold the assembled object word and card-read instruction for loading the assembled program into memory, as described earlier.

 Any data on the card in columns other than the ones cited above is dropped by the Cardatron and never even reaches the 205 memory.

The following snippet of code from EASY (starting at line 183 in the source) illustrates most of the features described above:
1                  4000 START
1     TABLE             STC   HOLD       SYMBOL TABLE
1                       CAD         4F   SEARCH ROUTINE.
1                     4 SLT
1                       STA   EXIT       STORE EXIT.
1                       CAD   HOLD       GENERATE
1                       MUL   10101      STARTING
1                       STC   TEMP       PLACE, AND
1                     3 SLT              SEARCH FOR
1                       STA   TEMP       EMPTY OR
1                       LDB   TEMP       UNUSED PLACE
1   1       1      1900 CAD
1                       CNZ         2F
1                       CAD   HOLD       NOT IN TABLE
1     EXIT
1     10101  1  1  1001
1   3                   CAD   EXIT       IF SYMBOL IN
1                       ADD   TWO        TABLE, ADD 2
1                       STA   TEMP       TO EXIT LINE
1           1      2900 CAD              AND DISPLAY
1     TEMP                               EQUIVALENT


Reconstructing EASY


Fortunately, a listing of the EASY assembler survives in the papers Knuth donated to the Computer History Museum in 2005. Brief documentation for the program, accompanying a shorter listing (lacking the symbol table constants and Cardatron format bands), is also available at that site.

Since the source code existed only in the form of scanned listings, I had to transcribe it manually. Then, because EASY was written in itself, the next problem was to find a way to bootstrap the assembler source into 205 object code so it could be executed.

Both listings cited above include the assembled object code to the right of the source text. Even better, the shorter listing accompanying the documentation showed the forward references resolved to their final addresses. Having the forward references already resolved allowed me to write an off-line script to extract the object code directly from the transcribed text without having to resolve the forward references myself.

After transcribing the assembler source and carefully proofreading it, I extracted the object code from the right-most columns of the card images and reformatted those words into a standard seven-words-per-card deck that can be loaded using Cardatron format band 6. This deck has the same format as the short card-load program cited earlier. Using this load deck, I then assembled EASY with itself and compared the assembly output against my transcription. After a couple of cycles of correction and reassembly, I had an accurate transcription of EASY that would generate itself.

There were several anomalies in the reconstruction process that bear mentioning, though.
  1. The unconditional branch instruction on line 129 (address 1733) is shown on the listing as assembling to 0000 00 0000. It should be 0000 20 4756. This was probably an error introduced during whatever process originally resolved the forward-reference addresses for the listing.
  2. The START pseudo-instruction on line 304 is shown on the listing as "20 START", which would imply the next word should be assembled at location 0020. The effective address is actually 1478, however. I eventually learned that suppression of leading zeroes in numeric fields is something that was done by the IBM card equipment, not the 205 or Cardatron, and those machines suppressed any character that did not have a numeric 1-9 punch in its card code. There is actually a "0+" (or equivalently, "0&") for the program-point address in columns 37-38 on this line. Since those characters do not have a numeric punch in their card encoding,  they were both zero-suppressed on the listing and printed as blanks. The presence of "0+" in the program-point address field changes the effective address of the START instruction from 0020 to the current address plus 20, or 1458+20=1478.
  3. At line 425 (address 2712), the symbol table word for the BF7 operator (code 37, Block From Loop 7000) was shown on the listing as 0 4246 86 0007. The character codes in that word translate to BF6 instead, so I corrected the word to 0 4246 87 0007. This is very likely a day-one bug in the program, but was probably never encountered as BF7 is rarely used.
  4. At lines 558-561 (addresses 1203-1206) in the initialization code for the program, the listing shows four Block To Loop instructions:
        1220 BT4
        1260 BT5
        1240 BT6
        1280 BT7

    These each transfer 20-word blocks from the indicated main memory addresses to the 4000, 5000, 6000, and 7000 high-speed loops, respectively. Of course, when these are executed, they destroy whatever was previously in those loops. Alas, EASY had assembled very critical code and data into each of those loops (see lines 183-250 in the listing), so the initial attempts to run the assembler bombed. These BTn instructions made absolutely no sense, as nothing had been assembled into the 1220-1299 address range. I resolved the problem simply by changing the BTn instructions to BFn, which transferred words from the loops to unused memory addresses instead of the reverse. Later, I learned that MEASY uses all four high-speed loops for its loader, and any code assembled into loop addresses will not be loaded by that routine. My best guess at this point is that the version of EASY in the listing we have had been modified for assembly by MEASY, and that there is a missing piece that took care of loading the loops.
  5. I modified the load deck to replace the HLT 1221 instruction that terminates the loader with an unconditional branch to the program's entry point (address 1200). This allows EASY to begin execution automatically after it is loaded.
  6. I also modified the load deck to impose "reload lockout" on the reading of the last card of the load deck. This allows the source deck to be assembled to follow immediately after the load deck, by preventing the Cardatron from reading the first card of the source deck before EASY has a chance to load the format bands to the Cardatron.
  7. The listing I transcribed did not have the mnemonics or operation codes for the four magnetic tape instructions. I added these to both the transcribed source and to the load deck.
  8. As will become clear from the discussion of the symbol table below, EASY requires that memory be cleared prior to an assembly. It does not do this itself, so the short memory-clear program cited above has been prepended to the load deck mentioned in the next paragraph.

The transcribed source file, load deck, a sample output deck, and an annotated listing are all available from the software repository folder on our GitHub project site.


How EASY Works


The 205 was strictly a word-oriented machine. It had no native character manipulation facilities, and it being rather slow to begin with, free-form parsing of text was quite expensive. What the 205 did have, though, was the Cardatron, which was quite good at extracting columns from cards and transforming them into numeric words for the 205's memory. EASY took heavy advantage of this capability. The numerous fields of its input card format were designed to allow the Cardatron to do essentially all of the parsing of text on a card, converting each of the fields described in the "Programming with EASY" section above to a full word in memory as part of a single card-read instruction.

EASY reads each input card into locations 6005-6018 under control of Cardatron format band 1. You can examine the codes for that format band at memory locations 1400-1428 in the source.

The 6000-series addresses reference one of the 20-word high-speed loops on the memory drum. All loop addresses are congruent modulo 20, which means that 6005 6015, 6135, 6675, 6815, etc., all address the same word on the drum [corrected 2015-12-08]. The layout of words in this loop is designed to facilitate both analysis of the columns on the card and formatting of the output card.

In the following, keep in mind that for both input and output, the Cardatron processes card columns in descending order, but reads and writes to memory in ascending order, so the fields in memory are in reverse order of those on the card:
  • Word 6000: the assembled word, but with its sign digit set to zero. This word will be punched into columns 71-80 on the output card.
  • Word 6001: the sign (in the low-order digit of the word) to be applied to the assembled word This will be punched into column 70 on the output card.
  • Word 6002: current value of the location counter (address of the instruction being assembled). This will be punched into columns 66-69 of the output card as the address field for the following card-read instruction.
  • Word 6003: skeleton card-read instruction (0 0810 44 xxxx) that will be punched into columns 60-69 on the output card. At load time, this will cause the next card in the output deck to be read; the address (xxxx, from word 6002 above) will specify the location into which the assembled word on that card will be stored.
  • Word 6004: a literal 6 applied by the Cardatron output band to the card-read instruction above and punched into column 59. This word is also used to supply a literal 6 elsewhere in the assembler (e.g., at 1732).
  • Word 6005: third word of comment field (columns 52-56).
  • Word 6006: second word of comment field (columns 47-51).
  • Word 6007: first word of comment field ( columns 42-46).
  • Word 6008: high-speed loop tag from column 40 of the card.
  • Word 6009: second program-point address reference character from column 38 on the card (+, -, F, B).
  • Word 6010: first program-point address reference character from column 37 (0-9).
  • Word 6011: symbolic address field from columns 31-15 on the card.
  • Word 6012: symbolic op code field from columns 25-29 on the card.
  • Word 6013: numeric address field from columns 20-23 on the card.
  • Word 6014: numeric op code field from columns 18-19 on the card.
  • Word 6015: numeric control digits from columns 14-17 on the card.
  • Word 6016: numeric sign from column 13 on the card.
  • Word 6017: symbolic label from columns 7-11 on the card.
  • Word 6018: program-point label from column 5 on the card.
  • Word 6019: literal 1600000999. The low-order four digits are used by the TABLE routine during symbol-table search wraparound to reload the B register. The high-order 16 gets punched into columns 1-2 in the output card to be used as Cardatron input format band select digits when the assembled deck is read back in.
Some fields are read numerically by the Cardatron, viz columns 5, 13, 14-17, 18-19, 20-23, 37, and 40. The rest of the fields are read alphanumerically. Blank columns read numerically are interpreted as zeroes. The internal code for a space character is decimal 00, so it is simple for EASY to determine if one of the symbolic fields is blank by checking its corresponding memory word for zero.

The numeric word from columns 13-23 becomes the default value for the assembled word. Literals can be specified simply by coding their value in these columns and leaving the remaining fields on the card blank. Coding a 1 in column 13 sets the sign digit in the assembled word, which for instructions indicates the operand address should be modified by the value of the B register prior to execution of the instruction.

EASY uses the program-point and symbol label fields on the left side of the card to create symbol table entries and detect when a forward reference to a symbol has been satisfied. If the symbolic operation code, symbolic operand address, program-point address reference, or high-speed loop tag are specified, their respective values are applied to the corresponding sub-fields in the assembled word.

The symbol table occupies memory locations 1900-3899, providing for a total of 1000 symbols. The table is in two parts: the first 1000 words hold the alphanumeric symbols, one symbol per word, left-justified over zeroes (i.e., blanks). The corresponding word 1000 locations higher in memory holds the value of the symbol, typically an address or instruction operation code. The table is initialized with the mnemonic operator codes and their numeric equivalents. The mnemonics have 7 added to their symbol value, probably to distinguish them from user-specified symbols having the same name. In effect, this places the operator mnemonics in a separate name space from the other symbols.

Symbols in the table are located by the TABLE routine, which resides in the 4000 and 5000 high-speed loops, using a hashing technique. The assembly routines that handle symbolic labels, mnemonic operation codes, and symbolic operand addresses call TABLE passing the target symbol in the A register. EASY multiplies the target symbol by 1001001001. This produces a 20-digit result in the A and R registers. EASY then extracts the high-order three digits of the R register as the modulus of the hash, producing an initial probe offset into the symbol words in addresses 1900-2899 of the table.

The hash modulus is loaded into the B register, which acts as an index register. The word at location 1900+B is loaded into the A register. If it is zero, that position in the symbol table is presently unused. This also means the target symbol is not in the table, and is therefore undefined. TABLE exits back to the caller one word after the address where it was called, with the target symbol restored to A and the offset to the unused symbol table entry in B.

If the symbol word in the table is not zero, TABLE subtracts the target symbol from it. If the result of that subtraction is zero, the symbol has been found, so TABLE loads the corresponding value of the symbol from 2900+B into A and returns to the caller three words after the address where it was called.

If the target symbol does not match the current symbol table entry, a hash collision has occurred.  TABLE decrements B to test against the prior symbol in the table. The process in the above two paragraphs then repeats until either the symbol is found or an unused symbol table entry is encountered. If B decrements below zero, its value is reset to 999 to continue searching from the top of the table. Alas, if the symbol table fills up, TABLE will loop endlessly if it searches for an undefined symbol. There is no recovery from this situation.

Since words of all zeroes signify an unused entry in the symbol table, EASY will not work properly if at least addresses 1900-3959 in memory are not cleared to zero initially. Ensuring this is something that must be done prior to loading the program.

Program-point labels are stored in memory locations after the symbol table. Backward program points are stored beginning at 3900, indexed by the program-point number. Forward program points are stored beginning at 3950. Forward references to program points are handled in the same manner as for symbolic labels. Only the value word for a program point is stored, as the symbols are just the digits 1-9 and implicitly index the table entries.

Once all the fields of a card have been processed, the values of the words in the 6000 loop define the content of the output card to be punched. Cardatron output format 1 (at locations 1436-1457) defines the mapping of these words to columns on the output card, which is punched by the CWR instruction at location 7018.

A card containing the END pseudo-operator signals the end of input to the assembly. The operand address on this card specifies the memory location where the loader program will be placed when the output cards are loaded for execution.

The loader occupies 33 words plus one word for each forward-reference chain generated during the assembly. The loader is coded in locations 1761-1790 within EASY. The routine at 1740-1760 copies this code, at six words per card, to the output deck. Following the cards for the loader, this routine emits a table of chain descriptor words, again six words per card. As with the cards generated by the assembly process, a seventh word on each card (in columns 4-14) is a card-read instruction that bootstraps the next card in the deck. On the final card, that card-read instruction is replaced by a branch to the start of the loader.

Thus, when an output deck is loaded under format band 6, the assembled words go to their assigned locations, the loader and chain descriptor words go to locations starting with the one specified on the END card, and the last card of the deck automatically branches to the loader routine. After fixing the forward-reference chains, the loader halts with 1221 in the operand address register, and the program is ready to run. At this point the 205 operator would normally use one of the control panels to enter a branch instruction to the program's entry point.

Each chain descriptor word processed by the loader has its sign digit set to 1. The low-order four digits of this word, if non-zero, are the address of the last word in a forward-reference chain. The next four higher-order digits are the forward address to be applied to each of the words in the chain. As mentioned previously, each word in the chain has the address of the prior word in the chain, except for the last word, which has an address field of zero.

The foregoing is a highly condensed description of how EASY operates, with many significant details missing. I was interested in how this assembler worked, so studied it in detail and prepared an annotated listing of the program. If you are interested in a deeper understanding of EASY, you may find that to be a good place to start.


The MEASY Assembler


At some point during the summer of 1960, Knuth rewrote EASY to take advantage of magnetic tape, calling the new assembler MEASY ("Modified EASY Assembly System for the Burroughs 205"). It is not difficult to understand why he did so. The Algol-58 compiler eventually grew to more than 3600 cards, so each assembly with EASY would punch another 3600 cards, plus whatever was required for the loader. For those who remember punched cards, 3600 is almost two boxes worth.

Running under the retro-205 emulator within Google Chrome, EASY assembles the Algol-58 compiler in 41 minutes and punches 3737 cards. To load the compiler so it can be run, all of those cards need to be read back in again, which including the time to resolve forward addresses, takes another 17-18 minutes. Assembling to magnetic tape is faster and eliminates the waste in card stock on each assembly. Loading the assembled compiler from tape takes only a little over four minutes. When you may only get one time slot on the computer per day, this level of time savings could have a significant effect on productivity.

MEASY is not just a clone of EASY that has been patched to output to tape. It uses the same input card format and symbol table structure, and internally many of the coding techniques are similar, but it appears to be a complete rewrite. Like EASY, it is a one-pass assembler and relies on a loader to resolve forward-address references, but the output format is nothing like EASY's, and the loader works entirely differently than the one for cards. MEASY does not punch cards. Instead, it produces a printed listing on the Cardatron's IBM 407 tabulator.

Magnetic tape on the 205 is formatted in fixed, 20-word blocks. The output tape produced by MEASY has the loader routine in the first three blocks, followed by blocks of assembled code. Within a block of assembled code, each assembled word is represented by a pair of words:
  • The second word is the assembled word to be stored in memory.
  • The first word has two four-digit fields. The low-order four digits will be non-zero if assembly of this word resolved a forward-address reference. The value of this field is the address of the last word in the forward-reference chain, with subsequent words in the chain linked through their address fields, as with EASY. The next higher-order four digits are the address at which the second word should be stored. This is also the address that must be applied to any words in the forward-reference chain. 
After the last block of assembled code, there is one additional block with its first word having a value of 9999999999. This signals the end of the program to the loader.
Running MEASY is a little more complicated than EASY. The assembler can be loaded either from cards (if assembled with EASY) or from tape (if assembled with itself). MEASY requires a scratch tape on magnetic tape unit 4 to receive the assembled output.

Loading from cards works the same as for EASY. As with EASY, I modified the load deck to replace the HLT 1221 at the end of the loader with a CUB (30, Change Unconditional and Block) instruction to the assembler's entry point, 1048. This instruction transfers 20 words from main memory to the 7000 loop, starting with the operand address. It then branches to the mod-20 congruent address in the loop (7008 in this case).

Loading from tape requires three steps. First, the three-block loader must be read from tape into memory at address 1000. This can be done with a magnetic tape read instruction of the form 0 0340 40 1000. Second, the loader must be initiated with a CUB to its entry point address at 1045 (0 0000 30 1045). Once the loader completes, it will HLT 7555. The final step is to initiate the assembler with a CUB 1048. This sequence of instructions could be entered manually from the Control Console keyboard, punched onto a paper tape, or punched onto a single card to be read with format band 6.

Reconstructing MEASY was done in a fashion similar to EASY. I transcribed the entire contents of the scanned listing, both source text and assembled object code. From that transcription, I created a card deck of the source text and assembled that with EASY, comparing the assembled output back to my transcription, applying corrections, and reassembling until the transcription and output matched.

Next, I extracted the MEASY object code from the EASY assembly output using an off-line script to produce a loadable card deck. Once magnetic tape devices were implemented in the emulator, I used that load deck to assemble MEASY with itself, this time checking the assembly output listing against the original scan. Once everything agreed, I had an accurate transcription, and executable versions of MEASY that could be loaded either from cards or tape.

As with EASY, the transcribed MEASY source file, assembly output listing, load deck, output deck from assembly with EASY, and output tape image from assembly with itself are all available from the software repository folder on our GitHub project site.

There are two anomalies in these materials. First, I transcribed MEASY using the same format as I had for EASY. The object code is in a different position on the MEASY listing than it is on the EASY listing. Second, the assembly output listing has been post-processed by an off-line script to apply leading-zero suppression to the numeric fields. Zero suppression was done by the IBM 407, and an equivalent facility has not yet been implemented in the emulator. The raw assembly output is also available in the project repository.


Why This Assembler?


If Burroughs had its STAR 0 assembler, and Shell Research had made its much more sophisticated assembler for the 205 available, why would Knuth have gone to the trouble to write an entirely different one on his own? I don't know the answer, but I can speculate a little.

First, it is likely that Knuth had never even seen a 205 prior to arriving in Pasadena early in the summer of 1960, let alone had any direct experience with one. He says in his oral history cited above that he had "learned about the 205 machine language" while still at Case. It's possible he was not aware of the other two assemblers, or had insufficient information on them to want to risk relying on them for such a short-term project.

Second, Knuth already had some experience with assemblers at Case Institute of Technology, and as cited above, had modified the SOAP III assembler for the IBM 650 to support program-point labels. It's possible he had some ideas about how an assembler could be constructed, and wanted to try those. It's also possible he wrote EASY as a warm-up exercise, to validate his understanding of the 205 and refine coding techniques for it. While the 205 and 650 had superficially similar architectures, programming to deal with the 205's high-speed loops was entirely different than that for addressing and flow control in the 650.

Third, it is clear from his comments in the oral history cited above that he had started working on the Algol-58 compiler before he got to Pasadena, and was coding portions of it at least by the time he was driving out to California. In order to do that, he had to have an assembly notation to code, and much of the notation in his draft coding form for the compiler closely resembles that of EASY.

Regardless of his reasons, his implementation of not one, but two 205 assemblers for the Algol compiler project worked, and no doubt helped him meet the project's ambitious schedule. Not only that, it turned out these relatively simple assemblers were quite fast. As Richard Waychoff recalls in his memoir, comparing assembly of Knuth's Algol-58 compiler with that of the 205 FORTRAN compiler he and Lloyd Turner were developing at the same time:
Our compilers were both punched on cards and were the same size. We had written ours in STAR 0, the only assembler that Burroughs supported on the 205. [...] Our compiler took one hour and 45 minutes to assemble. The first week of don’s project he spent in writing his own assembler. He could assemble his compiler in 45 minutes. We were green with envy. I am sure that don used only half the computer time that Lloyd and I used.
Running in the retro-205 emulator under Google Chrome, EASY can assemble its 574-card self in six minutes and 15 seconds. That is a rate of almost 92 cards/minute, which is pretty impressive, considering the assembler is ultimately limited by the 100 card/minute speed of the IBM 523 punch. The assembly of the Algol-58 compiler with EASY cited above -- producing 3737 cards in 41 minutes -- yields a similar rate.

Knuth comments in his notes for the MEASY assembler that it operates at about 120 cards/minute. Our experience with it in the emulator is nearly the same -- it compiles the Algol-58 compiler in approximately 30 minutes.

For whatever reasons drove Knuth to write his own assemblers, we are fortunate today that he did so, and that the source code for them has survived. The most reasonable alternative for him to use instead was probably the STAR 0 assembler. Nothing of that assembler appears to exist any longer -- not even a reference manual -- so if the Algol-58 compiler had been written in STAR 0, we would have been faced with the prospect of reverse-engineering that assembler from thin air.

Regardless, reconstruction of EASY and MEASY has been an interesting sideline to the overall goal of reconstructing of the Algol-58 compiler.


In Conclusion


As mentioned earlier, completion of the MEASY reconstruction has had to wait on implementation of magnetic tape in the retro-205 emulator. That work is now complete, and the next release of the emulator, scheduled shortly, will have tape support. Transcription of the Algol-58 source code was completed in June, and much progress has been made towards making it run.

The next blog posts will detail the magnetic tape implementation and the status of the compiler's reconstruction.




____________
1 Shell Assembly Program for the Burroughs 205 Electronic Data Processing System, Burroughs Corporation Bulletin 3038, March 1960, Part I, page 2. Originally published as EPR Memorandum Report 37 by the Shell Development Company, Exploration and Production Research Division, Houston, Texas, September 1958.

No comments: