Rather simple, instructional, educational and illustrative web program source codes in Knuth's WEB literal programming language?

Here is a small hello.web, that doesn't use much of the features of WEB:

@* Introduction.
This program takes an integer $n$ as input, and prints ``Hello world'' $n$ times.

@p program HELLO(input, output);
var
   n: integer;
   i: integer;
begin
   read(n);
   for i := 1 to n do
   begin
      writeln('Hello world');
   end;
end.

@* Index. Not much to it. Everything occurs in section 1.

After weave hello.web followed by tex hello.tex, the resulting typeset output starts like:

first page of hello


For more details, read the WEB manual, available with texdoc webman or online.


Instead of “Hello World”, how about a small-ish program that was specifically intended to illustrate WEB?

In the November 1981 issue of TUGboat, Knuth mentioned that he was developing a system called WEB. Then in the next (March 1982) issue, he published “Fixed-Point Glue Setting: An Example of WEB”, saying

I will soon be publishing a complete manual about WEB, but in the meantime I think it will be useful to have an example of a fairly short piece of code written in "web" form. Therefore I have prepared the accompanying program, which also serves another function: ...

You can read the program itself with texdoc glue, or here. The glue.web that you need is here.

I was able to test this glue.web just now with another Pascal compiler (fpc), by first running tangle glue.web to generate glue.p, then adding {$mode ISO} to the top of glue.p (so that when the program says “integer”, a 32-bit type is used rather than a 16-bit type), then running fpc glue.p to generate the glue binary. (I tried web2js but it crashed.)

Then to test the binary, put the following input (which consists of 7 test cases—pairs of lines—followed by 0) in a file (or type it in at the terminal). This input is the same as that used by Knuth.

200000
30000 40000 50000 60000 0
2000
30000 40000 50000 60000 0
1000000000
8000000 -9000000 8000000 4000 7000000 0
100
8000000 -9000000 8000000 4000 7000000 0
1000000000
800 -900 800 400 700 0
1000000000
800 -900 800 400 -700 0
65555 
-200 199 0
1
60000 -59999 90000 0
0

The output should be (matching that published in TUGboat):

Test data set number 1:
  Glue ratio is 1.1111 (0,14,18205)
               30000          33334
               40000          44445
               50000          55557
               60000          66668
 Totals       180000         200004 (versus 200000)
Test data set number 2:
  Glue ratio is 0.0111 (0,21,23302)
               30000            333
               40000            444
               50000            555
               60000            666
 Totals       180000           1998 (versus 2000)
Test data set number 3:
  Glue ratio is 71.4101 (8,0,18281)
             8000000      571281250
            -9000000     -642686836
             8000000      571281250
                4000         274215
             7000000      499857383
 Totals     14004000     1000007262 (versus 1000000000)
Test data set number 4:
  Glue ratio is 0.0000 (8,24,30670)
             8000000             57
            -9000000            -64
             8000000             57
                4000              0
             7000000             49
 Totals     14004000             99 (versus 100)
Test data set number 5:
  Glue ratio is 2x2x2x2x2x2x8681.0000 (-6,1,17362)
                 800      444467200
                -900     -500025600
                 800      444467200
                 400      222233600
                 700      388908800
 Totals         1800     1000051200 (versus 1000000000)
Test data set number 6:
! Excessive glue.
  Glue ratio is 2x2x2x2x2x2x2x0.0000 (-6,0,0)
                 800              0
                -900              0
                 800              0
                 400              0
                -700              0
 Totals          400              0 (versus 1000000000)
Test data set number 7:
Invalid data (nonpositive sum); this set rejected.
Test data set number 8:
  Glue ratio is 0.0000 (1,30,23861)
               60000              0
              -59999              0
               90000              1
 Totals        90001              1 (versus 1)

To illustrate WEB a bit better, here is a rather elaborate version of the hello-world program that uses most of the main features of WEB: named and unnamed sections; simple, numeric, and parametric macros; some formatting and indexing controls; and, most exotic of all, the string pool.

All that the program does is print “Hello world” N times after reading a number N (so there's nothing mathematically or algorithmically non-trivial to add difficulty, unlike the GLUE or PRIMES programs), but it is split up into modules, and uses double-quoted strings that (TANGLE writes and) the program reads from the string pool file, maintaining the strings in an array of characters as TeX and other Knuth programs do.

Here is the program (also uploaded here):

% A Hello-world program.
\def\title{Hello}
% Hack to change link colours (if used with pdfwebmac)
\def\BlueGreen{\pdfsetcolor{\cmykRed}}


@* Introduction.
This program takes an integer $n$ as input and prints ``Hello world'' $n$ times.
(There is no Pascal code in this section.)



@ We give part of the program here, and it will continue later.

@p
@<Compiler directives@>
program HELLO(input, output);
var
  @<global variables@>



@ What global variables do we need? For one thing, we need the $n$.

@<global...@>=
  @!n: integer;



@ For |integer| variables to be treated as 32-bit by the Pascal compiler,
on FPC we need a special compiler directive.

@<Compiler di...@> = @{@&$mode iso@}
@^system dependencies@>

@* The string pool and file I/O.
The WEB feature of string pools was designed at a time when Pascal compilers
did not have good support for strings. Now it may be no longer necessary, but to
illustrate the feature we will maintain a string pool.

More specifically, we will maintain a large array of characters, named |str|.
All characters of all strings from the string pool go into this array: the $n$th
string occupies the positions from |str_start[n]| to |str_start[n+1] - 1|
(inclusive) in this array, where |str_start| is an auxiliary array of integers.
Also, the number of strings currently in the string pool is stored in an integer
variable called |str_count|.

By convention, the first $256$ strings are the one-character (one-byte) strings.
For this program we don't need too many additional strings. In fact we need just
a few strings, but we'll support $10$ strings with a total of $1000$ characters.

@d max_strings = 256 + 10
@d max_total_string_length = 1000

@<global var...@>=
  @!str: array[0..max_total_string_length-1] of char;
  @!str_start: array[0..max_strings-1] of integer;
  @!str_count: integer;



@ To use this string pool, we have a procedure that reads out characters from it
one-by-one. Specfically, |print(k)| prints the $k$th string, and |println| and
|printnl| are convenience macros.

@d println(#) == begin print(#); writeln; end
@d printnl(#) == begin writeln; print(#); end
@p
procedure print(n: integer);
var
  i: integer;
begin
  @{ writeln('For ', n, ' will print characters from ', str_start[n], ' to ', str_start[n + 1] - 1); @}
  for i := str_start[n] to str_start[n + 1] - 1 do
  begin
    write(str[i]);
  end;
end;



@ We'll have a procedure to populate this array by reading from the pool file,
but unfortunately that means we need to figure out file input. How this is done
depends on the Pascal compiler. In FPC, a file of characters can be declared as
a variable of type |TextFile|, initialized with |Assign| and |Reset|, then read
with |read|.
@^system dependencies@>

@p
procedure initialize_str_array;
var
  pool_file: TextFile;
  x, y: char;  { for the first two digits on each line}
  @!length: integer;
  i: integer;
begin
  str_count := 0;
  str_start[0] := 0;
  for i := 0 to 255 do
  begin
    str[i] := chr(i);
    str_start[i + 1] := str_start[i] + 1;
    str_count := str_count + 1;
  end;
  Assign(pool_file, 'hello.pool');
  Reset(pool_file);
  while not eof(pool_file) do
  begin
    read(pool_file, x, y);
    if x = '*' then @<check pool checksum@>
    else begin
      length := 10 * (ord(x) - "0") + ord(y) - "0";
      str_start[str_count + 1] := str_start[str_count] + length;
      for i := str_start[str_count] to str_start[str_count + 1] - 1 do
      begin
        read(pool_file, str[i]);
      end;
      readln(pool_file);
      str_count := str_count + 1;
    end
  end;
end;



@ To ensure that the pool file hasn't been modified since tangle was run, we can
use the @@\$ (= |@t\AT!\$@>| = at-sign, dollar-sign) feature. We can reuse
(abuse?) the |y| and |length| variables for reading characters and maintaining
the checksum read from the file.

@<check pool...@> =
begin
  length := ord(y) - "0";
  while not eof(pool_file) do
  begin
    read(pool_file, y);
    if ("0" <= ord(y)) and (ord(y) <= "9") then
      length := length * 10 + (ord(y) - "0");
  end;
  if length <> @$ then
  begin
     writeln('Corrupted pool file: got length: ', length : 1, '; rerun tangle and recompile.');
     Halt(1);
  end
end



@* Main program.
Apart from |n|, we also need an |i| to loop over.

@<glob...@> =
  i: integer;



@ Here finally is the ``main'' block of the program.

@p
begin
  initialize_str_array;
  print("How many times should I say hello? ");
  read(n);
  printnl("OK, here are your "); write(n : 1); println(" hellos: ");
  for i := 1 to n do
  begin
    println("Hello, world!");
  end;
  print("There, said hello "); write(n : 1); println(" times.");
end.



@* Index. If you're reading the woven output, you'll see the index here.

Running tangle (to get hello.p and hello.pool) and then a Pascal compiler shows the program working correctly:

% tangle hello.web                                                     
This is TANGLE, Version 4.5 (TeX Live 2018)
*1*5*9*11
Writing the output file
Done.
6 strings written to string pool file.
(No errors were found.)

% cat hello.pool                                    
35How many times should I say hello? 
18OK, here are your 
09 hellos: 
13Hello, world!
18There, said hello 
07 times.
*332216284

% fpc hello.p                      
Free Pascal Compiler version 3.0.4 [2018/10/02] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Darwin for x86_64
Compiling hello.p
Assembling (pipe) hello.s
Linking hello
26 lines compiled, 0.1 sec

% echo 5 | ./hello 
How many times should I say hello? 
OK, here are your 5 hellos: 
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
There, said hello 5 times.

And running weave:

weave hello && sed -i=.bak "s/webmac/pdfwebmac/" hello.tex && pdftex hello.tex

results in a 7-page typeset PDF that is the version of the program “supposed” to be read usually.


Some additional sources of information.

In 1987, Knuth gave a series of lectures on mathematical writing; these also often discussed computer science writing. He devoted two lectures to literate programming, naturally using WEB (CWEB was also developed in 1987, but the lectures may well predate it).

While the whole series consists of twenty-one lectures, they are grouped by subject and are relatively self-contained; you do not need to watch all of them to understand the two relevant here:

Literate programming (1) - https://www.youtube.com/watch?v=U8LttJ1rvWI
Literate programming (2) - https://www.youtube.com/watch?v=ObxmXC2NCMA

He goes through a few WEB programs written by his students and critiques them, all while discussing various features of WEB and literate programming in general.


But that's not all. There was a different series of lectures in 1982 on ``The internal details of TeX82''; the first few of these often touch upon WEB, but the first is the most relevant. I would recommend watching the whole series if you're interested in, well, the internal details of TeX82.
(But do note some anachronisms; while most of the details haven't changed, there is at least one big discrepancy between the TeX82 discussed in the lectures and the TeX82 we know today in the existence of a \chcode primitive. This single primitive was used where either \catcode or \mathcode would be used now (see error #395 in the errorlog). Oh, and the recordings of the computer's display are largely illegible, but if you're familiar with WEB and TeX then you should still be able to follow along.)


@ShreevatsaR pointed out in a comment that the differences between the TeX82 lectured about and the TeX82 we know are not necessarily trivial, and linked to the tex.web version (the banner said version -.25) as of the time of the lectures. I went to go read some of it and realized that, even if you're somewhat familiar with WEB and TeX, it could prove extremely difficult to follow along using the WEB source (especially without a woven, typeset version).

So let's go through some parts of it, comparing it with TeX version 3.14159265 (henceforth "OldTeX" is version -0.25 as presented in the lectures and "TeX" is the modern form). As for line numbers, the reader should copy and paste the code from the website, and NOT save the entire page; doing the latter would require removing all of the HTML entities and markup from the code (and there's a lot of code).

First, of course, there's the first 61 lines:

COMMENT ⓧ   VALID 00057 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00006 00002  % This program is copyright 1982 by D. E. Knuth all rights are reserved.
C00010 00003  @* \[1] Introduction.
C00042 00004  @* \[2] The character set.
C00056 00005  @* \[3] Input and output.
C00079 00006  @* \[4] String handling.
C00095 00007  @* \[5] On-line and off-line printing.
C00111 00008  @* \[6] Reporting errors.
C00133 00009  @* \[7] Arithmetic with scaled dimensions.
C00148 00010  @* \[8] Packed data.
C00158 00011  @* \[9] Dynamic memory allocation.
C00175 00012  @* \[10] Data structures for boxes and their friends.
C00211 00013  @* \[11] Memory layout.
C00224 00014  @* \[12] Displaying boxes.
C00243 00015  @* \[13] Destroying boxes.
C00247 00016  @* \[14] Copying boxes.
C00253 00017  @* \[15] The command codes.
C00267 00018  @* \[16] The semantic nest.
C00280 00019  @* \[17] The table of equivalents.
C00332 00020  @* \[18] The hash table.
C00351 00021  @* \[19] Saving and restoring equivalents.
C00372 00022  @* \[20] Token lists.
C00384 00023  @* \[21] Introduction to the syntactic routines.
C00391 00024  @* \[22] Input stacks and states.
C00423 00025  @* \[23] Maintaining the input stacks.
C00430 00026  @* \[24] Getting the next token.
C00459 00027  @* \[25] Expanding user macros.
C00481 00028  @* \[26] Basic scanning subroutines.
C00531 00029  @* \[27] Building token lists.
C00544 00030  @* \[28] File names.
C00572 00031  @* \[29] Font metric data.
C00628 00032  @* \[30] Device-independent file format.
C00665 00033  @* \[31] Shipping pages out.
C00721 00034  @* \[32] Packaging.
C00749 00035  @* \[33] Data structures for math mode.
C00778 00036  @* \[34] Subroutines for math mode.
C00799 00037  @* \[35] Typesetting math formulas.
C00853 00038  @* \[36] Alignment.
C00902 00039  @* \[37] Breaking paragraphs into lines.
C00963 00040  @* \[38] Breaking paragraphs into lines, continued.
C00992 00041  @* \[39] Pre-hyphenation.
C01004 00042  @* \[40] Post-hyphenation.
C01020 00043  @* \[41] Hyphenation.
C01038 00044  @* \[42] Initializing the hyphenation tables.
C01067 00045  @* \[43] Breaking vertical lists into pages.
C01084 00046  @* \[44] The page builder.
C01131 00047  @* \[45] The chief executive.
C01157 00048  @* \[46] Building boxes and lists.
C01218 00049  @* \[47] Building math lists.
C01267 00050  @* \[48] Conditional processing.
C01281 00051  @* \[49] Mode-independent processing.
C01325 00052  @* \[50] Dumping and undumping the tables.
C01350 00053  @* \[51] The main program.
C01362 00054  @* \[52] Debugging.
C01367 00055  @* \[53] Extensions.
C01389 00056  @* \[54] System-dependent changes.
C01390 00057  @* \[55] Index.
C01391 ENDMK
Cⓧ;

At SAIL, the main system text editor (and this is referenced often in the lectures) was page-oriented; there was also a line editor with ex-like functionality. Pages are marked with the formfeed character, which may display as a square with FF, and can be entered in a terminal using control-L (which usually functions as a clear command if typed at the shell). Interestingly, the GNU C coding style guide recommends dividing source code with formfeed characters, but I've personally never seen this in any code past 1990.

Regardless, this "header" just lists the pages of the file and their first lines. In an editor which supports pagination, one could easily jump to Part 30 by jumping to page 32. It can be ignored. You can safely replace all formfeeds with nothing and not change anything important.

Next we have the limbo section, which I'll divide into parts.

% This program is copyright 1982 by D. E. Knuth; all rights are reserved.
% Please don't make any changes to this file unless you are D. E. Knuth!
% Version 0 is fully implemented but not yet fully tested, so beware of bugs.

% Here is TeX material that gets inserted after \input webhdr
\def\hang{\hangindent 3em\ \unskip\!}
\def\textindent#1{\hangindent 2.5em\noindent\hbox to 2.5em{\hss#1 }\!}
\def\at{@@} % use for an at sign
\chcode@@=13 \def@@{\penalty999\ } % ties words together
\def\TeX{T\hbox{\hskip-.1667em\lower.424ex\hbox{E}\hskip-.125em X}}
\font b=cmr9 \def\mc{\:b} % medium caps for names like PASCAL
\def\PASCAL{{\mc PASCAL}}
\def\ph{{\mc PASCAL-H}}
\font L=manfnt % font used for the METAFONT logo
\def\MF{{\:L META}\-{\:L FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\kern{\penalty100000\hskip}

For comparison (omitting the copyright comments), here is the corresponding code in the canonical tex.web:

% Here is TeX material that gets inserted after \input webmac
\def\hang{\hangindent 3em\noindent\ignorespaces}
\def\hangg#1 {\hang\hbox{#1 }}
\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
\font\ninerm=cmr9
\let\mc=\ninerm % medium caps for names like SAIL
\def\PASCAL{Pascal}
\def\ph{\hbox{Pascal-H}}
\def\pct!{{\char`\%}} % percent sign in ordinary text
\font\logo=logo10 % font used for the METAFONT logo
\def\MF{{\logo META}\-{\logo FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\section{\mathhexbox278}

Well, the first difference is that we're \inputting webhdr and not webmac. This is simple: in TeX78 and seemingly OldTeX, it appears that the .tex files containing the definitions for a format have filenames that end in hdr.tex instead of mac.tex. Examples: manmac.tex was manhdr.tex, and the corresponding file to modern-day taocpmac.tex (an illegal filename at SAIL because it's longer than ten characters) was acphdr.tex. And naturally webmac.tex was webhdr.tex, though for a short time.
(I suspect that this convention is related to a similar one for SAIL source code (see TEXHDR.SAI, and it was changed to distance TeX from SAIL and make it more independent; but this is just a hypothesis.)

Then come the definitions of \hang, \hangg (absent in OldTeX), and \textindent. These are used for itemizing; presumably finer control was desired later. In OldTeX and TeX78, \! was a primitive that appears to have been synonymous with TeX's \ignorespaces.
I'm not sure exactly what the purpose of \ \unskip\! is. \␣ was a "forced space" as it is in TeX. \unskip is listed in TeX and METAFONT, New Directions in Typesetting as a "recent addition", and deletes the most recent glue. The expression thus seems to mean <space><remove the space><supress following spaces>, which doesn't make sense to me.

The next definitions for OldTeX setup \at to produce an at sign, and then makes the at sign an active character (remember that any @@ in a WEB file is unconditionally converted to a single at sign) and defines it to produce a non-breaking space or tie---essentially, @ did what ~ does. Note in plain TeX, ~ is defined to be a penalty of 10000 as opposed to @'s 999. The change of @'s category code is accomplished with \chcode; as I stated above, this is essentially a combination of \catcode and \mathcode. One difference between OldTeX and TeX78 is shown here: in TeX78, the expression would be \chcode@@=12 (category codes started at zero instead of one).

The reason for TeX's \def\pct!... is that webmac.tex defines \% to be a percent sign in the \tt font. webhdr.tex does not. \TeX's definition is not part of basic, OldTeX and TeX78's default format, analogous to TeX's plain.

Note that the retaining of \PASCAL even though the name eventually came to be set in lowercase is likely just for compatibility.

Font handling (in terms of defining fonts and switching to them) was actually surprisingly complicated in OldTeX and TeX78. TeX78 font names were single characters, not control sequences; the actual internal font number to which they refer is determined by taking the last five bits of the ASCII code corresponding to that character, limiting the number of addressable fonts to 32. Later this was extended to 256. Suffice it to say that \font b=cmr9 and then \:b is the same as \font\b=cmr9 and then \b. I'm not familiar with any further details.

Now we can move on. The remainder of the limbo parts of both versions shouldn't present too many difficulties.

OldTeX

\def\(#1){} % this is used to make module names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @:sort key}{entry@>

\outer\def\N#1. \[#2]#3.{\par\mark{#1}\vfill\eject % beginning of starred module
  \gdef\position{\:a#2\:ux\:a\topmark} % for part numbers
  \xdef\rhead{\uppercase{\!#3}}
  \sendcontents{\Z{\]#2]#3}{#1}{\count1}}
  \Q\noindent{\bf#1.\quad\!#3.\quad}\!}

\def\title{\TeX82}
\def\contentspagenumber{1}
\def\topofcontents{\hsize 5.5in
  \topspace 0pt plus 1fil minus 1fil
  \def\]##1]{\hbox to 1in{\hfil##1.\ }}
  }
\def\botofcontents{\vskip 0pt plus 1fil minus 1fil\setpage\let\]=\let}
\def\lheader{\hbox to1.5em{\:a\hss\count0}\:m\qquad\rhead\hfill\title\qquad
  \position} % top line on left-hand pages
\def\rheader{\position\:m\qquad\title\hfill\rhead\qquad
  \hbox to1.5em{\:a\hss\count0}} % top line on right-hand pages
\setcount0 \contentspagenumber
\topofcontents
\ctrline{(replace this page by the contents page printed later)}
\botofcontents
\mark{1}\eject

TeX

\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>

\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section
  \def\rhead{PART #2:\uppercase{#3}} % define running headline
  \message{*\modno} % progress report
  \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next
  \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces}
\let\?=\relax % we want to be able to \write a \?

\def\title{\TeX82}
\def\topofcontents{\hsize 5.5in
  \vglue 0pt plus 1fil minus 1.5in
  \def\?##1]{\hbox to 1in{\hfil##1.\ }}
  }
\def\botofcontents{\vskip 0pt plus 1fil minus 1.5in}
\pageno=3

The TeX and METAFONT sources patch \N (all WEB versions), to modify the running header and the table of contents (for aesthetic reasons). OldTeX and TeX go in different directions for this.

I must admit I'm not sufficiently well-versed in WEAVE to explain the usage of \() and \9. \9 is internal to WEAVE; it isn't used anywhere in tex.web. \(), on the other hand, is frequently used within module names that share a common prefix with several others (this isn't a comment on WEAVE's internal data structures or how it searches by prefix, but just an observation).
For instance, here are four module names as they appear in order within a single section (namely 453):

@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>=
@<Scan for \(u)units that are internal dimensions;
  |goto attach_sign| with |cur_val| set if found@>;
@<Scan for \(m)\.{mu} units and |goto attach_fraction|@>;
@<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly;
  |goto done| in the case of scaled points@>

The three that share "Scan for" have \(), but "Scan units..." doesn't. Again, the exact use and meaning of this are not clear yet to me; perhaps someone more informed can chime in.

Wow; now we can actually get into some code! For every single difference between the source for OldTeX and the source for TeX, I've made a gist that simply contains the output of running diff oldtex.web tex.web. You'll see that there are, of course, thousands of changes; most of them are typographical corrections/changes, where "typographical" refers to both the text of the code and the typography of the woven TeX output. There's also generally more indexing in TeX. (I think that, unfortunately, the indexing commands often make the code somewhat difficult to read; this may be an issue for people just getting into WEB.)

But here's an extremely important kind of difference, prevalent throughout the entire source: the character set. These days, something incompatible with ASCII is quite rare, but at the time of SAIL's installation it was all too prevalent. This is why all WEB programs that do any input and output and are expected to be portable would convert the input into an internal, ASCII-like format. SAIL's character set is an extension of ASCII, adding many mathematically useful characters. Here's an excerpt of the source of OldTeX, after removing WEB commands and changing the indentation to enhance clarity:

@<Accumulate the constant...@>=
loop begin 
  if (cur_tok<zero_token+radix)∧(cur_tok≥zero_token)∧(cur_tok≤zero_token+9) then 
    d←cur_tok-zero_token
  else if (radix=16)∧(cur_tok≤A_token+5)∧(cur_tok≥A_token) then
    d←cur_tok-A_token+10
  else
    goto done;
  vacuous←false;
  if (cur_val≥m)∧((cur_val>m)∨(d>7)∨(radix≠10)) then
    begin if OK_so_far then
      begin print_nl("! Number too big");
        help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")
             ("so I'm using that number instead of yours.");
        error; cur_val←infinity; OK_so_far←false;
      end;
    end
  else cur_val←cur_val*radix+d;
  get_nc_token;
  end;
done:

The characters , , , , , and are all not standard ASCII. To have TeX run on as many systems as possible, and to make the porting process as painless as possible, these evidently couldn't be used; and the modern tex.web is indeed pure ASCII. The TeX78 language also made heavy use of SAIL's character set; would be used where & is now when aligning. This was also done away with. Reading the source of TeX and the TeXbook, one gets the idea that Knuth was somewhat resentful at having to limit everything to the "inferior" standard ASCII.

To change the WEB code from SAIL's character set to standard ASCII, a SAIL program was written by David R. Fuchs, which automatically performs the conversion. This program was called UNDEK. That name is amusing already, but even more so when you consider that the initial prototype of the TANGLE component of the WEB system was named UNDOC!

At any rate, the final form of the @<Accumulate the constant...@> module is thus:

@<Accumulate the constant...@>=
loop begin
  if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and
     (cur_tok<=zero_token+9) then
    d:=cur_tok-zero_token
  else
    if radix=16 then
      if (cur_tok<=A_token+5)and(cur_tok>=A_token) then
        d:=cur_tok-A_token+10
      else
        if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then
          d:=cur_tok-other_A_token+10
        else
          goto done
    else
      goto done;
  vacuous:=false;
  if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then
    begin if OK_so_far then
      begin print_err("Number too big");
      help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")
           ("so I'm using that number instead of yours.");
      error; cur_val:=infinity; OK_so_far:=false;
      end;
    end
  else cur_val:=cur_val*radix+d;
  get_x_token;
  end;
done:

(I really agree with Kernighans points about semicolons in Section 5 of his article critiquing Pascal; I honestly don't know if the quadruply-nested if statement is correctly indented.)

The biggest issue (definitely more difficult to deal with than the character set) is the lack of a printed version of the woven source of OldTeX. Throughout the lectures, reference is frequently made to exact module numbers. Here is an approximation to a solution: simply search (after removing formfeeds) for an at sign at the beginning of a line, followed by either a space or an asterisk; this can be represented in a common notation for pattern matching as ^@( |\*). Then by going to the nth result you effectively go to the nth module. It's not a very satisfactory solution, but oh well.

(Okay that was more of an information dump than a walkthrough, but now it's documented.)

Tags:

Web