How to make an e-TeX WebAssembly with Jim Fowler's WEB/TeX pascal to WASM compiler web2js?

You're increase of the pool size lead to additional memory requirements. So you do not need any other changes to eTeX, you have to increase the provided memory. In your Javascript versions, the amount of memory is set in the "compiler". For your settings you would need 32906 pages of memory, but there is an impmentation limit at 32767 pages. Luckily you can avoid this problem by using smaller values.

So we need to change some of the constants form etex.web. This doesn't mean that your etex.ch is "wrong" and you need a "right" one. Actually the license of etex.ch would forbid such modifications(At least without changing the name). Instead you should write a system dependent etex.sys file which you can pass to tangle later.

So first get copies from tex.web and etex.ch, then run

tie -m etex.web tex.web etex.ch

to get etex.web. Now you need a changefile with you new constants, for example save the following as etex.sys:

eTeX compatible constants for web2js

@x
@<Constants...@>=
@!mem_max=30000; {greatest index in \TeX's internal |mem| array;
  must be strictly less than |max_halfword|;
  must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|}
@!mem_min=0; {smallest index in \TeX's internal |mem| array;
  must be |min_halfword| or more;
  must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|}
@!buf_size=500; {maximum number of characters simultaneously present in
  current lines of open files and in control sequences between
  \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|}
@!error_line=72; {width of context lines on terminal error messages}
@!half_error_line=42; {width of first lines of contexts in terminal
  error messages; should be between 30 and |error_line-15|}
@!max_print_line=79; {width of longest text lines output; should be at least 60}
@!stack_size=200; {maximum number of simultaneous input sources}
@!max_in_open=6; {maximum number of input files and error insertions that
  can be going on simultaneously}
@!font_max=75; {maximum internal font number; must not exceed |max_quarterword|
  and must be at most |font_base+256|}
@!font_mem_size=20000; {number of words of |font_info| for all fonts}
@!param_size=60; {maximum number of simultaneous macro parameters}
@!nest_size=40; {maximum number of semantic levels simultaneously active}
@!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|}
@!string_vacancies=8000; {the minimum number of characters that should be
  available for the user's control sequences and font names,
  after \TeX's own error messages are stored}
@!pool_size=32000; {maximum number of characters in strings, including all
  error messages and help texts, and the names of all fonts and
  control sequences; must exceed |string_vacancies| by the total
  length of \TeX's own strings, which is currently about 23000}
@!save_size=600; {space for saving values outside of current group; must be
  at most |max_halfword|}
@!trie_size=8000; {space for hyphenation patterns; should be larger for
  \.{INITEX} than it is in production versions of \TeX}
@!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns}
@!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8}
@!file_name_size=40; {file names shouldn't be longer than this}
@!pool_name='TeXformats:TEX.POOL                     ';
  {string of length |file_name_size|; tells where the string pool appears}
@.TeXformats@>

@ Like the preceding parameters, the following quantities can be changed
at compile time to extend or reduce \TeX's capacity. But if they are changed,
it is necessary to rerun the initialization program \.{INITEX}
@.INITEX@>
to generate new tables for the production \TeX\ program.
One can't simply make helter-skelter changes to the following constants,
since certain rather complex initialization
numbers are computed from them. They are defined here using
\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
emphasize this distinction.

@d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX};
  must not be less than |mem_min|}
@d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX};
  must be substantially larger than |mem_bot|
  and not greater than |mem_max|}
@y
@<Constants...@>=
@!mem_max=200000; {greatest index in \TeX's internal |mem| array;
  must be strictly less than |max_halfword|;
  must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|}
@!mem_min=0; {smallest index in \TeX's internal |mem| array;
  must be |min_halfword| or more;
  must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|}
@!buf_size=5000; {maximum number of characters simultaneously present in
  current lines of open files and in control sequences between
  \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|}
@!error_line=72; {width of context lines on terminal error messages}
@!half_error_line=42; {width of first lines of contexts in terminal
  error messages; should be between 30 and |error_line-15|}
@!max_print_line=79; {width of longest text lines output; should be at least 60}
@!stack_size=1000; {maximum number of simultaneous input sources}
@!max_in_open=6; {maximum number of input files and error insertions that
  can be going on simultaneously}
@!font_max=75; {maximum internal font number; must not exceed |max_quarterword|
  and must be at most |font_base+256|}
@!font_mem_size=20000; {number of words of |font_info| for all fonts}
@!param_size=60; {maximum number of simultaneous macro parameters}
@!nest_size=40; {maximum number of semantic levels simultaneously active}
@!max_strings=60000; {maximum number of strings; must not exceed |max_halfword|}
@!string_vacancies=300000; {the minimum number of characters that should be
  available for the user's control sequences and font names,
  after \TeX's own error messages are stored}
@!pool_size=350000; {maximum number of characters in strings, including all
  error messages and help texts, and the names of all fonts and
  control sequences; must exceed |string_vacancies| by the total
  length of \TeX's own strings, which is currently about 23000}
@!save_size=600; {space for saving values outside of current group; must be
  at most |max_halfword|}
@!trie_size=8000; {space for hyphenation patterns; should be larger for
  \.{INITEX} than it is in production versions of \TeX}
@!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns}
@!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8}
@!file_name_size=40; {file names shouldn't be longer than this}
@!pool_name='TeXformats:TEX.POOL                     ';
  {string of length |file_name_size|; tells where the string pool appears}
@.TeXformats@>

@ Like the preceding parameters, the following quantities can be changed
at compile time to extend or reduce \TeX's capacity. But if they are changed,
it is necessary to rerun the initialization program \.{INITEX}
@.INITEX@>
to generate new tables for the production \TeX\ program.
One can't simply make helter-skelter changes to the following constants,
since certain rather complex initialization
numbers are computed from them. They are defined here using
\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to
emphasize this distinction.

@d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX};
  must not be less than |mem_min|}
@d mem_top==200000 {largest index in the |mem| array dumped by \.{INITEX};
  must be substantially larger than |mem_bot|
  and not greater than |mem_max|}
@z

@x
@d min_quarterword=0 {smallest allowable value in a |quarterword|}
@d max_quarterword=255 {largest allowable value in a |quarterword|}
@d min_halfword==0 {smallest allowable value in a |halfword|}
@d max_halfword==65535 {largest allowable value in a |halfword|}
@y
@d min_quarterword=0 {smallest allowable value in a |quarterword|}
@d max_quarterword=255 {largest allowable value in a |quarterword|}
@d min_halfword==0 {smallest allowable value in a |halfword|}
@d max_halfword==16777215 {largest allowable value in a |halfword|}
@z

Now you can run tangle:

tangle -underline etex.web etex.sys

You get the files etex.p and etex.pool.

Of course web2js will still look for tex.pool, but you can just change

filename = "tex.pool";

into

filename = "etex.pool";

in both header.js and library.js.

Now let's try

node compile.js etex.p

Similar to your original experiment, we get

[...]

Need 41 of memory

Now 41 is significantly less than 32906, especially it is below 32767. So we can just allocate more memory. This needs to be done consistently in four files: In index.js, initex.js, tex.js and pascal/program.js, change

var pages = 20;

into

var pages = 50;

(Probably 41 would be enough, but 50 looks nicer)

Now we can try

node compile.js etex.p

again. This time it actually works! You could use node initex.js now to get plain-TeX format, but we actually want eTeX. So you can get yourself a version of etex.src, etexdefs.lib and language.def and change

library.setInput("\nplain \\dump\n\n"

in initex.js into

library.setInput("\n*etex \\dump\n\n"

Here, the asterisk * is important, it enables the "extended mode". Also change &plain into &etex in the same file to preload etex.

Then

node initex.js

generates a e-TeX format etex.fmt and a memory dump, which can be used with

node tex.js

I managed to get a LaTeX format working with web2js, though with some caveats.

Here's a working (for me) sequence of steps.

  1. Get web2js: either download the zip file and unzip, or run

    git clone https://github.com/kisonecat/web2js.git
    
  2. Get tex.web: download using your browser, or run:

    wget http://mirrors.ctan.org/systems/knuth/dist/tex/tex.web
    
  3. Get etex.ch: download using your browser, or run:

    wget -O etex.ch 'https://tug.org/svn/texlive/trunk/Build/source/texk/web2c/etexdir/etex.ch?revision=32727&view=co'
    
  4. Tie them together:

    tie -m mytex.web tex.web etex.ch
    
  5. Make the following modifications to the resulting file (or you can use the “proper” way involving etex.sys etc., as in the answer by Marcel Krüger):

    @!mem_max=30000; {greatest index in \TeX's  |   @!mem_max=400000; {greatest index in \TeX'
    @!stack_size=200; {maximum number of simul  |   @!stack_size=1000; {maximum number of simu
    @!max_in_open=6; {maximum number of input   |   @!max_in_open=15; {maximum number of input
    @!max_strings=3000; {maximum number of str  |   @!max_strings=60000; {maximum number of st
    @!string_vacancies=8000; {the minimum numb  |   @!string_vacancies=300000; {the minimum nu
    @!pool_size=32000; {maximum number of char  |   @!pool_size=350000; {maximum number of cha
    @!trie_size=8000; {space for hyphenation p  |   @!trie_size=600000; {space for hyphenation
    @!trie_op_size=500; {space for ``opcodes''  |   @!trie_op_size=10000; {space for ``opcodes
    @d mem_top==30000 {largest index in the |m  |   @d mem_top==400000 {largest index in the |
    @d hash_size=2100 {maximum number of contr  |   @d hash_size=15000 {maximum number of cont
    @d hyph_size=307 {another prime; the numbe  |   @d hyph_size=2003 {another prime; the numb
    for i:=0 to @'37 do xchr[i]:=' ';           |   for i:=0 to @'37 do xchr[i]:=chr(i);
    for i:=@'177 to @'377 do xchr[i]:=' ';      |   for i:=@'177 to @'377 do xchr[i]:=chr(i);
    @d max_quarterword=255 {largest allowable   |   @d max_quarterword=65535 {largest allowabl
    @d max_halfword==65535 {largest allowable   |   @d max_halfword==16777215 {largest allowab
    

    These were determined most empirically, by bumping up the ones I got errors about. The change in the xchr assignments is as per the discussion at another question.

  6. Correspondingly, edit the four files index.js, initex.js, pascal/program.js and tex.js to change var pages = 20; to var pages=290;. (Actually, while playing with this I created a file commonMemory.js containing

    module.exports = { commonPages: function() { return 290; } };
    

    and used var pages = require('./commonMemory').commonPages(); or ... But that was just convenient while determining this number 290, and you don't have to do that.)

  7. Edit library.js: inside function reset, change this block:

        files.push({
          filename: filename,
          position: 0,
          descriptor: fs.openSync(filename,'r'),
        });
    

    to

        let basename = filename.slice(filename.lastIndexOf('/') + 1);
        const {spawnSync} = require('child_process');
        let realFilename = spawnSync('kpsewhich', [filename]).stdout.toString().trim();
        if (realFilename == '') {
            // try again with basename
            realFilename = spawnSync('kpsewhich', [basename]).stdout.toString().trim();
            if (realFilename == '') {
                // Give up, just create empty file
                spawnSync('touch', [basename]);
                realFilename = basename;
                console.log(`For filename #${filename}# created empty #${basename}#`);
            } else {
                console.log(`Found filename #${filename}# via basename at #${realFilename}#`);
            }
        } else {
            console.log(`Found filename #${filename}# at #${realFilename}#`);
        }
    
        files.push({
          filename: filename,
          position: 0,
          descriptor: fs.openSync(realFilename,'r'),
        });
    

    — the idea is that as creating a LaTeX format file loads zillions of files, some of which aren't even distributed with TeX Live, we make the file-lookup hook into kpsewhich to find all those files, and just leave the file empty if not found. For what it's worth, these were the files that were not found and for which empty files were used: babel-latex.cfg, il2enc.dfu, omlenc.dfu, omxenc.dfu, uenc.dfu.

  8. Edit initex.js to dump the LaTeX format instead of plain (and again when doing the core dump):

    -library.setInput("\nplain \\dump\n\n",
    +library.setInput("\n*latex.ltx \\dump\n\n",
    

    and

    -library.setInput("\n&plain\n\n",
    +library.setInput("\n&latex\n\n",
    
  9. Replace the contents of sample.tex with a LaTeX sample. For example, you can use (from here):

    \documentclass{article}
    \title{Cartesian closed categories and the price of eggs}
    \author{Jane Doe}
    \date{September 1994}
    \begin{document}
       \maketitle
       Hello world!
    \end{document}
    
  10. Get web2js dependencies and build its Pascal parser:

    npm install
    npm run-script build
    
  11. Build everything: from WEB (via TANGLE) to Pascal (via web2js) to WASM to loading and dumping format file and memory dump and then running TeX:

    tangle -underline mytex.web && \
    mv -f mytex.pool tex.pool && \
    node compile.js mytex.p && \
    node initex.js && \
    node tex.js
    

Note that sample.dvi has been created successfully and looks ok. So we have a working LaTeX format. You can try editing sample.tex and re-running node tex.js to typeset various LaTeX documents (to DVI).

Caveats:

  • Because of those missing files that were substituted, it's possible that hyphenation patterns for non-English languages, or those particular font encodings, may not work correctly. But I could not find these files even in the TeX Live sources so I'm not sure what they're supposed to contain, or whether they're expected to be empty anyway.

  • The first revision of this answer has a way to build a LaTeX format without increasing max_quarterword / max_halfword, or increasing the number of memory pages granted on the JS side. That came at the cost of not loading most languages' hyphenation patterns, and also is not sufficient for loading heavy-weight packages like TikZ. The current revision does not have those issues.