Why are spaces needed in the end of string array?

It was a "feature" of standard Pascal that there was no universal "string" type, every length was a separate type. That is, "string of length 40" is a different type from "string of length 50"; a function can be declared to accept arguments of either one type or the other, not both. (In fact Pascal had no string type at all, it had arrays, for which the same problem applies: arrays of different lengths are different types.)

This used to be one of the most annoying things about programming with Pascal, and it was the main thing that Brian Kernighan (the "K" in "K&R" of the C programming language) complained about in his “Why Pascal is Not My Favorite Programming Language” (written around the same time that Knuth was writing TeX in Pascal).

This inconvenience is the main reason that TeX does its own string handling (see Part 4 / §38 onwards in the program) (I'd started writing a bit about this here) -- mostly when it needs strings, it just uses an offset into a giant str_pool array of a pre-declared size, e.g. string n means whatever characters are in positions str_start[n] to str_start[n+1] - 1 of the str_pool array.

But occasionally it needs to use strings that are actually arrays in Pascal, to pass to system calls or assign to variables for example. If you look at your string constant

pool_name='TeXformats:TEX.POOL                     ';

the type of pool_name is not "string", but something like "an array of 40 chars". Later when pool_name is used (§51), in

name_of_file := pool_name;

the variable that it assigns to is of type (see §26)

name_of_file : packed array [1 .. file_name_size] of char;

i.e. it's also an array of 40 characters, which is why the assignment is possible. This name_of_file is then passed to the system routines reset and rewrite, as seen in the next section §27 (and as explained in Marcel Krüger's answer).

Elsewhere in the program you'll see routines to convert TeX strings (i.e. consecutive positions of the str_pool array, as mentioned above), into Pascal strings, specifically into the Pascal string name_of_file. The pack_file_name of section 519 you mention is one of those.

The idea is that if a string has been stored in a TeX string (e.g. when the user writes \input foo.tex, the str_pool array would contain the 7 characters foo.tex), then TeX would set name_length appropriately (here, 7) and then call this pack_file_name procedure, which will assign the Pascal string name_of_file to be foo.tex followed by spaces, so that reset(..., name_of_file, ...) can act on that string. If the characters were not cleared by adding spaces, then reset (as implemented by the Pascal-H runtime) could get an incorrect filename and try to open it.

[Note: Actually the format of file names was different and very inconsistent then, see Part 28: File names, §511 onwards -- file names at Stanford had a "name", "extension", and "area" which might be something like [1, DEK]. So pack_file_name needs to combine these different TeX strings into a single Pascal string. But we can ignore that complication; the example of foo.tex is enough for this explanation.]

In TeX, The Program there is a remark about this (directly after name_of_file is defined):

The Pascal-H compiler with which the present version of TEX was prepared has extended the rules of Pascal in a very convenient way. To open file f , we can write
reset (f , name ,  ́/O ́)      for input;
rewrite (f , name ,  ́/O ́)    for output.
The ‘name ’ parameter, which is of type packed array [<any>] of char, stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in name are ignored.

Of course, this only seems to explain why the spaces are allowed, but why are they necessary? The <any> in packed array [<any>] of char has to be filled with a constant number defined at compile-time, so the filename string has a compile-time defined length. But TeX does not want to restrict itself to opening files with a fixed length of the filename, therefore TeX by default defines a sufficiently large constant (40) which becomes the maximal length for a filename. Then a filename can be written to the first bytes of the field. The remaining bytes should be ignored, so they are filled with blanks.

Of course you might ask why Pascal-H reequires such a fixed length. I can only guess, but most likely this is to simplify porting to different systems. A variable sized string is relatively complicated to pass around: If often has to stored on the heap, different programming languages have different conventions of saving the length, etc. and the operating system might also enforce it's own conventions when passing the filename. On the other hand, a fixed size buffer is just a block of memory. It can be passed around freely, it can easily be allocated on the stack and some system specific code implemented in any language can parse it and transform it into the right format.

Also as Phelype mentioned, this often isn't very relevant for actual usage of TeX because it is in the system-specific part of the TeX sources. So for example in Web2C it is replaced by a implementation based on (C-style null-terminated) dynamic strings of length up to maxint and no spaces are written at the end of the names. (Instead, the code writes the \0 terminator there)

Why are spaces needed in the end of string array?

Tags:

Source

Tex Core

Related

Recent Posts