What conditions must be met for a file to be a text file as defined by POSIX?

  1. Must a text file be a regular file? In the above excerpt it does not explicitly say the file must be a regular file

    No; the excerpt even specifically notes standard input as a potential text file. Other standard utilities, such as make, specifically use the character special file /dev/null as a text file.

  2. Can a file be considered a text file if contains one character and one character only (i.e., a single character that isn't terminated with a newline)?

    That character must be a <newline>, or this isn't a line, and so the file it's in isn't a text file. A file containing exactly byte 0A is a single-line text file. An empty line is a valid line.

  3. In the above excerpt, it makes reference to "lines". I found four definitions with line in their name: "Empty Line", "Display Line", "Incomplete Line" and "Line". Am I supposed to infer that they mean "Line" because of their omission of "Empty", "Display" and "Incomplete"

    It's not really an inference, it's just what it says. The word "line" has been given a contextually-appropriate definition and so that's what it's talking about.

  4. Can I safely infer that if a file is empty, it is not a text file because it does not contain one or more characters?

    An empty file consists of zero (or more) lines and is thus a text file.

  5. Does the "zero" in "zero or more lines" mean that a file can still be considered a text file if it contains one or more characters that are not terminated with newline?

    No, these characters are not organised into lines.

  6. Does "zero or more lines" mean that once a single "Line" (0 or more characters plus a terminating newline) comes into play, that it becomes illegal for the last line to be an "Incomplete Line" (one or more non-newline characters at the end of a file)?

    It's not illegal, it's just not a text file. A utility requiring a text file to be given to it may behave adversely if given that file instead.

  7. Does "none [no line] can exceed {LINE_MAX} bytes in length, including the newline character" mean that there a limitation to the number of characters allowed in any given "Line" in a text file

    Yes.

This definition is just trying to set some bounds on what a text-based utility (for example, grep) will definitely accept — nothing more. They are also free to accept things more liberally, and quite often they do in practice. They are permitted to use a fixed-size buffer to process a line, to assume a newline appears before it's full, and so on. You may be reading too much into things.


As defined by POSIX:

Yes, a text file is (basically):

A file that contains characters organized into zero or more lines.

It would be useful to also include this definitions:

3.92 Character String

A contiguous sequence of characters terminated by and including the first null byte.

3.195 Incomplete Line

A sequence of one or more non- <newline> characters at the end of the file.

3.206 Line

A sequence of zero or more non- <newline> characters plus a terminating <newline> character.

3.243 Newline Character (<newline>)

A character that in the output stream indicates that printing should start at the beginning of the next line. It is the character designated by '\n' in the C language. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the next line.

3.247 NUL

A character with all bits set to zero.

Note that a "Text File" shall not contain NUL bytes.


So:

  1. Must a text file be a regular file?
    No, it does not need to be. A "text file" is defined in terms of what it contains when read. If a file contains "zero or more lines" it is a text file. Some file, like /dev/stdin, might contain a text file if read at one time and not on the next time it is read.
  2. Can a file be considered a text file if contains one character and one character only … ?
    No, that's an incomplete line (3.195).
    A text file shall have only non-"Incomplete Lines".
  3. Am I supposed to infer that they mean "Line" … ?
    Yes, you should.
  4. Can I safely infer that if a file is empty, it is not a text file … ?
    No, an empty file (zero characters) is a valid "text file".
    From above: …zero or more lines…. Zero lines (zero characters) is a valid "Text file".
  5. … considered a text file if it contains one or more characters that are not terminated with newline?
    No, an "Incomplete Line" in not (technically) a valid "line".
  6. Does the "zero" in "zero or more lines" mean that a file can still be considered a text file if it contains one or more characters that are not terminated with newline?
    No, an incomplete line is not a "Line". A text file shall not have incomplete lines.

  7. … there a limitation to the number of characters allowed in any given "Line" in a text file … ?
    Yes, no more than {LINE_MAX} bytes (as opposed to characters) shall be allowed in any given line of a valid "text file".
    The value of {LINE_MAX} is given in the file <limits.h>
    (also read Sensible line buffer size in C?):

    {LINE_MAX}
    Unless otherwise noted, the maximum length, in bytes, of a utility's input line (either standard input or another file), when the utility is described as processing text files. The length includes room for the trailing .
    Minimum Acceptable Value: {_POSIX2_LINE_MAX}

    For a GNU based system there is no set limit (except memory):

    Macro: int LINE_MAX
    The largest text line that the text-oriented POSIX.2 utilities can support. (If you are using the GNU versions of these utilities, then there is no actual limit except that imposed by the available virtual memory, but there is no way that the library can tell you this.)

    It seems to be defined in posix_lim.h to be 2048 (at least for 64 bit linux GNU systems):

    $ grep -ri 'POSIX2_LINE_MAX' /usr/include/ 
    
    /usr/include/x86_64-linux-gnu/bits/xopen_lim.h:#define NL_LANGMAX       _POSIX2_LINE_MAX
    /usr/include/x86_64-linux-gnu/bits/posix2_lim.h:#define _POSIX2_LINE_MAX                2048
    /usr/include/x86_64-linux-gnu/bits/posix2_lim.h:#define LINE_MAX                _POSIX2_LINE_MAX
    

    It may, also, be found using the POSIX utility getconf:

    $ getconf LINE_MAX
    2048
    

Related: Why should text files end with a newline?

Tags:

Text

Posix

Files