/usr/bin/ptx: Can you provide a use case or two?

Apparently, it was used to index the Unix Reference manual in the olden days.

In the References below, the Wikipedia article explains what a permuted index is (also called KWIC, or "Keyword in context") and ends with the cryptic:

Books composed of many short sections with their own descriptive headings, most notably collections of manual pages, often ended with a permuted index section, allowing the reader to easily find a section by any word from its heading. This practice is no longer common.

More searching reveals the remaining articles in the References, which explain more about how the Unix man pages used a permuted index. It seems the main issue they were dealing with is that the man pages had no continuous numbering.

From what I gather, the practice of using a permuted index is now arcane and obsolete.

References

Key Word in Context
Reading a Permuted Index
Definition: permuted index
Unearthed Arcana: Reading a Permuted Index

@Joseph R.'s accepted answer with the history is good, but let's look at how it might be used.

ptx generates a permuted term index ("ptx") from text. An example is easiest to understand:

$ cat input
a
b
c

$ ptx -A -w 25 input
:1:            a b c
:2:        a   b c
:3:      a b   c

         ^^^^  ^ ^^^^-words to the input's right
         |     +-here is the actual input
         +-words to the input's left

Down the right you see the different words from the input and the left and right word context surrounding them. The first word is "a". It occurs on line one and is followed by "b" and "c" to its right. The second word is "b", which occurs on line two with "a" to its left and "c" to its right. Finally, "c" occurs on line three and is proceeded by "a" and "b".

Using this, you can find the line number and surrounding words to any word in a text. This sounds a lot like grep, eh? The difference is that ptx understands the structure of text, in logical units of words and sentences. This makes the contextual output of ptx more relevant when dealing with English text than grep.

Let's compare ptx and grep, using the first paragraph of James Ellroy's American Tabloid:

$ cat text
America was never innocent. We popped our cherry on the boat over and looked back with no regrets. You can’t ascribe our fall from grace to any single event or set of circumstances. You can’t lose what you lacked at conception.

Here's grep (with color matches manually changed to be surrounded by //):

$ grep -ni you text
1:America was never innocent. We popped our cherry on the boat over and looked back with no regrets. /You/ can’t ascribe our fall from grace to any single event or set of circumstances. /You/ can’t lose what /you/ lacked at conception.

Here's ptx:

$ ptx -Afo <(echo you) text
text:1:        /back with no regrets.   You can’t ascribe our fall/
text:1:     /or set of circumstances.   You can’t lose what you/
text:1:      /. You can’t lose what   you lacked at conception.

Because grep is line-oriented, and this paragraph is all one line, the grep output isn't quite as concise or helpful as the output from ptx.

You might find this collection of examples interesting:

Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux

/usr/bin/ptx: Can you provide a use case or two?

Tags:

Text Processing

History

Coreutils

Related

Recent Posts