How to count all characters including spaces?

This is probably as much as you can get

pdftotext document.pdf -enc UTF-8 - | wc -m

For DVI files one can use

catdvi -e UTF-8 -s document.dvi | wc -m

(Thanks to Bob for having pointed to the -enc option and to catdvi.)


How does detex file.tex | wc -C work for you? detex removes all the tex macros, and wc -C returns the number of characters remaining. This should be a good enough proxy for characters in the output file given that there's no maths.

This obviously won't count things like running headers or other automatically generated text. For that, I guess you'd need to parse the .dvi as Bruno Le Floch suggested in comments.


A completely different approach would be to use the stdpage package. It creates 'standard pages' of 30 lines with 60 characters each (of course you can change this to different values). This approach results from the time, when people where using typewriters to write the manuscripts they hand in to their publisher. Some publishers still ask for standard pages today and pay per standard page.

The stdpage package allows you to switch between ragged and justified lines and you can turn on/off hyphenation and linenumbers. In the best case, the usage is as simple as adding

\usepackage[linenumbers,lines=30,chars=50,noindent]{stdpage}

to your preamble. As this package changes the linespacing and fonts, you will have to adapt the rest of your preamble (I had to remove a couple of packages). I personally hand in two pdfs: one with standard pages and the second one with the same text but using a nicer font, hyphenation, microtype and so on.

Tags:

Word Count