How to print the number of characters in each line of a text file

I've tried the other answers listed above, but they are very far from decent solutions when dealing with large files -- especially once a single line's size occupies more than ~1/4 of available RAM.

Both bash and awk slurp the entire line, even though for this problem it's not needed. Bash will error out once a line is too long, even if you have enough memory.

I've implemented an extremely simple, fairly unoptimized python script that when tested with large files (~4 GB per line) doesn't slurp, and is by far a better solution than those given.

If this is time critical code for production, you can rewrite the ideas in C or perform better optimizations on the read call (instead of only reading a single byte at a time), after testing that this is indeed a bottleneck.

Code assumes newline is a linefeed character, which is a good assumption for Unix, but YMMV on Mac OS/Windows. Be sure the file ends with a linefeed to ensure the last line character count isn't overlooked.

from sys import stdin, exit

counter = 0
while True:
    byte = stdin.buffer.read(1)
    counter += 1
    if not byte:
        exit()
    if byte == b'\x0a':
        print(counter-1)
        counter = 0

Here is example using xargs:

$ xargs -d '\n' -I% sh -c 'echo % | wc -c' < file

while IFS= read -r line; do echo ${#line}; done < abc.txt

It is POSIX, so it should work everywhere.

Edit: Added -r as suggested by William.

Edit: Beware of Unicode handling. Bash and zsh, with correctly set locale, will show number of codepoints, but dash will show bytes—so you have to check what your shell does. And then there many other possible definitions of length in Unicode anyway, so it depends on what you actually want.

Edit: Prefix with IFS= to avoid losing leading and trailing spaces.


Use Awk.

awk '{ print length }' abc.txt

Tags:

Unix

Shell

Awk

Sed