Get line number from byte offset

In your example,

001
002
003
004

byte number 8 is the second newline, not the 0 on the next line.

The following will give you the number of full lines after $b bytes:

$ dd if=data.in bs=1 count="$b" | wc -l

It will report 2 with b set to 8 and it will report 1 with b set to 7.

The dd utility, the way it's used here, will read from the file data.in, and will read $b blocks of size 1 byte.

As "icarus" rightly points out in the comments below, using bs=1 is inefficient. It's more efficient, in this particular case, to swap bs and count:

$ dd if=data.in bs="$b" count=1 | wc -l

This will have the same effect as the first dd command, but will read only one block of $b bytes.

The wc utility counts newlines, and a "line" in Unix is always terminated by a newline. So the above command will still say 2 if you set b to anything lower than 12 (the following newline). The result you are looking for is therefore whatever number the above pipeline reports, plus 1.

This will obviously also count the random newlines in the binary blob part of your file that precedes the ASCII text. If you knew where the ASCII bit starts, you could add skip="$offset" to the dd command, where $offset is the number of bytes to skip into the file.


Currently there is no dedicated tool like that, although it can be done fairly easily in python:

#!/usr/bin/env python3
import sys
import os

offset = int(sys.argv[2])
newline = 1
with open(sys.argv[1]) as fd:
    fd.seek(offset)
    while True:
        try:
            byte = fd.read(1)
            if byte == '\n': newline+=1
            #print(byte)
            offset = offset - 1
            fd.seek(offset)
        except ValueError:
            break
print(newline)

Usage is simple:

line4byte.py <FILE> <BYTE>

Test run:

$ cat input.txt
001
002
003
004
$ chmod +x ./line4byte.py                                                     
$ ./line4byte.py input.txt 8                                                  
3

This is a very quick and simple script. It doesn't check if the file is empty or not, so it works only on non-empty files.


Track the bytes seen and emit the current line number should the given offset be within the sum:

perl -E '$off=shift;while(<>){$sum+=length;if($sum>=$off){say $.;exit}}' 8 file

Or at length:

#!/usr/bin/env perl
use strict;
use warnings;
die "Usage: $0 offset file|-\n" if @ARGV != 2;
my $offset = shift;
shift if $ARGV[0] eq '-';
my $sum;
while (readline) {
    $sum += length;
    if ($sum >= $offset) {
        print "$.\n";
        exit;
    }
}
exit 1;