Can a Perl program know the line number where __DATA__ begins?

File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.

Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..

However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.

To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.

#!/usr/bin/perl
use strict;
use warnings qw( all );

use Fcntl qw( SEEK_SET );

# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
   my ($fh) = @_;
   ( my $initial_pos = tell($fh) ) >= 0
      or return undef;
   seek($fh, 0, SEEK_SET)
      or return undef;

   $. = 1;
   while (<$fh>) {
      ( my $pos = tell($fh) ) >= 0
         or last;

      if ($pos >= $initial_pos) {
         if ($pos > $initial_pos) {
            seek($fh, $initial_pos, SEEK_SET) 
               or die("Can't reset handle: $!\n");
         }

         return $.;
      }
   }

   seek($fh, $initial_pos, SEEK_SET)
      or die("Can't reset handle: $!\n");

   $. = 1;
   return undef;
}

my $prefix = fix_line_number(\*DATA) ? "" : "+";

while (<DATA>) {
   printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}

__DATA__
foo
bar
baz

Output:

$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz

$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz

Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.

For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:

$ cat Foo.pm
package Foo;

1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
   my $ln = B::svref_2object(\*Foo::DATA)->LINE;
   warn "__DATA__ at line $ln\n";
   Foo::DATA->input_line_number($ln);
   while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.

In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an @INC hook:

$ cat DH.pm
package DH;

unshift @INC, sub {
        my ($sub, $fname) = @_;
        for(@INC){
                if(open my $fh, '<', my $fpath = "$_/$fname"){
                        $INC{$fname} = $fpath;
                        return \'', $fh, sub {
                                our (%ln, %pos);
                                if($_){ $pos{$fname} += length; ++$ln{$fname} }
                        }
                }
        }
};
$ cat Bar.pm
package Bar;

print while <DATA>;

1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
    my $fn = "Bar.pm";
    warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
    seek Bar::DATA, $DH::pos{$fn}, 0;
    Bar::DATA->input_line_number($DH::ln{$fn});
    while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.

Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:

print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...

You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:

use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die $@ if $@;
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...

None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.


In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:

# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));

# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my @THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;

Tags:

Perl