How to cut a file starting from the line in which a certain pattern occurs?

You should be able to do it by just truncating the file in place without having to write a new copy of the file like sed -i/perl -i/ed/gawk -i inplace would do. With perl:

find . -name '*.txt' -type f -exec perl -ne '
  BEGIN{@ARGV=map{"+<$_"}@ARGV} # open files in read+write mode in the
                                # while(<>) loop implied by -n
  if (/END DATA/) {
    seek ARGV,-length,1; # back to beginning of matching line
    print ARGV "NEW END\n";
    truncate ARGV, tell ARGV;
    close ARGV; # skip to next file
  }' {} +

That minimises the I/O as perl stops reading as soon as it finds a match, and NEW END\n is the only thing it writes. It also writes in place, so the files metadata (ownership, permission, acls, sparseness...) are preserved and hard links are not broken.

With -exec {} + we also minimise the number of perl invocations.


It sounds like the sequence of commands you're looking for is

/END DATA/,$d
q
.a
NEW END
.
wq

or as a one-liner

printf '%s\n' '/END DATA/,$d' 'q' '.a' 'NEW END' '.' 'wq'

(You can replace wq with ,p for testing.)

Ex. given

$ cat file
Data 1
Data 2
something_unimportant_here END DATA
Rubbish 1
Rubbish 2

then

$ printf '%s\n' '/END DATA/,$d' 'q' '.a' 'NEW END' '.' 'wq' | ed -s file

gives

$ cat file
Data 1
Data 2
NEW END

With GNU grep and GNU sed

grep -lZ 'END DATA' *.txt | xargs -0 sed -i -e '/END DATA/,${//i foo' -e 'd}'

where *.txt assumes all your files are in current directory ending with .txt extension. If you need to recursively search for files, GNU grep also supports -r/-R options.

/END DATA/,$ range of lines to operate

//i foo here // will match the previously used regex, i.e. /END DATA/ and i command will add the new ending marker as needed

as i command has to be separated by newline, -e option is used to separate the d command to delete all lines matched by the range

as an alternate, you can also use this, but only one file will be passed at a time to sed:

grep -lZ 'END DATA' *.txt | xargs -0 -n1 sed -i -e '/END DATA/{i foo' -e 'Q}'

Tags:

Sed

Ed

Files