Delete lines before and after a match in bash (with sed or awk)?

an awk one-liner may do the job:

awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file

test:

kent$  cat file
######
foo
D28/10/2011
T-3.48
PINITIAL BALANCE
M
x
bar
######
this line will be kept
here
comes
PINITIAL BALANCE
again
blah
this line will be kept too
########

kent$  awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}{a[NR]=$0}END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' file
######
foo
bar
######
this line will be kept
this line will be kept too
########

add some explanation

  awk '/PINITIAL BALANCE/{for(x=NR-2;x<=NR+2;x++)d[x];}   #if match found, add the line and +- 2 lines' line number in an array "d"
      {a[NR]=$0} # save all lines in an array with line number as index
      END{for(i=1;i<=NR;i++)if(!(i in d))print a[i]}' #finally print only those index not in array "d"
     file  # your input file

sed will do it:

sed '/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'

It works this way:

  • if sed has only one string in pattern space it joins another one
  • if there are only two it joins the third one
  • if it does natch to pattern LINE + LINE + LINE with BALANCE it joins two following strings, deletes them and goes at the beginning
  • if not, it prints the first string from pattern and deletes it and goes at the beginning without swiping the pattern space

To prevent the appearance of pattern on the first string you should modify the script:

sed '1{/PINITIAL BALANCE/{N;N;d}};/\n/!N;/\n.*\n/!N;/\n.*\n.*PINITIAL BALANCE/{$d;N;N;d};P;D'

However, it fails in case you have another PINITIAL BALANCE in string which are going to be deleted. However, other solutions fails too =)


For such a task, I would probably reach for a more advanced tool like Perl:

perl -ne 'push @x, $_;
          if (@x > 4) {
              if ($x[2] =~ /PINITIAL BALANCE/) { undef @x }
                  else { print shift @x }
          }
          END { print @x }' input-file > output-file

This will remove 5 lines from the input file. These lines will be the 2 lines before the match, the matched line, and the two lines afterwards. You can change the total number of lines being removed modifying @x > 4 (this removes 5 lines) and the line being matched modifying $x[2] (this makes the match on the third line to be removed and so removes the two lines before the match).

Tags:

Shell

Awk

Sed