How to remove specific numbers from a txt file with SED or AWK?

It is fairly straight forward with AWK, b/c usually AWK doesn't do anything, so we just need to tell it when to do things, i.e. print the ID at the beginning of the line, if it's there

/^[0-9]+-[0-9]+\.[0-9]+\.501\.[0-9]+/{
    print $1
}

With sed it's a little different, b/c by default sed will print everything. (At least that's how these tools have been working for me.) First, we need to invoke sed as sed -n, to change its default behaviour to not do anything. Then we can

s/^\([0-9]\+-[0-9]\+\.[0-9]\+\.501\.[0-9]\+\).*$/\1/p

We need the p at the end to tell sed to print the result, if we had a matching pattern. Your particular sed expression is a NOOP because it replaces every match with itself and prints everything else as it was.


It does work, but you don't change anything, or rather change it to what it was. But with very small modification of this code you can get what you want:

sed -n 's/\([0-9]*\-[0-9]*\.[0-9]*\.501\.[0-9]*\).*/\1/p'

Notice three things:

  • -n switch, it means to not print anything by default
  • .* at the end of the group selected with (...)
  • p as a last command means print this line

Result:

010010-26.2010.501.0026
0011-15.2016.501.0012
0011-125.2013.501.0012

BTW, you can simplify a little by adding -E and using extended regular expression, i.e. get rid of backslashes in front of capturing groups:

sed -E -n 's/([0-9]*-[0-9]*\.[0-9]*\.501\.[0-9]*).*/\1/p'

Both ways work on mentioned webpage.