How to sed -e 's///' everything except a specific pattern?

You might be better off using grep -o in this case:

grep -oP '\B%[0-9]{1,3}\b' inputfile

Assuming that your version of grep supports Perl compatible regular expressions (-P). Otherwise:

grep -o '\B%[0-9]\{1,3\}\b' inputfile

Using GNU sed, one could transliterate spaces to newlines and get the desired lines:

sed 'y/ /\n/' inputfile | sed '/^%[0-9]\{1,\}/!d'

$ sed 's/^.*\(%[0-9]\+\).*$/\1/' input

Assuming that a line contains at most one of those %123 tokens and that every line contains such a token.

The \( \) meta character mark a match-group - which is then referenced in the substitution via the \1 back-reference. ^/$ match the beginning/end of a line.

Otherwise you can pre-filter the input, e.g.:

$ grep '%[0-9]\+' input | sed 's/^.*\(%[0-9]\+\).*$/\1/'

(when not all lines contain such a token)

Another variant:

$ sed 's/\(%[0-9]\+\)/\n\1\n/g' | grep '%[0-9]'

(when a line may contain multiple of those tokens)

Here are line breaks inserted directly before and after each token - in the first part of the pipe. Then the grep part removes all non %123 token lines.


When working with sed it's almost always advisable to:

/address then/s/earch/replace/

There are two reasons for this. The first is that with multiple lines /addressing/ is faster - it's optimized only to find a match and doesn't bother selecting only portions of a line for editing and so it can narrow the results sooner.

The second reason is that you can play multiple edit operations off of the same address - it makes things much easier.

Of course, in this case, given only the data you show, it makes no practical difference. Still, this is how I would do the thing you ask about:

sed '/^[^%]*\|[^0-9]*$/s///g' <<\DATA
    1: [18x14] [history 1/2000, 268 bytes] %3
    2: [18x14] [history 1/2000, 268 bytes] %4 (active)
DATA

#OUTPUT
%3
%4

It just selects all characters that are non-% characters from the beginning of the line and all non-numeric characters from the end of the line in the address and then removes them with s/// - and that that's that.

In it's current form it might mangle data in unexpected ways if you feed it lines not containing a %digit combo - and that's why addressing is important. If we alter it a little:

/%[0-9]/s/[^%]*\|[^0-9]*$//g

It gets safer and faster.

Tags:

Sed