Grep Match and extract

With grep -o, you will have to match exactly what you want to extract. Since you don't want to extract the proto= string, you should not match it.

An extended regular expression that would match either tcp or udp followed by a slash and some non-empty alphanumeric string is

(tcp|udp)/[[:alnum:]]+

Applying this on your data:

$ grep -E -o '(tcp|udp)/[[:alnum:]]+' file
tcp/http
tcp/https
udp/dns

To make sure that we only do this on lines that start with the string proto=:

grep '^proto=' file | grep -E -o '(tcp|udp)/[[:alnum:]]+'

With sed, removing everything before the first = and after the first blank character:

$ sed 's/^[^=]*=//; s/[[:blank:]].*//' file
tcp/http
tcp/https
udp/dns

To make sure that we only do this on lines that start with the string proto=, you could insert the same pre-processing step with grep as above, or you could use

sed -n '/^proto=/{ s/^[^=]*=//; s/[[:blank:]].*//; p; }' file

Here, we suppress the default output with the -n option, and then we trigger the substitutions and an explicit print of the line only if the line matches ^proto=.


With awk, using the default field separator, and then splitting the first field on = and printing the second bit of it:

$ awk '{ split($1, a, "="); print a[2] }' file
tcp/http
tcp/https
udp/dns

To make sure that we only do this on lines that start with the string proto=, you could insert the same pre-processing step with grep as above, or you could use

awk '/^proto=/ { split($1, a, "="); print a[2] }' file

If you are on GNU grep (for the -P option), you could use:

$ grep -oP 'proto=\K[^ ]*' file
tcp/http
tcp/https
udp/dns

Here we match the proto= string, to make sure that we are extracting the correct column, but then we discard it from the output with the \K flag.

The above assumes that the columns are space-separated. If tabs are also a valid separator, you would use \S to match the non-whitespace characters, so the command would be:

grep -oP 'proto=\K\S*' file

If you also want to protect against match fields where proto= is a substring, such as a thisisnotaproto=tcp/https, you can add word boundary with \b like so:

grep -oP '\bproto=\K\S*' file

Using awk:

awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input

$1 ~ "proto" will ensure we only take action on lines with proto in the first column

sub(/proto=/, "") will remove proto= from the input

print $1 prints the remaining column


$ awk '$1 ~ "proto" { sub(/proto=/, ""); print $1 }' input
tcp/http
tcp/https
udp/dns

Tags:

Grep

Awk