Print unmatched patterns, using grep with patterns from file

You could use grep -o to print only the matching part and use the result as patterns for a second grep -v on the original patterns.txt file:

grep -oFf patterns.txt Strings.xml | grep -vFf - patterns.txt

Though in this particular case you could also use join + sort:

join -t\" -v1 -j2 -o 1.1 1.2 1.3 <(sort -t\" -k2 patterns.txt) <(sort -t\" -k2 strings.xml)

The best approach is probably what @don_crissti suggested, so here's a variation on the same theme:

$ grep -vf <(grep -Po 'name=\K.+?"' Strings.xml) patterns.txt
"ExitWarning"
"SomeMessage"
"Help"

This basically is the inverse of @don_crissti's approach. It uses grep with Perl Compatible Regular Expressions (-P) and the -o switch to print only the matching part of the line. Then, the regex looks for name= and discards it (\K), and then looks for one or more characters until the first " (.+?"). This results in the list of patterns present in the String.txt file which is then passed as input to a reverse grep (grep -v) using process substitution (<(command)).


I would use cut, probably. That is, if, as it appears, you know where to expect the quoted string you're looking for.

If I do:

{   cut  -sd\" -f2 |
    grep -vFf- pat
}   <<\IN
#   <string name="Introduction">One day there was an apple that went to the market.</string>
#   <string name="BananaOpinion">Bananas are great!</string>
#   <string name="MessageToUser">We would like to give you apples, bananas and tomatoes.</string>
IN

...after saving my own copy of your example patterns.txt in pat and running the above command the output is:

"ExitWarning"
"SomeMessage"
"Help"

cut prints to stdout only the second " double-quote -delimited -field for each delimiter-matched line of input and -suppresses all others.

What cut actually prints at grep is:

Introduction
BananaOpinion
MessageToUser

grep searches its named file operand for lines which -v don't match the -Fixed strings in its - stdin pattern -file.

If you can rely on the second "-delimited field as the one to match, then it will definitely be an optimization over grep -Perl mode by just matching -Fixed strings and only tiny portions of them because cut does the heavy lifting - and it does it fast.

Tags:

Grep