Print lines in one file matching patterns in another file

Try grep -Fwf file2 file1 > out

The -F option specifies plain string matching, so should be faster without having to engage the regex engine.


Here's how to do it in awk:

awk 'NR==FNR{pats[$0]; next} $2 in pats' File2 File1

Using a 60,000 line File1 (your File1 repeated 8000 times) and a 6,000 File2 (yours repeated 1200 times):

$ time grep -Fwf File2 File1 > ou2

real    0m0.094s
user    0m0.031s
sys     0m0.062s

$ time awk 'NR==FNR{pats[$0]; next} $2 in pats' File2 File1 > ou1

real    0m0.094s
user    0m0.015s
sys     0m0.077s

$ diff ou1 ou2

i.e. it's about as fast as the grep. One thing to note though is that the awk solution lets you pick a specific field to match on so if anything from File2 shows up anywhere else in File1 you won't get a false match. It also lets you match on a whole field at a time so if your target strings were various lengths and you didn't want "scign000003" to match "scign0000031" for example (though the -w for grep gives similar protection for that).

For completeness, here's the timing for the other awk solution posted elsethread:

$ time awk 'BEGIN{i=0}FNR==NR{a[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,a[j]))print $0}' File2 File1 > ou3

real    3m34.110s
user    3m30.850s
sys     0m1.263s

and here's the timing I get for the perl script Mark posted:

$ time ./go.pl > out2

real    0m0.203s
user    0m0.124s
sys     0m0.062s