Find intersection of lines in two files

Simple comm + sort solution:

comm -12 <(sort file1) <(sort file2)
  • -12 - suppress column 1 and 2 (lines unique to FILE1 and FILE2 respectively), thus outputting only common lines (that appear in both files)

In awk, this loads the first file fully in memory:

$ awk 'NR==FNR { lines[$0]=1; next } $0 in lines' file1 file2 

Or, if you want to keep track of how many times a given line appears:

$ awk 'NR==FNR { lines[$0] += 1; next } lines[$0] {print; lines[$0] -= 1}' file1 file2

join could do that, though it does require the input files to be sorted, so you need to do that first, and doing it loses the original ordering:

$ join <(sort file1) <(sort file2)


awk 'NR==FNR { p[NR]=$0; next; }
   { for(val in p) if($0==p[val]) { delete p[val]; print; } }' file1 file2

This is the good solution because (for large files) it should be the fastest as it omits both printing the same entry more than once and checking an entry again after it has been matched.


grep -Fxf file1 file2

This would output the same entry several times if it occurs more than once in file2.


For fun (should be much slower than grep):

sort -u file1 >t1
sort -u file2 >t2
sort t1 t2 | uniq -d