In R, find whether two files differ

Without using memory, if the files are too large:

library(tools)
md5sum("file_1.txt") == md5sum("file_2.txt")

the closest to the unix command is diffr - it shows a really nice side by side window with all the different lines marked in color.

library(diffr)
diffr(filename1, filename2)

shows

enter image description here


I realize this is not exactly what you're asking for, but I post it for the benefit of others who run into this question wanting to see the full diff and willing to tolerate external dependencies. In that case, diffobj will show them to you with a real diff that works on windows, with the same algorithm as GNU diff. In this example, we compare the Moby Dick text to a version of it with 5 lines modified:

library(diffobj)
diffFile(mob.1.txt, mob.2.txt)   # or `diffChr` if you data in R already

Produces:

enter image description here

If you want something faster while still getting the locations of the differences you can get the shortest edit script, from the same package:

ses(readLines(mob.1.txt), readLines(mob.2.txt))
# [1] "1127c1127"   "2435c2435"   "6417c6417"   "13919c13919"

Code to get the Moby Dick data (note I didn't set seed, so you'll get different lines):

moby.dick.url <- 'http://www.gutenberg.org/files/2701/2701-0.txt'
moby.dick.raw <- moby.dick.UC <- readLines(moby.dick.url)
to.UC <- sample(length(moby.dick.raw), 5)
moby.dick.UC[to.UC] <- toupper(moby.dick.UC[to.UC])

mob.1.txt <- tempfile()
mob.2.txt <- tempfile()

writeLines(moby.dick.raw, mob.1.txt)
writeLines(moby.dick.UC, mob.2.txt)

Tags:

Diff

R