Grep in a huge log file (>14 GB) only the last x GB?

Solution 1:

I guess you could use tail to only output that last 4GB or so by using the -c switch

-c, --bytes=[+]NUM
output the last NUM bytes; or use -c +NUM to output starting with byte NUM of each file

You could probably do something with dd too by setting bs=1 and skiping to the offset you want to start e.g.

dd if=file bs=1024k skip=12g | grep something

Solution 2:

I'm just posting this because some of the comments asked for it.

What I end-up using was (15 GB file). It worked very fast and saved me a ton of time.

tail -f -c 14G file | grep something

I also did a very rudimentary benchmark on the same file. I tested:

grep xxx file
// took for-ever (> 5 minutes)

dd if=file bs=1 skip=14G | grep xxx
// very fast < 1 sec

tail -c 14g | grep xxx
// pretty fast < 2 sec

the tail is just a bit shorter.

NB: the suffix used g and G differ per command (Ubuntu 15.10)


Solution 3:

This doesn't answer the Title question, but it will do what you are wanting to do. Use tac to reverse the file, then use grep to find your string. If your string only occurs once or a known number of times in the file, then let it run until it finds the known number of occurrences. That way, if your assumption about where it is in the file is incorrect, it will still find it. If you do want to limit it, you can use head to do that. The head command would go between the tac and the grep.

So the command looks like:

tac < logfile | grep myString