How do I sort very large files

That isn't exactly a Java problem. You need to look into an efficient algorithm for sorting data that isn't completely read into memory. A few adaptations to Merge-Sort can achieve this.

Take a look at this: http://en.wikipedia.org/wiki/Merge_sort

and: http://en.wikipedia.org/wiki/External_sorting

Basically the idea here is to break the file into smaller pieces, sort them (either with merge sort or another method), and then use the Merge from merge-sort to create the new, sorted file.


Since your records are already in flat file text format, you can pipe them into UNIX sort(1) e.g. sort -n -t' ' -k1,1 < input > output. It will automatically chunk the data and perform merge sort using available memory and /tmp. If you need more space than you have memory available, add -T /tmpdir to the command.

It's quite funny that everyone is telling you to download huge C# or Java libraries or implement merge-sort yourself when you can use a tool that is available on every platform and has been around for decades.


You need an external merge sort to do that. Here is a Java implementation of it that sorts very large files.

Tags:

Java

Sorting

File