Randomly mix lines of 3 million-line file

Takes only a few seconds in Python:

import random
lines = open('3mil.txt').readlines()
random.shuffle(lines)
open('3mil.txt', 'w').writelines(lines)

import random
with open('the_file','r') as source:
    data = [ (random.random(), line) for line in source ]
data.sort()
with open('another_file','w') as target:
    for _, line in data:
        target.write( line )

That should do it. 3 million lines will fit into most machine's memory unless the lines are HUGE (over 512 characters).


I just tried this on a file with 4.3M of lines and fastest thing was 'shuf' command on Linux. Use it like this:

shuf huge_file.txt -o shuffled_lines_huge_file.txt

It took 2-3 seconds to finish.

Tags:

Python

Vim

Random