What's the best way to read and parse a large text file over the network?

If you are reading a sequential file you want to read it in line by line over the network. You need a transfer method capable of streaming. You'll need to review your IO streaming technology to figure this out.

Large IO operations like this won't benefit much by multithreading since you can probably process the items as fast as you can read them over the network.

Your other great option is to put the log parser on the server, and download the results.


The better option, from the perspective of performance, is going to be to perform your parsing at the remote server. Apart from exceptional circumstances the speed of your network is always going to be the bottleneck, so limiting the amount of data that you send over your network is going to greatly improve performance.

This is one of the reasons that so many databases use stored procedures that are run at the server end.

Improvements in parsing speed (if any) through the use of multithreading are going to be swamped by the comparative speed of your network transfer.

If you're committed to transferring your files before parsing them, an option that you could consider is the use of on-the-fly compression while doing your file transfer. There are, for example, sftp servers available that will perform compression on the fly. At the local end you could use something like libcurl to do the client side of the transfer, which also supports on-the-fly decompression.