Fastest way to read large binary file in Haskell?

To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.

In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.

The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).

Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.


With only streaming and bytestring, one can write something like:

import           Data.ByteString
import           Streaming
import qualified Streaming.Prelude as S
import           System.IO

fromHandle :: Int -> Handle -> Stream (Of ByteString) IO ()
fromHandle chunkSize h = 
    S.untilRight $ do bytes <- Data.ByteString.hGet h chunkSize
                      pure $ if Data.ByteString.null bytes then Right ()
                                                           else Left bytes

Using hGet, null from bytestring, and untilRight from streaming. You will need to use withFile to get the Handle, and consume the Stream within the callback:

dump :: FilePath -> IO ()
dump file = withFile file ReadMode go
 where
   go :: Handle -> IO ()
   go = S.mapM_ (Data.ByteString.hPut stdout) . fromHandle 4096 

Tags:

Haskell