Fastest way to read large binary file in Haskell?
To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.
In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.
The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).
Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.
With only streaming and bytestring, one can write something like:
import Data.ByteString import Streaming import qualified Streaming.Prelude as S import System.IO fromHandle :: Int -> Handle -> Stream (Of ByteString) IO () fromHandle chunkSize h = S.untilRight $ do bytes <- Data.ByteString.hGet h chunkSize pure $ if Data.ByteString.null bytes then Right () else Left bytes
null from bytestring, and
untilRight from streaming. You will need to use
withFile to get the
Handle, and consume the
Stream within the callback:
dump :: FilePath -> IO () dump file = withFile file ReadMode go where go :: Handle -> IO () go = S.mapM_ (Data.ByteString.hPut stdout) . fromHandle 4096