What is the use of buffering in python's built-in open() function?

Enabling buffering means that you're not directly interfacing with the OS's representation of a file, or its file system API. Instead, a chunk of data is read from the raw OS filestream into a buffer until it is consumed, at which point more data is fetched into the buffer. In terms of the objects you get, you'll get a BufferedIOBase object wrapping an underlying RawIOBase (which represents the raw file stream).

What is the benefit of this? Well interfacing with the raw stream might have high latency, because the operating system has to fool around with physical objects like the hard disk, and this may not be acceptable in all cases. Let's say you want to read three letters from a file every 5ms and your file is on a crusty old hard disk, or even a network file system. Instead of trying to read from the raw filestream every 5ms, it is better to load a bunch of bytes from the file into a buffer in memory, then consume it at will.

What size of buffer you choose will depend on how you're consuming the data. For the example above, a buffer size of 1 char would be awful, 3 chars would be alright, and any large multiple of 3 chars that doesn't cause a noticeable delay for your users would be ideal.


You can also check the default buffer size by calling the read only DEFAULT_BUFFER_SIZE attribute from io module.

import io
print (io.DEFAULT_BUFFER_SIZE)

As described here