How do Node.js Streams work?

The first thing to note is: node.js streams are not limited to HTTP requests. HTTP requests / Network resources are just one example of a stream in node.js.

Streams are useful for everything that can be processed in small chunks. They allow you to process potentially huge resources in smaller chunks that fit into your RAM more easily.

Say you have a file (several gigabytes in size) and want to convert all lowercase into uppercase characters and write the result to another file. The naive approach would read the whole file using fs.readFile (error handling omitted for brevity):

fs.readFile('my_huge_file', function (err, data) {
    var convertedData = data.toString().toUpperCase();

    fs.writeFile('my_converted_file', convertedData);
});

Unfortunately this approch will easily overwhelm your RAM as the whole file has to be stored before processing it. You would also waste precious time waiting for the file to be read. Wouldn't it make sense to process the file in smaller chunks? You could start processing as soon as you get the first bytes while waiting for the hard disk to provide the remaining data:

var readStream = fs.createReadStream('my_huge_file');
var writeStream = fs.createWriteStream('my_converted_file');
readStream.on('data', function (chunk) {
    var convertedChunk = chunk.toString().toUpperCase();
    writeStream.write(convertedChunk);
});
readStream.on('end', function () {
    writeStream.end();
});

This approach is much better:

  1. You will only deal with small parts of data that will easily fit into your RAM.
  2. You start processing once the first byte arrived and don't waste time doing nothing, but waiting.

Once you open the stream node.js will open the file and start reading from it. Once the operating system passes some bytes to the thread that's reading the file it will be passed along to your application.


Coming back to the HTTP streams:

  1. The first issue is valid here as well. It is possible that an attacker sends you large amounts of data to overwhelm your RAM and take down (DoS) your service.
  2. However the second issue is even more important in this case: The network may be very slow (think smartphones) and it may take a long time until everything is sent by the client. By using a stream you can start processing the request and cut response times.

On pausing the HTTP stream: This is not done at the HTTP level, but way lower. If you pause the stream node.js will simply stop reading from the underlying TCP socket. What is happening then is up to the kernel. It may still buffer the incoming data, so it's ready for you once you finished your current work. It may also inform the sender at the TCP level that it should pause sending data. Applications don't need to deal with that. That is none of their business. In fact the sender application probably does not even realize that you are no longer actively reading!

So it's basically about being provided data as soon as it is available, but without overwhelming your resources. The underlying hard work is done either by the operating system (e.g. net, fs, http) or by the author of the stream you are using (e.g. zlib which is a Transform stream and usually bolted onto fs or net).


The below chart seems to be a pretty accurate 10.000 feet overview / diagram for the the node streams class.

It represents streams3, contributed by Chris Dickinson.

enter image description here