Introduction to Node.js streams

05 Apr 2018

15 min read

Node.js

TL;DR: Streams are cool and important parts of Node.js. They are very useful, because they don't exhaust memory and make possible for us to handle data before they even got entirely read. I will briefly introduce streams in this article and will show some very basic examples.

What are streams?

A stream is a sequence of data elements made available over time.

What?

OK, let’s have a look what the official website of Node.js tells us about streams:

“A stream is an abstract interface for working with streaming data in Node.js.”

Not much better… In reality, streams are not as abstract as they may seem by reading the definition.

Imagine that we have a large file, consisting of a million lines or so. If we want to read data from this file, the memory of our computer would need to work very hard and it would take some time to display the entire content of this huge file.

Instead of grabbing the content at once, we can get it in smaller pieces. We receive a small portion of the file, then another piece and another one and so on. We process the data as they arrive: We can log them to the console or display them in the browser.

These pieces or chunks are of smaller size than the original large file was and won’t have as much of an impact on the memory as if the whole content of the file was loaded.

Let’s use an analogy to make the concepts of streams more clear. After an exciting weekend shopping you place the items you want to buy on the belt at the checkout. The belt is like a stream and the products are the chunks. The guy at the till can scan the products while you are placing them on the belt and he doesn’t have to wait for you to put them all on. You place the yogurt and the capsicum on the belt (i.e. you write on the stream), while the shop assistant takes them one-by-one and reads their price.

Another good way to make the concept of streams more digestible is to imagine a real, winding stream in a forest. The stream seamlessly flows and carries some leaves on it. These leaves can represent the data. We can throw some more leaves on the stream or, if we see some nice leaves, we can take them off and keep them.

Streams are very common in Node.js applications. We have to fetch and display large files on websites or need to handle multiple tasks simultaneously. The very good thing about streams is that we can start processing data while receiving it, so we don’t have to wait for the entire content to arrive before we work with them.

Type of streams in Node js

There are four different types of streams in Node.js:

Readable - Data can be read from these streams. What does it mean? We can get or extract data from them, like we can collect the leaves from the forest stream. These are the data that usually go into our application, where we can do whatever we want with them. We can transform them, add 3 to each of them if they are numbers, prepend words to them etc., you get the idea. Examples for readable streams are fs.createReadStream or the request in the http server.

Writable - They are streams that data can be written on. In our imaginary forest it’s a stream on which we can throw leaves while enjoying the calm and the scenery. The leaves on the forest stream will be transferred; we can see how they float surrendering to gravity. An example of writable streams is process.stdout, the standard output, which is basically the terminal. console.log, which is well known by every developer, also writes on process.stdout. When we log the values of an array to the console, we simply place the data on the standard output stream in the background and will see the data in the terminal.

Duplex - We can both read from and write on these streams. Imagine that our forest stream accepts tree leaves as well as we can take leaves off of them. It’s easy to imagine. The checkout belt can also be compared to a duplex stream as we can put products on it while the sop assistant takes them off. An example for a duplex stream is net.Socket, which can be used e.g. in chat applications where users can both send and receive messages.

Transform - These types of streams “just” do what their name suggests: transform data. They receive data from one end, do something with them and then pass them on to the next phase. Similarly to duplex streams, they are both readable and writable streams. Examples are the streams of the zlib module, which does compression on the data.

Streams are event emitters

Streams inherit from and are the instances of the EventEmitter class.

What does this mean?

This means that we can listen to these events and when these events occur (i.e. are emitted), we can do various things with the stream. Emitting an event means that the stream let us know that something is happening on the stream: A chunk of data has arrived or, on the contrary, no more data will come or even something went wrong.

For example, we can listen to the data event on readable streams. This will signal that a chunk of data has arrived and is available for us to do some work or transformation with it. If we have, say, a stream of integers going from 1 to 1 000 000, the data event will let us know that a number (the chunk) has arrived and we can do whatever we want with it. For example, we can multiply the number by 2 and log the result to the console. Once number 1 is multiplied by 2, the next number (2) arrives and we can work with it as well. Eventually, we will log all even numbers from 2 to 2 000 000. The good thing with streams is that we won’t receive those 1 million numbers at once but we get them in several parts or chunks. (Note that this is a very simplified example and things are slightly more complicated in reality.)

Streams emit other events as well, such as open, close or error, and we can attach the so called listeners to these events (like the multiplication by 2 in the last example) and do something when these events occur. These listeners are functions which contains the logic we want to apply on the stream of data.

.pipe()

Whenever we come across with a readable stream, we need to extract or read the data from them; otherwise what’s the point of receiving data? Somehow we should find a way to display the data we receive or write them in a file. It would be great if we could place the data coming from a readable stream on a writable stream.

The pipe method available on readable streams greatly server this purpose and connects the source stream to the destination stream. The formula is the following:

source.pipe(destination)

Source can be a readable or a duplex stream, and the only requirement for destination is that it cannot be a readable stream (so it can be a writable, duplex or transform stream).

“Hey, I don’t really get that.”

OK, let’s take an example. Assume that we have a readable stream, say we are reading data from a big file. This is the source, from where we receive the data. Then, we call the pipe method on this readable stream and pass a writable stream, like the standard output (process.stdout) as destination. This way the data read from the file (source, readable stream) will be logged on the terminal (destination, writable stream). We will see this example later.

We can also chain multiple pipes together:

source1.pipe(destination1).pipe(destination2).pipe(destination3)

Why would we do such a horrible thing? Take the big file example again. We read the file (source1), compress the data (destination1), encrypt them (destination2) and then display them in the terminal (destination3). This is an absolute real world scenario.

Backpressure

It’s all good so far: We have a readable stream and have put the chunks of data on a writable stream.

But what happens if the data arrive too fast for the writable stream to deal with? Imagine that people are arriving at the stadium to watch the football game. One gate can only let in one person at a time. People are coming but they can progress in the stadium much slower than they arrive, so there will soon be forming a smaller then a bigger crowd at the gate. Fans must wait and line up before they can enter the stadium.

This situation can also occur with streams and data can accumulate when reading occurs faster than writing. This is called backpressure and it needs be managed otherwise the memory will be soon exhausted.

Luckily, Node.js can handle this situation and knows when to pause the readable stream and give a break to the writable one. When we call pipe, backpressure is automatically managed, and Node.js ensures that only the amount of data is let go through pipe which the writable stream can deal with.

After this short introduction, let’s have a look at some examples.

Readable streams with fs.createReadStream

An excellent use case of readable streams is when we have a large file which needs to be read and displayed in the browser or in the terminal. It can be a video or just a file with lots of data.

When we try to read the large file at once, it will be placed in the memory with its full size, which is not good because the file will load slowly and memory space is taken during that time period.

But when we wrap our large file into a stream, the computer won’t need that much memory and data can be processed (e.g. displaying it in the browser) while being read. It has to be mentioned though that some memory is used this time too (internal buffer), but its size is way smaller than it would be without streaming data.

How can we achieve that?

We can use the createReadStream method available in Node.js’s fs (file system) module. The method returns a readable stream, which also inherits from the EventEmitter class.

I created bigfile.txt, which contains 100 000 lines of code. That’s not very much but it will be sufficient to demonstrate how createReadStream works. The file looks like this:

Line 1: One line in the big data file
Line 2: One line in the big data file
Line 3: One line in the big data file
Line 4: One line in the big data file
Line 5: One line in the big data file
...

It goes to Line 100 000. No, I didn’t type it line by line. I created a writable stream which managed everything in a matter of seconds. I will write about how to do this in a future post.

First we need to require the fs module and then let’s define a readable stream called readable:

const fs = require('fs');

const readable = fs.createReadStream('./bigfile.txt', {
  encoding: 'utf8'
});

The first argument of createReadStream is the path to the file. I saved bigfile.txt inside the same folder where the file with the stream node-streams.js is located.

The second, optional argument is an option with a few properties. Here I used the encoding property, which ensures that data is put on the writable stream as a string and not a Buffer, which would be some interesting and not really human readable format.

The second parameter can also be a string, in this case it refers to the encoding:

const readable = fs.createReadStream('./bigfile.txt', 'utf8');

With createReadStream we can read the content of bigfile.txt in chunks which does not exhaust the memory.

Now we can listen to the events the readable stream emits (remember, streams are event emitters as well):

readable.on('data', (chunk) => {
  console.log('Next chunk is:');
});

readable.on('error', (err) => {
  console.log(err);
});

readable.on('close', () => {
  console.log('File reading finished');
});

Whenever a chunk of data arrives, the data event is emitted and we can listen to it by logging a simple message to the console (“Next chunk is:”).

Similarly, we can log the error if something went wrong while reading the data and the error event is emitted in these cases.

Finally, when there is no more data to read from the file, the close event is emitted and we can acknowledge this fact by sending a note to the console.

Now that our simple readable stream is ready, we can pipe it to a writable stream, which will in this case be the terminal (process.stdout):

readable.pipe(process.stdout);

If we run Node.js from the file (node node-streams.js), we will get 100 000 lines (or whatever data you created) displayed in the terminal.

Note that the standard output is not the only writable stream we can pipe to. It’s possible to copy data from one file to another by creating a writable stream and this case will be discussed in a future article.

Displaying content of file in the browser

createServer in the http module of Node.js returns a new instance of http.Server, which inherits from net.Server, which is an EventEmitter. This means that it emits the usual event emitter events plus some other ones.

Let’s create a simple Node.js server:

const http = require('http');
const fs = require('fs');

const server = http.createServer();

I save this code in a file called nodejs-streams-2.js.

We require the http and the fs modules and create a server, and then store it in the expressively named variable server.

server emits the request event when a request is coming from the client (browser), and it occurs when we start the server and enter the URL in the address bar of the browser.

As request is an event, we can listen to it and explicitly tell what we are going to do:

server.on('request', (req, res) => {
  if (req.url === '/readable') {
    const src = fs.createReadStream('./bigfile.txt');
    src.pipe(res);
  } else {
    res.write('Data not found.');
    res.end();
  }
});

So on emitting the request event, we call a callback function. Our callback has two parameters, req stands for request and res for response.

req (from the client) is a readable stream and res (from the server) is a writable stream.

Let’s display the data only when we have a specific URL, such as /readable. If we type the /readable URL, we create a readable stream with fs.createReadStream called readable, and read data from bigfile.txt. Then we pipe it to res, which is a writable stream. This will result in the content of the big file being displayed in the browser.

On any other URL we display a simple “Data not found.” message using the write method available on writable streams. The end method will signal that there’s no more data will be coming to the writable stream.

All we have left now is to listen to a port:

server.listen(3000);

Now, if we open up the terminal and enter node nodejs-streams-2.js, we will start the server but nothing else will happen yet. Head over to the browser, enter localhost:3000, and you will see the “No data found.” message in the browser. Now change the URL to localhost:3000/readable and the content of bigfile.txt will be read and displayed.

Again, reading 100 000 lines of code is no big deal, but imagine that you need to display a video, which has considerably greater size, or millions of lines of content. We can make a good use of streamed requests in these cases.

Conclusion

Streams are good and cool and are around in Node.js. The concept might seem to foreign at first but a bit of practice one can become more comfortable with them.

I hope you found this post useful. If so, see you next time!