A brief introduction to the child process module

It frequently happens that a process, either another Node process or a different one, needs to be run in our application. For example, if Docker is needed for a code to work, first it might be a good idea to check if it's installed, and follow different paths depending on the result. In this post, I'll give a brief overview of the child_process module, which allows spawning additional processes.

Node.js really shines when it needs to execute I/O (input/output) tasks, like http requests or database read/write tasks.

Most of the I/O tasks are handled by the event loop. But what about high CPU tasks like heavy calculations, archiving or the operations of the crypto module?

These operations may take a long time to complete and as such, they can have a bad effect on performance, because they block the server.

One possible solution is to fork these operations to a child process.

The child_process module

Node.js’s child_process module has some methods that let us do the CPU intense tasks in a different process.

Child processes are independent from their parents (the process from where they have been forked from), which means that they won’t block the flow of the parent process and the Node.js event loop. However, a communication channel is established between the parent and the child process so that the parent process can be notified when the child has done its job.

The main child_process`methods

Node.js offers four methods for forking: spawn, fork, exec and execFile.

The main or default method is spawn, and the other methods are built on top of it. There are some differences between the methods, which will briefly be discussed below.

The methods come in both asynchronous and synchronous versions. In most cases, it’s recommended to use the asynchronous methods, because the synchronous ones block the Node.js event loop, so it doesn’t make much sense to spawn a different process. Therefore the use cases of the synchronous methods are limited, and most of the time the asynchronous versions should be used.

Because of this, in the rest of this post, I’ll only discuss the asynchronous methods and their features.

ChildProcess

Each (asynchronous) method returns a ChildProcess instance, which inherits from EventEmitter. This means that the parent process can set up listeners which are called when certain events occur in the child process.

Examples of these events are when the child process produces some data (result) or it terminated after it had finish running the code it was supposed to.

Let’s then see a brief overview of each method and some examples for them. A more detailed explanation of these methods will occur in future posts dedicated to each method.

spawn()

As discussed above, spawn is the default child process method. It uses streams, which makes processing large data more seamless by having no memory limitation issues.

With spawn, we can run various commands, and they don’t have to be written in Node.js. Commands of other programming languages can also be executed in a separate Node.js process with the help of spawn.

spawn has one mandatory parameter, the command we want to execute in the child process. The second, optional parameter is for the command arguments, and they must be written in an array. With the help of the third, also optional argument, we can change the default settings. These options will be discussed in more detail in a future post on spawn.

Let’s have a look at a very simple Linux command in a file called spawn.js:

const { spawn } = require('child_process')

const ls = spawn('ls', ['-la'])

ls.stdout.on('data', (data) => {
  console.log(`stdout: ${ data }`)
})

ls.on('close', (code) => {
  console.log(`Process exited with code ${ code }`)
})

console.log('This will be written to the console first.')

The command we want to execute is the good old ls -la to have a look at the content of the current folder. Let’s do node spawn.js to see the result.

-la is the argument of the ls command and as such, we have to put it in an array.

ls is a ChildProcess instance, and as such, it emits events and we can register listeners to them in the parent process (the body of the spawn.js file).

It uses streams: stdout is a readable stream on ls and it represents the child process’s standard output. We can listen to its data event, which in this case will be the result of the ls -la command, i.e. the content of the current folder.

Another event ls emits is close. This event is emitted when the streams of the child process (stdin, stdout and stderr) are closed. The first parameter of the listener (code) is the exit code of the child process. If everything goes well, this code should be 0.

fork()

This method is the special case of spawn because it creates a new Node.js process. This means that it also returns a ChildProcess instance and everything else is the same as it was in the case of spawn.

Its mandatory first argument is the path where the module we want to run in a child process can be found.

fork is the little brother of spawn, and the only difference is that fork can only run Node processes, while spawn can run any commands.

exec()

The exec method differs from the other two in the way that it allows an optional callback which runs when the child process has terminated. With exec, we are not tied to the event style and it’s possible to use the traditional callback style of Node.js.

The first argument will be the command itself. The callback, if specified, will come with the error, stdout and stderr arguments. The error code (if applicable) will be available from error, which is the instance of the Error object:

const { exec } = require('child_process')

const ls = exec('ls -la', (err, stdout, stderr) => {
  if (err) {
    console.log(`Error with code ${ err.code } and signal ${ err.signal }`)
  }

  console.log(`stdout: ${ stdout }`)
  console.log(`stderr: ${ stderr }`)
})

console.log('this comes first')

The output should be the content of the current folder, just like with spawn above.

Of course, the stream-based approach also works, and it will result in the same output to the console:

ls.stdout.on('data', (data) => {
  console.log(`from stdout: ${ data }`)
})

exec creates a shell and executes the given command (any command) in that shell.

execFile()

execFile is very similar to exec. The difference is that we have to specify the executable file or the path to that file as the first argument and it doesn’t create a shell by default. The shell can be spawn as an option, though.

Assume we want to run a bash script written in the echo.sh file, which is in the same directory as the file where the execFile is run from:

#!/bin/sh
echo 'This is displayed with execFile.'

We can refer to this file from execFile’s second argument:

const { execFile } = require('child_process')

execFile('sh', ['~/PATH_TO_THE_CURRENT_FOLDER/echo.sh'], (err, stdout, stderr) => {
  if (err) {
    console.log(`Error: ${ err }`)
  }

  console.log(`stdout: ${ stdout }`) // stdout: This is displayed with execFile.
  console.log(`stderr: ${ stderr }`)
})

This method is great for short data (like checking if Docker is installed) and as we have seen, any command can be used.

Conclusion

The child_process module of Node.js allows us to run calculation-heavy and time consuming commands in another process. This way the parent process won’t be blocked by time consuming code.

The methods come in both asynchronous and synchronous versions but we’ll need the async methods most of the time. These four methods (spawn, fork, exec and execFile) were discussed above.

This concludes the short introduction to child_process and more detailed posts on each method as well as the ChildProcess instance will follow soon.

Thanks for reading and see you next time.