2015-11-09

A NodeJS Webserver

Let's write our own webserver using NodeJS!

This is only a stepping-stone on the route to building a WebSockets-based chat server in this multipart series.

Part 0: A NodeJS Webserver
Part 1: Intro to WebSockets, conceptually
Part 2: Writing the client side chat code for the browser in JavaScript
Part 3: Writing the server-side chat code in NodeJS

If you want to jump right on in, get the full source on GitHub.

Prerequisites

Since we're going to be using NodeJS for this project, pop on over to that site and get 'er installed.

Basic familiarity with JavaScript is also assumed, notably comfort with asynchronous functions and callbacks.

And we'll be running the server from the command line.

What is a webserver, anyway?

An animation's worth a thousand pictures, right?

Web server and client (probably a browser) at play. The image on the left represents a big stack of disk platters, whatever those are.

It's a program that sits there waiting for connections from web clients. These clients are usually web browsers, but can be a variety of things, including curl or wget. Basically anything that speaks the HTTP protocol can connect to a web server (because the server also speaks HTTP.)

Once connected, the client makes an HTTP request, and the server provides an HTTP response. For example, a client might say, "Give me goats.jpg", and the server might respond, "Here is goats.jpg," followed by the data. Or the server might respond, "404 goats.jpg not found!"

In the simple case, the client receives the data, and the connection is closed, completing the transaction. (HTTP has a lot of features, some of which don't conform to the previous sentence, but we won't be going into those here.) The client then displays the image, or saves it, or whatever the client is programmed to do.

Once you run this webserver on your computer, you'll be able to connect to it from your web browser and have it serve web pages and images to your browser.

A quick note on the Right Thing™

Although we're implementing this in NodeJS, we're actually doing it the difficult, improper way. We're reinventing the wheel for learning and demonstration purposes.

For real life, there are libraries that help writing webservers in NodeJS, and those are what you'd be more likely to use. (See Connect and Serve Static.) But using those is too simple to learn anything from, so we'll roll our own for fun.

General attack plan

The basic approach is this:

Listen for incoming connections.
When one arrives, examine it to determine the file that is requested.
Make sure the file exists. If not, return a 404.
As a bonus, check if the file is a directory. If it is, try adding a "/index.html" to the end to see if that file exists. If not, return a 404.
If the file exists, look at its extension (e.g. ".html" or ".js") and determine it's MIME type for the response (e.g. text/html or application/javascript).
Read the file into memory.
Write a 200 (success) response header with the MIME type, and write the file into the response.
Kick up your heels and have a pint for a job well-done.
Return to step 1.

And just keep doing that forever.

I mentioned the HTTP header, above. This is some data that's sent at the beginning of the request and response to give more information about the contents of the data, or of the connection itself. The bulk of what HTTP is is dealing with HTTP headers.

There are a bazillion values you can set in the headers. Some are set automatically by the web server library, and some we have to set.

In this case, we're only going to set one header value for the response: Content-Type. In particular, we're going to set it to the MIME type of the file we're returning. E.g. text/html.

What this does is it allows the client (the browser) to figure out what to do with this data. Since data is data, it can be hard for the browser to tell if the data that came back is an image, or text, or XML, or JSON, or video, or audio, or... So the web server shares the data's MIME type explicitly in the Content-Type, so the browser knows if it should show an image, bring up a video player, or just render HTML text, and so on.

NodeJS: synchronous vs. asynchronous

A large number of NodeJS's I/O functions come in two flavors: synchronous and asynchronous. The code is a lot more simple if you use synchronous calls because they return the result right there. Asynchronous calls return the result later in a callback, making the code more convoluted.

You do get what you pay for, however. Async calls don't tie up the NodeJS engine while they execute, unlike sync calls. So if you're going to be handling a lot of requests, async is the way to go.

Because async is Harder and Righter, we'll be coding this up all async.

The Webserver, skeleton

The first part of the webserver just listens and calls a handler when a request comes in:

const http = require('[http](https://nodejs.org/api/http.html)');

function httpHandler(request, response) {
    // This gets called for each web request

    // *** Magic happens in here ***
}

// Listen for requests on port 3490
http.createServer(httpHandler).listen(3490);

This will accept connections, read a request, and then... do nothing. A web browser wouldn't know what to make of it. We have to have the server deal with the request and response objects that were passed in to httpHandler().

In NodeJS, require() is the way you bring in other modules. require('http') loads the HTTP module, which is what we are going to use to write the web server.

Request and response

Then, in the httpHandler() code, we want to get the requested file name from the request, and try to locate it on disk. We'll write another helper function, getFilenameFromPath(), to wrap up that functionality. (Since it's also async, it'll call the onGotFilename() callback once it completes.)

const url = require('[url](https://nodejs.org/api/url.html)');

function httpHandler(request, response) {

    /**
     * Called when the filename has been ascertained
     */
    function onGotFilename(err, filename) {
        // *** In here we'll actually handle the response
    }

    // Extract the part of the URL after the host:port. This is the
    // filename the browser is looking for:
    let path = url.parse(request.url).pathname;

    // Try to find the actual file associated with this path:
    getFilenameFromPath(path, onGotFilename);   // [MARK 1]
}

So far, so good. Let's assume the getFilenameFromPath() call was successful, and fill in that onGotFilename() callback. This is where we're going to write the HTTP headers and response for the web server to send back to the browser.

If the err parameter is set, something went wrong (we'll assume the file wasn't found), and we need to report it. So you'll see the famous 404 being returned in that block of code, below.

But if err is unset, then we got an actual file, and the filename is in the filename parameter. We want to read the file using fs.readFile(), figure out what the MIME type is so that we can pass it out in the Content-Type header, and then write the file out to the response.

This pattern of a callback taking an err argument followed by other result arguments is a common NodeJS pattern.

     /**
     * Called when the filename has been ascertained
     */
    function onGotFilename(err, filename) {

        /**
         * Helper function to return errors in the response
         */
        function writeError(err) {
            if (err.code == 'ENOENT') {
                // File not found
                response.writeHead(404, { 'Content-Type': 'text/plain' });
                response.write('404 Not Found\n');
                response.end();
                console.log("Not Found: " + filename);
            } else {
                // Any other error
                response.writeHead(500, { 'Content-Type': 'text/plain' });
                response.write('500 Internal Server Error\n');
                response.end();
                console.log("Internal Server Error: " + filename +
                    ": " + err.code);
            }
        }

        if (err) {
            writeError(err);
        } else {
            // No errors getting the filename, so go ahead and read it.
            fs.readFile(filename, "binary", function (err, file) {   // [MARK 2]
                if (err) {
                    writeError(err);
                } else {
                    // No errors reading the file, so write the response

                    // Get the MIME type first
                    let mimeType = getMIMEType(filename);  // [MARK 3]
                    response.writeHead(200, { 'Content-Type': mimeType }); // [MARK 4]
                    response.write(file, "binary");  // [MARK 5]
                    response.end();  // [MARK 6]
                    console.log("Sending file: " + filename);
                }
            });
        }
    }

As you can see, the general idea is to first check to see that err is set, and if it is, call writeError() to return an error to the client.

Assuming there's no error, we go ahead and try to read the entire file into memory, sending it to the anonymous function callback at [MARK 2].

Of course, the call to fs.readFile() could create an error, too, for whatever reason. So we also check for that.

But assuming all went well, we now have the file data in hand, and all that's left is to write back the response to the client.

Well, almost all. We still need to be able to tell the client what type the file is. So we call getMimeType() at [MARK 3] with the filename. It will return an appropriate MIME type for the file, such as text/html or image/jpg.

We need that MIME type for the Content-Type header, which we send over in the writeHead() call at [MARK 4]. The first argument is the HTTP status code. 200 is success, 404 is "not found", 500 is a general server error, and so on. You can find them all in the complete list.

After that, we write the file contents to the body of the response with write() at [MARK 5].

And finally, we complete the response by calling end() at [MARK 6]. The client gets it back and shows it to the user (or whatever it's coded to do.)

MIME types

In this server, there's a tiny function called getMIMEType(). It looks at the extension on the file name and matches it up to a MIME type.

This is really not an industrial way to do things at all, and is just here for learning purposes. There are several packages already written, such as NodeJS's mime, that are far more powerful and robust than this, and you should definitely use one of those instead. (Extra credit: convert the server to use that instead.)

But what we have here does show the basic idea.

const path = require('[path](https://nodejs.org/api/path.html)');

function getMIMEType(filename) {
    let mimeTypes = {
        '.js': 'application/javascript',
        '.jpg': 'image/jpg',
        '.png': 'image/png',
        '.html': 'text/html'
    };

    // Get the file extension, .html, .js, etc.
    let ext = path.extname(filename);

    if (ext in mimeTypes) {
        return mimeTypes[ext];
    }

    // If we don't recognize it, just return this default
    return 'text/plain';
}

Finding a file on disk for a given request

The only piece left is the elephant in the room of getFilenameFromPath() (which you might remember from [MARK 1] above). This function does all the heavy lifting of trying to find the file on disk that corresponds to the one requested in the URL by the client.

When the client requests a URL, the path portion is the stuff at the end. For example, the path of http://www.example.com/foo/index.html is /foo/index.html, and the path of http://www.example.com/ is /. We'll be appending that path onto our "base directory" to get the path to the file on disk.

The powerhouse function is fs.stat() which takes a filename and tries to determine if it exists, what kind of file it is, how big it is, etc.

Our basic approach is:

Convert all percent escapes (e.g. %20) and other special characters (e.g. +) in the requested path.
Verify that the path is in our base path (and that a malicious user can't retrieve files that are above our base path).
Call fs.stat() on the requested filename.
If error, return error.
If the file is a directory, attach /index.html to the end of the path, and goto step 1.
If the file is a normal file, congratulations, you've delivered a healthy bouncing baby file, and the success callback can be made.
Otherwise something else is wrong. We return an error.

DANGER, Will Robinson! We're taking a request for a file from somewhere in the world, and then delivering that file back to whomever asked for it. There are a lot of bad people in the world. Some of them might do something like request this URL:

http://example.com/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd

%2e, in case you don't recognize it, is a period. If you do like we do in this server and decode the URL to replace those and append the result onto our base directory, it'll change the path to:

/home/beej/myserver/../../../../etc/passwd

where /etc/passwd is your password file!

And that's not something we want to hand back to the client.

We have to sterilize the path from the request to make sure it fits in our little "sandboxed" part of the file system.

Our server starts serving files from the current directory, which we will define as the "base directory". So we make sure that no one can request anything in any parent directory above the base directory.

But what if the user has a symbolic link as a child directory that leads back up past the base directory? Our implementation will actually follow that. Should it? Maybe. Maybe not. In servers like Apache, you can set a "FollowSymlinks" option to decide to follow or not follow them. (Extra credit: improve our implementation to allow this kind of control.)

In general, you have to validate every piece of data in the the request that you are going to respond to. Any of it can be malicious.

Without much further ado, let's take a look at the code:

const path = require('[path](https://nodejs.org/api/path.html)'),
    fs = require('[fs](https://nodejs.org/api/fs.html)');

// Use current directory as base directory for all file serving
let basedir = process.cwd();

/**
 * Locate a filename for a specific path
 */
function getFilenameFromPath(filepath, callback) {
    // Get all those %20s, +s, and stuff out of there:
    filepath = decodeURI(filepath.replace(/\+/g, '%20'));

    // Normalize will translate out all the ./ and ../ parts out of the
    // path and turn it into a plain, absolute path.
    let filename = path.normalize(basedir + path.sep + filepath);
    let st;

    /**
     * Called when the fs.stat() call completes
     */
    function onStatComplete(err, stats) {
        if (err) {
            return callback(err, filename);
        }

        // If it's a directory, try looking for index.html:
        if (stats.isDirectory()) {
            filename = path.normalize(filename + path.sep + 'index.html');
            fs.stat(filename, onStatComplete);
            return;
        }

        // If the result's a file, return the name
        if (stats.isFile()) {
            return callback(null, filename)
        } else {
            return callback(new Error("Unknown file type"), filename);
        }
    }

    // First make sure the file is still in the base directory
    // for security reasons:
    if (filename.substring(0, basedir.length) != basedir) {
        // If not, 404 it
        let err = new Error("Not Found");
        err.code = 'ENOENT';
        return callback(err, filename);
    }

    // Now see if we can find the file:
    fs.stat(filename, onStatComplete);
}

Notice the code that looks for index.html if the destination is a directory. For extra credit, modify the code to show a directory listing if the path is a directory and index.html is not found. (Most webservers offer this functionality as an option.)

And that's the webserver. Here's a link to the complete code.

To run it, unpack it, and type the following on the command line:

$ cd beej-httpserver-demo
$ node httpserver.js

If you get an error about an address already being in use, first make sure you're not already running the server in another window. If you aren't, then go into the httpserver.js source and change the port number at the top from 3490 to something else, like 4000 or 8000. You just need a port that's not currently used on your machine, and most of them are free.

And then point your browser to http://localhost:3490. It should say "Hello, World!" back!

Testing with `curl`

You can also test it from the command line if you have curl installed. If you run with the "-D -" options, it'll also dump the HTTP header to stdout so you can see it. Here's an example:

$ curl -D - http://localhost:3490
HTTP/1.1 200 OK
Content-Type: text/html
Date: Sun, 08 Nov 2015 18:33:54 GMT
Connection: keep-alive
Transfer-Encoding: chunked

<!DOCTYPE html>
<html>
        <head>
        </head>

        <body>
                <img src="goat100.png"><br>
                Hello, World!
        </body>
</html>

You can see our computed MIME type coming back in the Content-Type header, too!

Let's try getting the file goat100.png for fun:

$ curl -D - http://localhost:3490/goat100.png
HTTP/1.1 200 OK
Content-Type: image/png
Date: Sun, 08 Nov 2015 18:45:10 GMT
Connection: keep-alive
Transfer-Encoding: chunked

�PNG

IHDR���K
CiCCPICC profilexڝSwX��>��eVB����l�"#�Y��a�@Ņ�

[etc...]

Content-Type is image/png, which is what it should be. And sure enough, there's the PNG data. It's a binary format, so it looks pretty bad when it come back out on the terminal. But there you have it.

Congratulations! You wrote your own webserver!

Continue to Part 1: Intro to WebSockets

License

The code attached to this article is licensed under the MIT open source license.

Beej's Bit Bucket

⚡ Tech and Programming Fun

A NodeJS Webserver

Prerequisites

What is a webserver, anyway?

A quick note on the Right Thing™

General attack plan

NodeJS: synchronous vs. asynchronous

The Webserver, skeleton

Request and response

MIME types

Finding a file on disk for a given request

Testing with `curl`

Links

License

Beej's Bit Bucket

⚡ Tech and Programming Fun

A NodeJS Webserver

Prerequisites

What is a webserver, anyway?

A quick note on the Right Thing™

General attack plan

NodeJS: synchronous vs. asynchronous

The Webserver, skeleton

Request and response

MIME types

Finding a file on disk for a given request

Testing with curl

Links

License

Testing with `curl`