2015-11-09
A NodeJS Webserver
Let's write our own webserver using NodeJS!
This is only a stepping-stone on the route to building a WebSockets-based chat server in this multipart series.
- Part 0: A NodeJS Webserver
- Part 1: Intro to WebSockets, conceptually
- Part 2: Writing the client side chat code for the browser in JavaScript
- Part 3: Writing the server-side chat code in NodeJS
If you want to jump right on in, get the full source on GitHub.
Prerequisites
Since we're going to be using NodeJS for this project, pop on over to that site and get 'er installed.
Basic familiarity with JavaScript is also assumed, notably comfort with asynchronous functions and callbacks.
And we'll be running the server from the command line.
What is a webserver, anyway?
An animation's worth a thousand pictures, right?
It's a program that sits there waiting for connections from web clients. These clients are usually web browsers, but can be a variety of things, including curl or wget. Basically anything that speaks the HTTP protocol can connect to a web server (because the server also speaks HTTP.)
Once connected, the client makes an HTTP request, and the server provides an HTTP response. For example, a client might say, "Give me goats.jpg", and the server might respond, "Here is goats.jpg," followed by the data. Or the server might respond, "404 goats.jpg not found!"
In the simple case, the client receives the data, and the connection is closed, completing the transaction. (HTTP has a lot of features, some of which don't conform to the previous sentence, but we won't be going into those here.) The client then displays the image, or saves it, or whatever the client is programmed to do.
Once you run this webserver on your computer, you'll be able to connect to it from your web browser and have it serve web pages and images to your browser.
A quick note on the Right Thing™
Although we're implementing this in NodeJS, we're actually doing it the difficult, improper way. We're reinventing the wheel for learning and demonstration purposes.
For real life, there are libraries that help writing webservers in NodeJS, and those are what you'd be more likely to use. (See Connect and Serve Static.) But using those is too simple to learn anything from, so we'll roll our own for fun.
General attack plan
The basic approach is this:
- Listen for incoming connections.
- When one arrives, examine it to determine the file that is requested.
- Make sure the file exists. If not, return a 404.
- As a bonus, check if the file is a directory. If it is, try adding a "/index.html" to the end to see if that file exists. If not, return a 404.
- If the file exists, look at its extension (e.g. ".html" or ".js") and
determine it's MIME type
for the response (e.g.
text/html
orapplication/javascript
). - Read the file into memory.
- Write a 200 (success) response header with the MIME type, and write the file into the response.
- Kick up your heels and have a pint for a job well-done.
- Return to step 1.
And just keep doing that forever.
I mentioned the HTTP header, above. This is some data that's sent at the beginning of the request and response to give more information about the contents of the data, or of the connection itself. The bulk of what HTTP is is dealing with HTTP headers.
There are a bazillion values you can set in the headers. Some are set automatically by the web server library, and some we have to set.
In this case, we're only going to set one header value for the response:
Content-Type
. In particular, we're going to set it to the MIME type of the file we're returning. E.g.text/html
.What this does is it allows the client (the browser) to figure out what to do with this data. Since data is data, it can be hard for the browser to tell if the data that came back is an image, or text, or XML, or JSON, or video, or audio, or... So the web server shares the data's MIME type explicitly in the
Content-Type
, so the browser knows if it should show an image, bring up a video player, or just render HTML text, and so on.
NodeJS: synchronous vs. asynchronous
A large number of NodeJS's I/O functions come in two flavors: synchronous and asynchronous. The code is a lot more simple if you use synchronous calls because they return the result right there. Asynchronous calls return the result later in a callback, making the code more convoluted.
You do get what you pay for, however. Async calls don't tie up the NodeJS engine while they execute, unlike sync calls. So if you're going to be handling a lot of requests, async is the way to go.
Because async is Harder and Righter, we'll be coding this up all async.
The Webserver, skeleton
The first part of the webserver just listens and calls a handler when a request comes in:
const http = require('[http](https://nodejs.org/api/http.html)');
function httpHandler(request, response) {
// This gets called for each web request
// *** Magic happens in here ***
}
// Listen for requests on port 3490
http.createServer(httpHandler).listen(3490);
This will accept connections, read a request, and then... do nothing. A
web browser wouldn't know what to make of it. We have to have the server
deal with the request
and response
objects that were passed in to
httpHandler()
.
In NodeJS,
require()
is the way you bring in other modules.require('http')
loads the HTTP module, which is what we are going to use to write the web server.
Request and response
Then, in the httpHandler()
code, we want to get the requested file
name from the request, and try to locate it on disk. We'll write
another helper function, getFilenameFromPath()
, to wrap up that
functionality. (Since it's also async, it'll call the onGotFilename()
callback once it completes.)
const url = require('[url](https://nodejs.org/api/url.html)');
function httpHandler(request, response) {
/**
* Called when the filename has been ascertained
*/
function onGotFilename(err, filename) {
// *** In here we'll actually handle the response
}
// Extract the part of the URL after the host:port. This is the
// filename the browser is looking for:
let path = url.parse(request.url).pathname;
// Try to find the actual file associated with this path:
getFilenameFromPath(path, onGotFilename); // [MARK 1]
}
So far, so good. Let's assume the getFilenameFromPath()
call was
successful, and fill in that onGotFilename()
callback. This is where
we're going to write the HTTP headers and response for the web server to
send back to the browser.
If the err
parameter is set, something went wrong (we'll assume the
file wasn't found), and we need to report it. So you'll see the famous
404
being returned in that block of code, below.
But if err
is unset, then we got an actual file, and the filename is
in the filename
parameter. We want to read the file using
fs.readFile()
, figure out what the MIME type is so that we can pass it
out in the Content-Type
header, and then write the file out to the
response.
This pattern of a callback taking an
err
argument followed by other result arguments is a common NodeJS pattern.
/**
* Called when the filename has been ascertained
*/
function onGotFilename(err, filename) {
/**
* Helper function to return errors in the response
*/
function writeError(err) {
if (err.code == 'ENOENT') {
// File not found
response.writeHead(404, { 'Content-Type': 'text/plain' });
response.write('404 Not Found\n');
response.end();
console.log("Not Found: " + filename);
} else {
// Any other error
response.writeHead(500, { 'Content-Type': 'text/plain' });
response.write('500 Internal Server Error\n');
response.end();
console.log("Internal Server Error: " + filename +
": " + err.code);
}
}
if (err) {
writeError(err);
} else {
// No errors getting the filename, so go ahead and read it.
fs.readFile(filename, "binary", function (err, file) { // [MARK 2]
if (err) {
writeError(err);
} else {
// No errors reading the file, so write the response
// Get the MIME type first
let mimeType = getMIMEType(filename); // [MARK 3]
response.writeHead(200, { 'Content-Type': mimeType }); // [MARK 4]
response.write(file, "binary"); // [MARK 5]
response.end(); // [MARK 6]
console.log("Sending file: " + filename);
}
});
}
}
As you can see, the general idea is to first check to see that err
is
set, and if it is, call writeError()
to return an error to the client.
Assuming there's no error, we go ahead and try to read the entire file
into memory, sending it to the anonymous function callback at [MARK 2]
.
Of course, the call to fs.readFile()
could create an error, too, for
whatever reason. So we also check for that.
But assuming all went well, we now have the file data in hand, and all that's left is to write back the response to the client.
Well, almost all. We still need to be able to tell the client what
type the file is. So we call getMimeType()
at [MARK 3]
with the
filename. It will return an appropriate MIME type for the file, such as
text/html
or image/jpg
.
We need that MIME type for the Content-Type
header, which we send over
in the writeHead()
call at [MARK 4]
. The first argument is the HTTP
status code. 200 is success, 404 is "not found", 500 is a general server
error, and so on. You can find them all in the complete
list.
After that, we write the file contents to the body of the response with
write()
at [MARK 5]
.
And finally, we complete the response by calling end()
at [MARK 6]
.
The client gets it back and shows it to the user (or whatever it's coded
to do.)
MIME types
In this server, there's a tiny function called getMIMEType()
. It looks
at the extension on the file name and matches it up to a MIME
type.
This is really not an industrial way to do things at all, and is just here for learning purposes. There are several packages already written, such as NodeJS's mime, that are far more powerful and robust than this, and you should definitely use one of those instead. (Extra credit: convert the server to use that instead.)
But what we have here does show the basic idea.
const path = require('[path](https://nodejs.org/api/path.html)');
function getMIMEType(filename) {
let mimeTypes = {
'.js': 'application/javascript',
'.jpg': 'image/jpg',
'.png': 'image/png',
'.html': 'text/html'
};
// Get the file extension, .html, .js, etc.
let ext = path.extname(filename);
if (ext in mimeTypes) {
return mimeTypes[ext];
}
// If we don't recognize it, just return this default
return 'text/plain';
}
Finding a file on disk for a given request
The only piece left is the elephant in the room of
getFilenameFromPath()
(which you might remember from [MARK 1]
above). This function does all the heavy lifting of trying to find the
file on disk that corresponds to the one requested in the URL by the
client.
When the client requests a URL, the path portion is the stuff at the
end. For example, the path of http://www.example.com/foo/index.html
is
/foo/index.html
, and the path of http://www.example.com/
is /
.
We'll be appending that path onto our "base directory" to get the path
to the file on disk.
The powerhouse function is fs.stat()
which takes a filename and tries
to determine if it exists, what kind of file it is, how big it is, etc.
Our basic approach is:
-
Convert all percent escapes (e.g.
%20
) and other special characters (e.g.+
) in the requested path. -
Verify that the path is in our base path (and that a malicious user can't retrieve files that are above our base path).
-
Call
fs.stat()
on the requested filename. -
If error, return error.
-
If the file is a directory, attach
/index.html
to the end of the path, and goto step 1. -
If the file is a normal file, congratulations, you've delivered a healthy bouncing baby file, and the success callback can be made.
-
Otherwise something else is wrong. We return an error.
DANGER, Will Robinson! We're taking a request for a file from somewhere in the world, and then delivering that file back to whomever asked for it. There are a lot of bad people in the world. Some of them might do something like request this URL:
http://example.com/%2e%2e/%2e%2e/%2e%2e/%2e%2e/etc/passwd
%2e
, in case you don't recognize it, is a period. If you do like we do in this server and decode the URL to replace those and append the result onto our base directory, it'll change the path to:
/home/beej/myserver/../../../../etc/passwd
where
/etc/passwd
is your password file!And that's not something we want to hand back to the client.
We have to sterilize the path from the request to make sure it fits in our little "sandboxed" part of the file system.
Our server starts serving files from the current directory, which we will define as the "base directory". So we make sure that no one can request anything in any parent directory above the base directory.
But what if the user has a symbolic link as a child directory that leads back up past the base directory? Our implementation will actually follow that. Should it? Maybe. Maybe not. In servers like Apache, you can set a "FollowSymlinks" option to decide to follow or not follow them. (Extra credit: improve our implementation to allow this kind of control.)
In general, you have to validate every piece of data in the the request that you are going to respond to. Any of it can be malicious.
Without much further ado, let's take a look at the code:
const path = require('[path](https://nodejs.org/api/path.html)'),
fs = require('[fs](https://nodejs.org/api/fs.html)');
// Use current directory as base directory for all file serving
let basedir = process.cwd();
/**
* Locate a filename for a specific path
*/
function getFilenameFromPath(filepath, callback) {
// Get all those %20s, +s, and stuff out of there:
filepath = decodeURI(filepath.replace(/\+/g, '%20'));
// Normalize will translate out all the ./ and ../ parts out of the
// path and turn it into a plain, absolute path.
let filename = path.normalize(basedir + path.sep + filepath);
let st;
/**
* Called when the fs.stat() call completes
*/
function onStatComplete(err, stats) {
if (err) {
return callback(err, filename);
}
// If it's a directory, try looking for index.html:
if (stats.isDirectory()) {
filename = path.normalize(filename + path.sep + 'index.html');
fs.stat(filename, onStatComplete);
return;
}
// If the result's a file, return the name
if (stats.isFile()) {
return callback(null, filename)
} else {
return callback(new Error("Unknown file type"), filename);
}
}
// First make sure the file is still in the base directory
// for security reasons:
if (filename.substring(0, basedir.length) != basedir) {
// If not, 404 it
let err = new Error("Not Found");
err.code = 'ENOENT';
return callback(err, filename);
}
// Now see if we can find the file:
fs.stat(filename, onStatComplete);
}
Notice the code that looks for index.html
if the destination is a
directory. For extra credit, modify the code to show a directory listing
if the path is a directory and index.html
is not found. (Most
webservers offer this functionality as an option.)
And that's the webserver. Here's a link to the complete code.
To run it, unpack it, and type the following on the command line:
$ cd beej-httpserver-demo
$ node httpserver.js
If you get an error about an address already being in use, first make sure you're not already running the server in another window. If you aren't, then go into the
httpserver.js
source and change the port number at the top from3490
to something else, like4000
or8000
. You just need a port that's not currently used on your machine, and most of them are free.
And then point your browser to
http://localhost:3490
. It should say "Hello,
World!" back!
Testing with curl
You can also test it from the command line if you have
curl
installed. If you run with the "-D -
"
options, it'll also dump the HTTP header to
stdout
so you can see it. Here's an example:
$ curl -D - http://localhost:3490
HTTP/1.1 200 OK
Content-Type: text/html
Date: Sun, 08 Nov 2015 18:33:54 GMT
Connection: keep-alive
Transfer-Encoding: chunked
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<img src="goat100.png"><br>
Hello, World!
</body>
</html>
You can see our computed MIME type coming back in the Content-Type
header, too!
Let's try getting the file goat100.png
for fun:
$ curl -D - http://localhost:3490/goat100.png
HTTP/1.1 200 OK
Content-Type: image/png
Date: Sun, 08 Nov 2015 18:45:10 GMT
Connection: keep-alive
Transfer-Encoding: chunked
�PNG
IHDR���K
CiCCPICC profilexڝSwX��>��eVB����l�"#�Y��a�@Ņ�
[etc...]
Content-Type
is image/png
, which is what it should be. And sure
enough, there's the PNG data. It's a binary format, so it looks pretty
bad when it come back out on the terminal. But there you have it.
Congratulations! You wrote your own webserver!
Continue to Part 1: Intro to WebSockets
Links
License
The code attached to this article is licensed under the MIT open source license.