Let's write our own webserver using NodeJS!
I was going to make this entry about WebSockets, but this sort of took on a life of its own. I'll do WebSockets later. :)
Since we're going to be using NodeJS for this project, pop on over to that site and get 'er installed.
And we'll be running the server from the command line.
An animation's worth a thousand pictures, right?
It's a program that sits there waiting for connections from web clients. These clients are usually web browsers, but can be a variety of things, including curl or wget. Basically anything that speaks the HTTP protocol can connect to a web server (because the server also speaks HTTP.)
Once connected, the client makes an HTTP request, and the server provides an HTTP response. For example, a client might say, "Give me goats.jpg", and the server might respond, "Here is goats.jpg," followed by the data. Or the server might respond, "404 goats.jpg not found!"
In the simple case, the client receives the data, and the connection is closed, completing the transaction. (HTTP has a lot of features, some of which don't conform to the previous sentence, but we won't be going into those here.) The client then displays the image, or saves it, or whatever the client is programmed to do.
Once you run this webserver on your computer, you'll be able to connect to it from your web browser and have it serve web pages and images to your browser.
Although we're implementing this in NodeJS, we're actually doing it the difficult, improper way. We're reinventing the wheel for learning and demonstration purposes.
For real life, there are libraries that help writing webservers in NodeJS, and those are what you'd be more likely to use. (See Connect and Serve Static.) But using those is too simple to learn anything from, so we'll roll our own for fun.
The basic approach is this:
And just keep doing that forever.
I mentioned the HTTP header, above. This is some data that's sent at the beginning of the request and response to give more information about the contents of the data, or of the connection itself. The bulk of what HTTP is is dealing with HTTP headers.
There are a bazillion values you can set in the headers. Some are set automatically by the web server library, and some we have to set.
In this case, we're only going to set one header value for the response: Content-Type. In particular, we're going to set it to the MIME type of the file we're returning. E.g. Content-Type: text/html.
What this does is it allows the client (the browser) to figure out what to do with this data. Since data is data, it can be hard for the browser to tell if the data that came back is an image, or text, or XML, or JSON, or video, or audio, or... So the web server shares the data's MIME type explicitly in the Content-Type, so the browser knows if it should show an image, bring up a video player, or just render HTML text, and so on.
A large number of NodeJS's I/O functions come in two flavors: synchronous and asynchronous. The code is a lot more simple if you use synchronous calls because they return the result right there. Asynchronous calls return the result later in a callback, making the code more convoluted.
You do get what you pay for, however. Async calls don't tie up the NodeJS engine while they execute, unlike sync calls. So if you're going to be handling a lot of requests, async is the way to go.
Because async is Harder and Righter, we'll be coding this up all async.
The first part of the webserver just listens and calls a handler when a request comes in:
This will accept connections, read a request, and then... do nothing. A web browser wouldn't know what to make of it. We have to have the server deal with the request and response objects that were passed in to httpHandler().
In NodeJS, require() is the way you bring in other modules. require('http') loads the HTTP module, which is what we are going to use to write the web server.
Then, in the httpHandler() code, we want to get the requested file name from the request, and try to locate it on disk. We'll write another helper function, getFilenameFromPath(), to wrap up that functionality. (Since it's also async, it'll call the onGotFilename() callback once it completes.)
So far, so good. Let's assume the getFilenameFromPath() call was successful, and fill in that onGotFilename() callback. This is where we're going to write the HTTP headers and response for the web server to send back to the browser.
If the err parameter is set, something went wrong (we'll assume the file wasn't found), and we need to report it. So you'll see the famous 404 being returned in that block of code, below.
But if err is unset, then we got an actual file, and the filename is in the filename parameter. We want to read the file using fs.readFile(), figure out what the MIME type is so that we can pass it out in the Content-Type header, and then write the file out to the response.
This pattern of a callback taking an err argument followed by other result arguments is a common NodeJS pattern.
As you can see, the general idea is to first check to see that err is set, and if it is, call writeError() to return an error to the client.
Assuming there's no error, we go ahead and try to read the entire file into memory, sending it to the anonymous function callback at 2.
Of course, the call to fs.readFile() could create an error, too, for whatever reason. So we also check for that.
But assuming all went well, we now have the file data in hand, and all that's left is to write back the response to the client.
Well, almost all. We still need to be able to tell the client what type the file is. So we call getMimeType()3 with the filename. It will return an appropriate MIME type for the file, such as text/html or image/jpg.
We need that MIME type for the Content-Type header, which we send over in the writeHead()4 call. The first argument is the HTTP status code. 200 is success, 404 is "not found", 500 is a general server error, and so on. You can find them all in the complete list.
After that, we write the file contents to the body of the response with write()5.
And finally, we complete the response by calling end()6. The client gets it back and shows it to the user (or whatever it's coded to do.)
In this server, there's a tiny function called getMIMEType(). It looks at the extension on the file name and matches it up to a MIME type.
This is really not an industrial way to do things at all, and is just here for learning purposes. There are several packages already written, such as NodeJS's mime, that are far more powerful and robust than this, and you should definitely use one of those instead. (Extra credit: convert the server to use that instead.)
But what we have here does show the basic idea.
The only piece left is the elephant in the room of getFilenameFromPath() (which you might remember from 1, above). This function does all the heavy lifting of trying to find the file on disk that corresponds to the one requested in the URL by the client.
When the client requests a URL, the path portion is the stuff at the end. For example, the path of http://www.example.com/foo/index.html is /foo/index.html, and the path of http://www.example.com/ is /. We'll be appending that path onto our "base directory" to get the path to the file on disk.
The powerhouse function is fs.stat() which takes a filename and tries to determine if it exists, what kind of file it is, how big it is, etc.
Our basic approach is:
DANGER, Will Robinson! We're taking a request for a file from somewhere in the world, and then delivering that file back to whomever asked for it. There are a lot of bad people in the world. Some of them might do something like request this URL:
And that's not something we want to hand back to the client.
We have to sterilize the path from the request to make sure it fits in our little "sandboxed" part of the file system.
Our server starts serving files from the current directory, which we will define as the "base directory". So we make sure that no one can request anything in any parent directory above the base directory.
But what if the user has a symbolic link as a child directory that leads back up past the base directory? Our implementation will actually follow that. Should it? Maybe. Maybe not. In servers like Apache, you can set a "FollowSymlinks" option to decide to follow or not follow them. (Extra credit: improve our implementation to allow this kind of control.)
In general, you have to validate every piece of data in the the request that you are going to respond to. Any of it can be malicious.
Without much further ado, let's take a look at the code:
Notice the code that looks for index.html if the destination is a directory. For extra credit, modify the code to show a directory listing if the path is a directory and index.html is not found. (Most webservers offer this functionality as an option.)
And that's the webserver. Here's a link to the complete code.
To run it, unpack it, and type the following on the command line:
Running the webserver$ cd beej-httpserver-demo $ node httpserver.js
If you get an error about an address already being in use, first make sure you're not already running the server in another window. If you aren't, then go into the httpserver.js source and change the port number at the top from 3490 to something else, like 4000 or 8000. You just need a port that's not currently used on your machine, and most of them are free.
And then point your browser to http://localhost:3490. It should say "Hello, World!" back!
$ curl -D - http://localhost:3490 HTTP/1.1 200 OK Content-Type: text/html Date: Sun, 08 Nov 2015 18:33:54 GMT Connection: keep-alive Transfer-Encoding: chunked <!DOCTYPE html> <html> <head> </head> <body> <img src="goat100.png"><br> Hello, World! </body> </html>
You can see our computed MIME type coming back in the Content-Type header, too!
Let's try getting the file goat100.png for fun:
$ curl -D - http://localhost:3490/goat100.png HTTP/1.1 200 OK Content-Type: image/png Date: Sun, 08 Nov 2015 18:45:10 GMT Connection: keep-alive Transfer-Encoding: chunked �PNG IHDR���K CiCCPICC profilexڝSwX��>��eVB����l�"#�Y��a�@Ņ� [etc...]
Content-Type is image/png, which is what it should be. And sure enough, there's the PNG data. It's a binary format, so it looks pretty bad when it come back out on the terminal. But there you have it.
Congratulations! You wrote your own webserver!
The code attached to this article is licensed under the MIT open source license.