Notes 2: Files & Directories

Corresponds to Chapter 4 in Advanced Programming in the Unix Environment.

It seems that almost everything in Unix is a file. Even things that don't appear in the directory structure are often accessed through file descriptors in some way or another. In the file system, files can be many things: symbolic links, FIFOs, device files, directories, or even normal files, amongst other things.

int stat(const char *file_name, struct stat *buf);
int fstat(int filedes, struct stat *buf);
int lstat(const char *file_name, struct stat *buf);

All stat() variants return information about the file "file_name", such as size, date last modified, and lots of other goodies stored in the i-node. The information is stored in a struct stat (from the Linux man page):

    struct stat
        dev_t         st_dev;      /* device */
        ino_t         st_ino;      /* inode */
        umode_t       st_mode;     /* protection */
        nlink_t       st_nlink;    /* number of hard links */
        uid_t         st_uid;      /* user ID of owner */
        gid_t         st_gid;      /* group ID of owner */
        dev_t         st_rdev;     /* device type (if inode device) */
        off_t         st_size;     /* total size, in bytes */
        unsigned long st_blksize;  /* blocksize for filesystem I/O */
        unsigned long st_blocks;   /* number of blocks allocated */
        time_t        st_atime;    /* time of last access */
        time_t        st_mtime;    /* time of last modification */
        time_t        st_ctime;    /* time of last change */

The fstat() call returns the identical information, but can be used if you already have open()'d the file since it accepts the file descriptor as the argument. Finally, lstat() will return information to you about a symbolic link, rather than information about the file it points to.

The field st_mode contains the file permissions, as well as info that describes what the file type is. A series of macros will tell you a file's type when passed a certain st_mode:

S_ISLNK(m)True if file is a symbolic link
S_ISREG(m)True if file is a regular file
S_ISDIR(m)True if file is a directory
S_ISCHR(m)True if file is a character device
S_ISBLK(m)True if file is a block device
S_ISFIFO(m)True if file is a FIFO
S_ISSOCK(m)True if file is a socket
Table 1. File type determination macros from <sys/stat.h>.

Additionally, the st_mode field encodes what the file permissions are. Specifically, these are the bits that make up the "rwxrwxrwx" that you see when you do an "ls -l". The values of these bits are defined in <sys/stat.h> and are:

S_IRUSRUser has read permission
S_IWUSRUser has write permission
S_IXUSRUser has execute permission
S_IRGRPGroup has read permission
S_IWGRPGroup has write permission
S_IXGRPGroup has execute permission
S_IROTHOther has read permission
S_IWOTHOther has write permission
S_IXOTHOther has execute permission
Table 2. File permissions from <sys/stat.h>.

Another field in struct stat worthy of mention is the st_blksize field. This contains the blocksize for each block in the file--by reading this particular file into a buffer with that same size, you can maximize I/O throughput.

On creation of new files

There are two other bits in the st_mode field of the struct stat which are of interest: the "set-user-ID" (SUID) and "set-group-ID" (SGID) bits. When the SUID bit is set, it tells the OS to change the effective UID of the executing process to that of the file's owner (stored in the st_uid field of the struct stat). SGID works similarly, except it sets the GID of the calling process.

This is the basis for the power behind the legendary "SUID root" programs--these programs have the SUID bit set and are owned by root. Thus, they run as root all the time. For instance, if you had a copy of /bin/sh which was SUID-root, you would would effectively be root when you were running shell commands! Power, indeed.

When a new file is created, the owner is set to the effective user-ID of the creating process. The group of the file is either set to the effective group-ID of the creating process, or to the group of the directory if the SGID bit is set on the directory.

int umask(int mask);

A way to modify the permissions on a file that is being created is to set the process' umask with the umask() system call. Basically, you set the bits of the umask to the permissions you want to mask out of the file permissions.

For instance, the following snippet of code creat()s a file with 0600 ("-rw-------") permissions, even though the creat() call asks for 0666 permissions:

    /* don't set any of the following bits on file creation: */
    umask(S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH); /* 0066 */

    /* creat the file: */
    creat("bar", S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH); /* 0666 */

    /* file is now created with 0600 permissions */

On a bit level, the formula is this:

    new_mask = mode & ~umask

so the following is true of the above example:

    umask    = 000110110 = ---rw-rw- = 0066
    ~umask   = 111001001

    mode     = 110110110 = rw-rw-rw- = 0666

    new_mask = 110000000 = rw------- = 0600

int access(const char *pathname, int mode);

Use this system call to determine if the calling process has read, write, or execute permissions for a file, or if the process is allowed to check for the existence of a file. (This last case could fail if the file is buried past a directory that the process doesn't have execute permission to.)

Simply set pathname to the name of the file to check, then set mode to one of more of the following values OR'd together:

R_OK Process can read from the file
W_OK Process can write to the file
X_OK Process can execute the file
F_OK Process can test the file's existence
Table 3. access() mode bits.

If 0 is returned, it means you can access the file. If you don't have the requested permissions, -1 will be returned and errno will be set to EACCES.

int chmod(const char *path, mode_t mode);
int fchmod(int fildes, mode_t mode);

This system call accomplishes the same thing as the Unix utility chmod. With chmod(), you can set the file permission bits for any particular file (if the calling process owns the file). Set path to the full path of the file to change, then set mode to whatever permissions you desire. The mode can be constructed by OR'ing together the macros in Table 2, or can be the octal representation of the permission.

The following six defined macros are also useful in this situation:

S_ISUID Set the SUID bit
S_ISGID Set the SGID bit
S_ISVTX Set the Sticky Bit
S_IRWXU Give "User" read, write, and execute permission
S_IRWXG Give "Group" read, write, and execute permission
S_IRWXO Give "Other" read, write, and execute permission
Table 4. Additional mode bits for chmod().

The "sticky bit"? Well, it's a long story. In a nutshell, if the sticky bit is set on a directory, files in that directory can only be renamed or removed if the user is the owner of the file, the owner of the directory, or the superuser.

For instance, the /tmp directory often has the sticky bit set as well as world-write permissions:

drwxrwxrwt   3 root     root         2048 Aug 12 22:23 /tmp

In this case, anyone can write to the directory to their heart's content, but no one else (except the superuser) can remove their files since the sticky bit is set. Feature.

fchmod() works the same way, but is useful if you already have an open file descriptor to the file in question.

int chown(const char *path, uid_t owner, gid_t group);
int fchown(int fd, uid_t owner, gid_t group);

These functions can be used to set the owner UID and GID of a given file. Note that on some systems, these calls are only available to superuser as they can be used to get around disk quotas. Usage is similar to chmod(). Later, we'll discuss functions that can be used to determine the necessary numerical UID and GID for a given user or group name.

int truncate(const char *path, size_t length);
int ftruncate(int fd, size_t length);

These functions truncate a file to a given length. If the length is less than the length of the file, the data past the new length would be inaccessible. If length is greater than the length of the file, the results are system dependent; it could be that the file is extended to the new length (SVR4) or nothing at all happens (4.3BSD).

Another way to truncate a file to zero length is to use the O_TRUNC call to open():

    open("foo", O_RDWR | O_TRUNC);

Not all systems implement the truncate() call, but many do.

int link(const char *oldpath, const char *newpath);

Creates a hard link to a file. The number of hard links to a particular file can be determined by calling stat() and checking the st_nlink field.

int unlink(const char *pathname);

This removes a hard link to a file, and, if the hard link is the last one, it removes the file itself from disk. There is an exception: if any process has the file open, the directory entry for the file will be removed when it is unlink()'d, but the file will still be accessible to that process until it close()'s the file or exits.

One use for this trick is in creating temp files: open the file with creat() or open(), then immediately unlink() it. Now the temp file invisible to everyone else. More importantly, if your program crashes, the temp file cleanup is automatic because the file is already unlink()'d!

Another way to unlink a file is to use the ANSI remove() function.

int rename(const char *oldpath, const char *newpath);

rename() is the basis for the Unix "mv" command. It can be used to rename a file, or to move a file to a new directory. Note that if you want to simply move a file from one directory to another but preserve the name, you must still specify the file name in the newpath (given that "foo" is a normal file, below):

	rename("foo", "/tmp/foo"); /* valid */
	rename("foo", "/tmp");     /* invalid */

More simply, the command rename("foo", "bar"); will rename the file "foo" to "bar", reasonably enough.

int symlink(const char *oldpath, const char *newpath);

The symlink() function makes a symbolic link, newpath, to another file or directory, oldpath.

Most functions automatically follow the symlink, with the exception of chown(), lstat(), readlink(), remove(), rename(), and unlink(), all of which operate on the symlink itself.

int readlink(const char *path, char *buf, size_t bufsiz);

If you want to find which file a symbolic link is pointing to, use this function. Given the path, the call fills the buffer buf with the file the symlink points to. Setting bufsiz to the maximum size of your buffer prevents an overflow.

The return value is the length of the symlink in bytes, or -1 on error. Note that the string stored in buf is not automatically null-terminated.

Sample call:

    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    #include <unistd.h>

        char buf[51];
        int count;

        if ((count = readlink("/home", buf, 50)) == -1) {

        buf[count] = '\0'; /* null terminate that puppy */
        printf("/home -> %s\n", buf);

On my machine here at home, this produces the following:

    /home -> /usr/local/home

which is absolutely correct. (I don't have enough room in my root partition for all the stuff that's amassed in my home directory, so all the home directories are moved to my 2.2GB partition. 36% full--and to think I grew up on 92K floppies.)

int utime(const char *filename, struct utimbuf *buf);

Use this system call to set the access time and modification time of a file. The Unix touch command often uses this function.

The filename is passed as the filename argument, and the time is passed as a pointer to a struct utimbuf:

    struct utimbuf {
        time_t actime;   /* access time */
        time_t modtime;  /* modification time */

Each of the time values are the number of seconds since Epoch (often January 1, 1970). In the following sample, the times of file "foo" are updated, and the times of "bar" are set to whatever time it is now (both files already exist prior to the run):

    #include <sys/types.h>
    #include <utime.h>

        struct utimbuf tb;

        tb.actime = 2300000;  /* Jan 27 1970 on Linux */
        tb.modtime = 2400000; /* Jan 28 1970 */
        utime("foo", &tb);

        utime("bar", NULL); /* make time on bar to now */

The following are the results immediately after running the above program:

    $ date
    Wed Aug 13 15:33:52 PDT 1997
    $ ls -l foo bar   [Shows last modification time]
    -rw-r--r--   1 beej     users           0 Aug 13 15:33 bar
    -rw-r--r--   1 beej     users           0 Jan 28  1970 foo
    $ ls -lu foo bar  [Shows last access time]
    -rw-r--r--   1 beej     users           0 Aug 13 15:33 bar
    -rw-r--r--   1 beej     users           0 Jan 27  1970 foo

As previously mentioned, you can access the values for access and modification time through the struct stat fields st_atime and st_mtime, respectively.

int mkdir(const char *pathname, mode_t mode);

Creates a directory of the name pathname with the permissions specified in mode. The mode is constructed by bit-wise OR'ing values from tables 2 and/or 4, just like practically every other call that deals with file permissions.

int rmdir(const char *pathname);

It's the opposite of mkdir(). The directory must be empty (containing only entries "." and "..") before it can be removed.

Reading Directories

Directories are files, just like everything else. They consist of a number of records that contain the filename, i-node number for the corresponding file, and maybe some other things. What is in there specifically is system-dependent, so those great POSIX guys designed a bunch of routines for reading directories that are platform independent. Use these whenever you want to read a directory (unless you really really know exactly what you're doing and why you're doing it).

The basic idea is this:

  1. Open the directory to read from with opendir(). This returns a pointer to a "directory stream" (DIR*) which is similar in many ways to a buffered I/O stream (FILE*), except that you use special functions to read from it.

  2. Read a directory entry (struct dirent) using the readdir() call. Optionally repeat.

  3. Close the directory stream with closedir().

DIR *opendir(const char *name);

Opens a directory for reading. The returned DIR* is used in subsequent calls.

struct dirent *readdir(DIR *dir);

This value reads an entry from the directory and returns a pointer to a static struct dirent. NULL is returned on EOF. The structure itself is implementation dependent, but looks like this under Linux:

    struct dirent {
        long            d_ino;        /* i-node number */
        __kernel_off_t  d_off;        /* kernel stuff */
        unsigned short  d_reclen;     /* more kernel stuff */
        char            d_name[256];  /* Aha!  This is the name!! */

Of all the above fields, the only one that's sure to be there (and the only one you really care about) is d_name which is the null-terminated name of the file. Once you have the file name, you can do anything.

int closedir(DIR *dir);

Once you're done with the directory, close it to free up the DIR*.

off_t telldir(DIR *dir);

Returns the current offset in the directory. Like the library call ftell(), except for directories.

void seekdir(DIR *dir, off_t offset);

Seeks to a specified point in the directory stream. You should only use values returned from telldir() for your offset.

Directory example

Read a directory and print out the i-node number and filename (Linux):

    #include <stdio.h>
    #include <sys/types.h>
    #include <dirent.h>

        struct dirent *de;
        DIR *d;

        d = opendir("/home");

        /* print the i-node and name for each file: */
        while((de = readdir(d)) != NULL)
            printf("%7d %s\n", de->d_ino, de->d_name);


Output on my machine (Ooh! Those lucky people with an account on Beej's computer!):

      65537 .
          2 ..
      67585 ftp
      83969 beej
     239617 becca
     307201 aaron
     309249 carl
     391169 bapper
     243719 sd
      10284 pberry

Those of you who got plenty of sleep last night will undoubtedly remember that "/home" is actually a symlink on my machine, and will notice that opendir() automatically followed the symlink.

int sync(void);

The sync() call schedules the writing of all unflushed buffers in the buffer cache. It doesn't wait around for this to actually occur, but returns immediately. You can be pretty sure, though, that they will be written in the next few seconds.

int fsync(int fd);

For a given file descriptor, fd, writes all unwritten blocks to disk, then returns when done. One way to get files to do this all the time automatically is to use the O_SYNC flag when open()ing them.