Why size reporting for directories is different than other files?

I think the reason you're confused is because you don't know what a directory is. To do this lets take a step back and examine how Unix filesystems work.

The Unix filesystem has several separate notions for addressing data on disk:

  • data blocks are a group of blocks on a disk which have the contents of a file.
  • inodes are special blocks on a filesystem, with a numerical address unique within that filesystem, which contains metadata about a file such as:
    • permissions
    • access / modification times
    • size
    • pointers to the data blocks (could be a list of blocks, extents, etc)
  • filenames are hierarchical locations on a filesystem root that are mapped to inodes.

In other words, a "file" is actually composed of three different things:

  1. a PATH in the filesystem
  2. an inode with metadata
  3. data blocks pointed to by the inode

Most of the time, users imagine a file to be synonymous to "the entity associated with the filename" - it's only when you're dealing with low-level entities or the file/socket API that you think of inodes or data blocks. Directories are one of those low-level entities.

You might think that a directory is a file that contains a bunch of other files. That's only half-correct. A directory is a file that maps filenames to inode numbers. It doesn't "contain" files, but pointers to filenames. Think of it like a text file that contains entries like this:

  • . - inode 1234
  • .. - inode 200
  • Documents - inode 2008
  • README.txt - inode 2009

The entries above are called directory entries. They are basically mappings from filenames to inode numbers. A directory is a special file that contains directory entries.

That's a simplification of course, but it explains the basic idea and other directory weirdness.

  • Why don't directories know their own size?
    • Because they only contain pointers to other stuff, you have to iterate over their contents to find the size
  • Why aren't directories ever empty?
    • Because they contain at least the . and .. entries. Thus, a proper directory will be at least as small as the smallest filesize that can contain those entries. In most filesystems, 4096 bytes is the smallest.
  • Why is it that you need write permission on the parent directory when renaming a file?
    • Because you're not just changing the file, you're changing the directory entry pointing to the file.
  • Why does ls show a weird number of "links" to a directory?
    • a directory can be referenced (linked to) by itself, its parent, its children.
  • What does a hard link do and how does it differ from a symlink?
    • a hard link adds a directory entry pointing to the same inode number. Because it points to an inode number, it can only point to files in the same filesystem (inodes are local to a filesystem)
    • a symlink adds a new inode which points to a separate filename. Because it refers to a filename it can point to arbitrary files in the tree.

But wait! Weird things are happening!

ls -ld somedirectory always shows the filesize to be 4096, whereas ls -l somefile shows the actual size of a file. Why?

Point of confusion 1: when we say "size" we can be referring to two things:

  • filesize, which is a number stored in the inode; and
  • allocated size, which is the number of blocks associated with the inode times the size of each block.

In general, these are not the same number. Try running stat on a regular file and you'll see this difference.

When a filesystem creates a non-empty file, it usually eagerly allocates data blocks in groups. This is because files have a tendency to grow and shrink arbitrarily fast. If the filesystem only allocated as many data blocks as needed to represent the file, growing / shrinking would be slower, and fragmentation would be a serious concern. So in practice, filesystems don't have to keep reallocating space for small changes. This means that there may be a lot of space on disk that is "claimed" by files but completely unused.

What does the filesystem do with all this unused space? Nothing. Until it feels like it needs to. If your filesystem optimizer tool - maybe an online optimizer running in the background, maybe part of your fsck, maybe built-in to your filesystem itself - feels like it, it may reassign the data blocks of your files - moving used blocks, freeing unused blocks, etc.

So now we come to the difference between regular files and directories: because directories form the "backbone" of your filesystem, you expect that they may need to be accessed or modified frequently and should thus be optimized. And so you don't want them fragmented at all. When directories are created, they always max out all their data blocks in size, even when they only have so many directory entries. This is okay for directories, because, unlike files, directories are typically limited in size and growth rate.

The 4096 reported size of directories is the "filesize" number stored in the directory inode, not the number of entries in the directory. It isn't a fixed number - it's the maximum bytes that will fit into the allocated number of blocks for the directory. Typically, this is 512 bytes/block times 8 blocks allocated for a file with any contents - incidentally, for directories, the filesize and the allocated size are the same. Because it's allocated as a single group, the filesystem optimizer won't move its blocks around.

As the directory grows, more data blocks are assigned to it, and it will also max out those blocks by adjusting the filesize accordingly.

And so ls and stat will show the filesize field of the directory's inode, which is set to the size of the data blocks assigned to it.

I think that the initial, empty, directory size depends on the filesystem. On ext3 and ext4 filesystems I have access to, I also get 4096-byte empty directories. On an NFS-mounted NAS of some sort, I get an 80-byte empty directory. I don't have access to a ReiserFS filesystem, the newly-created, empty directory size there would be interesting.

Traditionally, a directory was a file with a bit set in its inode (the on-disk structure describing the file) that indicated it was a directory. That file was filled with variable-length records. Here's what /usr/include/linux/dirent.h says:

struct dirent64 {
    __u64       d_ino;
    __s64       d_off;
    unsigned short  d_reclen;
    unsigned char   d_type;
    char        d_name[256];

You could skip through the directory-file-entries by using the d_off values. If an entry got removed (unlink() system call, used by rm command), the d_off value of the previous entry got increased to account for the missing record. Nothing did any "compacting" of records. It was probably just simplest to show the allocation in terms of the number of bytes in the disk blocks allocated to the file, rather than try to figure out how many bytes in a directory file account for all of the entries, or just up to the last entry.

These days, directories have internal formats like B-trees or Hash Trees. I'm guessing that it's either a big performance improvement to do directories by blocks, or there's "blank space" inside them similar to old school directories, so it's hard to decide what the "real size" in bytes of a directory is, particularly one that's been in use for a while and had files deleted and added to it a lot. Easier just to show number-of-blocks multiplied by bytes-per-block.

A file may have no blocks allocated to it; the -s flag to ls will show this difference, while a directory will have some number of minimum blocks allocated, hence the default size. (Unless you're on some fancy modern filesystem that throws these notions out the window.) For example:

% mkdir testfoo
% cd testfoo/
% mkdir foodir
% touch foofile
% ln -s foofile foosln
% ls -ld foo*
drwxrwxr-x  2 jmates  jmates  512 Oct  5 19:48 foodir
-rw-rw-r--  1 jmates  jmates    0 Oct  5 19:48 foofile
lrwxrwxr-x  1 jmates  jmates    7 Oct  5 19:48 foosln -> foofile
% ls -lds foo*
8 drwxrwxr-x  2 jmates  jmates  512 Oct  5 19:48 foodir
0 -rw-rw-r--  1 jmates  jmates    0 Oct  5 19:48 foofile
0 lrwxrwxr-x  1 jmates  jmates    7 Oct  5 19:48 foosln -> foofile

Note that the symlink here takes no blocks, despite dedicating seven bytes for the details necessary to readlink(2), how curious! Anyways, let's now pad foofile with a byte or two:

% echo >> foofile a
% ls -lds foo*
8 drwxrwxr-x  2 jmates  jmates  512 Oct  5 19:48 foodir
8 -rw-rw-r--  1 jmates  jmates    2 Oct  5 19:49 foofile
0 lrwxrwxr-x  1 jmates  jmates    7 Oct  5 19:48 foosln -> foofile

And one can see that the allocated blocks for foofile has jumped to 8 despite there being only two bytes (the a and the newline echo tacked on).

Files can also be sparse, which is another way the reported file size versus actual contents can differ, depending on how the tool interacting with the file handles that sparseness.

Also, the size of the directory can be increased, create many files with very long names and check what happens to the size of the directory (and to the blocks allocated) after each new long filename is created with ls -lds .