"No space left on device" error despite having plenty of space, on btrfs

Solution 1:

Welcome to the world of BTRFS. It has some tantalizing features but also some infuriating issues.

First off, some info on your setup, it looks like you have four drives in a BTRFS "raid 10" volume (so all data is stored twice on different disks). This BTRFS volume is then carved up into subvolumes on different mount points. The subvolumes share a pool of disk space but have separate inode numbers and can be mounted in different places.

BTRFS allocates space in "chunks", a chunk is allocated to a specific class of either data or metadata. What can happen (and looks like has happened in your case) is that all free space gets allocated to data chunks leaving no room for metadata

It also seems that (for reasons I don't fully understand) that BTRFs "runs out" of metadata space before the indicator of the proportion of metadata space used reaches 100%.

This appears to be what has happened in your case, there is lots of free data space but no free space that has not been allocated to chunks and insufficient free space in the existing metadata chunks.

The fix is to run a "rebalance". This will move data around so that some chunks can be returned to the "global" free pool where they can be reallocated as metadata chunks

btrfs fi balance start -dusage=5 /mnt/durable

The number after -dusage sets how aggressive the rebalance is, that is how close to empty the blocks have to be to get rewritten. If the balance says it rewrote 0 blocks try again with a higher value of -dusage.

If the balance fails then I would try rebooting and/or freeing up some space by removing files.

Solution 2:

Since you're running btrfs with a RAID setup, try running a balance operation.

btrfs balance start /var/opt/gitlab

If this gives an error about not having enough space, try again with this syntax:

btrfs balance start -musage=0 -dusage=0 -susage=0 /var/opt/gitlab 

Repeat this operation for each btrfs filesystem where you are seeing errors about space. If your space problem is due to the metadata not being distributed across the mirrored disks this might free up some space for you.


Solution 3:

On my system, I added the following job in cron.monthly.

The clear_cache remount is due to some corruption issues btrfs was having with the free maps. (I think they finally found the issue, but the issue is so annoying, I'm willing to pay to rebuild the maps once a month.)

I ramp up the usage options to free up space gradually for larger and larger balances.

#!/bin/sh

for mountpoint in `mount -t btrfs | awk '{print $3}' | sort -u`
do
    echo --------------------------
    echo Balancing $mountpoint :
    echo --------------------------
    echo remount with clear_cache...
    mount -oremount,clear_cache $mountpoint
    echo Before:
    /usr/sbin/btrfs fi show $mountpoint
    /usr/sbin/btrfs fi df $mountpoint
    for size in 0 1 5 10 20 30 40 50 60 70 80 90
    do
        time /usr/sbin/btrfs balance start -v -musage=$size $mountpoint 2>&1
        time /usr/sbin/btrfs balance start -v -dusage=$size $mountpoint 2>&1
    done
    echo After:
    /usr/sbin/btrfs fi show $mountpoint
    /usr/sbin/btrfs fi df $mountpoint
done

If you get to the point where you can't rebalance because you have insufficient space, the recommendation is to temporarily add another block device (or loopback device on another disk) of some sort to your volume for the duration of the rebalance, and then remove it.