Ubuntu's garbage collection cron job for PHP sessions takes 25 minutes to run, why?

Solution 1:

Congratulations on having a popular web site and managing to keep it running on a virtual machine for all this time.

If you're really pulling in two million pageviews per day, then you're going to stack up a LOT of PHP sessions in the filesystem, and they're going to take a long time to delete no matter whether you use fuser or rm or a vacuum cleaner.

At this point I'd recommend you look into alternate ways to store your sessions:

  • One option is to store sessions in memcached. This is lightning fast, but if the server crashes or restarts, all your sessions are lost and everyone is logged out.
  • You can also store sessions in a database. This would be a bit slower than memcached, but the database would be persistent, and you could clear old sessions with a simple SQL query. To implement this, though, you have to write a custom session handler.

Solution 2:

Removing of fuser should help. This job runs a fuser command (check if a file is currently opened) for every session file found, which can easily take several minutes on a busy system with 14k sessions. This was a Debian bug (Ubuntu is based on Debian).

Instead of memcached you can also try to use tmpfs (a filesystem in memory) for session files. Like memcached this would invalidate sessions on reboot (this can be worked around by backing up this directory somewhere in shutdown script and restoring in startup script), but will be much easier to setup. But it will not help with fuser problem.


Solution 3:

So, the Memcached and database session storage options suggested by users here are both good choices to increase performance, each with their own benefits and drawbacks.

But by performance testing, I found that the huge performance cost of this session maintenance is almost entirely down to the call to fuser in the cron job. Here's the performance graphs after reverting to the Natty / Oneiric cron job which uses rm instead of fuser to trim old sessions, the switchover happens at 2:30.

CPU usage

Elapsed IO time

Disk operations

You can see that the periodic performance degradation caused by Ubuntu's PHP session cleaning is almost entirely removed. The spikes shown in the Disk Operations graph are now much smaller in magnitude, and about as skinny as this graph can possibly measure, showing a small, short disruption where previously server performance was significantly degraded for 25 minutes. Extra CPU usage is entirely eliminated, this is now an IO-bound job.

(an unrelated IO job runs at 05:00 and CPU job runs at 7:40 which both cause their own spikes on these graphs)

The modified cron job I'm now running is:

09 *     * * *     root   [ -x /usr/lib/php5/maxlifetime ] && \
   [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 \
   -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 \
   | xargs -n 200 -r -0 rm