I look after a PHP-based web server which has recently been receiving rather more hits than usual, and is creaking under the strain. Investigation reveals that we have high CPU and IO wait times, with the main culprits being (drum roll please)...

httpd and mysqld

No surprises there!

On analysing the logs (naturally using freq.py), I found that just a few php scripts were being called very frequently, and they were each making quite a number of SQL queries. So the first thing I did was to add some new indexes on the tables that were being queried. But I know that under load, it doesn't matter how good the indexes are, the server is still going to creak - it's a single core pentium 4 machine, though a fairly fast one, and it's only got 1 gig of RAM. Multi-tasking between Apache and MySQL is not the most efficient on this system.

The best way to speed things up is to do less querying of the databases and less overall page-building processing. Normally, for a situation like this, memcached is the best fit, but unfortunately, for various reasons, that's not an option: firstly the customer only has ONE server, and for memcached it's best to have two or more. Secondly, the server is running a really old version of Linux - Fedora-Core 4! This is version is no longer supported - I'm not sure that any repositories still exist for it! Upgrading right now is simply not an option, and I have my doubts about whether or not the server could successfully run an up-to-date distro.

I discussed the idea of adding caching "by hand" with the site developer, but we didn't have any concrete ideas between us. The idea got put on the back-burner for the time-being, and I got on with trying to find some more indexes to create. It wasn't until I was driving home from work that the way to do it came to me: if we could somehow redirect the output of the existing scripts to a file, and check to see if the file was fresh enough before re-generating the html, we would be able to cache the data.

My original plan was to write a "front-end" script that ran the original script as a "back-end", but eventually I realised that it would be more efficient to embed the caching code near the top of each script, and a bit right at the end. It took about half an hour to find out how to do the relevant stuff in php (I'm more of a Python guy, not a PHP coder, as you know).

Here is the code that goes at the top of the script. You have to work out the best place to put it based on the logic of the script you are embedding it into. See comments for a brief explanation of what is going on:

/* Callback function to allow script output to be sent to
a cache file instead of directly to the browser */

function ob_file_callback($buffer)
{
global $cache_file_h;
fwrite( $cache_file_h, $buffer );
}

/* The cache file - change the name of this to reflect the script name and any parameters which will change the output */
$cache_file = '/tmp/cache/index.html';

/* Check if the cache is still fresh enough */
if ( file_exists( $cache_file ) ) {
$cache_mtime = stat( $cache_file )['mtime'];
} else {
$cache_stat = 0; /* Definitely out of date */
}

/* Check if cached file was created more than 2 (120 seconds) minutes ago */
if ( $cache_mtime > ( time() - 120 ) )  {
/* Cache file is still fresh enough */
$html = file_get_contents( $cache_file );
print $html;
/* A little flag so we can see if a cached version was served */
print '<!-- cached -->';
ob_end_flush;
exit;
} else {
/* Open a file to receive the output of this script */
$cache_file_h = fopen( $cache_file, 'w' );
/* Register the callback function that will write the output data to the file */
ob_start( 'ob_file_callback' );
}

This is the part that you put right at the end of the script - after all the code and any html sections:

<?php
/* Flush output buffer and close cache file, then display the output */
ob_end_flush();
fclose( $cache_file_h );
/* Re-read file and send the contents to the client - there's probably a way to get the data straight from the buffer */
$html = file_get_contents( $cache_file );
print $html;
?>

The final piece of the puzzle, after you've enabled caching in your script, you probably want to store your cache files in RAM rather than on disk, in a tmpfs filesystem:

mount -t tmpfs -o size=100M,mode=0777 tmpfs /tmp/cache/
chown apache.apache /tmp/cache/

You don't *have* to do this, but it will make quite a difference to performance.

Feel free to use this code in your own projects, but if you want to pass it on in any form, it's licenced under the GPL v2 or later, except the ob_file_callback function, which I lifted from http://codeutopia.net/blog/2007/10/03/how-to-easily-redirect-php-output-to-a-file/. The method for redirecting output came from there too, but the cache logic is all my own.