Best way to track (direct) file downloads

Feel free to use :)

.htaccess:

RewriteEngine on    
RewriteRule ^(.*).(rar|zip|pdf)$ http://xy.com/downloads/download.php?file=$1.$2 [R,L]    

mysql:

CREATE TABLE `download` (
    `filename` varchar(255) NOT NULL,
    `stats` int(11) NOT NULL,
    PRIMARY KEY  (`filename`)
)

download.php

<?php

mysql_connect("localhost", "name", "password")
or die ("Sorry, can't connect to database.");
mysql_select_db("dbname"); 
$baseDir = "/home/public_html/downloads"; 
$path = realpath($baseDir . "/" . basename($_GET['file'])); 

if (dirname($path) == $baseDir) {
if(!is_bot())
mysql_query("INSERT INTO download SET filename='".mysql_real_escape_string(basename($_GET['file']))."' ON DUPLICATE KEY UPDATE stats=stats+1");


header("Cache-Control: public");
header("Content-Description: File Transfer");
header("Content-Disposition: attachment; filename=" . basename($_GET['file']));
header("Content-Length: ".filesize($path));
header("Content-Type: application/force-download");
header("Content-Transfer-Encoding: binary");
ob_clean();
ob_end_flush();
readfile($path);    
}

function is_bot()
{

    $botlist = array("Teoma", "alexa", "froogle", "Gigabot", "inktomi",
    "looksmart", "URL_Spider_SQL", "Firefly", "NationalDirectory",
    "Ask Jeeves", "TECNOSEEK", "InfoSeek", "WebFindBot", "girafabot",
    "crawler", "www.galaxy.com", "Googlebot", "Scooter", "Slurp",
    "msnbot", "appie", "FAST", "WebBug", "Spade", "ZyBorg", "rabaz",
    "Baiduspider", "Feedfetcher-Google", "TechnoratiSnoop", "Rankivabot",
    "Mediapartners-Google", "Sogou web spider", "WebAlta Crawler","TweetmemeBot",
    "Butterfly","Twitturls","Me.dium","Twiceler");

    foreach($botlist as $bot)
    {
        if(strpos($_SERVER['HTTP_USER_AGENT'],$bot)!==false)
        return true;    // Is a bot
    }

    return false;
}

?>

Source - gayadesign.com


Your apache logs should contain a lot of info, but I think what you're asking for is more control over what gets logged and when. So what you want to do is have two pages: one with the link to the file, and the other that tracks the file, like so:

file_page.php

<a href="download.php?id=1234">Download File!</a>

download.php

<? // Code to track the file using PHP, whether that means storing data in a database, saving to a log, or emailing you. I'd use a DB, like so:

   // Prep the vars
   $file_id = $_GET['file_id']; // You should sanitize this first.
   $file_path = '/files/'.$file_id.'.pdf';

   // Save data to database
   mysql_query('INSERT INTO download_log
      SET file_id = '.$file_id.',
          date_downloaded = '.date('Y-m-d H:i:s').',
          user_id = '.$_SESSION['user_id']);

   // Now find the file and download it
   header('Content-type: application/pdf');
   header('Content-Disposition: attachment; filename='.$file_id.'.pdf); // or whatever the file name is
   readfile($file_path);

Something like that, anyway.

The page will be blank when it is done, but all browsers should begin downloading the file when the page is loaded.

So what I'm doing here is I'm saving the file ID, the current datetime, and the user ID of the person downloading it (from a $_SESSION variable). You probably want to store a lot more information, such as the user's IP address, the HTTP_REFERRER or other $_SERVER information, so you can track where the user came from and when and what they downloaded.

Good luck.