Home > Programming > Apache Log Parser Class

Apache Log Parser Class

The following is a PHP class that will parse Apache webserver access logs and return each lines component values in an associative array (hash). It’s ported from a CPAN module but I’ve made my own improvements.

View Source http://kungf.eu/wp-content/uploads/apachelogregex.class.phps
Download: http://kungf.eu/wp-content/uploads/apachelogregex.zip


Categories: Programming Tags: , ,
  1. Bruce Robinson
    March 20th, 2007 at 17:19 | #1

    This looks really promising. It looks like magic quotes got embedded instead of the actual source, for example:

    $this->_regex_string = ”;

    Instead of what should be:

    $this->_regex_string = “”;

    (hopefully that formatted properly) – anyways, could you post the actual file?

    THANKS!!

  2. Rob
    April 3rd, 2007 at 11:29 | #2

    Hi this is excellent just what I was looking for.

    The problem is I am quite new at this so don;t know how to fix the dates. In my access.log the date is in this format

    03/Apr/2007:11:59:44 0100

    but when the script runs it outputs all the dates as

    January 1, 1970, 1:00 am

    Cheers, keep up the good work

    Rob

  3. July 27th, 2008 at 20:10 | #3

    Hi Rob (and others),

    I found the same with the date format. The reason seems to be that in the logtime_to_timestamp() method in the class the author refers to $m but the preg_match uses $matches. Simply change the preg_match line to read:


    if(!preg_match($time_format, $time, $m)

    Cheers,

    - Bob -

  4. Lubo
    January 18th, 2009 at 13:53 | #4

    find line 383 -
    if(!preg_match($time_format, $time, $matches)

    and replace it with

    if(!preg_match($time_format, $time, $m)

    - so $matches should be $m and it’ll work fine

    chers

  5. March 6th, 2009 at 14:44 | #5

    Hi, thanks for this usefull class, i use it for a statistics.

    However there is an syntax error in your script that disable the logtime_to_timestamp method:
    At line 384 (in version 1.2.1) you must set “$m” as parameter of “preg_match” instead of “$matches”.

    So the code will look like (at line 384):
    if(!preg_match($time_format, $time, $m)

    Have a nice day!

  6. March 30th, 2009 at 02:19 | #6

    Justin Randel; suggested to following patch, which I may get around to merging some time:

    --- src/ApacheLogParser.php	2007-05-31 23:03:27.000000000 +1000
    +++ ApacheLogParser.php	2009-01-08 20:26:48.000000000 +1100
    @@ -137,12 +137,12 @@ class ApacheLogRegex {
    
             foreach(explode(' ', $this->_format) as $element)
             {
    -            $quotes = preg_match('/^\\\"/', $element) ? true : false;
    +            $quotes = preg_match('/^\\"/', $element) ? true : false;
    
                 if($quotes)
                 {
                     $element = preg_replace(
    -                    array('/^\\\"/', '/\\\"$/'),
    +                    array('/^\\"/', '/\\"$/'),
                         '',
                         $element
                     );
    @@ -333,7 +333,7 @@ class ApacheLogRegex {
                 .':([\d]{2}):([\d]{2}) ([\+\-])([\d]{2})([\d]{2})\]/';
    
             $m = array();    //matches
    -        if(!preg_match($time_format, $time, $matches)
    +        if(!preg_match($time_format, $time, $m)
                     || count($m) != 10)
                 return null;
    
    @@ -347,5 +347,3 @@ class ApacheLogRegex {
    
     } // end class ApacheLogRegex
    
    -
    -?>
    \ No newline at end of file
    
  7. April 1st, 2009 at 13:43 | #7

    hamish :

    - $quotes = preg_match(’/^\\\”/’, $element) ? true : false;
    + $quotes = preg_match(’/^\\”/’, $element) ? true : false;

    - array(’/^\\\”/’, ‘/\\\”$/’),
    + array(’/^\\”/’, ‘/\\”$/’),

    - if(!preg_match($time_format, $time, $matches)
    + if(!preg_match($time_format, $time, $m)

    Of the changes suggested above, I agree with the last. There is clearly a mistake here that will stop the time stamps from being parsed. The other two however, I am not so sure. In both cases it is matching the quotes on the element string. Eg: \"%{User-Agent}i\" But the quotes must be escaped in the httpd.conf so we must match \". The \ (backslash) is the escape character for both PHP strings and regular expressions, so it must be escaped twice. Therefore I think the correct value here is ’/^\\\\”/ with four slashes.

  1. No trackbacks yet.