Apache Log Parser Class
The following is a PHP class that will parse Apache webserver access logs and return each lines component values in an associative array (hash). It’s ported from a CPAN module but I’ve made my own improvements.
View Source http://kungf.eu/wp-content/uploads/apachelogregex.class.phps
Download: http://kungf.eu/wp-content/uploads/apachelogregex.zip


LinkedIn Profile
Facebook Profile
CV/Resume
This looks really promising. It looks like magic quotes got embedded instead of the actual source, for example:
$this->_regex_string = ”;
Instead of what should be:
$this->_regex_string = “”;
(hopefully that formatted properly) – anyways, could you post the actual file?
THANKS!!
Hi this is excellent just what I was looking for.
The problem is I am quite new at this so don;t know how to fix the dates. In my access.log the date is in this format
03/Apr/2007:11:59:44 0100
but when the script runs it outputs all the dates as
January 1, 1970, 1:00 am
Cheers, keep up the good work
Rob
Hi Rob (and others),
I found the same with the date format. The reason seems to be that in the logtime_to_timestamp() method in the class the author refers to $m but the preg_match uses $matches. Simply change the preg_match line to read:
…
if(!preg_match($time_format, $time, $m)
…
Cheers,
- Bob -
find line 383 -
if(!preg_match($time_format, $time, $matches)
and replace it with
if(!preg_match($time_format, $time, $m)
- so $matches should be $m and it’ll work fine
chers
Hi, thanks for this usefull class, i use it for a statistics.
However there is an syntax error in your script that disable the logtime_to_timestamp method:
At line 384 (in version 1.2.1) you must set “$m” as parameter of “preg_match” instead of “$matches”.
So the code will look like (at line 384):
if(!preg_match($time_format, $time, $m)
Have a nice day!
Justin Randel; suggested to following patch, which I may get around to merging some time:
--- src/ApacheLogParser.php 2007-05-31 23:03:27.000000000 +1000 +++ ApacheLogParser.php 2009-01-08 20:26:48.000000000 +1100 @@ -137,12 +137,12 @@ class ApacheLogRegex { foreach(explode(' ', $this->_format) as $element) { - $quotes = preg_match('/^\\\"/', $element) ? true : false; + $quotes = preg_match('/^\\"/', $element) ? true : false; if($quotes) { $element = preg_replace( - array('/^\\\"/', '/\\\"$/'), + array('/^\\"/', '/\\"$/'), '', $element ); @@ -333,7 +333,7 @@ class ApacheLogRegex { .':([\d]{2}):([\d]{2}) ([\+\-])([\d]{2})([\d]{2})\]/'; $m = array(); //matches - if(!preg_match($time_format, $time, $matches) + if(!preg_match($time_format, $time, $m) || count($m) != 10) return null; @@ -347,5 +347,3 @@ class ApacheLogRegex { } // end class ApacheLogRegex - -?> \ No newline at end of fileOf the changes suggested above, I agree with the last. There is clearly a mistake here that will stop the time stamps from being parsed. The other two however, I am not so sure. In both cases it is matching the quotes on the element string. Eg:
\"%{User-Agent}i\"But the quotes must be escaped in thehttpd.confso we must match\". The\(backslash) is the escape character for both PHP strings and regular expressions, so it must be escaped twice. Therefore I think the correct value here is’/^\\\\”/with four slashes.