Recent changes to bugs

junky hostnames

2005-02-15T22:29:54Z

The parser allows for symbolic hostnames, such as
www.example.com, but it gets confused if the
hostname has a dash in it.

www.ugly-name.com

parsing misses authenticated users

2005-02-15T22:24:43Z

Apache uses "-" for missing value.

parse.py combined_format_re assumes that the
second and third field will be missing (it looks for
"-(?P<unknown>.*?)-"

The second and third fields are for user id; in our log
files, the third field is filled in for most webDAV
requests, as these require authorization.

_testa='0.0.0.0 - first.m.last@example.com [19/Apr/
2002:12:22:02 -0400] "MKCOL / HTTP/1.1" 200 0'

# Yes, we had entries with spaces in them.
_testb='0.0.0.0 - first last [19/Apr/2002:12:22:02 -
0400] "DELETE / HTTP/1.1" 200 0'

# We even had a few that were entered as "",
# though I hope that was a config mistake.
I ended up using this regex for the user field:
'(?P<user2>(-)|([a-zA-Z0-9.@ _-]+)|"")'

It is possible that other characters would also appear,
but didn't happen to in our logs.

Exception in parse.py with long lines

2004-03-29T19:53:57Z

Occasionally I get lines in my access_log that look like:
ip - - [27/Mar/2004:10:32:49 -0500] "SEARCH
/~P^B�^B�^B..." 414 360 "-" "-"

With ... being a lot of the same charcters (a little
over 8000). This cause an exception (recursion limit
reached) in process_log when it is trying to parse the
line. I fixed this by modifying the search line as
follows:
try:
m = self.log_format_compiled.search(line)
except Exception, e:
print "Unparsed line ", line
print e
continue

parse.py fails with "unsubscriptable object"

2003-12-03T14:42:33Z

Hi all....
after some months of daily usage of scratchy it started
to fail with:
---------------------------
>> Parsing log: /var/log/apache/access.log
>> Reading: data/www.jesta.net/122003
Could not read file: data/www.jesta.net/122003
Traceback (most recent call last):
File "parse.py", line 642, in ?
log.process_log()
File "parse.py", line 170, in process_log
self.__process_useragent(useragent)
File "parse.py", line 317, in __process_useragent
self.__increment_hit_count('opsys', _opsys)
File "parse.py", line 416, in __increment_hit_count
dict = self.parsed_data[field]
TypeError: unsubscriptable object
------------------------------------------------------------

I'm not so good in python for debuging it, but if
someone point me to way it can be done, I'd try it....

Thanks
Jindra
<jindra(at) jesta[dot] net>

No reports generated

2003-07-24T20:36:34Z

hi,
i downloaded 0.6.9. After succesfull execution of
parse.py (no errors printed) no reports was
generated. I seems to me that report.py failed to run
somehow since reports directory was not created
(should it?).
I tried to run report.py manually but it complains
about not finding data/mysite/072003 file.

python version is 2.2.1.

I tried also 0.6.8 - but no success.

tapsa

Report.py hangs with gdchart when MAX_FILE_TYPE is 0

2003-07-06T21:20:49Z

Hi,
scratchy 0.6.2 bug.
Line 161 in report.py hangs if gdchart is installed,
and the MAX_FILE_TYPE list is set to be unlimited
(0). That is due to a list object of 0 length. When
using charts, the 0 value is dangerous at chart
creation - true for any chart, I just happened to set
MAX_FILE_TYPE to 0.
Bye,
Attila

Incorrect file type

2003-07-06T19:00:16Z

Hi,
I have found this line in my scratchy report, at the
filetypes:
com/ Unknown 678 2.05 3038

- they are parsed incorrectly. My guess is that they
are created when in the access.log there is a line
like:
GET
http://a1452.g.akamaitech.
net/f/1452/2731/24h/cache.xerox.
com/images/world/s/spacer.gif

or something like this.
Some of the file types that could be included:
.zip, tar.gz, .rar, bz2 etc. - Compressed,
.wav - Sound,
.avi,.mpg, mpeg - movies
Also note that there is a typo in filetypes.py (should
be .bmp not .bmg)
PS: I have uploaded a modified filetypes.py with the
above options.

Attila

Country graph + table

2003-07-04T14:41:12Z

When there is many countries in report then countries
doesn't fit in screen good.... It would be Ok if they
would be placed just like in "Daily Log" - Graph on top
and under it the table.

User agent (Siets)

2003-07-04T11:03:40Z

Could not recognize useragent: SietsCrawler/0.1
Could not recognize useragent: SietsCrawler/0.1
Could not recognize useragent: SietsCrawler/0.1
Could not recognize useragent: SietsCrawler/0.1
Could not recognize useragent: SietsCrawler/0.1
Could not recognize useragent: SietsCrawler/0.1

I have a lot of these in my logs, so maybe U could add
this robot, too.
It's latvian search engine robot (Siets -
http://www.siets.lv\).

Thanx!

Completion time is 0

2003-07-04T11:01:08Z

Writing: /home/nix/stats/nix/052003
>> Creating hourly chart
>> Creating daily chart
>> Creating day_of_week chart
>> Creating Operating Systems chart
>> Creating File Types chart
>> Creating Browsers chart
>> creating file:
/home/nix/public_html/stats/nix/052003/nix.css
Creating report for: 5/2003
>> creating file:
/home/nix/public_html/stats/nix/052003/index.html
>> creating summary
>> creating file: /home/nix/public_html/stats/nix/nix.css
>> Creating summary chart

Parsed lines : 601
Completion time (seconds): 0
Traceback (most recent call last):
File "./parse.py", line 614, in ?
log.process_log()
File "./parse.py", line 222, in process_log
print "Lines per second : %d" %
(total_lines / nsecs)
ZeroDivisionError: integer division or modulo by zero