You also might try this filter script I've written to cleanup the logs before passing to calamaris, it removes some characters that cause calamaris problems.
#!/usr/local/bin/gawk -f { if ( $0 ~ /[^\x20-\x7E]/ ) { gsub( /\x20\x0C/, "" ) gsub( /[\x00-\x1F]/, "" ) gsub( /[\x7F-\xFF]/, "" ) }
if ( $5 < 0 ) { $5 *= -1 }
if ( $2 !~ /[^0-9]/ && $5 !~ /[^0-9]/ && ( $11 == "ALLOW" || $11 == "DENY" ) ) { print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10 } else { next } }
-- Kirk Schneider 972-952-4645 (work) Raytheon Corporate IT Security 214-912-8679 (cell) [EMAIL PROTECTED] 888-431-7621 (pager)
"If you think the problem is bad now just wait until we've solved it."
-------- Original Message -------- Subject: [squid-users] Calamaris Date: Mon, 1 Mar 2004 17:43:52 +0100 From: Endre Szekely-Bencedi <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]
Hello List,
I have a problem with Calamaris (v2.58).
I am using squid 2.5stable3, compiled from sources, with SmartFilter plugin. As far as I know, I have to use the squid-extended input type for this. But this will give some errors:
[EMAIL PROTECTED] logs]# date;cat test.log | /usr/local/squid/bin/calamaris -f squid-extended -F html > /var/www/html/calamaris2.html;date Mon Mar 1 17:44:08 CET 2004 Malformed UTF-8 character (unexpected non-continuation byte 0x31, immediately after start byte 0xf3) in split at (eval 1) line 20, <> line 369578. Malformed UTF-8 character (unexpected non-continuation byte 0x31, immediately after start byte 0xf3) in split at (eval 1) line 20, <> line 369578. Split loop at (eval 1) line 20, <> line 369578. Mon Mar 1 17:48:05 CET 2004 [EMAIL PROTECTED] logs]#
Generated log shows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"></HEAD> <BODY></BODY></HTML>
Which is an empty page.
A sample from the logfile:
1077780471.441 93 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.466 64 3.227.65.74 TCP_MISS/200 1722 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.479 72 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.508 59 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780471.699 73 3.227.65.74 TCP_MISS/200 1585 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.713 83 3.227.65.74 TCP_MISS/200 1607 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.726 86 3.227.65.74 TCP_MISS/200 1589 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780471.885 256 3.227.65.74 TCP_MISS/200 726 GET http://as.fotexnet.hu/adserver.ads/153/0///937480 - DEFAULT_PARENT/10.20.20.254 text/ht ml text/html ALLOW 1077780473.212 229 3.227.65.74 TCP_MISS/200 23713 GET http://index.hu/ad/lipton/banner1_120x240.swf? - DEFAULT_PARENT/10.20.20.254 applicat ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.298 72 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.388 279 3.227.65.74 TCP_MISS/200 17697 GET http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254 application/x-sho ckwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.439 106 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.458 47 3.227.65.74 TCP_MISS/302 476 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.480 368 3.227.65.74 TCP_MISS/200 4292 GET http://as.fotexnet.hu/adserver.ads/196/0///27236 - DEFAULT_PARENT/10.20.20.254 text/ht ml text/html ALLOW 1077780473.643 162 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.646 144 3.227.65.74 TCP_MISS/302 477 GET http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Portal Sites 1077780473.673 487 3.227.65.74 TCP_MISS/200 10319 GET http://as.fotexnet.hu/adserver.ads/200/0///378158 - DEFAULT_PARENT/10.20.20.254 text/ html text/html ALLOW 1077780473.799 280 3.227.65.74 TCP_MISS/200 26216 GET http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254 application/ x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites 1077780473.819 122 3.227.65.74 TCP_MISS/200 216 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites 1077780473.824 124 3.227.65.74 TCP_MISS/200 355 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites 1077780473.842 136 3.227.65.74 TCP_MISS/200 1603 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Port al Sites 1077780473.846 47 3.227.65.74 TCP_MISS/200 353 GET http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html ALLOW Porta l Sites
Am I doing something wrong?
Thanks, Endre.