Here's something similar to what you're already doing except comparing to a 
file of "badwords" to look for in the URL's and then emailing you the results.

#!/bin/sh
# filter.sh
#
cd /path/to/filterscript
cat /var/log/squid/access.log | grep -if /path/to/filterscript/badwords > 
hits.out

/path/to/filterscript/wordfilter.gawk hits.out

cat /path/to/filterscript/word-report | /bin/mail -s "URL Filter Report" [EMAIL 
PROTECTED] 

rm hits.out


#!/bin/gawk -f
# wordfilter.gawk

BEGIN {
print "URL Filter Report:" > "/path/to/filterscript/word-report"
print "--------------------------------------" >> 
"/path/to/filterscript/word-report"
sp = " -> "
}

{
print strftime("%m-%d-%Y %H:%M:%S",$1), sp, $8 >> 
"/path/to/filterscript/word-report"
print $7 >> "/path/to/filterscript/word-report"
print "" >> "/path/to/filterscript/word-report"
}



You may need to adjust the columns printed in the awk script.  They're set for 
username instead of IP's.  Also, you'll need to make a 
"/path/to/filterscript/badwords" file with the words/regex you want to search 
for....one per line.  Someone with better regex skills could probably eliminate 
a lot "false" hits with specific patterns in the "badwords" file.  I'm using 
this in addition to squidGuard and blacklists to catch URL's that were missed 
so the output isn't near as large as what you're getting.  

Rob



-------------------------------------
Rob Asher
Network Systems Technician
Paragould School District
(870)236-7744 Ext. 169


>>> "Steven Engebretson" <[EMAIL PROTECTED]> 6/11/2008 1:32 PM >>>
I am looking for a tool that will scan the access.log file for pornographic 
sites, and will report the specifics back.  We do not block access to any 
Internet sites, but need to monitor for objectionable content.

What I am doing now is just greping for some key words, and dumping the output 
into a file.  I am manually going through about 60,000 lines of log file, 
following my grep.  99% of these are false.  Any help would be appreciated.

Thank you all.


-Steven E.


---------- 

This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.



---------- 

This message has been scanned for viruses and
dangerous content by the Paragould School District
MailScanner, and is believed to be clean.

Reply via email to