Hi:
I am trying to develop a web based tool to track page hit counts, user
session activity and etc of our own sites.

I meet some problems:

1) How to distinguish a request target is a page or a resource?

For example,the following two logs(remove some parts):

#1-> [17/Sep/2010:11:38:26 +0800] "POST /test.jsp?name=test HTTP/1.1" 200
"test.jsp"
#2-> [17/Sep/2010:11:40:11 +0800] "POST /example/test.jpg HTTP/1.1" 200
"/example/test.jpg"
#3-> [17/Sep/2010:11:44:26 +0800] "POST /example/testServlet HTTP/1.1" 200
"test.jsp"
the pattern used in the above log is : '%t "%r" %s "%U"'.

The log #1 show a page request with a parameter, it can be use to calculate
the most frequently visited pages.

Log #2 show a resource(it is a image here) request, it can be used to
calculate the most frequently visited files.

Log#3 show a requst with nothing(it is a servlet),in fact it is a page.

That's to say, they are different request types,so how to distinguish them
in my codes?

2)Log parser.
I can read the log file line by line. But how to extract the value of each
attribute?
They are all in one line. Split them using the string.split() method? But
how if the value itself contains the separator?

For example, I use the split(" ") to split the log#1,but the value "POST
/example/test.jpg HTTP/1.1" will be splitted also,and this maybe
inefficient, so I wonder if there is a tool can make me do this easily?

Reply via email to