Hi: I am trying to develop a web based tool to track page hit counts, user session activity and etc of our own sites.
I meet some problems: 1) How to distinguish a request target is a page or a resource? For example,the following two logs(remove some parts): #1-> [17/Sep/2010:11:38:26 +0800] "POST /test.jsp?name=test HTTP/1.1" 200 "test.jsp" #2-> [17/Sep/2010:11:40:11 +0800] "POST /example/test.jpg HTTP/1.1" 200 "/example/test.jpg" #3-> [17/Sep/2010:11:44:26 +0800] "POST /example/testServlet HTTP/1.1" 200 "test.jsp" the pattern used in the above log is : '%t "%r" %s "%U"'. The log #1 show a page request with a parameter, it can be use to calculate the most frequently visited pages. Log #2 show a resource(it is a image here) request, it can be used to calculate the most frequently visited files. Log#3 show a requst with nothing(it is a servlet),in fact it is a page. That's to say, they are different request types,so how to distinguish them in my codes? 2)Log parser. I can read the log file line by line. But how to extract the value of each attribute? They are all in one line. Split them using the string.split() method? But how if the value itself contains the separator? For example, I use the split(" ") to split the log#1,but the value "POST /example/test.jpg HTTP/1.1" will be splitted also,and this maybe inefficient, so I wonder if there is a tool can make me do this easily?