I'm analyzing a daily apache log file. I'd like to get the number of
requests and of visits by hour.

I managed to get the requests, but how do I get the visits?

grunt> RAW_LOGS = LOAD '<log-file>' USING TextLoader() AS (line:chararray);
grunt> LOGS_BASE = FOREACH RAW_LOGS GENERATE
  FLATTEN(
    REGEX_EXTRACT_ALL(line, '(\\S+) (\\S+)
\\[(\\d{2}/\\w{3}/\\d{4})\\:(\\d{2})\\:(\\d{2})\\:(\\d{2}) (\\+\\d{4})\\]
"(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" (\\S+) (\\S+)')
  ) AS (
    client:   chararray,
    username: chararray,
    date: chararray,
    hour: chararray,
    minute: chararray,
    second: chararray,
    timeZone: chararray,
    request:  chararray,
    statusCode: int,
    bytesSent: chararray,
    referer:  chararray,
    userAgent: chararray,
    remoteUser: chararray,
    timeTaken: chararray
);
grunt> A = GROUP LOGS_BASE BY hour;
DESCRIBE A;
A: {group: chararray,LOGS_BASE: {(client: chararray,username:
chararray,date: chararray,hour: chararray,minute: chararray,second:
chararray,timeZone: chararray,request: chararray,statusCode: int,bytesSent:
chararray,referer: chararray,userAgent: chararray,remoteUser:
chararray,timeTaken: chararray)}}
grunt> B = FOREACH A GENERATE group AS hour, COUNT( $1 );
grunt> C = ORDER B BY hour; -- requests by hour

How can I now get the distinct count of clients per hour?

Thanks for your help!

-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner 
Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************




-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner 
Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Reply via email to