You can group by multiple keys, so perhaps prod_grouped = group prod by (sid, ip); prod_hits = foreach prod_grouped generate FLATTEN(group) as (sid, ip), COUNT($1) as prod_hit_count;
On Sun, Jan 16, 2011 at 5:02 PM, Cam Bazz <[email protected]> wrote: > Hello, > > I have rigged my web application so it generates some sort of custom > access log. Each line in my access log has the ipnumber, > sessionCookie, idOfPage. > > How can i count unique visits to per idOfPage? > > I followed the tutorial to write a script for calculating number of > visits per idOfPage: > > raw = load '/home/cambazz/my.log' using PigStorage('\t'); > rawprod = filter raw by $2=='PROD'; > prod = foreach rawprod generate $0 as time, $3 as ip, $4 as session, $9 as > sid; > prod_grouped = group prod by sid; > prod_hits = foreach prod_grouped generate group, COUNT($1); > dump prod_hits; > > which was easy. > > I now want to calculate number of unique visits, where visits from > same ip,sessionCookie counts as 1 per sid. > > I tried various schemes, but could not quite come up with it. > > Any ideas / suggestions / help greatly appreciated. > > > Best Regards, > C.B. >
