Hello Dmitriy;

I did not know the flatten command, so while waiting for a reply from
the mailing list i have done:

RAW = load '/home/can/my.log' using PigStorage('\t');
PROD = foreach PROD generate $3 as ip, $4 as session, $9 as sid;
--
V = group PROD by sid;
UV = group PROD by (sid,session,ip);
--
HITS = foreach V generate group, COUNT($1);
UNQ = group UV by group.sid;
UNQVISITS = foreach UNQ generate group, COUNT($1);

which does seem to work. I was wondering if I did something very wrong
in my ignorance.

best regards,
-C.B.



On Mon, Jan 17, 2011 at 3:21 AM, Dmitriy Ryaboy <[email protected]> wrote:
> You can group by multiple keys, so perhaps
>
> prod_grouped = group prod by (sid, ip);
> prod_hits = foreach prod_grouped generate FLATTEN(group) as (sid, ip),
> COUNT($1) as prod_hit_count;
>
> On Sun, Jan 16, 2011 at 5:02 PM, Cam Bazz <[email protected]> wrote:
>
>> Hello,
>>
>> I have rigged my web application so it generates some sort of custom
>> access log. Each line in my access log has the ipnumber,
>> sessionCookie, idOfPage.
>>
>> How can i count unique visits to per idOfPage?
>>
>> I followed the tutorial to write a script for calculating number of
>> visits per idOfPage:
>>
>> raw = load '/home/cambazz/my.log' using PigStorage('\t');
>> rawprod = filter raw by $2=='PROD';
>> prod = foreach rawprod generate $0 as time, $3 as ip, $4 as session, $9 as
>> sid;
>> prod_grouped = group prod by sid;
>> prod_hits = foreach prod_grouped generate group, COUNT($1);
>> dump prod_hits;
>>
>> which was easy.
>>
>> I now want to calculate number of unique visits, where visits from
>> same ip,sessionCookie counts as 1 per sid.
>>
>> I tried various schemes, but could not quite come up with it.
>>
>> Any ideas / suggestions / help greatly appreciated.
>>
>>
>> Best Regards,
>> C.B.
>>
>

Reply via email to