Hi,
 
Need some help regarding Calculating Running Counts:
 
I have data in below format:
ss_dates.data:
16,2013-11-10,2013-11-10
16,2013-11-10,2013-11-09
16,2013-11-10,2013-11-08
16,2013-11-10,2013-11-07
16,2013-11-10,2013-11-06
16,2013-11-10,2013-11-05
16,2013-11-10,2013-11-04
15,2013-11-03,2013-11-03
15,2013-11-03,2013-11-02
15,2013-11-03,2013-11-01
15,2013-11-03,2013-10-31
15,2013-11-03,2013-10-30
15,2013-11-03,2013-10-29
15,2013-11-03,2013-10-28
14,2013-10-27,2013-10-27
14,2013-10-27,2013-10-26
14,2013-10-27,2013-10-25

qs_data.data:
2013-11-10,abc,def
2013-11-09,abc,rpt
2013-11-06,abc,tre
2013-11-03,abc,yys
2013-10-25,abc,jsh
2013-10-27,abc,jkg

 
Pig Script:
ss_dts = LOAD 'ss_dates.data' USING PigStorage(',') as (nbr:int, 
wk_st_dt:chararray, dt:chararray);
qs_rec = LOAD 'qs_data.data' USING PigStorage(',') as (dat:chararray, 
id:chararray, client:chararray);
jn_ss_qs = JOIN ss_dts by dt, qs_rec by dat;
gn_jn_ss_qs = FOREACH jn_ss_qs GENERATE
ss_dts::nbr as nbr,
ss_dts::wk_st_dt as wk_st_dt,
ss_dts::dt as dt,
qs_rec::id as id,
qs_rec::client as client
;
grp_ss_qs = GROUP gn_jn_ss_qs BY (wk_st_dt);
gn_grp_ss_qs = FOREACH grp_ss_qs {
                                  fil_lt_wsdt = FILTER gn_jn_ss_qs BY dt <= 
wk_st_dt;
                                  proj_lt_wsdt = FOREACH fil_lt_wsdt GENERATE 
client;
                                  dist_lt_wsdt = DISTINCT proj_lt_wsdt;
                                  GENERATE group as wk_st_dt, 
COUNT(dist_lt_wsdt) as clcount;
                                 };
dump gn_grp_ss_qs;
 
This gives me output in below format.
output:
(2013-10-27,2)
(2013-11-03,1)
(2013-11-10,3)
 
These are counts for Week. Is there a way I could get Running Counts like below?
 
(2013-10-27,2)
(2013-11-03,3)
(2013-11-10,6)
 
Thanks,
Manoj

Reply via email to