BTW, is this some how related[1] ?
[1]: http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis <andronat_...@hotmail.com> wrote: > Yes, of course, my output is like that: > > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) > (20131209,AM-02-SEUA,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) > . > . > . > > and when I put PARALLEL 1 in GROUP BY I get: > > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) > (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) > (20131209,AM-02-SEUA,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) > . > . > . > > > On 27 Φεβ 2014, at 10:20 μ.μ., Pradeep Gollakota <pradeep...@gmail.com> wrote: > >> Where exactly are you getting duplicates? I'm not sure I understand your >> question. Can you give an example please? >> >> >> On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis < >> andronat_...@hotmail.com> wrote: >> >>> Hello everyone, >>> >>> I have a foreach statement and inside of it, I use an order by. After the >>> order by, I have a UDF. Example like this: >>> >>> >>> logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader(); >>> >>> logs_g = GROUP logs BY (date, site, profile) PARALLEL 2; >>> >>> service_flavors = FOREACH logs_g { >>> t = ORDER logs BY status; >>> GENERATE group.date as dates, group.site as site, group.profile as >>> profile, >>> FLATTEN(MY_UDF(t)) as >>> (generic_status); >>> }; >>> >>> The problem is that I get duplicate results.. I know that MY_UDF is >>> running on mappers, but shouldn't each mapper take 1 group from the logs_g? >>> Is something wrong with order by? I tried to add order by parallel but I >>> get syntax errors... >>> >>> My problem is resolved if I put GROUP logs BY (date, site, profile) >>> PARALLEL 1; But this is not a scalable solution. Can someone help me pls? I >>> am using pig 0.11 >>> >>> Cheers, >>> Anastasis >