I found the problem. I used some private variables in my class. I was thinking
that in every tuple I'm getting, pig will create a new object of my class. But
this not the case of course.
Sorry for the inconvenience
Anastasis
On 28 Φεβ 2014, at 2:07 π.μ., Anastasis Andronidis
wrote:
> I also
I also just found out that the bag from the nested order by is
org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag
should be like that?
On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis
wrote:
> Hi again,
>
> I added this in my UDF:
>
> if(!((DataBag) input.
Hi again,
I added this in my UDF:
if(!((DataBag) input.get(0)).isSorted()) {
throw new IOException("It's not sorted");
}
And the exception arises. Why? I don't understand it. I specified ORDER BY in
the nested foreach.
Thank you for helping me btw!
On 28 Φεβ 2014, at 1:12 π
No... that wouldn't be related since you're not doing a GROUP ALL.
The `FLATTEN(MY_UDF(t))` has me a little weary. Something is possibly going
wrong in your UDF. The output of your UDF is going to be a string that is
some generic status right? My uneducated guess is that there's a bug in
your UDF.
BTW, is this some how related[1] ?
[1]:
http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E
On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis
wrote:
> Yes, of course, my output is like that:
>
> (20131209,AEGIS04-KG,ch.cer
Yes, of course, my output is like that:
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)
(20131209,AM-02-SEUA,ch.
Where exactly are you getting duplicates? I'm not sure I understand your
question. Can you give an example please?
On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis <
andronat_...@hotmail.com> wrote:
> Hello everyone,
>
> I have a foreach statement and inside of it, I use an order by. After
Hello everyone,
I have a foreach statement and inside of it, I use an order by. After the order
by, I have a UDF. Example like this:
logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader();
logs_g = GROUP logs BY (date, site, profile) PARALLEL 2;
service_flavors = FOREACH logs_g {