Re: Nested foreach with order by

2014-02-28 Thread Anastasis Andronidis
I found the problem. I used some private variables in my class. I was thinking that in every tuple I'm getting, pig will create a new object of my class. But this not the case of course. Sorry for the inconvenience Anastasis On 28 Φεβ 2014, at 2:07 π.μ., Anastasis Andronidis andronat_

Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hello everyone, I have a foreach statement and inside of it, I use an order by. After the order by, I have a UDF. Example like this: logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader(); logs_g = GROUP logs BY (date, site, profile) PARALLEL 2; service_flavors = FOREACH logs_g {

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
μ.μ., Pradeep Gollakota pradeep...@gmail.com wrote: Where exactly are you getting duplicates? I'm not sure I understand your question. Can you give an example please? On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: Hello everyone, I have

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
BTW, is this some how related[1] ? [1]: http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Yes, of course, my output is like

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
? My uneducated guess is that there's a bug in your UDF. To confirm, do you get the correct result if you replace your UDF with an out of the box one e.g. COUNT? On Thu, Feb 27, 2014 at 2:21 PM, Anastasis Andronidis andronat_...@hotmail.com wrote: BTW, is this some how related[1

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
I also just found out that the bag from the nested order by is org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag should be like that? On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Hi again, I added this in my UDF

Re: UDF logging

2014-02-18 Thread Anastasis Andronidis
side. So, you have to find the log in the job tracker or node manager (if hadoop 2.x) in the server side. 2014-02-16 7:02 GMT+09:00 Anastasis Andronidis andronat_...@hotmail.com: Hello, I am using Pig 0.11.0 with cdh4.5.0 I have a custom UDF and I am trying to log stuff from inside. My

UDF logging

2014-02-15 Thread Anastasis Andronidis
Hello, I am using Pig 0.11.0 with cdh4.5.0 I have a custom UDF and I am trying to log stuff from inside. My problem is that I get no logs. I tried with: 1) this.log 2) this.warn 3) this.pigLogger 4) I created a LogFactory from apache.commons None of these worked. What am I doing wrong? Cheers

Re: Pig and Hcat

2014-02-09 Thread Anastasis Andronidis
Hello again, any comments on the subject? Cheers, Anastasis On 4 Φεβ 2014, at 5:36 μ.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Hello, I am using Apache Pig version 0.11.0-cdh4.5.0 and when I want to know if there is a way to overwrite a partition in a table with Hcat. Up

Pig and Hcat

2014-02-04 Thread Anastasis Andronidis
Hello, I am using Apache Pig version 0.11.0-cdh4.5.0 and when I want to know if there is a way to overwrite a partition in a table with Hcat. Up until know I am getting errors that partition already exists. Is there an option to overwrite it though a pig script?? Kindly, Anastasis

Re: Small files

2013-09-30 Thread Anastasis Andronidis
Hello again, any comments on this? Thanks, Anastasis On 27 Σεπ 2013, at 5:36 μ.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Hello, I am working on a very small project for my university and I have a small cluster with 2 worker nodes and 1 master node. I'm using Pig to do

Small files

2013-09-27 Thread Anastasis Andronidis
Hello, I am working on a very small project for my university and I have a small cluster with 2 worker nodes and 1 master node. I'm using Pig to do some calculations and I have a question regarding small files. I have a UDF that is reading a small input (around 200k) and correlates the data