Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
I also just found out that the bag from the nested order by is org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag should be like that? On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis wrote: > Hi again, > > I added this in my UDF: > > if(!((DataBag) input.

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hi again, I added this in my UDF: if(!((DataBag) input.get(0)).isSorted()) { throw new IOException("It's not sorted"); } And the exception arises. Why? I don't understand it. I specified ORDER BY in the nested foreach. Thank you for helping me btw! On 28 Φεβ 2014, at 1:12 π

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
No... that wouldn't be related since you're not doing a GROUP ALL. The `FLATTEN(MY_UDF(t))` has me a little weary. Something is possibly going wrong in your UDF. The output of your UDF is going to be a string that is some generic status right? My uneducated guess is that there's a bug in your UDF.

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
BTW, is this some how related[1] ? [1]: http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis wrote: > Yes, of course, my output is like that: > > (20131209,AEGIS04-KG,ch.cer

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Yes, of course, my output is like that: (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) (20131209,AM-02-SEUA,ch.

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
Where exactly are you getting duplicates? I'm not sure I understand your question. Can you give an example please? On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis < andronat_...@hotmail.com> wrote: > Hello everyone, > > I have a foreach statement and inside of it, I use an order by. After

Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hello everyone, I have a foreach statement and inside of it, I use an order by. After the order by, I have a UDF. Example like this: logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader(); logs_g = GROUP logs BY (date, site, profile) PARALLEL 2; service_flavors = FOREACH logs_g {

Re: Issue with cassandra-pig-thrift.transport and java.net.SocketException: Connection reset

2014-02-27 Thread Miguel Angel Martin junquera
I forgot I am using cassandra 2.04 , hadoop 1.2.1 and pig 0.12 Thanks 2014-02-27 17:29 GMT+01:00 Miguel Angel Martin junquera < mianmarjun.mailingl...@gmail.com>: > HI all, > > I trying to do a cogroup with five relations that I load from cassandra > previously. > > In single node and local cas

Issue with cassandra-pig-thrift.transport and java.net.SocketException: Connection reset

2014-02-27 Thread Miguel Angel Martin junquera
HI all, I trying to do a cogroup with five relations that I load from cassandra previously. In single node and local casandra testing environment the script works fine but when I try to execute in a cluster over AWS instances with only one slave in hadoop cluster and One seed cassandra node I ha

Re: How should I insert EMPTY values into pig

2014-02-27 Thread praveenesh kumar
I am just curious why would you want to do that. Because you can use the nulls from pig in the same way as empty values. At the end, if you output them back, you won't get null values, you would get empty values only. You can test this by doing a dump/STORE on the EMPTY relation. Regards Prav On

How should I insert EMPTY values into pig

2014-02-27 Thread Arpit Maheshwari
Hi All, I am trying to create an alias in pig, which should read records from a csv file which contains some empty records. But Pig is treating those empty values(separated by commas) as NULL values. I used the same comma separated empty values to load data into hive tables where it loads them as

Re: pig REGEX_EXTRACT_ALL

2014-02-27 Thread ROHIT LADDHA
Yeah sure. 'resultData' file contains lakers nba.com 1 lakers espn.com 2 kings nhl.com 1 kings nba.com 2 4 rows, 3 columns seperated by '\t' commands are p = load 'resultData' using TextLoader AS (line:chararray); q = foreach p generate flatten (REGEX_EXTRACT (line, '(.com).*',1 ) ); r = fore

Re: pig REGEX_EXTRACT_ALL

2014-02-27 Thread Nitin Pawar
can you give an example on what's the input and what's your code? On Thu, Feb 27, 2014 at 5:47 PM, ROHIT LADDHA wrote: > Hi, > > how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the > expected result but REGEX_EXTRACT_ALL gives empty result most of the time > which not expecte

pig REGEX_EXTRACT_ALL

2014-02-27 Thread ROHIT LADDHA
Hi, how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the expected result but REGEX_EXTRACT_ALL gives empty result most of the time which not expected. Regards Rohit