pig REGEX_EXTRACT_ALL

2014-02-27 Thread ROHIT LADDHA
Hi, how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the expected result but REGEX_EXTRACT_ALL gives empty result most of the time which not expected. Regards Rohit

Re: pig REGEX_EXTRACT_ALL

2014-02-27 Thread Nitin Pawar
can you give an example on what's the input and what's your code? On Thu, Feb 27, 2014 at 5:47 PM, ROHIT LADDHA rohitla...@gmail.com wrote: Hi, how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the expected result but REGEX_EXTRACT_ALL gives empty result most of the time

Re: pig REGEX_EXTRACT_ALL

2014-02-27 Thread ROHIT LADDHA
Yeah sure. 'resultData' file contains lakers nba.com 1 lakers espn.com 2 kings nhl.com 1 kings nba.com 2 4 rows, 3 columns seperated by '\t' commands are p = load 'resultData' using TextLoader AS (line:chararray); q = foreach p generate flatten (REGEX_EXTRACT (line, '(.com).*',1 ) ); r =

How should I insert EMPTY values into pig

2014-02-27 Thread Arpit Maheshwari
Hi All, I am trying to create an alias in pig, which should read records from a csv file which contains some empty records. But Pig is treating those empty values(separated by commas) as NULL values. I used the same comma separated empty values to load data into hive tables where it loads them as

Re: How should I insert EMPTY values into pig

2014-02-27 Thread praveenesh kumar
I am just curious why would you want to do that. Because you can use the nulls from pig in the same way as empty values. At the end, if you output them back, you won't get null values, you would get empty values only. You can test this by doing a dump/STORE on the EMPTY relation. Regards Prav

Issue with cassandra-pig-thrift.transport and java.net.SocketException: Connection reset

2014-02-27 Thread Miguel Angel Martin junquera
HI all, I trying to do a cogroup with five relations that I load from cassandra previously. In single node and local casandra testing environment the script works fine but when I try to execute in a cluster over AWS instances with only one slave in hadoop cluster and One seed cassandra node I

Re: Issue with cassandra-pig-thrift.transport and java.net.SocketException: Connection reset

2014-02-27 Thread Miguel Angel Martin junquera
I forgot I am using cassandra 2.04 , hadoop 1.2.1 and pig 0.12 Thanks 2014-02-27 17:29 GMT+01:00 Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com: HI all, I trying to do a cogroup with five relations that I load from cassandra previously. In single node and local casandra

Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hello everyone, I have a foreach statement and inside of it, I use an order by. After the order by, I have a UDF. Example like this: logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader(); logs_g = GROUP logs BY (date, site, profile) PARALLEL 2; service_flavors = FOREACH logs_g {

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
Where exactly are you getting duplicates? I'm not sure I understand your question. Can you give an example please? On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis andronat_...@hotmail.com wrote: Hello everyone, I have a foreach statement and inside of it, I use an order by. After the

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Yes, of course, my output is like that: (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2) (20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
BTW, is this some how related[1] ? [1]: http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Yes, of course, my output is like that:

Re: Nested foreach with order by

2014-02-27 Thread Pradeep Gollakota
No... that wouldn't be related since you're not doing a GROUP ALL. The `FLATTEN(MY_UDF(t))` has me a little weary. Something is possibly going wrong in your UDF. The output of your UDF is going to be a string that is some generic status right? My uneducated guess is that there's a bug in your

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
Hi again, I added this in my UDF: if(!((DataBag) input.get(0)).isSorted()) { throw new IOException(It's not sorted); } And the exception arises. Why? I don't understand it. I specified ORDER BY in the nested foreach. Thank you for helping me btw! On 28 Φεβ 2014, at 1:12

Re: Nested foreach with order by

2014-02-27 Thread Anastasis Andronidis
I also just found out that the bag from the nested order by is org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag should be like that? On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis andronat_...@hotmail.com wrote: Hi again, I added this in my UDF: