Hi,
how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the
expected result but REGEX_EXTRACT_ALL gives empty result most of the time
which not expected.
Regards
Rohit
can you give an example on what's the input and what's your code?
On Thu, Feb 27, 2014 at 5:47 PM, ROHIT LADDHA rohitla...@gmail.com wrote:
Hi,
how REGEX_EXTRACT_ALL works? When I use REGEX_EXTRACT, its gives the
expected result but REGEX_EXTRACT_ALL gives empty result most of the time
Yeah sure.
'resultData' file contains
lakers nba.com 1
lakers espn.com 2
kings nhl.com 1
kings nba.com 2
4 rows, 3 columns seperated by '\t'
commands are
p = load 'resultData' using TextLoader AS (line:chararray);
q = foreach p generate flatten (REGEX_EXTRACT (line, '(.com).*',1 ) );
r =
Hi All,
I am trying to create an alias in pig, which should read records from a csv
file which contains some empty records. But Pig is treating those empty
values(separated by commas) as NULL values. I used the same comma separated
empty values to load data into hive tables where it loads them as
I am just curious why would you want to do that. Because you can use the
nulls from pig in the same way as empty values. At the end, if you output
them back, you won't get null values, you would get empty values only. You
can test this by doing a dump/STORE on the EMPTY relation.
Regards
Prav
HI all,
I trying to do a cogroup with five relations that I load from cassandra
previously.
In single node and local casandra testing environment the script works fine
but when I try to execute in a cluster over AWS instances with only one
slave in hadoop cluster and One seed cassandra node I
I forgot I am using cassandra 2.04 , hadoop 1.2.1 and pig 0.12
Thanks
2014-02-27 17:29 GMT+01:00 Miguel Angel Martin junquera
mianmarjun.mailingl...@gmail.com:
HI all,
I trying to do a cogroup with five relations that I load from cassandra
previously.
In single node and local casandra
Hello everyone,
I have a foreach statement and inside of it, I use an order by. After the order
by, I have a UDF. Example like this:
logs = LOAD 'raw_data' USING org.apache.hcatalog.pig.HCatLoader();
logs_g = GROUP logs BY (date, site, profile) PARALLEL 2;
service_flavors = FOREACH logs_g {
Where exactly are you getting duplicates? I'm not sure I understand your
question. Can you give an example please?
On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis
andronat_...@hotmail.com wrote:
Hello everyone,
I have a foreach statement and inside of it, I use an order by. After the
Yes, of course, my output is like that:
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)
BTW, is this some how related[1] ?
[1]:
http://mail-archives.apache.org/mod_mbox/pig-user/201102.mbox/%3c5528d537-d05c-47d9-8bc8-cc68e236a...@yahoo-inc.com%3E
On 27 Φεβ 2014, at 11:20 μ.μ., Anastasis Andronidis andronat_...@hotmail.com
wrote:
Yes, of course, my output is like that:
No... that wouldn't be related since you're not doing a GROUP ALL.
The `FLATTEN(MY_UDF(t))` has me a little weary. Something is possibly going
wrong in your UDF. The output of your UDF is going to be a string that is
some generic status right? My uneducated guess is that there's a bug in
your
Hi again,
I added this in my UDF:
if(!((DataBag) input.get(0)).isSorted()) {
throw new IOException(It's not sorted);
}
And the exception arises. Why? I don't understand it. I specified ORDER BY in
the nested foreach.
Thank you for helping me btw!
On 28 Φεβ 2014, at 1:12
I also just found out that the bag from the nested order by is
org.apache.pig.data.InternalCachedBag and not org.apache.pig.data.SortedDataBag
should be like that?
On 28 Φεβ 2014, at 1:51 π.μ., Anastasis Andronidis andronat_...@hotmail.com
wrote:
Hi again,
I added this in my UDF:
14 matches
Mail list logo