Merge join

2011-07-19 Thread Ankur Jain
Hi all, I'm trying to do a map-side only merge join [1] in pig using Zebra's TableLoader. (My data allows merge join.) But I'm being unable to use the TableLoader. Even a simple script that loads a table and just stores it back doesn't work - A = load 'my_input' using org.apache.hadoop.zeb

Re: The nested depth?

2011-07-19 Thread Thejas Nair
Hi Yong, There is no limitation on the levels of nesting that you can have in your data. Pig has nested-foreach which lets you manipulate bags directly as if it were a relation. But that works only for one level. There are plans to extend that. You can always process data with any levels of nes

Re: hadoop fail over

2011-07-19 Thread Gerrit Jansen van Vuuren
HI, Zookeeper + Book Keeper seems to be a good candidate for this. I've recently changed http://code.google.com/p/bigstreams from using hazelcast to using zookeeper, it uses zookeeper to keep track of realtime file transfers, and have found zookeeper to work great, failover is decent and predictab

Re: hadoop fail over

2011-07-19 Thread jagaran das
Try Avatar Node - JD From: Thiago Veiga To: user@pig.apache.org Sent: Tuesday, 19 July 2011 12:22 PM Subject: hadoop fail over Hi , i would like to create a hadoop fail over system , mainly for the master node in my hadoop cluster. I can t use the linux HA so

hadoop fail over

2011-07-19 Thread Thiago Veiga
Hi , i would like to create a hadoop fail over system , mainly for the master node in my hadoop cluster. I can t use the linux HA so i have been trying another approach I thought zookeeper wold be a good option , but i am not sure about that Does someone have a suggestion ? thanks thiago

Re: why the foreach nested form can't work?

2011-07-19 Thread Jacob Perkins
On Tue, 2011-07-19 at 16:05 +0200, 勇胡 wrote: > How can I understand that 'A.score' is a bag? I mean that if I issue a > 'describe B' command, I can get B: {group:int, A: {name:chararray, > no:int,score:int}}. Looking at the output of describe shows that A is bag (eg. the '{' and '}' characters), y

Re: why the foreach nested form can't work?

2011-07-19 Thread Gianmarco
2011/7/19 勇胡 > How can I understand that 'A.score' is a bag? I mean that if I issue a > 'describe B' command, I can get B: {group:int, A: {name:chararray, > no:int,score:int}}. From here, I can't get any information that 'A.score' > is > a bag, but I can see that A.score is an element of bag. >

Re: why the foreach nested form can't work?

2011-07-19 Thread 勇胡
How can I understand that 'A.score' is a bag? I mean that if I issue a 'describe B' command, I can get B: {group:int, A: {name:chararray, no:int,score:int}}. From here, I can't get any information that 'A.score' is a bag, but I can see that A.score is an element of bag. And why if I delete the quan

Re: why the foreach nested form can't work?

2011-07-19 Thread Jacob Perkins
I think it's because 'A.score' is a bag but Pig needs a reference to a field in the tuples. This worked for me: A = LOAD 'foo.tsv' AS (name:chararray, no:int, score: int); B = GROUP A BY no; C = FOREACH B { D = FILTER A BY score > 80; GENERATE FLATTEN(D.(name, score)); }; DUMP C;

why the foreach nested form can't work?

2011-07-19 Thread 勇胡
Hello, I want to use foreach statement to filter the tuple in the bag. But it didn't work. My pig-code is as follows: A = LOAD '/home/test/student.txt' AS (name:chararray, no:int, score: int); B = GROUP A BY no; C = FOREACH B { D = FILTER A BY A.score > 80; GENERATE D.name, D.score;} DUM

The nested depth?

2011-07-19 Thread 勇胡
Hello, I know that pig can support nested data. I want to know that how many nested depth pig can support, infinite nested or there exists a limitation? Thanks! Yong

Re: STORE INTO replacing contents?

2011-07-19 Thread Raghu Angadi
I noticed this too. I could not find any documentation regd how return code from a command like 'fs' or 'rmf' or some shell command is. Replace 'fs -rmr' with rmf for your case. Raghu. On Tue, Jul 19, 2011 at 12:27 AM, Chris Rosner wrote: > Greetings, > > I'm trying to upgrade from 0.7.0 to 0.

STORE INTO replacing contents?

2011-07-19 Thread Chris Rosner
Greetings, I'm trying to upgrade from 0.7.0 to 0.8.1, but am having some trouble with existing scripts. The basic problem I'm trying to solve is storing an alias in to hdfs, overwriting data that may already exist at that path. I've been using this pattern in 0.7.0: fs -rmr /some/output/path/ s