Hi all,
I'm trying to do a map-side only merge join [1] in pig using Zebra's
TableLoader. (My data allows merge join.) But I'm being unable to use the
TableLoader. Even a simple script that loads a table and just stores it back
doesn't work -
A = load 'my_input' using org.apache.hadoop.zeb
Hi Yong,
There is no limitation on the levels of nesting that you can have in
your data.
Pig has nested-foreach which lets you manipulate bags directly as if it
were a relation. But that works only for one level. There are plans to
extend that.
You can always process data with any levels of nes
HI,
Zookeeper + Book Keeper seems to be a good candidate for this.
I've recently changed http://code.google.com/p/bigstreams from using
hazelcast to using zookeeper, it uses zookeeper to keep track of realtime
file transfers, and have found zookeeper to work great, failover is decent
and predictab
Try Avatar Node
- JD
From: Thiago Veiga
To: user@pig.apache.org
Sent: Tuesday, 19 July 2011 12:22 PM
Subject: hadoop fail over
Hi ,
i would like to create a hadoop fail over system , mainly for the
master node in my hadoop cluster.
I can t use the linux HA so
Hi ,
i would like to create a hadoop fail over system , mainly for the
master node in my hadoop cluster.
I can t use the linux HA so i have been trying another approach
I thought zookeeper wold be a good option , but i am not sure about that
Does someone have a suggestion ?
thanks
thiago
On Tue, 2011-07-19 at 16:05 +0200, 勇胡 wrote:
> How can I understand that 'A.score' is a bag? I mean that if I issue a
> 'describe B' command, I can get B: {group:int, A: {name:chararray,
> no:int,score:int}}.
Looking at the output of describe shows that A is bag (eg. the '{' and
'}' characters), y
2011/7/19 勇胡
> How can I understand that 'A.score' is a bag? I mean that if I issue a
> 'describe B' command, I can get B: {group:int, A: {name:chararray,
> no:int,score:int}}. From here, I can't get any information that 'A.score'
> is
> a bag, but I can see that A.score is an element of bag.
>
How can I understand that 'A.score' is a bag? I mean that if I issue a
'describe B' command, I can get B: {group:int, A: {name:chararray,
no:int,score:int}}. From here, I can't get any information that 'A.score' is
a bag, but I can see that A.score is an element of bag.
And why if I delete the quan
I think it's because 'A.score' is a bag but Pig needs a reference to a
field in the tuples. This worked for me:
A = LOAD 'foo.tsv' AS (name:chararray, no:int, score: int);
B = GROUP A BY no;
C = FOREACH B {
D = FILTER A BY score > 80;
GENERATE FLATTEN(D.(name, score));
};
DUMP C;
Hello,
I want to use foreach statement to filter the tuple in the bag. But it
didn't work. My pig-code is as follows:
A = LOAD '/home/test/student.txt' AS (name:chararray, no:int, score: int);
B = GROUP A BY no;
C = FOREACH B {
D = FILTER A BY A.score > 80;
GENERATE D.name, D.score;}
DUM
Hello,
I know that pig can support nested data. I want to know that how many nested
depth pig can support, infinite nested or there exists a limitation?
Thanks!
Yong
I noticed this too. I could not find any documentation regd how return code
from a command like 'fs' or 'rmf' or some shell command is.
Replace 'fs -rmr' with rmf for your case.
Raghu.
On Tue, Jul 19, 2011 at 12:27 AM, Chris Rosner wrote:
> Greetings,
>
> I'm trying to upgrade from 0.7.0 to 0.
Greetings,
I'm trying to upgrade from 0.7.0 to 0.8.1, but am having some trouble
with existing scripts.
The basic problem I'm trying to solve is storing an alias in to hdfs,
overwriting data that may already exist at that path.
I've been using this pattern in 0.7.0:
fs -rmr /some/output/path/
s
13 matches
Mail list logo