Pig needs to know where your HDFS is, doesn't it? :)
http://pig.apache.org/docs/r0.7.0/setup.html#Embedded+Programs details
on what needs to be set for embedded programs to use Pig.
Specifically, the $HADOOPDIR part.
You could also put the conf files into the classpath as Jeff pointed :)
On Thu,
Thanks for the pointers Yan!
Renato M.
2010/10/27 Yan Zhou
> If you can not change your input data generation process to generate
> input directly in Zebra, I can’t see any alternative than two sets of data.
>
>
>
> Regarding generating Zebra data, Pig is simpler than raw map/reduce and the
>
You need to put hadoop conf file under class path, otherwise you will
always connect local file system
On Wed, Oct 27, 2010 at 4:25 PM, Zach Bailey wrote:
>
> Hi all,Facing a weird problem and wondering if anyone
> has run into this before. I've been playing with PigServ
It should already work on trunk and 0.8 branch.
Olga
-Original Message-
From: rakesh kothari [mailto:rkothari_...@hotmail.com]
Sent: Wednesday, October 27, 2010 5:03 PM
To: user@pig.apache.org; pig-u...@hadoop.apache.org
Subject: RE: Name of the Job in PIG
>>In Pig 0.8 to be released
>>In Pig 0.8 to be released soon, Pig will print mapping between
>>aliases/operators and JobId.
Is this feature already implemented. I ran the latest PIG code from SVN but it
doesn't print out the mappings. Do I need to do anything special ?
When is 0.8 slated for release ?
Thanks,
-Rakesh
Hi all,Facing a weird problem and wondering if anyone
has run into this before. I've been playing with PigServer to programmatically
run some simple pig scripts and it does not seem to be connecting to HDFS when
I pass in ExecType.MAPREDUCE.I am running in pseudo-distrib
The easiest way to do this might be to have your loader return a
single tuple that contains bag, with all of the tuples you want to
return in that bag. Then your next statement can be a foreach with a
flatten to turn each of those into its own record.
A = load 'foo' as (b:bag{});
B = forea
Hi Pig Users,
I am currently writing a UDF loader. In one of my use case, one line in the
input stream results in multiple tuples. Has anyone encounter or solve this
issue on their end.
The current structure of the code getNext method only return tuple but I
want it to return a List. Let me kn
If you give a name via set command, the name will be associated with all jobs
within that script.
In Pig 0.8 to be released soon, Pig will print mapping between
aliases/operators and JobId.
Olga
-Original Message-
From: rakesh kothari [mailto:rkothari_...@hotmail.com]
Sent: Wednesday
Hi,
What's the best way to diagnose which M/R step PIG is executing ? I was hoping
if name of the PIG job can have some relationship with the operator it is
executing.
It gets difficult to diagnose what step is running without it.
Thanks,
-Rakesh
Yes. I did that as well.
Thanks,
-Rakesh
From: te...@yahoo-inc.com
To: user@pig.apache.org; dvrya...@gmail.com; rkothari_...@hotmail.com
CC: pig-u...@hadoop.apache.org
Date: Wed, 27 Oct 2010 10:43:27 -0700
Subject: Re: Joins with OR condition
Message body
I don’t understand the solution p
I don't understand the solution proposed by Dmitriy using 3 joins. But it can
be done using two joins and a union, as follows -
J1 = join A by prop1, B by prop1;
J2 = join A by prop2, B by prop2;
-- this filters prevents joined rows where both prop1, prop2 match from being
counted twice
J2_fil
Thanks, I guess I would trip over that later on - but for this immediate
problem it doesn't help (of course, because Pig fails at the start, when I'm
not working with HBase yet).
I have tracked the error message to HBaseStorage.init() and added some
debugging info:
-
public void init()
The same way you have /etc/hadoop/conf on the claspath, you want to
put the hbase conf directory on the classpath.
-D
On Tue, Oct 26, 2010 at 11:50 PM, Anze wrote:
>
>> ... You have all the conf files in PIG_CLASSPATH right?
>
> I think I do:
> ***
> PIG_HOME: /opt/pig/bin/..
> PIG_CONF_DIR: /op
14 matches
Mail list logo