Re: PigServer not connecting to HDFS?

2010-10-27 Thread Harsh J
Pig needs to know where your HDFS is, doesn't it? :) http://pig.apache.org/docs/r0.7.0/setup.html#Embedded+Programs details on what needs to be set for embedded programs to use Pig. Specifically, the $HADOOPDIR part. You could also put the conf files into the classpath as Jeff pointed :) On Thu,

Re: Using data with Zebra

2010-10-27 Thread Renato Marroquín Mogrovejo
Thanks for the pointers Yan! Renato M. 2010/10/27 Yan Zhou > If you can not change your input data generation process to generate > input directly in Zebra, I can’t see any alternative than two sets of data. > > > > Regarding generating Zebra data, Pig is simpler than raw map/reduce and the >

Re: PigServer not connecting to HDFS?

2010-10-27 Thread Jeff Zhang
You need to put hadoop conf file under class path, otherwise you will always connect local file system On Wed, Oct 27, 2010 at 4:25 PM, Zach Bailey wrote: > >                        Hi all,Facing a weird problem and wondering if anyone > has run into this before. I've been playing with PigServ

RE: Name of the Job in PIG

2010-10-27 Thread Olga Natkovich
It should already work on trunk and 0.8 branch. Olga -Original Message- From: rakesh kothari [mailto:rkothari_...@hotmail.com] Sent: Wednesday, October 27, 2010 5:03 PM To: user@pig.apache.org; pig-u...@hadoop.apache.org Subject: RE: Name of the Job in PIG >>In Pig 0.8 to be released

RE: Name of the Job in PIG

2010-10-27 Thread rakesh kothari
>>In Pig 0.8 to be released soon, Pig will print mapping between >>aliases/operators and JobId. Is this feature already implemented. I ran the latest PIG code from SVN but it doesn't print out the mappings. Do I need to do anything special ? When is 0.8 slated for release ? Thanks, -Rakesh

PigServer not connecting to HDFS?

2010-10-27 Thread Zach Bailey
Hi all,Facing a weird problem and wondering if anyone has run into this before. I've been playing with PigServer to programmatically run some simple pig scripts and it does not seem to be connecting to HDFS when I pass in ExecType.MAPREDUCE.I am running in pseudo-distrib

Re: UDF Loader - one line in input result in multiple tuples

2010-10-27 Thread Alan Gates
The easiest way to do this might be to have your loader return a single tuple that contains bag, with all of the tuples you want to return in that bag. Then your next statement can be a foreach with a flatten to turn each of those into its own record. A = load 'foo' as (b:bag{}); B = forea

UDF Loader - one line in input result in multiple tuples

2010-10-27 Thread John Hui
Hi Pig Users, I am currently writing a UDF loader. In one of my use case, one line in the input stream results in multiple tuples. Has anyone encounter or solve this issue on their end. The current structure of the code getNext method only return tuple but I want it to return a List. Let me kn

RE: Name of the Job in PIG

2010-10-27 Thread Olga Natkovich
If you give a name via set command, the name will be associated with all jobs within that script. In Pig 0.8 to be released soon, Pig will print mapping between aliases/operators and JobId. Olga -Original Message- From: rakesh kothari [mailto:rkothari_...@hotmail.com] Sent: Wednesday

Name of the Job in PIG

2010-10-27 Thread rakesh kothari
Hi, What's the best way to diagnose which M/R step PIG is executing ? I was hoping if name of the PIG job can have some relationship with the operator it is executing. It gets difficult to diagnose what step is running without it. Thanks, -Rakesh

RE: Joins with OR condition

2010-10-27 Thread rakesh kothari
Yes. I did that as well. Thanks, -Rakesh From: te...@yahoo-inc.com To: user@pig.apache.org; dvrya...@gmail.com; rkothari_...@hotmail.com CC: pig-u...@hadoop.apache.org Date: Wed, 27 Oct 2010 10:43:27 -0700 Subject: Re: Joins with OR condition Message body I don’t understand the solution p

Re: Joins with OR condition

2010-10-27 Thread Thejas M Nair
I don't understand the solution proposed by Dmitriy using 3 joins. But it can be done using two joins and a union, as follows - J1 = join A by prop1, B by prop1; J2 = join A by prop2, B by prop2; -- this filters prevents joined rows where both prop1, prop2 match from being counted twice J2_fil

Re: loading from HBase - Pig 0.7

2010-10-27 Thread Anze
Thanks, I guess I would trip over that later on - but for this immediate problem it doesn't help (of course, because Pig fails at the start, when I'm not working with HBase yet). I have tracked the error message to HBaseStorage.init() and added some debugging info: - public void init()

Re: loading from HBase - Pig 0.7

2010-10-27 Thread Dmitriy Ryaboy
The same way you have /etc/hadoop/conf on the claspath, you want to put the hbase conf directory on the classpath. -D On Tue, Oct 26, 2010 at 11:50 PM, Anze wrote: > >> ... You have all the conf files in PIG_CLASSPATH right? > > I think I do: > *** > PIG_HOME: /opt/pig/bin/.. > PIG_CONF_DIR: /op