pig connecting to hadoop - Failed to create DataStorage

2010-03-04 Thread Landon Cox
I just installed hadoop and pig yesterday on an ubuntu Jaunty box. Hadoop 0.18.3-6cloudera0.3.0 Apache Pig version 0.6.0 (r910629) I have hadoop services running and can copy files to the hdfs of hadoop and ran the test for computing PI. The problem I'm having is getting pig to recognize my

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
Actually I think the evaluation is correct Overriding the method to pass FALSE as parameter, enhances the experiences Secondly as in org.apache.pig.Main grunt.exec(); is called which in turn calls parser.parseStopOnError(); which calls parseStopOnError(false); Regards Rohan Dmitriy Ryaboy wrote

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Dmitriy Ryaboy
This stuff is a bit convoluted, isn't it? I think you may be right (I never use registerScript). Try an experiment? On Thu, Mar 4, 2010 at 11:20 AM, Rohan Rai wrote: > In addition > > Even > org.apache.pig.tools.pigscript.parser.PigScriptParser (.jj) > seems to tell that its not running in batch

Re: MultiStorage-like UDF on Elastic MapReduce/S3

2010-03-04 Thread Jennie Cochran-Chinn
Amazons extension allows one to write to/read from both s3 or hdfs, whereas the last time I checked the non amazon version only allows one to do either or but not both. The MultiStorage in the regular piggy bank is not written to support the multiple file systems - which would be my guess

Re: User opinions needed

2010-03-04 Thread Gerrit van Vuuren
Hi, I've forgot to respond but what I was thinking is that if there is a need to have a function that splits a string and returns a tuple, and another that returns a bag, so if tokenize returns a bag then yes I agree with Bill that split should return a tuple. Am I making sense? :) - O

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
In addition Even org.apache.pig.tools.pigscript.parser.PigScriptParser (.jj) seems to tell that its not running in batch mode . Is the interpretation incorrect Regards Rohan Rohan Rai wrote: Thats what makes it confusing If you see the the parameter getting passed is true which is sameBatch o

MultiStorage-like UDF on Elastic MapReduce/S3

2010-03-04 Thread Jialong Wu
Hi, Does anyone have experience running MultiStorage-like UDF on Elastic MapReduce? Basically we are trying to store output into multiple directories based on certain field values. We have some success writing UDF that extends MultiStorage in piggybank to write to HDFS, but we couldn't get the sam

Re: User opinions needed

2010-03-04 Thread Dmitriy Ryaboy
I kid, I kid... On Thu, Mar 4, 2010 at 10:34 AM, Alan Gates wrote: > > On Mar 4, 2010, at 10:19 AM, Dmitriy Ryaboy wrote: > > Thanks to Gerrit and Bill who responded. >> Unfortunately they said the exact opposite thing so we are still at an >> impasse :-). Anyone else care to venture an opini

Re: User opinions needed

2010-03-04 Thread Alan Gates
On Mar 4, 2010, at 10:19 AM, Dmitriy Ryaboy wrote: Thanks to Gerrit and Bill who responded. Unfortunately they said the exact opposite thing so we are still at an impasse :-). Anyone else care to venture an opinion? Cause if Alan and I have a commiter fight, he'll win and y'all will have to

Re: User opinions needed

2010-03-04 Thread Dmitriy Ryaboy
Thanks to Gerrit and Bill who responded. Unfortunately they said the exact opposite thing so we are still at an impasse :-). Anyone else care to venture an opinion? Cause if Alan and I have a commiter fight, he'll win and y'all will have to live with unordered split results :) -D On Mon, Mar 1,

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
Thats what makes it confusing If you see the the parameter getting passed is true which is sameBatch on which it should ideally not call setBatchOn if (!mInteractive && !sameBatch) { setBatchOn(); } Dmitriy Ryaboy wrote: Looks like it's on automatically. Code below is from

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Dmitriy Ryaboy
Looks like it's on automatically. Code below is from trunk, but I don't think this changed recently. I got rid of exception handling for conciseness. In PigServer: public void registerScript(String fileName) throws IOException { GruntParser grunt = new GruntParser(new FileReader(

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
Thanks Dmitriy Just a question more registerScript allows to register a pig script in the embedded mode So the confusion was does it internally tries to optimize it. or setBatchOn has to be explicitly called Regards Rohan Dmitriy Ryaboy wrote: 1) Automatically, if you call it right. Look for

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Dmitriy Ryaboy
1) Automatically, if you call it right. Look for the setBatchOn and executeBatch methods (I may be slightly off on the method names, going off memory) 2) The optimizer moves stuff around and may be executing things in a slightly different order then what you tell it. This can mean pushing up proj

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
In addendum How can I play with Logical Plan? Rohan Rai wrote: On using embedded Pig Server and registering a pig script for execution 1) Does Multi Query Optimization happens automatically, or has to explicitly told so. 2) Logical Plan. What one can infer out of it. 3) Does the Block Size (d

Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
On using embedded Pig Server and registering a pig script for execution 1) Does Multi Query Optimization happens automatically, or has to explicitly told so. 2) Logical Plan. What one can infer out of it. 3) Does the Block Size (defined in hadoop) has an effect on performance or the number of m