Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
On using embedded Pig Server and registering a pig script for execution 1) Does Multi Query Optimization happens automatically, or has to explicitly told so. 2) Logical Plan. What one can infer out of it. 3) Does the Block Size (defined in hadoop) has an effect on performance or the number of

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
In addendum How can I play with Logical Plan? Rohan Rai wrote: On using embedded Pig Server and registering a pig script for execution 1) Does Multi Query Optimization happens automatically, or has to explicitly told so. 2) Logical Plan. What one can infer out of it. 3) Does the Block Size

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Dmitriy Ryaboy
1) Automatically, if you call it right. Look for the setBatchOn and executeBatch methods (I may be slightly off on the method names, going off memory) 2) The optimizer moves stuff around and may be executing things in a slightly different order then what you tell it. This can mean pushing up

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
Thats what makes it confusing If you see the the parameter getting passed is true which is sameBatch on which it should ideally not call setBatchOn if (!mInteractive !sameBatch) { setBatchOn(); } Dmitriy Ryaboy wrote: Looks like it's on automatically. Code below is from

Re: User opinions needed

2010-03-04 Thread Dmitriy Ryaboy
Thanks to Gerrit and Bill who responded. Unfortunately they said the exact opposite thing so we are still at an impasse :-). Anyone else care to venture an opinion? Cause if Alan and I have a commiter fight, he'll win and y'all will have to live with unordered split results :) -D On Mon, Mar 1,

Re: User opinions needed

2010-03-04 Thread Alan Gates
On Mar 4, 2010, at 10:19 AM, Dmitriy Ryaboy wrote: Thanks to Gerrit and Bill who responded. Unfortunately they said the exact opposite thing so we are still at an impasse :-). Anyone else care to venture an opinion? Cause if Alan and I have a commiter fight, he'll win and y'all will have to

MultiStorage-like UDF on Elastic MapReduce/S3

2010-03-04 Thread Jialong Wu
Hi, Does anyone have experience running MultiStorage-like UDF on Elastic MapReduce? Basically we are trying to store output into multiple directories based on certain field values. We have some success writing UDF that extends MultiStorage in piggybank to write to HDFS, but we couldn't get the

Re: User opinions needed

2010-03-04 Thread Gerrit van Vuuren
Hi, I've forgot to respond but what I was thinking is that if there is a need to have a function that splits a string and returns a tuple, and another that returns a bag, so if tokenize returns a bag then yes I agree with Bill that split should return a tuple. Am I making sense? :) -

Re: MultiStorage-like UDF on Elastic MapReduce/S3

2010-03-04 Thread Jennie Cochran-Chinn
Amazons extension allows one to write to/read from both s3 or hdfs, whereas the last time I checked the non amazon version only allows one to do either or but not both. The MultiStorage in the regular piggy bank is not written to support the multiple file systems - which would be my guess

Re: Embedded Pig + MQO + Logical Plan

2010-03-04 Thread Rohan Rai
Actually I think the evaluation is correct Overriding the method to pass FALSE as parameter, enhances the experiences Secondly as in org.apache.pig.Main grunt.exec(); is called which in turn calls parser.parseStopOnError(); which calls parseStopOnError(false); Regards Rohan Dmitriy Ryaboy

pig connecting to hadoop - Failed to create DataStorage

2010-03-04 Thread Landon Cox
I just installed hadoop and pig yesterday on an ubuntu Jaunty box. Hadoop 0.18.3-6cloudera0.3.0 Apache Pig version 0.6.0 (r910629) I have hadoop services running and can copy files to the hdfs of hadoop and ran the test for computing PI. The problem I'm having is getting pig to recognize my