On using embedded Pig Server and registering a pig script for execution
1) Does Multi Query Optimization happens automatically, or has to
explicitly told so.
2) Logical Plan. What one can infer out of it.
3) Does the Block Size (defined in hadoop) has an effect on performance
or the number of
In addendum
How can I play with Logical Plan?
Rohan Rai wrote:
On using embedded Pig Server and registering a pig script for execution
1) Does Multi Query Optimization happens automatically, or has to
explicitly told so.
2) Logical Plan. What one can infer out of it.
3) Does the Block Size
1) Automatically, if you call it right. Look for the setBatchOn and
executeBatch methods (I may be slightly off on the method names, going off
memory)
2) The optimizer moves stuff around and may be executing things in a
slightly different order then what you tell it. This can mean pushing up
Thats what makes it confusing
If you see the the parameter getting passed is true which is sameBatch
on which it should ideally not call setBatchOn
if (!mInteractive !sameBatch) {
setBatchOn();
}
Dmitriy Ryaboy wrote:
Looks like it's on automatically.
Code below is from
Thanks to Gerrit and Bill who responded.
Unfortunately they said the exact opposite thing so we are still at an
impasse :-). Anyone else care to venture an opinion?
Cause if Alan and I have a commiter fight, he'll win and y'all will have to
live with unordered split results :)
-D
On Mon, Mar 1,
On Mar 4, 2010, at 10:19 AM, Dmitriy Ryaboy wrote:
Thanks to Gerrit and Bill who responded.
Unfortunately they said the exact opposite thing so we are still at an
impasse :-). Anyone else care to venture an opinion?
Cause if Alan and I have a commiter fight, he'll win and y'all will
have to
Hi,
Does anyone have experience running MultiStorage-like UDF on Elastic
MapReduce? Basically we are trying to store output into multiple
directories based on certain field values. We have some success
writing UDF that extends MultiStorage in piggybank to write to HDFS,
but we couldn't get the
Hi,
I've forgot to respond but what I was thinking is that if there is a need to
have a function that splits a string and returns a tuple, and another that
returns a bag, so if tokenize returns a bag then yes I agree with Bill that
split should return a tuple.
Am I making sense? :)
-
Amazons extension allows one to write to/read from both s3 or hdfs,
whereas the last time I checked the non amazon version only allows one
to do either or but not both. The MultiStorage in the regular piggy
bank is not written to support the multiple file systems - which would
be my guess
Actually I think the evaluation is correct
Overriding the method to pass FALSE as parameter, enhances the experiences
Secondly as in
org.apache.pig.Main
grunt.exec(); is called
which in turn calls
parser.parseStopOnError();
which calls
parseStopOnError(false);
Regards
Rohan
Dmitriy Ryaboy
I just installed hadoop and pig yesterday on an ubuntu Jaunty box.
Hadoop 0.18.3-6cloudera0.3.0
Apache Pig version 0.6.0 (r910629)
I have hadoop services running and can copy files to the hdfs of
hadoop and ran the test for computing PI.
The problem I'm having is getting pig to recognize my
11 matches
Mail list logo