On 6/28/10 5:51 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
I have a feeling that propagating schemas when known, and using them to for
(de)serialization instead of reflecting every field, would also be a big
win.
Thoughts on just using Avro for the internal PigStorage?
When I
I have created a wiki which puts together some ideas that can help in
improving performance by avoiding/delaying serialization/de-serialization .
http://wiki.apache.org/pig/AvoidingSedes
These are ideas that don't involve changes to optimizer. Most of them
involve changes in the load/store
I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as a
sub-project of hadoop.
-Thejas
On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
Over time, Pig is increasing its coupling to Hadoop
will have access
to the InputFormat instance, correct? Can it not call
InputFormat.getNext the desired number of times (which will not parse
the tuple) and then call LoadFunc.getNext to get the next parsed tuple?
Alan.
On Nov 3, 2009, at 4:28 PM, Thejas Nair wrote:
In the new
In the new implementation of SampleLoader subclasses (used by order-by,
skew-join ..) as part of the loader redesign, we are not only reading all
the records input but also parsing them as pig tuples.
This is because the SampleLoaders are wrappers around the actual input
loaders specified in the
I could not find any documentation (in piglatin manual) on what the
definition of equality of bags is (or what it should be), does the order of
tuples in the bag matter ? But the definition of a bag does not imply any
ordering.
This has implication on the definition of join/cogroup/group on bags.
fix it, I am not filing a jira.
-Thejas
On 11/2/09 9:19 AM, Thejas Nair te...@yahoo-inc.com wrote:
I could not find any documentation (in piglatin manual) on what the
definition of equality of bags is (or what it should be), does the order of
tuples in the bag matter ? But the definition
Jflex is covered by GPL, but code generated by it is not. Only the code that
is generated by Jflex goes into pig.jar.
We can't checkin Jflex.jar into svn, ivy will be setup to download it from
maven repository.
-Thejas
On 8/25/09 11:57 AM, Dmitriy Ryaboy dvrya...@cloudera.com wrote:
Santosh,
I think we are creating unnecessary bureaucratic hurdles here by preventing
contrib project from having a branch. I don't see why zebra has to use pig
release branch, as the new pig release does not include it.
The decisions are supposed to help keeping things open, but this seems to be
forcing
This paper seems very relevant to the proposal -
Compiled Query Execution Engine using JVM
http://www2.computer.org/portal/web/csdl/doi/10.1109/ICDE.2006.40
From the abstract -
Our experimental results on the TPC-H data set show that, despite both
engines benefiting from JIT, the compiled engine
without it? IIRC the OperatorKey includes an
operator number. When looking at the explain plans this is useful for
cases where there is more than one of a given type of operator and you
want to be able to distinguish between them.
Alan.
On Mar 6, 2009, at 3:14 PM, Thejas Nair wrote:
What
What is the purpose of scope string in org.apache.pig.impl.plan.OperatorKey
?Is it meant to be used if we have a pig deamon process ?
Is it ok to stop printing the scope part in explain output? It does not seem
to add value to it and makes the output more verbose.
Thanks,
Thejas
12 matches
Mail list logo