If you want to add a new operator the right place to add the logic should be
LogicalPlanBuilder.
Just a question, are you sure this code is correct? I can't understand how
it works.
cubed = foreach rel generate flatten(CubeDimensions(a, b));
cube = foreach (group rel by $0) generate
I think showing the data at every step will help.
rel:
(green,tall)
(red,short)
(green,short)
cubed:
(green,tall)
(green,)
(,tall)
(,)
(red,short)
(red,)
(,short)
(,)
(green,short)
(green,)
(,short)
(,)
cube: I did mess up typing the code in the email -- it should look
more like this:
cube =
Dmitry, a quick point on your approach...
I assume that you meant to do, replacing rel with cubed? If you ran what you
pasted, you don't actually make reference to the cubed that you output,
which may have influenced run time.
cubed = foreach rel generate flatten(CubeDimensions(a, b));
cube =
Jon, I ran the right script, I just wrote out the wrong one in the email :-).
I also compared results of both computations to ensure correctness.
Arnab posted his slides: http://pdf.cx/44wrk
My approach is the naive approach described in slides 11-17.
D
On Thu, Jul 14, 2011 at 11:54 AM,
+1 to what Gianmarco said about the place to do it. See sample_clause
in LogicalPlanGenerator.g.
I tried the expanded query (2 dimensions) with 0.8, it results only in 2
MR jobs, the 1st MR job has all the computation being done in a single
MR job. The 2nd MR job just concats the outputs
In the dw world, using a single table and using null as an all marker is the
standard thing to do. In my udf I actually allow an optional string to be
passed to the constructor to denote all if null is a valid value... I'll post
the udf shortly, it's a prerequisite to LOCube.
I suspect the
On 7/14/11 3:03 PM, Dmitriy Ryaboy wrote:
In the dw world, using a single table and using null as an all marker is the
standard thing to do
But I imagine that in the dw world, the cube results would get stored in
such a way that you can efficiently retrieve results of specific
group-bys