Re: Cubing in Pig

2011-07-14 Thread Gianmarco
If you want to add a new operator the right place to add the logic should be LogicalPlanBuilder. Just a question, are you sure this code is correct? I can't understand how it works. cubed = foreach rel generate flatten(CubeDimensions(a, b)); cube = foreach (group rel by $0) generate

Re: Cubing in Pig

2011-07-14 Thread Dmitriy Ryaboy
I think showing the data at every step will help. rel: (green,tall) (red,short) (green,short) cubed: (green,tall) (green,) (,tall) (,) (red,short) (red,) (,short) (,) (green,short) (green,) (,short) (,) cube: I did mess up typing the code in the email -- it should look more like this: cube =

Re: Cubing in Pig

2011-07-14 Thread Jonathan Coveney
Dmitry, a quick point on your approach... I assume that you meant to do, replacing rel with cubed? If you ran what you pasted, you don't actually make reference to the cubed that you output, which may have influenced run time. cubed = foreach rel generate flatten(CubeDimensions(a, b)); cube =

Re: Cubing in Pig

2011-07-14 Thread Dmitriy Ryaboy
Jon, I ran the right script, I just wrote out the wrong one in the email :-). I also compared results of both computations to ensure correctness. Arnab posted his slides: http://pdf.cx/44wrk My approach is the naive approach described in slides 11-17. D On Thu, Jul 14, 2011 at 11:54 AM,

Re: Cubing in Pig

2011-07-14 Thread Thejas Nair
+1 to what Gianmarco said about the place to do it. See sample_clause in LogicalPlanGenerator.g. I tried the expanded query (2 dimensions) with 0.8, it results only in 2 MR jobs, the 1st MR job has all the computation being done in a single MR job. The 2nd MR job just concats the outputs

Re: Cubing in Pig

2011-07-14 Thread Dmitriy Ryaboy
In the dw world, using a single table and using null as an all marker is the standard thing to do. In my udf I actually allow an optional string to be passed to the constructor to denote all if null is a valid value... I'll post the udf shortly, it's a prerequisite to LOCube. I suspect the

Re: Cubing in Pig

2011-07-14 Thread Thejas Nair
On 7/14/11 3:03 PM, Dmitriy Ryaboy wrote: In the dw world, using a single table and using null as an all marker is the standard thing to do But I imagine that in the dw world, the cube results would get stored in such a way that you can efficiently retrieve results of specific group-bys