Re: Multi-GroupBy-Insert optimization

2012-06-05 Thread Jan Dolinár
Hi Shan, If you happen to have a lot of repeated data (in the most general grouping), you might get some speedup by little pre-aggregation. The following code should produce the same results as the example in your first post: >From ( SELECT a, b , c, count(*) AS cnt FROM X group by a,b,c )

Re: Multi-GroupBy-Insert optimization

2012-06-04 Thread Jan Dolinár
On 6/4/12, shan s wrote: > Thanks for the explanation Jan. > If I understand correctly, the input will be read one single time and will > be preprocessed in some form, and this intermediate data is used for > subsequent group-by.. > Not sure if my scenario will help this single step, since group-

Re: Multi-GroupBy-Insert optimization

2012-06-04 Thread shan s
Thanks for the explanation Jan. If I understand correctly, the input will be read one single time and will be preprocessed in some form, and this intermediate data is used for subsequent group-by.. Not sure if my scenario will help this single step, since group-by varies across vast entities. If

Re: Multi-GroupBy-Insert optimization

2012-06-04 Thread Jan Dolinár
> On Fri, Jun 1, 2012 at 5:25 PM, shan s wrote: > >> I am using Multi-GroupBy-Insert. I was expecting a single map-reduce job >> which would club the group-bys together. >> However it is scheduling n jobs where n = number of group bys.. >> Could you please explain this behaviour. >> >> > No, it wi

Re: Multi-GroupBy-Insert optimization

2012-06-04 Thread shan s
Anyone? Thanks.. On Fri, Jun 1, 2012 at 5:25 PM, shan s wrote: > I am using Multi-GroupBy-Insert. I was expecting a single map-reduce job > which would club the group-bys together. > However it is scheduling n jobs where n = number of group bys.. > Could you please explain this behaviour. > > Fr