Re: Multi-group-by select always scans entire table

2012-06-07 Thread Jan Dolinár
Thank you very much Mark for your investigation and explanations. I'm well aware of the fact that hadoop 0.7.1 is quite an old code and that newer version might perform better - that is the main reason I discussed it here instead of reporting it as a bug. For now it doesn't bother me, as I have th

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Mark Grover
Hi Jan, I did some testing for this on Apache Hive 0.9 and I have boiled it down the following: Predicate pushdown seems to work for single-insert queries using LATERAL VIEW. It also seems to work for multi-insert queries NOT using LATERAL VIEW. However, it doesn't work for multi-insert queries usi

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Jan Dolinár
On 6/7/12, Mark Grover wrote: > Can you please check if predicate push down enabled changes the explain > plan on a simple inner join query like: > > select a.* from a inner join b on(a.key=b.key) where a.some_col=blah; No problem, I ran following as you suggested (INNER JOIN didn't work for me,

Re: Multi-group-by select always scans entire table

2012-06-07 Thread Mark Grover
Hi Jan, Thanks for the analysis. Yes, it's true that optimize ppd will push predicates to be evaluated earlier. The only catch there is that predicates cannot be pushed across constructs that change the data in the query. An example of this is having a predicate (say of the form 'where Col is not N

Re: Multi-group-by select always scans entire table

2012-06-06 Thread Jan Dolinár
Hi Mark, Thanks for all your help. I tried to run a series of test with various settings of hive.optimize.ppd and various queries ( see it here http://pastebin.com/E89p9Ubx ) and now I'm even more confused than before. In all cases, regardless if the WHERE clause asks about partitioned or regular

Re: Multi-group-by select always scans entire table

2012-06-05 Thread Mark Grover
Hi Jan, The quick answer is I don't know but may be someone else on the mailing list does:-) Looking at the wiki page for Lateral view( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView), there was a problem related to predicate pushdown on UDTF's ( https://issues.apache.

Re: Multi-group-by select always scans entire table

2012-06-04 Thread Jan Dolinár
On Mon, Jun 4, 2012 at 7:20 PM, Mark Grover wrote: > Hi Jan, > Glad you found something workable. > > What version of Hive are you using? Could you also please check what the > value of the property hive.optimize.ppd is for you? > > Thanks, > Mark > > Hi Mark, Thanks for reply. I'm using hive 0.

Re: Multi-group-by select always scans entire table

2012-06-04 Thread Mark Grover
1:57:25 AM Subject: Re: Multi-group-by select always scans entire table On Fri, May 25, 2012 at 12:03 PM, Jan Dolinár < dolik@gmail.com > wrote: -- see what happens when you try to perform multi-group-by query on one of the partitions EXPLAIN EXTENDED FROM partition_test LATERAL V

Re: Multi-group-by select always scans entire table

2012-05-28 Thread Jan Dolinár
On Fri, May 25, 2012 at 12:03 PM, Jan Dolinár wrote: > > -- see what happens when you try to perform multi-group-by query on one of > the partitions > EXPLAIN EXTENDED > FROM partition_test > LATERAL VIEW explode(col1) tmp AS exp_col1 > INSERT OVERWRITE DIRECTORY '/test/1' > SELECT exp_col1 >

Multi-group-by select always scans entire table

2012-05-25 Thread Jan Dolinár
Hello, I've encountered a weird issue with hive and I'm not sure if I'm doing something wrong or if it is a bug. I'm trying to do a multi-group-by select statement on a partitioned table. I wan't only data from one partition, therefore all the WHERE statements are exactly the same and contain only