Mungeol, I did check the # of mappers and that did not change between the two queries but when I ran a count(*) query the total execution time reduced significantly for Query1 vs Query2. Also, the amount data the query reads does change when the where clause changes. I still can't explain why one is faster over the other.
Thanks, Appan On Thu, May 14, 2015 at 4:46 PM, Mungeol Heo <mungeol....@gmail.com> wrote: > Hi, Appan. > > you can just simply check the amount of data your query reads from the > table. or the number of the mapper for running that query. > then, you can know whether it filtering or scanning all table. > Of course, it is a lazy approach. but, you can give a try. > I think query 1 should work fine. because I am using a lot of that > kind of queries and it works fine for me. > > Thanks, > mungeol > > On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai > <appanhiv...@gmail.com> wrote: > > I agree with you Viral. I see the same behavior as well. We are on Hive > 0.13 > > for the cluster where I'm testing this. > > > > On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria <viral.baja...@gmail.com> > > wrote: > >> > >> Hi Appan, > >> > >> In my experience I have seen that Query 2 does not use partition pruning > >> because it's not a straight up filtering and involves using functions > (aka > >> UDFs). > >> > >> What version of Hive are you using ? > >> > >> Thanks, > >> Viral > >> > >> > >> > >> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai > >> <appanhiv...@gmail.com> wrote: > >>> > >>> Hi, > >>> > >>> I have a question on Hive Optimizer. I have a table with partition > >>> columns eg.,Sales partitioned by year, month, day. Assume that I have > two > >>> years worth of data on this table. I'm running two queries on this > table. > >>> > >>> Query 1: Select * from Sales where year=2015 and month = 5 and day > >>> between 1 and 7 > >>> > >>> Query 2: Select * from Sales where concat_ws('-',cast(year as > >>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as > string),2,'0')) > >>> between '2015-01-01' and '2015-01-07' > >>> > >>> When I ran Explain command on the above two queries I get a Filter > >>> operation for the 2nd Query and there is no Filter Operation for the > first > >>> query. > >>> > >>> My question is: Do both queries use the partitions or is it used only > in > >>> Query 1 and for Query 2 it will be a scan of all the data? > >>> > >>> Thanks for your help. > >>> > >>> Thanks, > >>> Appan > >> > >> > > >