0.14.0 -- Abhishek
On Thu, Jan 15, 2015 at 10:43 PM, Ashutosh Chauhan <hashut...@apache.org> wrote: > which hive version you are using ? > > On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar <abhishekiit...@gmail.com > > wrote: > >> Hi, >> >> Thanks for the reply. >> >> I tried that, but no luck. The map-reduce seems to be stuck (taking a lot >> of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or >> http://pastebin.com/BUYDUiEu) >> >> My only question is why the filter push-down for row-key (*startKey* and >> *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down >> happens, then Hbase will resolve this Scanner very fast and no matter MR >> job runs or not, the query resolution will be very fast. >> >> -- >> Abhishek >> >> On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <hashut...@apache.org> >> wrote: >> >>> Can you run your query with following config: >>> >>> hive> set hive.fetch.task.conversion=none; >>> >>> and run your two queries with this. Lets see if this makes a difference. >>> My expectation is this will result in MR job getting launched and thus >>> runtimes might be different. >>> >>> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar < >>> abhishekiit...@gmail.com> wrote: >>> >>>> First I tried running the query: select * from table1 where id = >>>> 'value'; >>>> It was very fast, as expected since Hbase replied the results very >>>> fast. In this case, I observed no map/reduce task getting spawned. >>>> >>>> Now, for the query, select * from table1 where id > 'zzz', I expected >>>> the filter push down to happen (at least the 0.14 code says). And since, >>>> there were no results found, so Hbase will again reply very fast and thus >>>> hive should output the query's result very fast. But, this is not >>>> happening, and from the logs of datanode, it looks like a lot of reads are >>>> happening (close to full table scan of 10GBs of data). I expected the >>>> response time to be very close to the above query's time. >>>> >>>> I will check about the number of task getting launched. >>>> >>>> My questions are: >>>> * Why there was no any filter pushdown (id > 'zzz') happening for this >>>> very simple query. >>>> * Since this query can only be resolved from HBase, will Hive launch >>>> map tasks (last time, I guess I observed no map task getting launched) >>>> >>>> -- >>>> Abhishek >>>> >>>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashut...@apache.org >>>> > wrote: >>>> >>>>> Hi Abhishek, >>>>> >>>>> How are you determining its resulting in full table scan? One way to >>>>> ascertain that filter got pushed down is to see how many tasks were >>>>> launched for your query, with and without filter. One would expect lower # >>>>> of splits (and thus tasks) for query having filter. >>>>> >>>>> Thanks, >>>>> Ashutosh >>>>> >>>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar < >>>>> abhishekiit...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I >>>>>> am facing issues in terms of slowness when querying over Hbase. My query >>>>>> looks like following: >>>>>> >>>>>> select * from table1 where id > 'zzzz'; (id is the row-key) >>>>>> >>>>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner >>>>>> as 'startKey'. Now given there are no such rows-keys (id) which satisfies >>>>>> this criteria, this query should be extremely fast. But hive is taking a >>>>>> lot of time, looks like full hbase table scan. >>>>>> Can someone let me know where am I wrong in understanding the whole >>>>>> thing? >>>>>> >>>>>> -- >>>>>> Abhishek >>>>>> >>>>> >>>>> >>>> >>> >> >