which hive version you are using ? On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar <abhishekiit...@gmail.com> wrote:
> Hi, > > Thanks for the reply. > > I tried that, but no luck. The map-reduce seems to be stuck (taking a lot > of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or > http://pastebin.com/BUYDUiEu) > > My only question is why the filter push-down for row-key (*startKey* and > *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down > happens, then Hbase will resolve this Scanner very fast and no matter MR > job runs or not, the query resolution will be very fast. > > -- > Abhishek > > On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <hashut...@apache.org> > wrote: > >> Can you run your query with following config: >> >> hive> set hive.fetch.task.conversion=none; >> >> and run your two queries with this. Lets see if this makes a difference. >> My expectation is this will result in MR job getting launched and thus >> runtimes might be different. >> >> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <abhishekiit...@gmail.com >> > wrote: >> >>> First I tried running the query: select * from table1 where id = >>> 'value'; >>> It was very fast, as expected since Hbase replied the results very fast. >>> In this case, I observed no map/reduce task getting spawned. >>> >>> Now, for the query, select * from table1 where id > 'zzz', I expected >>> the filter push down to happen (at least the 0.14 code says). And since, >>> there were no results found, so Hbase will again reply very fast and thus >>> hive should output the query's result very fast. But, this is not >>> happening, and from the logs of datanode, it looks like a lot of reads are >>> happening (close to full table scan of 10GBs of data). I expected the >>> response time to be very close to the above query's time. >>> >>> I will check about the number of task getting launched. >>> >>> My questions are: >>> * Why there was no any filter pushdown (id > 'zzz') happening for this >>> very simple query. >>> * Since this query can only be resolved from HBase, will Hive launch map >>> tasks (last time, I guess I observed no map task getting launched) >>> >>> -- >>> Abhishek >>> >>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashut...@apache.org> >>> wrote: >>> >>>> Hi Abhishek, >>>> >>>> How are you determining its resulting in full table scan? One way to >>>> ascertain that filter got pushed down is to see how many tasks were >>>> launched for your query, with and without filter. One would expect lower # >>>> of splits (and thus tasks) for query having filter. >>>> >>>> Thanks, >>>> Ashutosh >>>> >>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar < >>>> abhishekiit...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I >>>>> am facing issues in terms of slowness when querying over Hbase. My query >>>>> looks like following: >>>>> >>>>> select * from table1 where id > 'zzzz'; (id is the row-key) >>>>> >>>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner >>>>> as 'startKey'. Now given there are no such rows-keys (id) which satisfies >>>>> this criteria, this query should be extremely fast. But hive is taking a >>>>> lot of time, looks like full hbase table scan. >>>>> Can someone let me know where am I wrong in understanding the whole >>>>> thing? >>>>> >>>>> -- >>>>> Abhishek >>>>> >>>> >>>> >>> >> >