Hi, Thanks for the reply.
I tried that, but no luck. The map-reduce seems to be stuck (taking a lot of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or http://pastebin.com/BUYDUiEu) My only question is why the filter push-down for row-key (*startKey* and *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down happens, then Hbase will resolve this Scanner very fast and no matter MR job runs or not, the query resolution will be very fast. -- Abhishek On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <hashut...@apache.org> wrote: > Can you run your query with following config: > > hive> set hive.fetch.task.conversion=none; > > and run your two queries with this. Lets see if this makes a difference. > My expectation is this will result in MR job getting launched and thus > runtimes might be different. > > On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <abhishekiit...@gmail.com> > wrote: > >> First I tried running the query: select * from table1 where id = 'value'; >> It was very fast, as expected since Hbase replied the results very fast. >> In this case, I observed no map/reduce task getting spawned. >> >> Now, for the query, select * from table1 where id > 'zzz', I expected >> the filter push down to happen (at least the 0.14 code says). And since, >> there were no results found, so Hbase will again reply very fast and thus >> hive should output the query's result very fast. But, this is not >> happening, and from the logs of datanode, it looks like a lot of reads are >> happening (close to full table scan of 10GBs of data). I expected the >> response time to be very close to the above query's time. >> >> I will check about the number of task getting launched. >> >> My questions are: >> * Why there was no any filter pushdown (id > 'zzz') happening for this >> very simple query. >> * Since this query can only be resolved from HBase, will Hive launch map >> tasks (last time, I guess I observed no map task getting launched) >> >> -- >> Abhishek >> >> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashut...@apache.org> >> wrote: >> >>> Hi Abhishek, >>> >>> How are you determining its resulting in full table scan? One way to >>> ascertain that filter got pushed down is to see how many tasks were >>> launched for your query, with and without filter. One would expect lower # >>> of splits (and thus tasks) for query having filter. >>> >>> Thanks, >>> Ashutosh >>> >>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar < >>> abhishekiit...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I >>>> am facing issues in terms of slowness when querying over Hbase. My query >>>> looks like following: >>>> >>>> select * from table1 where id > 'zzzz'; (id is the row-key) >>>> >>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as >>>> 'startKey'. Now given there are no such rows-keys (id) which satisfies this >>>> criteria, this query should be extremely fast. But hive is taking a lot of >>>> time, looks like full hbase table scan. >>>> Can someone let me know where am I wrong in understanding the whole >>>> thing? >>>> >>>> -- >>>> Abhishek >>>> >>> >>> >> >
0: jdbc:hive2://localhost:10000> select * from events where id = 'some_id'; INFO : Number of reduce tasks is set to 0 since there's no reduce operator WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. INFO : number of splits:1 INFO : Submitting tokens for job: job_local1981153761_0018 INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 INFO : 2015-01-15 08:22:02,398 Stage-1 map = 0%, reduce = 0% +-------------------------------------------------------------------------------+----------------------------------------------------------+--+ | events.values | events.id | +-------------------------------------------------------------------------------+----------------------------------------------------------+--+ | {"eventName":"value","eventTs":"1417258870867","key2":"..."} | some_id | +-------------------------------------------------------------------------------+----------------------------------------------------------+--+ 1 row selected (15.882 seconds) INFO : 2015-01-15 08:22:03,522 Stage-1 map = 100%, reduce = 0% INFO : Ended Job = job_local1981153761_0018 0: jdbc:hive2://localhost:10000> 0: jdbc:hive2://localhost:10000> 0: jdbc:hive2://localhost:10000> select * from events where id > 'zzz' AND id < 'zzzz' limit 1; INFO : Number of reduce tasks is set to 0 since there's no reduce operator WARN : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. INFO : number of splits:3 INFO : Submitting tokens for job: job_local240730237_0019 INFO : The url to track the job: http://localhost:8080/ INFO : Job running in-process (local Hadoop) INFO : Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 INFO : 2015-01-15 08:22:24,817 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:23:24,829 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:24:25,625 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:25:25,770 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:26:26,559 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:27:26,573 Stage-1 map = 0%, reduce = 0% INFO : 2015-01-15 08:28:27,244 Stage-1 map = 0%, reduce = 0%