Re: Hive being slow

Ashutosh Chauhan Thu, 15 Jan 2015 09:14:10 -0800

which hive version you are using ?

On Thu, Jan 15, 2015 at 12:44 AM, Abhishek kumar <abhishekiit...@gmail.com>
wrote:


> Hi,
>
> Thanks for the reply.
>
> I tried that, but no luck. The map-reduce seems to be stuck (taking a lot
> of time, just for 65 lakhs of Hbase rows). I am attaching the log file (or
> http://pastebin.com/BUYDUiEu)
>
> My only question is why the filter push-down for row-key (*startKey* and
> *stopKey* for the *Scanner*) is not happening to Hbase. If the push-down
> happens, then Hbase will resolve this Scanner very fast and no matter MR
> job runs or not, the query resolution will be very fast.
>
> --
> Abhishek
>
> On Thu, Jan 15, 2015 at 1:59 AM, Ashutosh Chauhan <hashut...@apache.org>
> wrote:
>
>> Can you run your query with following config:
>>
>> hive> set hive.fetch.task.conversion=none;
>>
>> and run your two queries with this. Lets see if this makes a difference.
>> My expectation is this will result in MR job getting launched and thus
>> runtimes might be different.
>>
>> On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <abhishekiit...@gmail.com
>> > wrote:
>>
>>> First I tried running the query: select * from table1 where id =
>>> 'value';
>>> It was very fast, as expected since Hbase replied the results very fast.
>>> In this case, I observed no map/reduce task getting spawned.
>>>
>>> Now, for the query, select * from table1 where id > 'zzz', I expected
>>> the filter push down to happen (at least the 0.14 code says). And since,
>>> there were no results found, so Hbase will again reply very fast and thus
>>> hive should output the query's result very fast. But, this is not
>>> happening, and from the logs of datanode, it looks like a lot of reads are
>>> happening (close to full table scan of 10GBs of data). I expected the
>>> response time to be very close to the above query's time.
>>>
>>> I will check about the number of task getting launched.
>>>
>>> My questions are:
>>> * Why there was no any filter pushdown (id > 'zzz') happening for this
>>> very simple query.
>>> * Since this query can only be resolved from HBase, will Hive launch map
>>> tasks (last time, I guess I observed no map task getting launched)
>>>
>>> --
>>> Abhishek
>>>
>>> On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <hashut...@apache.org>
>>> wrote:
>>>
>>>> Hi Abhishek,
>>>>
>>>> How are you determining its resulting in full table scan? One way to
>>>> ascertain that filter got pushed down is to see how many tasks were
>>>> launched for your query, with and without filter. One would expect lower #
>>>> of splits (and thus tasks) for query having filter.
>>>>
>>>> Thanks,
>>>> Ashutosh
>>>>
>>>> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <
>>>> abhishekiit...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I
>>>>> am facing issues in terms of slowness when querying over Hbase. My query
>>>>> looks like following:
>>>>>
>>>>> select * from table1 where id > 'zzzz';  (id is the row-key)
>>>>>
>>>>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner
>>>>> as 'startKey'. Now given there are no such rows-keys (id) which satisfies
>>>>> this criteria, this query should be extremely fast. But hive is taking a
>>>>> lot of time, looks like full hbase table scan.
>>>>> Can someone let me know where am I wrong in understanding the whole
>>>>> thing?
>>>>>
>>>>> --
>>>>> Abhishek
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Hive being slow

Reply via email to