Re: Hive footprint

Jörn Franke Wed, 20 Apr 2016 09:14:04 -0700

Hive has working indexes. However many people overlook that a block is usually 
much larger than in a relational database and thus do not use them right.


> On 19 Apr 2016, at 09:31, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> The issue is that Hive has indexes (not index store) but they don't work so 
> there we go. May be in later releases we can make use of these indexes for 
> faster queries. Hive allows even bitmap indexes on Fact table but they are 
> never used by COB.
> 
> show indexes on sales;
> 
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> |       idx_name        |       tab_name        |       col_names       |     
>           idx_tab_name               |       idx_type        | comment  |
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> | sales_cust_bix        | sales                 | cust_id               | 
> oraclehadoop__sales_sales_cust_bix__     | bitmap                |          |
> | sales_channel_bix     | sales                 | channel_id            | 
> oraclehadoop__sales_sales_channel_bix__  | bitmap                |          |
> | sales_prod_bix        | sales                 | prod_id               | 
> oraclehadoop__sales_sales_prod_bix__     | bitmap                |          |
> | sales_promo_bix       | sales                 | promo_id              | 
> oraclehadoop__sales_sales_promo_bix__    | bitmap                |          |
> | sales_time_bix        | sales                 | time_id               | 
> oraclehadoop__sales_sales_time_bix__     | bitmap                |          |
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 18 April 2016 at 23:51, Marcin Tustin <mtus...@handybook.com> wrote:
>> We use a hive with ORC setup now. Queries may take thousands of seconds with 
>> joins, and potentially tens of seconds with selects on very large tables. 
>> 
>> My understanding is that the goal of hbase is to provide much lower latency 
>> for queries. Obviously, this comes at the cost of not being able to perform 
>> joins. I don't actually use hbase, so I hesitate to say more about it. 
>> 
>>> On Mon, Apr 18, 2016 at 6:48 PM, Mich Talebzadeh 
>>> <mich.talebza...@gmail.com> wrote:
>>> Thanks Marcin.
>>> 
>>> What is the definition of low latency here? Are you referring to the 
>>> performance of SQL against HBase tables compared to Hive. As I understand 
>>> HBase is a columnar database. Would it be possible to use Hive against ORC 
>>> to achieve the same?
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> http://talebzadehmich.wordpress.com
>>>  
>>> 
>>>> On 18 April 2016 at 23:43, Marcin Tustin <mtus...@handybook.com> wrote:
>>>> HBase has a different use case - it's for low-latency querying of big 
>>>> tables. If you combined it with Hive, you might have something nice for 
>>>> certain queries, but I wouldn't think of them as direct competitors.
>>>> 
>>>>> On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh 
>>>>> <mich.talebza...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> I notice that Impala is rarely mentioned these days.  I may be missing 
>>>>> something. However, I gather it is coming to end now as I don't recall 
>>>>> many use cases for it (or customers asking for it). In contrast, Hive has 
>>>>> hold its ground with the new addition of Spark and Tez as execution 
>>>>> engines, support for ACID and ORC and new stuff in Hive 2. In addition 
>>>>> provided a good choice for its metastore it scales well.
>>>>> 
>>>>> If Hive had the ability (organic) to have local variable and stored 
>>>>> procedure support then it would be top notch Data Warehouse. Given its 
>>>>> metastore, I don't see any technical reason why it cannot support these 
>>>>> constructs.
>>>>> 
>>>>> I was recently asked to comment on migration from commercial DWs to Big 
>>>>> Data (primarily for TCO reason) and really could not recall any better 
>>>>> candidate than Hive. Is HBase a viable alternative? Obviously whatever 
>>>>> one decides there is still HDFS, a good engine for Hive (sounds like many 
>>>>> prefer TEZ although I am a Spark fan) and the ubiquitous YARN.
>>>>> 
>>>>> Let me know your thoughts.
>>>>> 
>>>>> 
>>>>> Dr Mich Talebzadeh
>>>>>  
>>>>> LinkedIn  
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>  
>>>>> http://talebzadehmich.wordpress.com
>>>> 
>>>> 
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> Latest news at Handy
>>>> Handy just raised $50m led by Fidelity
>>>> 
>>>> 
>> 
>> 
>> Want to work at Handy? Check out our culture deck and open roles
>> Latest news at Handy
>> Handy just raised $50m led by Fidelity
>> 
>> 
>

Re: Hive footprint

Reply via email to