Re: How useful are tools for Hive data modeling

Mich Talebzadeh Wed, 11 Nov 2020 12:00:05 -0800

Many thanks Austin.

The challenge I have been told is how to effectively query a subset of data
avoiding full table scan. The tables I believe are parquet.


I know performance in Hive is not that great, so anything that could help
would be great.

Cheers,



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 11 Nov 2020 at 19:32, Austin Hackett <[email protected]> wrote:

> Hi Mich
>
> Hive also has non-validated primary key, foreign key etc constraints.
> Whilst I’m not too familiar with the modelling tools you mention, perhaps
> they’re able to use these for generating SQL etc?
>
> ORC files have indexes (min, max, bloom filters) - not particularly
> relevant to the data modelling tools question, but mentioning it for
> completeness…
>
> Thanks
>
> Austin
>
>
> On 11 Nov 2020, at 17:14, Mich Talebzadeh <[email protected]>
> wrote:
>
> Many thanks Peter.
>
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 11 Nov 2020 at 16:58, Peter Vary <[email protected]> wrote:
>
>> Hi Mich,
>>
>> Index support was removed from hive:
>>
>>    - https://issues.apache.org/jira/browse/HIVE-21968
>>    - https://issues.apache.org/jira/browse/HIVE-18715
>>
>>
>> Thanks,
>> Peter
>>
>> On Nov 11, 2020, at 17:25, Mich Talebzadeh <[email protected]>
>> wrote:
>>
>> Hi all,
>>
>> I wrote these notes earlier this year.
>>
>> I heard today that someone mentioned Hive 1 does not support indexes but
>> hive 2 does.
>>
>> I still believe that Hive does not support indexing as per below. Has
>> this been changed?
>>
>> Regards,
>>
>> Mich
>>
>> ---------- Forwarded message ---------
>> From: Mich Talebzadeh <[email protected]>
>> Date: Thu, 2 Apr 2020 at 12:17
>> Subject: How useful are tools for Hive data modeling
>> To: user <[email protected]>
>>
>>
>> Hi,
>>
>> Fundamentally Hive tables have structure and support provided by desc
>> formatted <TABLE> and show partitions <TABLE>.
>>
>> Hive does not support indexes in real HQL operations (I stand corrected).
>> So what we have are tables, partitions and clustering (AKA hash
>> partitioning).
>>
>> Hive does not support indexes because Hadoop lacks blocks locality
>> necessary for indexes. So If I use a tool like Collibra, Ab-intio etc what
>> advantage(s) one is going to gain on top a simple sell scrip to get table
>> and partition definitions?
>>
>> Thanks,
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>

Re: How useful are tools for Hive data modeling

Reply via email to