Re: New document: "How to optimize cube build"

ShaoFeng Shi Wed, 25 Jan 2017 19:35:37 -0800

Hi Alberto,

Thanks for your comments! In many cases the data is imported to Hadoop in
T+1 mode. Especially when everyday's data is tens of GB, it is
reasonable to partition the Hive table by date. The problem is whether it
worth to keep a long history data in Hive; Usually user only keep a couple
monthes' data in Hive; If the partition number exceeds the threshold in
Hive, he/she can remove the oldest partitions or move to another table
easily; That is a common practice of Hive I think, and it is very good to
know that Hive 2.0 will solve this.


2017-01-25 17:10 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:

> Be careful about partition by "FLIGHTDATE"
>
> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
>
> *"Option 1: Use id_date as partition column on Hive table. This have a big
> problem: the Hive metastore is meant for few hundred of partitions not
> thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"
>
> In Hive 2.0 will be a preview (only for testing) to solve this
>
> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:
>
>> Hello,
>>
>> A new document is added for the practices of cube build. Any suggestion
>> or comment is welcomed. We can update the doc later with feedbacks;
>>
>> Here is the link:
>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: New document: "How to optimize cube build"

Reply via email to