Hi Alberto, Thanks for your comments! In many cases the data is imported to Hadoop in T+1 mode. Especially when everyday's data is tens of GB, it is reasonable to partition the Hive table by date. The problem is whether it worth to keep a long history data in Hive; Usually user only keep a couple monthes' data in Hive; If the partition number exceeds the threshold in Hive, he/she can remove the oldest partitions or move to another table easily; That is a common practice of Hive I think, and it is very good to know that Hive 2.0 will solve this.
2017-01-25 17:10 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>: > Be careful about partition by "FLIGHTDATE" > > From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance > > *"Option 1: Use id_date as partition column on Hive table. This have a big > problem: the Hive metastore is meant for few hundred of partitions not > thousand (Hive 9452 there is an idea to solve this isn’t in progress)*" > > In Hive 2.0 will be a preview (only for testing) to solve this > > 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>: > >> Hello, >> >> A new document is added for the practices of cube build. Any suggestion >> or comment is welcomed. We can update the doc later with feedbacks; >> >> Here is the link: >> https://kylin.apache.org/docs16/howto/howto_optimize_build.html >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋