Re:Re: Re: Re: use timestamp for cube partition

大数据开发工程师-付德彬 Mon, 23 Oct 2017 20:53:34 -0700

退订







At 2017-10-10 16:04:18, "ShaoFeng Shi" <shaofeng...@apache.org> wrote:
>The "shard-by" column will be used to distribute the cube data to different
>shards (each shard is a HBase region). Usually the "shard-by" column need
>be a high cardinality column, using which can ensure the shards are in
>similar size, like user_id, order_id etc. Usually partition column's
>cardinality is not enough high, so not suggest to use it for this purpose.
>
>The default streaming parser in Kylin accepts JSON format. If you have
>another format, need implement that by extending the StreamingParser class.
>
>2017-10-10 16:30 GMT+08:00 崔苗 <cuim...@danale.com>:
>
>> Thanks for your your suggestion,finally we changed the timestamp into date
>> format by sql and it worked.<br/>some other questions:<br/>1、what's the
>> meaning of  'shard by' column,is it  proper to set partition column as the
>> 'shard by' column?<br/>2、Is there limitations on data format in kafka when
>> building streaming cubes？we succeed to build streaming a cube on the sample
>> data supplied by kylin,but failed on our own data,it's avro format,not json
>> format.
>> 在 2017-10-10 14:47:30，ShaoFeng Shi <shaofeng...@apache.org> 写道：
>> >Hi Miao,
>> >
>> >It doesn't understand your time format. You need to use the standard Date
>> >format in Hive. Or you can implement your own logic, with the interface "
>> >IPartitionConditionBuilder"
>> >
>> >2017-10-10 11:33 GMT+08:00 崔苗 <cuim...@danale.com>:
>> >
>> >> well,the timestamp column was bigint such as 1507547479434 in hive
>> table,
>> >> when I define the endtime to build the cube ,I found the timestamp
>> >> 1507547479434 was converted to '20171009' and the log show that kylin
>> >> loaded data from hive with condition "WHERE (USER_REG.REG_TIME &lt;
>> >> 20171009)",so the Intermediate Flat Hive Table was null. I want to know
>> >> could kylin derive other time values like “year_start”, “day_start” from
>> >> the bigint timestamp in hive as it does in kafka table? or we must
>> change
>> >> the bigint timestamp into data format such as "2017-10-09" in hive?
>> >> At 2017-10-09 22:04:56, ShaoFeng Shi <shaofeng...@apache.org> wrote:
>> >> >Hi Miao,
>> >> >
>> >> >What's the error as you said: "kylin failed to load data from hive
>> >> tables"?
>> >> >
>> >> >In my opinion, it is not recommended to use timestamp as the partition
>> >> >column, since it is too fine granularity. Usually, the cube is
>> partitioned
>> >> >by day/week/month; in some cases, it is by the hour; In streaming
>> case, it
>> >> >might partition by the minute; But no case by timestamp. I put some
>> >> >comments about this in this document:
>> >> >https://kylin.apache.org/docs21/tutorial/cube_streaming.html
>> >> >
>> >> >2017-10-09 14:27 GMT+08:00 崔苗 <cuim...@danale.com>:
>> >> >
>> >> >> Hi,
>> >> >> we want to use tables in kafka as fact tables and tables in MySql as
>> >> >> lookup tables,so we put all the tables into hive and want to join
>> them
>> >> as
>> >> >> cubes.
>> >> >>
>> >> >> the time column in fact table was timestamp, so does kylin2.1 support
>> >> >> timestamp for cube partition?
>> >> >> I find this :https://issues.apache.org/jira/browse/KYLIN-633 ,
>> >> >>
>> >> >> it seems kylin already supprt Timestamp for cube partition,but when
>> we
>> >> >> define timestamp as partition , kylin failed to load data from hive
>> >> tables.
>> >> >>
>> >> >>
>> >> >> thanks in advanced for your reply.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >--
>> >> >Best regards,
>> >> >
>> >> >Shaofeng Shi 史少锋
>> >>
>> >>
>> >>
>> >
>> >
>> >--
>> >Best regards,
>> >
>> >Shaofeng Shi 史少锋
>>
>>
>>
>
>
>-- 
>Best regards,
>
>Shaofeng Shi 史少锋

Re:Re: Re: Re: use timestamp for cube partition

Reply via email to