退订
At 2017-10-10 16:04:18, "ShaoFeng Shi" <shaofeng...@apache.org> wrote: >The "shard-by" column will be used to distribute the cube data to different >shards (each shard is a HBase region). Usually the "shard-by" column need >be a high cardinality column, using which can ensure the shards are in >similar size, like user_id, order_id etc. Usually partition column's >cardinality is not enough high, so not suggest to use it for this purpose. > >The default streaming parser in Kylin accepts JSON format. If you have >another format, need implement that by extending the StreamingParser class. > >2017-10-10 16:30 GMT+08:00 崔苗 <cuim...@danale.com>: > >> Thanks for your your suggestion,finally we changed the timestamp into date >> format by sql and it worked.<br/>some other questions:<br/>1、what's the >> meaning of 'shard by' column,is it proper to set partition column as the >> 'shard by' column?<br/>2、Is there limitations on data format in kafka when >> building streaming cubes?we succeed to build streaming a cube on the sample >> data supplied by kylin,but failed on our own data,it's avro format,not json >> format. >> 在 2017-10-10 14:47:30,ShaoFeng Shi <shaofeng...@apache.org> 写道: >> >Hi Miao, >> > >> >It doesn't understand your time format. You need to use the standard Date >> >format in Hive. Or you can implement your own logic, with the interface " >> >IPartitionConditionBuilder" >> > >> >2017-10-10 11:33 GMT+08:00 崔苗 <cuim...@danale.com>: >> > >> >> well,the timestamp column was bigint such as 1507547479434 in hive >> table, >> >> when I define the endtime to build the cube ,I found the timestamp >> >> 1507547479434 was converted to '20171009' and the log show that kylin >> >> loaded data from hive with condition "WHERE (USER_REG.REG_TIME < >> >> 20171009)",so the Intermediate Flat Hive Table was null. I want to know >> >> could kylin derive other time values like “year_start”, “day_start” from >> >> the bigint timestamp in hive as it does in kafka table? or we must >> change >> >> the bigint timestamp into data format such as "2017-10-09" in hive? >> >> At 2017-10-09 22:04:56, ShaoFeng Shi <shaofeng...@apache.org> wrote: >> >> >Hi Miao, >> >> > >> >> >What's the error as you said: "kylin failed to load data from hive >> >> tables"? >> >> > >> >> >In my opinion, it is not recommended to use timestamp as the partition >> >> >column, since it is too fine granularity. Usually, the cube is >> partitioned >> >> >by day/week/month; in some cases, it is by the hour; In streaming >> case, it >> >> >might partition by the minute; But no case by timestamp. I put some >> >> >comments about this in this document: >> >> >https://kylin.apache.org/docs21/tutorial/cube_streaming.html >> >> > >> >> >2017-10-09 14:27 GMT+08:00 崔苗 <cuim...@danale.com>: >> >> > >> >> >> Hi, >> >> >> we want to use tables in kafka as fact tables and tables in MySql as >> >> >> lookup tables,so we put all the tables into hive and want to join >> them >> >> as >> >> >> cubes. >> >> >> >> >> >> the time column in fact table was timestamp, so does kylin2.1 support >> >> >> timestamp for cube partition? >> >> >> I find this :https://issues.apache.org/jira/browse/KYLIN-633 , >> >> >> >> >> >> it seems kylin already supprt Timestamp for cube partition,but when >> we >> >> >> define timestamp as partition , kylin failed to load data from hive >> >> tables. >> >> >> >> >> >> >> >> >> thanks in advanced for your reply. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> > >> >> >-- >> >> >Best regards, >> >> > >> >> >Shaofeng Shi 史少锋 >> >> >> >> >> >> >> > >> > >> >-- >> >Best regards, >> > >> >Shaofeng Shi 史少锋 >> >> >> > > >-- >Best regards, > >Shaofeng Shi 史少锋