How to increase split number for Fact distinct columns when using spark engine?

2018-11-04 Thread chenxi07)-
Hi, ALL: I’m using spark engine to build cube. Now I found the bottleneck of build time lies in the #3 Step Name: Extract Fact Table Distinct Columns. When I look into the spark application, I found there is only two splits regardless of how large the input sequence file is. I wonder how t

How to increase split number for Fact distinct columns when using spark engine?(picture added)

2018-11-04 Thread chenxi07)-
Hi, ALL: I’m using spark engine to build cube. Now I found the bottleneck of build time lies in the #3 Step Name: Extract Fact Table Distinct Columns. When I look into the spark application, I found there is only two splits regardless of how large the input sequence file is. I wonder how t

RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

2018-11-04 Thread chenxi07)-
Hi: I’m sorry the picture is dead again. I upload it as attachment this time -- Best regards, Xi Chen From: 陈熹(chenxi07)-技术产品中心 Sent: Monday, November 5, 2018 3:04 PM To: dev@kylin.apache.org Subject: How to increase split number for Fact distinct columns when using spark

RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

2018-11-04 Thread chenxi07)-
Fact distinct columns when using spark engine?(picture added) Please check this doc: https://kylin.apache.org/docs/howto/howto_optimize_build.html 陈熹(chenxi07)-技术产品中心 于2018年11月5日周一 下午3:25写道: > Hi: > >I’m sorry the picture is dead again. > >I upload it as

RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

2018-11-05 Thread chenxi07)-
ngine?(picture added) Hi Xi, The core is the same; By default, MR and Spark run a container/task for one file block. If we can split the data to more splits, will get more concurrent tasks: kylin.engine.mr.mapper-input-rows=500000 陈熹(chenxi07)-技术产品中心 于2018年11月5日周一 下午3:52写道: > Hi,

RE: How to increase split number for Fact distinct columns when using spark engine?(picture added)

2018-11-05 Thread chenxi07)-
: Support DrakosData Sent: Monday, November 5, 2018 4:01 PM To: dev@kylin.apache.org; 陈熹(chenxi07)-技术产品中心 Subject: Re: How to increase split number for Fact distinct columns when using spark engine?(picture added) Hi Xi Chen I think your refer to 'kylin.engine.spark.rdd-partition-c