Hi, ALL:
I’m using spark engine to build cube.
Now I found the bottleneck of build time lies in the #3 Step Name: Extract Fact
Table Distinct Columns.
When I look into the spark application, I found there is only two splits
regardless of how large the input sequence file is.
I wonder how t
Hi, ALL:
I’m using spark engine to build cube.
Now I found the bottleneck of build time lies in the #3 Step Name: Extract Fact
Table Distinct Columns.
When I look into the spark application, I found there is only two splits
regardless of how large the input sequence file is.
I wonder how t
Hi:
I’m sorry the picture is dead again.
I upload it as attachment this time
--
Best regards,
Xi Chen
From: 陈熹(chenxi07)-技术产品中心
Sent: Monday, November 5, 2018 3:04 PM
To: dev@kylin.apache.org
Subject: How to increase split number for Fact distinct columns when using
spark
Fact distinct columns when using
spark engine?(picture added)
Please check this doc:
https://kylin.apache.org/docs/howto/howto_optimize_build.html
陈熹(chenxi07)-技术产品中心 于2018年11月5日周一 下午3:25写道:
> Hi:
>
>I’m sorry the picture is dead again.
>
>I upload it as
ngine?(picture added)
Hi Xi,
The core is the same; By default, MR and Spark run a container/task for one
file block. If we can split the data to more splits, will get more concurrent
tasks:
kylin.engine.mr.mapper-input-rows=500000
陈熹(chenxi07)-技术产品中心 于2018年11月5日周一 下午3:52写道:
> Hi,
: Support DrakosData
Sent: Monday, November 5, 2018 4:01 PM
To: dev@kylin.apache.org; 陈熹(chenxi07)-技术产品中心
Subject: Re: How to increase split number for Fact distinct columns when using
spark engine?(picture added)
Hi Xi Chen
I think your refer to 'kylin.engine.spark.rdd-partition-c