Sounds like a YARN configuration problem
Parallelize is good :), not all Map / reduces are executed at same times
Check some configurations like:

   -

   yarn.nodemanager.resource.memory-mb per node
   -

   yarn.nodemanager.resource.cpu-vcores per node

This can help you to start:
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html

If your cluster is very small, put block size to 256 MB can be too big, you
can try with 128 MB

On 27 May 2017 at 08:49, jianhui.yi <jianhui...@zhiyoubao.com> wrote:

> My model have 7 tables,a cube have 15 dimensions, in the “Convert Cuboid
> Data to HFile” step to start too many maps and reduces(maps 500+,reduces
> 1.4k+),This step expend all resources of the small cluster.
>
> I set these parameters in the cluster:
>
> dfs.block.size=256M
>
> hive.exec.reducers.bytes.per.reducer=1073741824
>
> hive.merge.mapfiles=true
>
> hive.merge.mapredfiles=true
>
> hive.merge.size.per.task=256M
>
>
>
> kylin_hive_conf.xml this file uses the default settings
>
> Where can I turning performance optimization?
>
> Thanks.
>

Reply via email to