Sounds like a YARN configuration problem Parallelize is good :), not all Map / reduces are executed at same times Check some configurations like:
- yarn.nodemanager.resource.memory-mb per node - yarn.nodemanager.resource.cpu-vcores per node This can help you to start: https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html If your cluster is very small, put block size to 256 MB can be too big, you can try with 128 MB On 27 May 2017 at 08:49, jianhui.yi <jianhui...@zhiyoubao.com> wrote: > My model have 7 tables,a cube have 15 dimensions, in the “Convert Cuboid > Data to HFile” step to start too many maps and reduces(maps 500+,reduces > 1.4k+),This step expend all resources of the small cluster. > > I set these parameters in the cluster: > > dfs.block.size=256M > > hive.exec.reducers.bytes.per.reducer=1073741824 > > hive.merge.mapfiles=true > > hive.merge.mapredfiles=true > > hive.merge.size.per.task=256M > > > > kylin_hive_conf.xml this file uses the default settings > > Where can I turning performance optimization? > > Thanks. >