You need to use `spark.sql.shuffle.partitions`. // maropu
On Fri, May 20, 2016 at 8:17 PM, εδΉι <251922...@qq.com> wrote: > Hi all. > I set Spark.default.parallelism equals 20 in spark-default.conf. And send > this file to all nodes. > But I found reduce number is still default value,200. > Does anyone else encouter this problem? can anyone give some advice? > > ############ > [Stage 9:> (0 + 0) > / 200] > [Stage 9:> (0 + 2) > / 200] > [Stage 9:> (1 + 2) > / 200] > [Stage 9:> (2 + 2) > / 200] > ####### > > And this results in many empty files.Because my data is little, only some > of the 200 files have data. > ####### > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00000 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00001 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00002 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00003 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00004 > 2016-05-20 17:01 > /warehouse/dmpv3.db/datafile/tmp/output/userprofile/20160519/part-00005 > ######## > > > > -- --- Takeshi Yamamuro