Sean, I did specify the number of cores to use as follows:
... ... val sparkConf = new SparkConf() .setAppName("<<< Reading HBase >>>") .set("spark.cores.max", "32") val sc = new SparkContext(sparkConf) ... ... But that does not solve the problem --- only 2 workers are allocated. I'm using Spark 0.9 and submitting my job through Yarn client mode. Actually, setting *spark.cores.max* only applies when the job runs on a *standalone deploy cluster *or a *Mesos cluster in "coarse-grained" sharing mode*. Please refer to this link <http://spark.apache.org/docs/0.9.1/configuration.html> So how to specify the number of executors when submitting a Spark 0.9 job in Yarn Client mode? 2014-10-08 15:09 GMT+08:00 Sean Owen <so...@cloudera.com>: > You do need to specify the number of executor cores to use. Executors are > not like mappers. After all they may do much more in their lifetime than > just read splits from HBase so would not make sense to determine it by > something that the first line of the program does. > On Oct 8, 2014 8:00 AM, "Tao Xiao" <xiaotao.cs....@gmail.com> wrote: > >> Hi Sean, >> >> Do I need to specify the number of executors when submitting the job? >> I suppose the number of executors will be determined by the number of >> regions of the table. Just like a MapReduce job, you needn't specify the >> number of map tasks when reading from a HBase table. >> >> The script to submit my job can be seen in my second post. Please refer >> to that. >> >> >> >> 2014-10-08 13:44 GMT+08:00 Sean Owen <so...@cloudera.com>: >> >>> How did you run your program? I don't see from your earlier post that >>> you ever asked for more executors. >>> >>> On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao <xiaotao.cs....@gmail.com> >>> wrote: >>> > I found the reason why reading HBase is too slow. Although each >>> > regionserver serves multiple regions for the table I'm reading, the >>> number >>> > of Spark workers allocated by Yarn is too low. Actually, I could see >>> that >>> > the table has dozens of regions spread over about 20 regionservers, >>> but only >>> > two Spark workers are allocated by Yarn. What is worse, the two >>> workers run >>> > one after one. So, the Spark job lost parallelism. >>> > >>> > So now the question is : Why are only 2 workers allocated? >>> >> >>