Re: Reading from HBase is too slow

Tao Xiao Wed, 08 Oct 2014 18:21:20 -0700

Sean,

I did specify the number of cores to use as follows:


... ...
val sparkConf = new SparkConf()
        .setAppName("<<< Reading HBase >>>")
        .set("spark.cores.max", "32")
val sc = new SparkContext(sparkConf)
... ...



But that does not solve the problem --- only 2 workers are allocated.

I'm using Spark 0.9 and submitting my job through Yarn client mode.
Actually, setting *spark.cores.max* only applies when the job runs on
a *standalone
deploy cluster *or a  *Mesos cluster in "coarse-grained" sharing mode*.
Please refer to this link
<http://spark.apache.org/docs/0.9.1/configuration.html>

So how to specify the number of executors when submitting a Spark 0.9 job
in Yarn Client mode?

2014-10-08 15:09 GMT+08:00 Sean Owen <so...@cloudera.com>:

> You do need to specify the number of executor cores to use. Executors are
> not like mappers. After all they may do much more in their lifetime than
> just read splits from HBase so would not make sense to determine it by
> something that the first line of the program does.
> On Oct 8, 2014 8:00 AM, "Tao Xiao" <xiaotao.cs....@gmail.com> wrote:
>
>> Hi Sean,
>>
>>    Do I need to specify the number of executors when submitting the job?
>> I suppose the number of executors will be determined by the number of
>> regions of the table. Just like a MapReduce job, you needn't specify the
>> number of map tasks when reading from a HBase table.
>>
>>   The script to submit my job can be seen in my second post. Please refer
>> to that.
>>
>>
>>
>> 2014-10-08 13:44 GMT+08:00 Sean Owen <so...@cloudera.com>:
>>
>>> How did you run your program? I don't see from your earlier post that
>>> you ever asked for more executors.
>>>
>>> On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao <xiaotao.cs....@gmail.com>
>>> wrote:
>>> > I found the reason why reading HBase is too slow.  Although each
>>> > regionserver serves multiple regions for the table I'm reading, the
>>> number
>>> > of Spark workers allocated by Yarn is too low. Actually, I could see
>>> that
>>> > the table has dozens of regions spread over about 20 regionservers,
>>> but only
>>> > two Spark workers are allocated by Yarn. What is worse, the two
>>> workers run
>>> > one after one. So, the Spark job lost parallelism.
>>> >
>>> > So now the question is : Why are only 2 workers allocated?
>>>
>>
>>

Re: Reading from HBase is too slow

Reply via email to