Re: Spark application doesn't scale to worker nodes

Mathieu Longtin Mon, 04 Jul 2016 11:39:07 -0700

Are you using a --master argument, or equivalent config, when calling
spark-submit?


If you don't, it runs in standalone mode.

On Mon, Jul 4, 2016 at 2:27 PM Jakub Stransky <stransky...@gmail.com> wrote:

> Hi Mich,
>
> sure that workers are mentioned in slaves file. I can see them in spark
> master UI and even after start they are "blocked" for this application but
> the cpu and memory consumption is close to nothing.
>
> Thanks
> Jakub
>
> On 4 July 2016 at 18:36, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Silly question. Have you added your workers to sbin/slaves file and have
>> you started start-slaves.sh.
>>
>> on master node when you type jps what do you see?
>>
>> The problem seems to be that workers are ignored and spark is essentially
>> running in Local mode
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 4 July 2016 at 17:05, Jakub Stransky <stransky...@gmail.com> wrote:
>>
>>> Hi Mich,
>>>
>>> I have set up spark default configuration in conf directory
>>> spark-defaults.conf where I specify master hence no need to put it in
>>> command line
>>> spark.master   spark://spark.master:7077
>>>
>>> the same applies to driver memory which has been increased to 4GB
>>>  and the same is for spark.executor.memory 12GB as machines have 16GB
>>>
>>> Jakub
>>>
>>>
>>>
>>>
>>> On 4 July 2016 at 17:44, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Hi Jakub,
>>>>
>>>> In standalone mode Spark does the resource management. Which version of
>>>> Spark are you running?
>>>>
>>>> How do you define your SparkConf() parameters for example setMaster
>>>> etc.
>>>>
>>>> From
>>>>
>>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>>> SparkPOC.jar 10 4.3
>>>>
>>>> I did not see any executor, memory allocation, so I assume you are
>>>> allocating them somewhere else?
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 4 July 2016 at 16:31, Jakub Stransky <stransky...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I have a spark cluster consisting of 4 nodes in a standalone mode,
>>>>> master + 3 workers nodes with configured available memory and cpus etc.
>>>>>
>>>>> I have an spark application which is essentially a MLlib pipeline for
>>>>> training a classifier, in this case RandomForest  but could be a
>>>>> DecesionTree just for the sake of simplicity.
>>>>>
>>>>> But when I submit the spark application to the cluster via spark
>>>>> submit it is running out of memory. Even though the executors are
>>>>> "taken"/created in the cluster they are esentially doing nothing ( poor
>>>>> cpu, nor memory utilization) while the master seems to do all the work
>>>>> which finally results in OOM.
>>>>>
>>>>> My submission is following:
>>>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>>>> SparkPOC.jar 10 4.3
>>>>>
>>>>> I am submitting from the master node.
>>>>>
>>>>> By default it is running in client mode which the driver process is
>>>>> attached to spark-shell.
>>>>>
>>>>> Do I need to set up some settings to make MLlib algos parallelized and
>>>>> distributed as well or all is driven by parallel factor set on dataframe
>>>>> with input data?
>>>>>
>>>>> Essentially it seems that all work is just done on master and the rest
>>>>> is idle.
>>>>> Any hints what to check?
>>>>>
>>>>> Thx
>>>>> Jakub
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Jakub Stransky
>>> cz.linkedin.com/in/jakubstransky
>>>
>>>
>>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
> --
Mathieu Longtin
1-514-803-8977

Re: Spark application doesn't scale to worker nodes

Reply via email to