Are you using a --master argument, or equivalent config, when calling spark-submit?
If you don't, it runs in standalone mode. On Mon, Jul 4, 2016 at 2:27 PM Jakub Stransky <stransky...@gmail.com> wrote: > Hi Mich, > > sure that workers are mentioned in slaves file. I can see them in spark > master UI and even after start they are "blocked" for this application but > the cpu and memory consumption is close to nothing. > > Thanks > Jakub > > On 4 July 2016 at 18:36, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Silly question. Have you added your workers to sbin/slaves file and have >> you started start-slaves.sh. >> >> on master node when you type jps what do you see? >> >> The problem seems to be that workers are ignored and spark is essentially >> running in Local mode >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 4 July 2016 at 17:05, Jakub Stransky <stransky...@gmail.com> wrote: >> >>> Hi Mich, >>> >>> I have set up spark default configuration in conf directory >>> spark-defaults.conf where I specify master hence no need to put it in >>> command line >>> spark.master spark://spark.master:7077 >>> >>> the same applies to driver memory which has been increased to 4GB >>> and the same is for spark.executor.memory 12GB as machines have 16GB >>> >>> Jakub >>> >>> >>> >>> >>> On 4 July 2016 at 17:44, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> Hi Jakub, >>>> >>>> In standalone mode Spark does the resource management. Which version of >>>> Spark are you running? >>>> >>>> How do you define your SparkConf() parameters for example setMaster >>>> etc. >>>> >>>> From >>>> >>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp >>>> SparkPOC.jar 10 4.3 >>>> >>>> I did not see any executor, memory allocation, so I assume you are >>>> allocating them somewhere else? >>>> >>>> HTH >>>> >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> On 4 July 2016 at 16:31, Jakub Stransky <stransky...@gmail.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have a spark cluster consisting of 4 nodes in a standalone mode, >>>>> master + 3 workers nodes with configured available memory and cpus etc. >>>>> >>>>> I have an spark application which is essentially a MLlib pipeline for >>>>> training a classifier, in this case RandomForest but could be a >>>>> DecesionTree just for the sake of simplicity. >>>>> >>>>> But when I submit the spark application to the cluster via spark >>>>> submit it is running out of memory. Even though the executors are >>>>> "taken"/created in the cluster they are esentially doing nothing ( poor >>>>> cpu, nor memory utilization) while the master seems to do all the work >>>>> which finally results in OOM. >>>>> >>>>> My submission is following: >>>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp >>>>> SparkPOC.jar 10 4.3 >>>>> >>>>> I am submitting from the master node. >>>>> >>>>> By default it is running in client mode which the driver process is >>>>> attached to spark-shell. >>>>> >>>>> Do I need to set up some settings to make MLlib algos parallelized and >>>>> distributed as well or all is driven by parallel factor set on dataframe >>>>> with input data? >>>>> >>>>> Essentially it seems that all work is just done on master and the rest >>>>> is idle. >>>>> Any hints what to check? >>>>> >>>>> Thx >>>>> Jakub >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Jakub Stransky >>> cz.linkedin.com/in/jakubstransky >>> >>> >> > > > -- > Jakub Stransky > cz.linkedin.com/in/jakubstransky > > -- Mathieu Longtin 1-514-803-8977