Re: Spark Streaming Job Keeps growing memory over time

Mich Talebzadeh Tue, 09 Aug 2016 03:56:23 -0700

Hi Aashish,

You are running in standalone mode with one node


As I read you start master and 5 workers pop up from
SPARK_WORKER_INSTANCES=5. I gather you use start-slaves.sh?

Now that is the number of workers and low memory on them port 8080 should
show practically no memory used (idle). Also every worker has been
allocated 1 core SPARK_WORKER_CORE=1

Now it all depends how you start your start-submit job and what parameters
you pass to it.

${SPARK_HOME}/bin/spark-submit \
                --driver-memory 1G \
                --num-executors 2 \
                --executor-cores 1 \
                --executor-memory 1G \
                --master spark://<IP>:7077 \

What are your parameters here? From my experience standalone mode has mind
of its own and it does not follow what you have asked.

If you increase the number of cores for workers, you may reduce the memory
issue because effectively multiple tasks can be run on sub-set of your data.

HTH

P.S. I don't use SPARK_MASTER_OPTS


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 August 2016 at 11:21, aasish.kumar <aasish.ku...@avekshaa.com> wrote:

> Hi,
>
> I am running spark v 1.6.1 on a single machine in standalone mode, having
> 64GB RAM and 16cores.
>
> I have created five worker instances to create five executor as in
> standalone mode, there cannot be more than one executor in one worker node.
>
> *Configuration*:
>
> SPARK_WORKER_INSTANCES 5
> SPARK_WORKER_CORE 1
> SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"
>
> all other configurations are default in spark_env.sh
>
> I am running a spark streaming direct kafka job at an interval of 1 min,
> which takes data from kafka and after some aggregation write the data to
> mongo.
>
> *Problems:*
>
> > when I start master and slave, it starts one master process and five
> > worker processes. each only consume about 212 MB of ram.when i submit the
> > job , it again creates 5 executor processes and 1 job process and also
> the
> > memory uses grows to 8GB in total and keeps growing over time (slowly)
> > also when there is no data to process.
>
> I am also unpersisting cached rdd at the end also set spark.cleaner.ttl to
> 600. but still memory is growing.
>
> > one more thing, I have seen the merged SPARK-1706, then also why i am
> > unable to create multiple executor within a worker.and also in
> > spark_env.sh file , setting any configuration related to executor comes
> > under YARN only mode.
>
> I have also tried running example program but same problem.
>
> Any help would be greatly appreciated,
>
> Thanks
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-
> memory-over-time-tp27498.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Spark Streaming Job Keeps growing memory over time

Reply via email to