Re: Spark Streaming Job Keeps growing memory over time

Aasish Kumar Tue, 09 Aug 2016 05:20:13 -0700

Hi Sandeep,

I have not enabled check pointing. I will try enabling check pointing and
observe the memory pattern. but what you really want to correlate with
check pointing . I don't know much about check-pointing.



Thanks and rgds

Aashish Kumar

Software Engineer

Avekshaa Technologies (P) Ltd. | www.avekshaa.com

+91 -9164495083

Performance Excellence Assured

*Deloitte Technology Fast 50 India *|* Technology Fast 500 APAC 2014*

*NASSCOM* Emerge 50, 2013
*Express IT Awards *- IT Innovation: Winner (silver) 2015

*P* *Every 3000 A4 paper costs 1 tree. Please **do not **print unless you
really need it, save environment & energy*

On Tue, Aug 9, 2016 at 5:30 PM, Sandeep Nemuri <nhsande...@gmail.com> wrote:

> Hi Aashish,
>
> Do you have checkpointing enabled ? if not, Can you try enabling
> checkpointing and observe the memory pattern.
>
> Thanks,
> Sandeep
> ᐧ
>
> On Tue, Aug 9, 2016 at 4:25 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> > wrote:
>
>> Hi Aashish,
>>
>> You are running in standalone mode with one node
>>
>> As I read you start master and 5 workers pop up from
>> SPARK_WORKER_INSTANCES=5. I gather you use start-slaves.sh?
>>
>> Now that is the number of workers and low memory on them port 8080
>> should  show practically no memory used (idle). Also every worker has been
>> allocated 1 core SPARK_WORKER_CORE=1
>>
>> Now it all depends how you start your start-submit job and what
>> parameters you pass to it.
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>                 --driver-memory 1G \
>>                 --num-executors 2 \
>>                 --executor-cores 1 \
>>                 --executor-memory 1G \
>>                 --master spark://<IP>:7077 \
>>
>> What are your parameters here? From my experience standalone mode has
>> mind of its own and it does not follow what you have asked.
>>
>> If you increase the number of cores for workers, you may reduce the
>> memory issue because effectively multiple tasks can be run on sub-set of
>> your data.
>>
>> HTH
>>
>> P.S. I don't use SPARK_MASTER_OPTS
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 9 August 2016 at 11:21, aasish.kumar <aasish.ku...@avekshaa.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am running spark v 1.6.1 on a single machine in standalone mode, having
>>> 64GB RAM and 16cores.
>>>
>>> I have created five worker instances to create five executor as in
>>> standalone mode, there cannot be more than one executor in one worker
>>> node.
>>>
>>> *Configuration*:
>>>
>>> SPARK_WORKER_INSTANCES 5
>>> SPARK_WORKER_CORE 1
>>> SPARK_MASTER_OPTS "-Dspark.deploy.default.Cores=5"
>>>
>>> all other configurations are default in spark_env.sh
>>>
>>> I am running a spark streaming direct kafka job at an interval of 1 min,
>>> which takes data from kafka and after some aggregation write the data to
>>> mongo.
>>>
>>> *Problems:*
>>>
>>> > when I start master and slave, it starts one master process and five
>>> > worker processes. each only consume about 212 MB of ram.when i submit
>>> the
>>> > job , it again creates 5 executor processes and 1 job process and also
>>> the
>>> > memory uses grows to 8GB in total and keeps growing over time (slowly)
>>> > also when there is no data to process.
>>>
>>> I am also unpersisting cached rdd at the end also set spark.cleaner.ttl
>>> to
>>> 600. but still memory is growing.
>>>
>>> > one more thing, I have seen the merged SPARK-1706, then also why i am
>>> > unable to create multiple executor within a worker.and also in
>>> > spark_env.sh file , setting any configuration related to executor comes
>>> > under YARN only mode.
>>>
>>> I have also tried running example program but same problem.
>>>
>>> Any help would be greatly appreciated,
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Spark-Streaming-Job-Keeps-growing-memo
>>> ry-over-time-tp27498.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>
>
> --
> *  Regards*
> *  Sandeep Nemuri*
>

Re: Spark Streaming Job Keeps growing memory over time

Reply via email to