Re: long GC pause during file.cache()

Nan Zhu Sun, 15 Jun 2014 13:50:34 -0700

SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind 
the WARNING in the logs


you can set spark.executor.extraJavaOpts in your SparkConf obj  

Best,

--  
Nan Zhu


On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote:

> Hi, Wei
>  
> You may try to set JVM opts in spark-env.sh (http://spark-env.sh) as follow 
> to prevent or mitigate GC pause:
>  
> export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC 
> -Xmx2g -XX:MaxPermSize=256m"
>  
> There are more options you could add, please just Google :)  
>  
>  
> Regards,
> Wang Hao(王灏)
>  
> CloudTeam | School of Software Engineering
> Shanghai Jiao Tong University
> Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> Email:wh.s...@gmail.com (mailto:wh.s...@gmail.com)
>  
>  
>  
>  
>  
>  
> On Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <w...@us.ibm.com 
> (mailto:w...@us.ibm.com)> wrote:
> > Hi,  
> >  
> >   I have a single node (192G RAM) stand-alone spark, with memory 
> > configuration like this in spark-env.sh (http://spark-env.sh)  
> >  
> > SPARK_WORKER_MEMORY=180g  
> > SPARK_MEM=180g  
> >  
> >  
> >  In spark-shell I have a program like this:  
> >  
> > val file = sc.textFile("/localpath") //file size is 40G  
> > file.cache()  
> >  
> >  
> > val output = file.map(line => extract something from line)  
> >  
> > output.saveAsTextFile (...)  
> >  
> >  
> > When I run this program again and again, or keep trying file.unpersist() 
> > --> file.cache() --> output.saveAsTextFile(), the run time varies a lot, 
> > from 1 min to 3 min to 50+ min. Whenever the run-time is more than 1 min, 
> > from the stage monitoring GUI I observe big GC pause (some can be 10+ min). 
> > Of course when run-time is "normal", say ~1 min, no significant GC is 
> > observed. The behavior seems somewhat random.  
> >  
> > Is there any JVM tuning I should do to prevent this long GC pause from 
> > happening?  
> >  
> >  
> >  
> > I used java-1.6.0-openjdk.x86_64, and my spark-shell process is something 
> > like this:  
> >  
> > root     10994  1.7  0.6 196378000 1361496 pts/51 Sl+ 22:06   0:12 
> > /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -cp 
> > ::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar
> >  -XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g 
> > org.apache.spark.deploy.SparkSubmit spark-shell --class 
> > org.apache.spark.repl.Main  
> >  
> > Best regards,  
> > Wei  
> >  
> > ---------------------------------  
> > Wei Tan, PhD  
> > Research Staff Member  
> > IBM T. J. Watson Research Center  
> > http://researcher.ibm.com/person/us-wtan

Re: long GC pause during file.cache()

Reply via email to