Re: long GC pause during file.cache()

Nan Zhu Sun, 15 Jun 2014 15:42:06 -0700

Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t 
check….)


Best,  

--  
Nan Zhu


On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote:

> Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0?
>  
>  
>  
> On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu <zhunanmcg...@gmail.com 
> (mailto:zhunanmcg...@gmail.com)> wrote:
> > SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t 
> > mind the WARNING in the logs
> >  
> > you can set spark.executor.extraJavaOpts in your SparkConf obj  
> >  
> > Best,
> >  
> > --  
> > Nan Zhu
> >  
> >  
> > On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote:
> >  
> > > Hi, Wei
> > >  
> > > You may try to set JVM opts in spark-env.sh (http://spark-env.sh) as 
> > > follow to prevent or mitigate GC pause:  
> > >  
> > > export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC 
> > > -Xmx2g -XX:MaxPermSize=256m"
> > >  
> > > There are more options you could add, please just Google :)  
> > >  
> > >  
> > > Regards,
> > > Wang Hao(王灏)
> > >  
> > > CloudTeam | School of Software Engineering
> > > Shanghai Jiao Tong University
> > > Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> > > Email:wh.s...@gmail.com (mailto:wh.s...@gmail.com)
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > On Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <w...@us.ibm.com 
> > > (mailto:w...@us.ibm.com)> wrote:
> > > > Hi,  
> > > >  
> > > >   I have a single node (192G RAM) stand-alone spark, with memory 
> > > > configuration like this in spark-env.sh (http://spark-env.sh)  
> > > >  
> > > > SPARK_WORKER_MEMORY=180g  
> > > > SPARK_MEM=180g  
> > > >  
> > > >  
> > > >  In spark-shell I have a program like this:  
> > > >  
> > > > val file = sc.textFile("/localpath") //file size is 40G  
> > > > file.cache()  
> > > >  
> > > >  
> > > > val output = file.map(line => extract something from line)  
> > > >  
> > > > output.saveAsTextFile (...)  
> > > >  
> > > >  
> > > > When I run this program again and again, or keep trying 
> > > > file.unpersist() --> file.cache() --> output.saveAsTextFile(), the run 
> > > > time varies a lot, from 1 min to 3 min to 50+ min. Whenever the 
> > > > run-time is more than 1 min, from the stage monitoring GUI I observe 
> > > > big GC pause (some can be 10+ min). Of course when run-time is 
> > > > "normal", say ~1 min, no significant GC is observed. The behavior seems 
> > > > somewhat random.  
> > > >  
> > > > Is there any JVM tuning I should do to prevent this long GC pause from 
> > > > happening?  
> > > >  
> > > >  
> > > >  
> > > > I used java-1.6.0-openjdk.x86_64, and my spark-shell process is 
> > > > something like this:  
> > > >  
> > > > root     10994  1.7  0.6 196378000 1361496 pts/51 Sl+ 22:06   0:12 
> > > > /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -cp 
> > > > ::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar
> > > >  -XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g 
> > > > org.apache.spark.deploy.SparkSubmit spark-shell --class 
> > > > org.apache.spark.repl.Main  
> > > >  
> > > > Best regards,  
> > > > Wei  
> > > >  
> > > > ---------------------------------  
> > > > Wei Tan, PhD  
> > > > Research Staff Member  
> > > > IBM T. J. Watson Research Center  
> > > > http://researcher.ibm.com/person/us-wtan
> >  
>  
>  
>  
> --  
>                                                              SUREN HIRAMAN, 
> VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>  
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v (mailto:suren.hira...@sociocast.com)elos.io 
> (http://elos.io)
> W: www.velos.io (http://www.velos.io/)
>

Re: long GC pause during file.cache()

Reply via email to