i am using spark1.6 and noticed  time between jobs get longer,sometimes it
could be 20 mins.
i tried to search same questions ,and found a close one :
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-app-gets-slower-as-it-gets-executed-more-times-td1089.html#a1146

and found something useful:
One thing to worry about is long-running jobs or shells. Currently, state
buildup of a single job in Spark is a problem, as certain state such as
shuffle files and RDD metadata is not cleaned up until the job (or shell)
exits. We have hacky ways to reduce this, and are working on a long term
solution. However, separate, consecutive jobs should be independent in terms
of performance.


On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <[hidden email]> wrote:
Is your spark app an iterative one ? If so, your app is creating a big DAG
in every iteration. You should use checkpoint it periodically, say, 10
iterations one checkpoint.

i also wrote a test program,there is the code:

public static void newJob(int jobNum,SQLContext sqlContext){
                for(int i=0;i<jobNum;i++){
                        testJob(i,sqlContext);
                }
        }
        
        
        public static void testJob(int jobNum,SQLContext sqlContext){
                String test_sql =" SELECT a.*   FROM income a";
                DataFrame test_df = sqlContext.sql(test_sql);
                test_df.registerTempTable("income");
                test_df.cache();
                test_df.count();
                test_df.show();
                }
        }

function newJob(100,sqlContext)  could reproduce my issue,job build cost
more and more time .
DataFrame  without close api like checkpoint.
Is there anothor way to resolve it?







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/job-build-cost-more-and-more-time-tp27017.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to