Rong Tang created SPARK-25409:
---------------------------------

             Summary: Speed up Spark History at start if there are tens of 
thousands of applications.
                 Key: SPARK-25409
                 URL: https://issues.apache.org/jira/browse/SPARK-25409
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: Rong Tang


We have a spark history server, storing 7 days' applications. it usually has 
10K to 20K attempts.

We found that it can take hours at start up,loading/replaying the logs in 
event-logs folder.  thus, new finished applications have to wait several hours 
to be seem. So I made 2 improvements for it.
 # As we run spark on yarn. the on-going applications' information can also be 
seen via resource manager, so I introduce in a flag 
spark.history.fs.load.incomplete to say loading logs for incomplete attempts or 
not.
 # Incremental loading applications. as I said, we have more then 10K 
applications stored, it can take hours to load all of them at the first time. 
so I introduced in a config spark.history.fs.appRefreshNum to say how many 
application to load each time, then it gets a chance the check the latest 
updates.

Here are the benchmark I did.  our system has 1K incomplete application ( it 
was not cleaned up for some reason, that is another issue that I need 
investigate), and applications' log size can be gigabytes. 

 

Not load incomplete attempts.
| |Load Count|Load incomplete APPs|All attempts number|Time Cost|Increase with 
more attempts|
|1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
|2|All|No|13K|31 minutes| yes|

 

 

Limit each time how much to load.

 
| |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Increase with 
more attempts|
|1 ( current implementation)|All|Yes|13K|2 hours 14 minutes|Yes|
|2|3000|Yes|13K|42 minutes except last 1.6K
(The last 1.6K attempts cost extremely long 2.5 hours)|NO|

 

 

Limit each time how many to load, and not load incomplete jobs.

 
| |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase 
with more attempts|
|1 ( current implementation)|All|Yes|13K|2 hours 14 minutes| |Yes|
|2|3000|NO|12K|17minutes
 |10 minutes
( 41 minutes in total)|NO|

 

 
| |Load Count|Load incomplete APPs|All attempts number|Worst Cost|Avg|Increase 
with more attempts|
|1 ( current implementation)|All|Yes|20K|1 hour 52 minutes| |Yes|
|2|3000|NO|18.5K|20minutes|18 minutes
(2 hours 18 minutes in total)
 |NO|

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to