Ah, apologies, I found an existing issue and fix has already gone out for this in 1.3.1 and up: https://issues.apache.org/jira/browse/SPARK-6036.
On Mon, Jun 1, 2015 at 3:39 PM, Richard Marscher <rmarsc...@localytics.com> wrote: > It looks like it is possibly a race condition between removing the > IN_PROGRESS and building the history UI for the application. > > `AppClient` sends an `UnregisterApplication(appId)` message to the > `Master` actor, which triggers the process to look for the app's eventLogs. > If they are suffixed with `.inprogress` then it will not build out the > history UI and instead build the error page I've seen. > > Tying this together, calling SparkContext.stop() has the following block: > > > if (_dagScheduler != null) { _dagScheduler.stop() _dagScheduler = null } > if (_listenerBusStarted) { listenerBus.stop() _listenerBusStarted = false > } _eventLogger.foreach(_.stop()) > Dag Scheduler has a TaskScheduler which has a SparkDeploySchedulerBackend > which has an AppClient. AppClient sends itself a message to stop itself, > and like mentioned above, it then sends a message to the Master where it > tries to build the history UI. > > Meanwhile, EventLoggingListener.stop() is where the `.inprogress` suffix > is removed in the file-system. It seems like the race condition of the Akka > message passing to trigger the Master's building of the history UI may be > the only reason the history UI ever gets properly setup in the first place. > Because if the ordering of calls were all strict in the SparkContext.stop > method then you would expect the Master to always see the event logs as in > in progress. > > Maybe I have missed something in tracing through the code? Is there a > reason that the eventLogger cannot be closed before the dagScheduler? > > Thanks, > Richard > > On Mon, Jun 1, 2015 at 12:23 PM, Richard Marscher < > rmarsc...@localytics.com> wrote: > >> Hi, >> >> In Spark 1.3.0 I've enabled event logging to write to an existing HDFS >> folder on a Standalone cluster. This is generally working, all the logs are >> being written. However, from the Master Web UI, the vast majority of >> completed applications are labeled as not having a history: >> http://xxx.xxx.xxx.xxx:8080/history/not-found/?msg=Application+App+is+still+in+progress.&title=Application%20history%20not%20found%20(app-20150601160846-1914) >> >> The log does exists though: >> >> # hdfs dfs -ls -R /eventLogs/app-20150601160846-1914 >> >> -rw-rw---- 3 user group 1027848 2015-06-01 16:09 >> /eventLogs/app-20150601160846-1914 >> >> and `cat` the file ends with: >> >> {"Event":"SparkListenerApplicationEnd","Timestamp":1433174955077} >> >> This seems to indicate it saw and logged the application end. >> >> Is there a known issue here or a workaround? Looking at the source code I >> might have expected these files to end in `.inprogress` given the UI error >> message, but they don't. >> >> Thanks, >> Richard >> > >