Ah, apologies, I found an existing issue and fix has already gone out for
this in 1.3.1 and up: https://issues.apache.org/jira/browse/SPARK-6036.

On Mon, Jun 1, 2015 at 3:39 PM, Richard Marscher <rmarsc...@localytics.com>
wrote:

> It looks like it is possibly a race condition between removing the
> IN_PROGRESS and building the history UI for the application.
>
> `AppClient` sends an `UnregisterApplication(appId)` message to the
> `Master` actor, which triggers the process to look for the app's eventLogs.
> If they are suffixed with `.inprogress` then it will not build out the
> history UI and instead build the error page I've seen.
>
> Tying this together, calling SparkContext.stop() has the following block:
>
>
> if (_dagScheduler != null) { _dagScheduler.stop() _dagScheduler = null }
> if (_listenerBusStarted) { listenerBus.stop() _listenerBusStarted = false
> } _eventLogger.foreach(_.stop())
> Dag Scheduler has a TaskScheduler which has a SparkDeploySchedulerBackend
> which has an AppClient. AppClient sends itself a message to stop itself,
> and like mentioned above, it then sends a message to the Master where it
> tries to build the history UI.
>
> Meanwhile, EventLoggingListener.stop() is where the `.inprogress` suffix
> is removed in the file-system. It seems like the race condition of the Akka
> message passing to trigger the Master's building of the history UI may be
> the only reason the history UI ever gets properly setup in the first place.
> Because if the ordering of calls were all strict in the SparkContext.stop
> method then you would expect the Master to always see the event logs as in
> in progress.
>
> Maybe I have missed something in tracing through the code? Is there a
> reason that the eventLogger cannot be closed before the dagScheduler?
>
> Thanks,
> Richard
>
> On Mon, Jun 1, 2015 at 12:23 PM, Richard Marscher <
> rmarsc...@localytics.com> wrote:
>
>> Hi,
>>
>> In Spark 1.3.0 I've enabled event logging to write to an existing HDFS
>> folder on a Standalone cluster. This is generally working, all the logs are
>> being written. However, from the Master Web UI, the vast majority of
>> completed applications are labeled as not having a history:
>> http://xxx.xxx.xxx.xxx:8080/history/not-found/?msg=Application+App+is+still+in+progress.&title=Application%20history%20not%20found%20(app-20150601160846-1914)
>>
>> The log does exists though:
>>
>> # hdfs dfs -ls -R /eventLogs/app-20150601160846-1914
>>
>> -rw-rw----   3 user group    1027848 2015-06-01 16:09
>> /eventLogs/app-20150601160846-1914
>>
>> and `cat` the file ends with:
>>
>> {"Event":"SparkListenerApplicationEnd","Timestamp":1433174955077}
>>
>> This seems to indicate it saw and logged the application end.
>>
>> Is there a known issue here or a workaround? Looking at the source code I
>> might have expected these files to end in `.inprogress` given the UI error
>> message, but they don't.
>>
>> Thanks,
>> Richard
>>
>
>

Reply via email to