This is great news thanks for the update! I will either wait for the
1.0 release or go and test it ahead of time from git rather than trying
to pull it out of JobLogger or creating my own SparkListener.
On 04/02/2014 06:48 PM, Andrew Or wrote:
Hi Philip,
In the upcoming release of Spark
I can appreciate the reluctance to expose something like the
JobProgressListener as a public interface. It's exactly the sort of
thing that you want to deprecate as soon as something better comes along
and can be a real pain when trying to maintain the level of backwards
compatibility that
What I'd like is a way to capture the information provided on the stages
page (i.e. cluster:4040/stages via IndexPage). Looking through the
Spark code, it doesn't seem like it is possible to directly query for
specific facts such as how many tasks have succeeded or how many total
tasks there
Hey Phillip,
Right now there is no mechanism for this. You have to go in through the low
level listener interface.
We could consider exposing the JobProgressListener directly - I think it's
been factored nicely so it's fairly decoupled from the UI. The concern is
this is a semi-internal piece of
Hi Philip,
In the upcoming release of Spark 1.0 there will be a feature that provides
for exactly what you describe: capturing the information displayed on the
UI in JSON. More details will be provided in the documentation, but for
now, anything before 0.9.1 can only go through JobLogger.scala,
Some related discussion: https://github.com/apache/spark/pull/246
On Tue, Apr 1, 2014 at 8:43 AM, Philip Ogren philip.og...@oracle.comwrote:
Hi DB,
Just wondering if you ever got an answer to your question about monitoring
progress - either offline or through your own investigation. Any
The discussion there hits on the distinction of jobs and stages.
When looking at one application, there are hundreds of stages,
sometimes thousands. Depends on the data and the task. And the UI
seems to track stages. And one could independently track them for
such a job.
You can get detailed information through Spark listener interface regarding
each stage. Multiple jobs may be compressed into a single stage so jobwise
information would be same as Spark.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi