You can easily understand the flow by looking at the number of operations
in your program (like map, groupBy, join etc.), first of all you list out
the number of operations happening in your application and then from the
webui you will be able to see how many operations have happened so far.

Thanks
Best Regards

On Wed, Feb 4, 2015 at 4:33 PM, Joe Wass <jw...@crossref.org> wrote:

> I'm sitting here looking at my application crunching gigabytes of data on
> a cluster and I have no idea if it's an hour away from completion or a
> minute. The web UI shows progress through each stage, but not how many
> stages remaining. How can I work out how many stages my program will take
> automatically?
>
> My application has a slightly interesting DAG (re-use of functions that
> contain Spark transformations, persistent RDDs). Not that complex, but not
> 'step 1, step 2, step 3'.
>
> I'm guessing that if the driver program runs sequentially sending messages
> to Spark, then Spark has no knowledge of the structure of the driver
> program. Therefore it's necessary to execute it on a small test dataset and
> see how many stages result?
>
> When I set spark.eventLog.enabled = true and run on (very small) test data
> I don't get any stage messages in my STDOUT or in the log file. This is on
> a `local` instance.
>
> Did I miss something obvious?
>
> Thanks!
>
> Joe
>

Reply via email to