But there isn't a 1-1 mapping from operations to stages since multiple
operations will be pipelined into a single stage if no shuffle is
required.  To determine the number of stages in a job you really need to be
looking for shuffle boundaries.

On Wed, Feb 4, 2015 at 11:27 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> You can easily understand the flow by looking at the number of operations
> in your program (like map, groupBy, join etc.), first of all you list out
> the number of operations happening in your application and then from the
> webui you will be able to see how many operations have happened so far.
>
> Thanks
> Best Regards
>
> On Wed, Feb 4, 2015 at 4:33 PM, Joe Wass <jw...@crossref.org> wrote:
>
>> I'm sitting here looking at my application crunching gigabytes of data on
>> a cluster and I have no idea if it's an hour away from completion or a
>> minute. The web UI shows progress through each stage, but not how many
>> stages remaining. How can I work out how many stages my program will take
>> automatically?
>>
>> My application has a slightly interesting DAG (re-use of functions that
>> contain Spark transformations, persistent RDDs). Not that complex, but not
>> 'step 1, step 2, step 3'.
>>
>> I'm guessing that if the driver program runs sequentially sending
>> messages to Spark, then Spark has no knowledge of the structure of the
>> driver program. Therefore it's necessary to execute it on a small test
>> dataset and see how many stages result?
>>
>> When I set spark.eventLog.enabled = true and run on (very small) test
>> data I don't get any stage messages in my STDOUT or in the log file. This
>> is on a `local` instance.
>>
>> Did I miss something obvious?
>>
>> Thanks!
>>
>> Joe
>>
>
>

Reply via email to