Yes, there is no way right now to know how many stages a job will generate
automatically. Like Mark said, RDD#toDebugString will give you some info
about the RDD DAG and from that you can determine based on the dependency
types (Wide vs. narrow) if there is a stage boundary.
On Thu, Feb 5, 2015 at
And the Job page of the web UI will give you an idea of stages completed
out of the total number of stages for the job. That same information is
also available as JSON. Statically determining how many stages a job
logically comprises is one thing, but dynamically determining how many
stages remai
RDD#toDebugString will help.
On Thu, Feb 5, 2015 at 1:15 AM, Joe Wass wrote:
> Thanks Akhil and Mark. I can of course count events (assuming I can deduce
> the shuffle boundaries), but like I said the program isn't simple and I'd
> have to do this manually every time I change the code. So I rath
Thanks Akhil and Mark. I can of course count events (assuming I can deduce
the shuffle boundaries), but like I said the program isn't simple and I'd
have to do this manually every time I change the code. So I rather find a
way of doing this automatically if possible.
On 4 February 2015 at 19:41, M
But there isn't a 1-1 mapping from operations to stages since multiple
operations will be pipelined into a single stage if no shuffle is
required. To determine the number of stages in a job you really need to be
looking for shuffle boundaries.
On Wed, Feb 4, 2015 at 11:27 AM, Akhil Das
wrote:
>
You can easily understand the flow by looking at the number of operations
in your program (like map, groupBy, join etc.), first of all you list out
the number of operations happening in your application and then from the
webui you will be able to see how many operations have happened so far.
Thank
I'm sitting here looking at my application crunching gigabytes of data on a
cluster and I have no idea if it's an hour away from completion or a
minute. The web UI shows progress through each stage, but not how many
stages remaining. How can I work out how many stages my program will take
automatic