Yeah, I'm starting to think it's not possible to have dynamic stage names
at this time. But thanks for taking a look at this Josh.

On Tue, Sep 29, 2015 at 9:12 AM Josh Wills <[email protected]> wrote:

> Hey Nithin,
>
> I checked around about this-- apparently the stage name is hard-coded to
> be the call-site of the code block that triggered the stage:
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala
>
> Right now, we pass the names for DoFns to the RDDs we create via
> RDD.setName, but obviously that doesn't play into the stage name control.
>
> J
>
> On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <[email protected]>
> wrote:
>
>> I'm fairly new to Spark, and would like to understand about stage/job
>> names when using Crunch on Spark. When I submit my Spark application, I see
>> a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I
>> would like to understand if it possible by user code to update these stage
>> names dynamically? Perhaps, is it possible to have DoFn names as Stage
>> names?
>>
>> I did a little bit of digging and the closest thing I can find to modify
>> stage name is using
>>
>> sparkContext.setCallSite(String)
>>
>> However, this updates all stage and job names to same text. I tried
>> looking at MRPipeline's implementation to understand how JobNames are
>> built, and I believe for SparkPipeline crunch does not create DAG and we
>> don't create a job name.
>>
>> But does anyone with Spark expertise know if it's possible in Crunch to
>> create job/stage names based on DoFn names?
>>
>> Thank you!
>> Nithin
>>
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Reply via email to