Hey Nithin, I checked around about this-- apparently the stage name is hard-coded to be the call-site of the code block that triggered the stage:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala Right now, we pass the names for DoFns to the RDDs we create via RDD.setName, but obviously that doesn't play into the stage name control. J On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <[email protected]> wrote: > I'm fairly new to Spark, and would like to understand about stage/job > names when using Crunch on Spark. When I submit my Spark application, I see > a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I > would like to understand if it possible by user code to update these stage > names dynamically? Perhaps, is it possible to have DoFn names as Stage > names? > > I did a little bit of digging and the closest thing I can find to modify > stage name is using > > sparkContext.setCallSite(String) > > However, this updates all stage and job names to same text. I tried > looking at MRPipeline's implementation to understand how JobNames are > built, and I believe for SparkPipeline crunch does not create DAG and we > don't create a job name. > > But does anyone with Spark expertise know if it's possible in Crunch to > create job/stage names based on DoFn names? > > Thank you! > Nithin > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
