Yeah, I'm starting to think it's not possible to have dynamic stage names at this time. But thanks for taking a look at this Josh.
On Tue, Sep 29, 2015 at 9:12 AM Josh Wills <[email protected]> wrote: > Hey Nithin, > > I checked around about this-- apparently the stage name is hard-coded to > be the call-site of the code block that triggered the stage: > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Stage.scala > > Right now, we pass the names for DoFns to the RDDs we create via > RDD.setName, but obviously that doesn't play into the stage name control. > > J > > On Mon, Sep 28, 2015 at 5:46 PM, Nithin Asokan <[email protected]> > wrote: > >> I'm fairly new to Spark, and would like to understand about stage/job >> names when using Crunch on Spark. When I submit my Spark application, I see >> a set of stage names like *mapToPair at PGroupedTableImpl.java:108. *I >> would like to understand if it possible by user code to update these stage >> names dynamically? Perhaps, is it possible to have DoFn names as Stage >> names? >> >> I did a little bit of digging and the closest thing I can find to modify >> stage name is using >> >> sparkContext.setCallSite(String) >> >> However, this updates all stage and job names to same text. I tried >> looking at MRPipeline's implementation to understand how JobNames are >> built, and I believe for SparkPipeline crunch does not create DAG and we >> don't create a job name. >> >> But does anyone with Spark expertise know if it's possible in Crunch to >> create job/stage names based on DoFn names? >> >> Thank you! >> Nithin >> > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
