How to run a job on all workers?

2014-07-09 Thread silvermast
Is it possible to run a job that assigns work to every worker in the system? My bootleg right now is to have a spark listener hear whenever a block manager is added and to increase a split count by 1. It runs a spark job with that split count and hopes that it will at least run on the newest worker

Re: TaskContext stageId = 0

2014-07-09 Thread silvermast
Oh well, never mind. The problem is that ResultTask's stageId is immutable and is used to construct the Task superclass. Anyway, my solution now is to use this.id for the rddId and to gather all rddIds using a spark listener on stage completed to clean up for any activity registered for those rdds.

TaskContext stageId = 0

2014-07-09 Thread silvermast
Has anyone else seen this, at least in local mode? I haven't tried this in the cluster, but I'm getting myself frustrated that I cannot ID activity within the RDD's compute() method whether by stageId or rddId (available on ParallelCollectionPartition but not on ShuffledRDDPartition, and then only