Is it possible to run a job that assigns work to every worker in the system?
My bootleg right now is to have a spark listener hear whenever a block
manager is added and to increase a split count by 1. It runs a spark job
with that split count and hopes that it will at least run on the newest
worker
Oh well, never mind. The problem is that ResultTask's stageId is immutable
and is used to construct the Task superclass. Anyway, my solution now is to
use this.id for the rddId and to gather all rddIds using a spark listener on
stage completed to clean up for any activity registered for those rdds.
Has anyone else seen this, at least in local mode? I haven't tried this in
the cluster, but I'm getting myself frustrated that I cannot ID activity
within the RDD's compute() method whether by stageId or rddId (available on
ParallelCollectionPartition but not on ShuffledRDDPartition, and then only