I am running on a 15 node cluster and am trying to set partitioning to balance the work across all nodes. I am using an Accumulator to track work by Mac Address but would prefer to use data known to the Spark environment - Executor ID, and Function ID show up in the Spark UI and Task ID and Attempt ID (assuming these work like Hadoop) would be useful. Does someone know how code running in a function can access these parameters. I think I have asked this group several times about Task ID and Attempt ID without getting a reply.
Incidentally the data I collect suggests that my execution is not at all balanced