Yep exactly! I’m not sure how complicated it would be to pull off. If someone
wouldn’t mind helping to get me pointed in the right direction I would be happy
to look into and contribute this functionality. I imagine this would be
implemented in the scheduler codebase and there would be some s
This would be really useful. Especially for Shark where shift of
partitioning effects all subsequent queries unless task scheduling time
beats spark.locality.wait. Can cause overall low performance for all
subsequent tasks.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur
We have a use case where we’d like something to execute once on each node and I
thought it would be good to ask here.
Currently we achieve this by setting the parallelism to the number of nodes and
use a mod partitioner:
val balancedRdd = sc.parallelize(
(0 until Settings.parallelism)