I've wanted similar functionality too: when network IO bound (for me I was
trying to pull things from s3 to hdfs) I wish there was a `.mapMachines`
api where I wouldn't have to try guess at the proper partitioning of a
'driver' RDD for `sc.parallelize(1 to N, N).map( i=> pull the i'th chunk
from S3 )`.

On Thu, Aug 27, 2015 at 10:01 AM Young, Matthew T <matthew.t.yo...@intel.com>
wrote:

> What’s the canonical way to find out the number of physical machines in a
> cluster at runtime in Spark? I believe SparkContext.defaultParallelism will
> give me the number of cores, but I’m interested in the number of NICs.
>
>
>
> I’m writing a Spark streaming application to ingest from Kafka with the
> Receiver API and want to create one DStream per physical machine for read
> parallelism’s sake. How can I figure out at run time how many machines
> there are so I know how many DStreams to create?
>

Reply via email to