Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

Anfernee Xu Wed, 23 Sep 2015 14:40:22 -0700

Hi Spark experts,

I'm coming across these terminologies and having some confusions, could you
please help me understand them better?


For instance I have implemented a Hadoop InputFormat to load my external
data in Spark, in turn my custom InputFormat will create a bunch of
InputSplit's, my questions is about

# Each InputSplit will exactly map to a Spark partition, is that correct?

# If I run on Yarn, how does Spark executor/task map to Yarn container?

# because I already have a bunch of InputSplits, do I still need to specify
the number of executors to get processing parallelized?

# How does -executor-memory map to the memory requirement in Yarn's
resource request?

-- 
--Anfernee

Custom Hadoop InputSplit, Spark partitions, spark executors/task and Yarn containers

Reply via email to