Hi Spark experts,

I'm coming across these terminologies and having some confusions, could you
please help me understand them better?

For instance I have implemented a Hadoop InputFormat to load my external
data in Spark, in turn my custom InputFormat will create a bunch of
InputSplit's, my questions is about

# Each InputSplit will exactly map to a Spark partition, is that correct?

# If I run on Yarn, how does Spark executor/task map to Yarn container?

# because I already have a bunch of InputSplits, do I still need to specify
the number of executors to get processing parallelized?

# How does -executor-memory map to the memory requirement in Yarn's
resource request?

-- 
--Anfernee

Reply via email to