Hi Spark experts, I'm coming across these terminologies and having some confusions, could you please help me understand them better?
For instance I have implemented a Hadoop InputFormat to load my external data in Spark, in turn my custom InputFormat will create a bunch of InputSplit's, my questions is about # Each InputSplit will exactly map to a Spark partition, is that correct? # If I run on Yarn, how does Spark executor/task map to Yarn container? # because I already have a bunch of InputSplits, do I still need to specify the number of executors to get processing parallelized? # How does -executor-memory map to the memory requirement in Yarn's resource request? -- --Anfernee
