Is one Spark partition mapped to one and only Spark Task ?

Sreyan Chakravarty Sun, 24 Mar 2024 13:00:28 -0700

I am trying to understand the Spark Architecture for my upcoming
certification, however there seems to be conflicting information available.


https://stackoverflow.com/questions/47782099/what-is-the-relationship-between-tasks-and-partitions

Does Spark assign a Spark partition to only a single corresponding Spark
partition ?

In other words, is the number of Spark tasks for a job equal to the number
of Spark partitions ? (Provided of course there are no shuffles)

If so, a following question is :

1) Is the reason in Spark we can get OOMs ? Because a partition may not be
able to be loaded into RAM (provided its coming from an intermediate step
like a groupBy) ?

2) What is the purpose of spark.task.cpus ? It does not make sense for more
than one thread (or more than one cpu) to be working on a single
partition of data. So this number should always be 1 right ?

Need some help. Thanks.

-- 
Regards,
Sreyan Chakravarty

Is one Spark partition mapped to one and only Spark Task ?

Reply via email to