RDD Partition number

Alessandro Lulli Thu, 19 Feb 2015 08:10:45 -0800

Hi All,

Could you please help me understanding how Spark defines the number of
partitions of the RDDs if not specified?


I found the following in the documentation for file loaded from HDFS:
*The textFile method also takes an optional second argument for controlling
the number of partitions of the file. By default, Spark creates one
partition for each block of the file (blocks being 64MB by default in
HDFS), but you can also ask for a higher number of partitions by passing a
larger value. Note that you cannot have fewer partitions than blocks*

What is the rule for file loaded from the file systems?
For instance, i have a file X replicated on 4 machines. If i load the file
X in a RDD how many partitions are defined and why?

Thanks for your help on this
Alessandro

RDD Partition number

Reply via email to