Hi All, Thanks for your answers. I have one more details to point out.
It is clear now how partition number is defined for HDFS file, However, if i have my dataset replicated on all the machines in the same absolute path. In this case each machine has for instance ext3 filesystem. If i load the file in a RDD how many partitions are defined in this case and why? I found that Spark define a number, say K, of partitions. If i force the partition to be <=K my parameter is ignored. If a set a value K*>=K then Spark set K* partitions. Thanks for your help Alessandro On Thu, Feb 19, 2015 at 6:27 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. *blocks being 64MB by default in HDFS* > > > *In hadoop 2.1+, default block size has been increased.* > See https://issues.apache.org/jira/browse/HDFS-4053 > > Cheers > > On Thu, Feb 19, 2015 at 8:32 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> What file system are you using ? >> >> If you use hdfs, the documentation you cited is pretty clear on how >> partitions are determined. >> >> bq. file X replicated on 4 machines >> >> I don't think replication factor plays a role w.r.t. partitions. >> >> On Thu, Feb 19, 2015 at 8:05 AM, Alessandro Lulli <lu...@di.unipi.it> >> wrote: >> >>> Hi All, >>> >>> Could you please help me understanding how Spark defines the number of >>> partitions of the RDDs if not specified? >>> >>> I found the following in the documentation for file loaded from HDFS: >>> *The textFile method also takes an optional second argument for >>> controlling the number of partitions of the file. By default, Spark creates >>> one partition for each block of the file (blocks being 64MB by default in >>> HDFS), but you can also ask for a higher number of partitions by passing a >>> larger value. Note that you cannot have fewer partitions than blocks* >>> >>> What is the rule for file loaded from the file systems? >>> For instance, i have a file X replicated on 4 machines. If i load the >>> file X in a RDD how many partitions are defined and why? >>> >>> Thanks for your help on this >>> Alessandro >>> >> >> >