Hi,
I checked the number of partitions by
System.out.println("INFO: RDD with " + rdd.partitions().size() + "
partitions created.");
Each single split is about 100MB. I am currently loading the data from
local file system, would this explains this observation?
Thank you!
Best,
Wenlei
On Tue, Apr 21, 2015 at 6:28 AM, Archit Thakur <[email protected]>
wrote:
> Hi,
>
> It should generate the same no of partitions as the no. of splits.
> Howd you check no of partitions.? Also please paste your file size and
> hdfs-site.xml and mapred-site.xml here.
>
> Thanks and Regards,
> Archit Thakur.
>
> On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie <[email protected]> wrote:
>
>> Hi,
>>
>> I am wondering the mechanism that determines the number of partitions
>> created by SparkContext.sequenceFile ?
>>
>> For example, although my file has only 4 splits, Spark would create 16
>> partitions for it. Is it determined by the file size? Is there any way to
>> control it? (Looks like I can only tune minPartitions but not maxPartitions)
>>
>> Thank you!
>>
>> Best,
>> Wenlei
>>
>>
>>
>
--
Wenlei Xie (谢文磊)
Ph.D. Candidate
Department of Computer Science
456 Gates Hall, Cornell University
Ithaca, NY 14853, USA
Email: [email protected]