Re: [Spark-Core] sc.textFile() explicit minPartitions did not work

Gokula Krishnan D Tue, 25 Jul 2017 05:54:47 -0700

Excuse for the too many mails on this post.

found a similar issue
https://stackoverflow.com/questions/24671755/how-to-partition-a-rdd


Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Tue, Jul 25, 2017 at 8:21 AM, Gokula Krishnan D <email2...@gmail.com>
wrote:

> In addition to that,
>
> tried to read the same file with 3000 partitions but it used 3070
> partitions. And took more time than previous please refer the attachment.
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
> On Tue, Jul 25, 2017 at 8:15 AM, Gokula Krishnan D <email2...@gmail.com>
> wrote:
>
>> Hello All,
>>
>> I have a HDFS file with approx. *1.5 Billion records* with 500 Part
>> files (258.2GB Size) and when I tried to execute the following I could see
>> that it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't
>> it?
>>
>> val inputFile = <HDFS File>
>> val inputRdd = sc.textFile(inputFile)
>> inputRdd.count()
>>
>> I was hoping that I can do the same with the fewer partitions so tried
>> the following
>>
>> val inputFile = <HDFS File>
>> val inputrddnqew = sc.textFile(inputFile,500)
>> inputRddNew.count()
>>
>> But still it used 2290 tasks.
>>
>> As per scala doc, it supposed use as like the HDFS file i.e 500.
>>
>> It would be great if you could throw some insight on this.
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>
>

Re: [Spark-Core] sc.textFile() explicit minPartitions did not work

Reply via email to