Re: Reading from HDFS by increasing split size

Kanagha Kumar Tue, 10 Oct 2017 08:55:42 -0700

Thanks for the inputs!!

I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the
size I wanted to read. It didn't take any effect.
I also tried passing in spark.dfs.block.size, with all the params set to
the same value.


JavaSparkContext.fromSparkContext(spark.sparkContext()).textFile(hdfsPath,
13);

Is there any other param that needs to be set as well?

Thanks

On Tue, Oct 10, 2017 at 4:32 AM, ayan guha <guha.a...@gmail.com> wrote:

> I have not tested this, but you should be able to pass on any map-reduce
> like conf to underlying hadoop config.....essentially you should be able to
> control behaviour of split as you can do in a map-reduce program (as Spark
> uses the same input format)
>
> On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfra...@gmail.com>
> wrote:
>
>> Write your own input format/datasource or split the file yourself
>> beforehand (not recommended).
>>
>> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm trying to read a 60GB HDFS file using spark
>> textFile("hdfs_file_path", minPartitions).
>> >
>> > How can I control the no.of tasks by increasing the split size? With
>> default split size of 250 MB, several tasks are created. But I would like
>> to have a specific no.of tasks created while reading from HDFS itself
>> instead of using repartition() etc.,
>> >
>> > Any suggestions are helpful!
>> >
>> > Thanks
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 


<http://smart.salesforce.com/sig/kprasad//us_mb/default/link.html>

Re: Reading from HDFS by increasing split size

Reply via email to