I have not tested this, but you should be able to pass on any map-reduce like conf to underlying hadoop config.....essentially you should be able to control behaviour of split as you can do in a map-reduce program (as Spark uses the same input format)
On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Write your own input format/datasource or split the file yourself > beforehand (not recommended). > > > On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com> wrote: > > > > Hi, > > > > I'm trying to read a 60GB HDFS file using spark > textFile("hdfs_file_path", minPartitions). > > > > How can I control the no.of tasks by increasing the split size? With > default split size of 250 MB, several tasks are created. But I would like > to have a specific no.of tasks created while reading from HDFS itself > instead of using repartition() etc., > > > > Any suggestions are helpful! > > > > Thanks > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha