Thanks for the inputs!! I passed in spark.mapred.max.split.size, spark.mapred.min.split.size to the size I wanted to read. It didn't take any effect. I also tried passing in spark.dfs.block.size, with all the params set to the same value.
JavaSparkContext.fromSparkContext(spark.sparkContext()).textFile(hdfsPath, 13); Is there any other param that needs to be set as well? Thanks On Tue, Oct 10, 2017 at 4:32 AM, ayan guha <guha.a...@gmail.com> wrote: > I have not tested this, but you should be able to pass on any map-reduce > like conf to underlying hadoop config.....essentially you should be able to > control behaviour of split as you can do in a map-reduce program (as Spark > uses the same input format) > > On Tue, Oct 10, 2017 at 10:21 PM, Jörn Franke <jornfra...@gmail.com> > wrote: > >> Write your own input format/datasource or split the file yourself >> beforehand (not recommended). >> >> > On 10. Oct 2017, at 09:14, Kanagha Kumar <kpra...@salesforce.com> >> wrote: >> > >> > Hi, >> > >> > I'm trying to read a 60GB HDFS file using spark >> textFile("hdfs_file_path", minPartitions). >> > >> > How can I control the no.of tasks by increasing the split size? With >> default split size of 250 MB, several tasks are created. But I would like >> to have a specific no.of tasks created while reading from HDFS itself >> instead of using repartition() etc., >> > >> > Any suggestions are helpful! >> > >> > Thanks >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha > -- <http://smart.salesforce.com/sig/kprasad//us_mb/default/link.html>