I think the reason is simply that there is no longer an explicit min-partitions argument for Hadoop InputSplits in the new Hadoop APIs. At least, I didn't see it when I glanced just now.
However, you should be able to get the same effect by setting a Configuration property, and you can do so through the newAPIHadoopFile method. You set it as a suggested maximum split size rather than suggest minimum number of splits. Although I think the old config property mapred.max.split.size is still respected, you may try mapreduce.input.fileinputformat.split.maxsize instead, which appears to be the intended replacement in the new APIs. On Mon, Sep 15, 2014 at 9:35 PM, Eric Friedman <eric.d.fried...@gmail.com> wrote: > sc.textFile takes a minimum # of partitions to use. > > is there a way to get sc.newAPIHadoopFile to do the same? > > I know I can repartition() and get a shuffle. I'm wondering if there's a > way to tell the underlying InputFormat (AvroParquet, in my case) how many > partitions to use at the outset. > > What I'd really prefer is to get the partitions automatically defined based > on the number of blocks. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org