sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize", "67108864")
sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get + "/*", classOf[Text], classOf[Text]) works On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: > There isn't setter for sc.hadoopConfiguration > You can directly change value of parameter in sc.hadoopConfiguration > > However, see the note in scaladoc: > * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not > to modify it unless you > * plan to set some global configurations for all Hadoop RDDs. > > Cheers > > On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> > wrote: > >> val hadoopConf = new Configuration(sc.hadoopConfiguration) >> >> hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize", >> "67108864") >> >> >> sc.hadoopConfiguration(hadoopConf) >> >> or >> >> sc.hadoopConfiguration = hadoopConf >> >> threw error. >> >> On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> sequenceFile() calls hadoopFile() where: >>> val confBroadcast = broadcast(new >>> SerializableConfiguration(hadoopConfiguration)) >>> >>> You can set the parameter in sc.hadoopConfiguration before calling >>> sc.sequenceFile(). >>> >>> Cheers >>> >>> On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>> wrote: >>> >>>> I can do this >>>> >>>> val hadoopConf = new Configuration(sc.hadoopConfiguration) >>>> >>>> * hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize", >>>> "67108864")* >>>> >>>> sc.newAPIHadoopFile( >>>> >>>> path + "/*.avro", >>>> >>>> classOf[AvroKeyInputFormat[GenericRecord]], >>>> >>>> classOf[AvroKey[GenericRecord]], >>>> >>>> classOf[NullWritable], >>>> >>>> hadoopConf) >>>> >>>> >>>> But i cant do the same with >>>> >>>> sc.sequenceFile("path", classOf[Text], classOf[Text]) >>>> How can i achieve the same with sequenceFile >>>> -- >>>> Deepak >>>> >>>> >>> >> >> >> -- >> Deepak >> >> > -- Deepak