Try something like that:
def readGenericRecords(sc: SparkContext, inputDir: String, startDate: Date, endDate: Date) = { // assuming a list of paths val paths: Seq[String] = getInputPaths(inputDir, startDate, endDate) val job = Job.getInstance(new Configuration(sc.hadoopConfiguration)) paths.drop(1).foreach(p => FileInputFormat.addInputPath(job, new Path(p))) sc.newAPIHadoopFile(paths.head, classOf[AvroKeyInputFormat[GenericRecord]], classOf[NullWritable], classOf[GenericRecord], job.getConfiguration()) } 2015-05-27 10:55 GMT+02:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>: > > def readGenericRecords(sc: SparkContext, inputDir: String, startDate: > Date, endDate: Date) = { > > val path = getInputPaths(inputDir, startDate, endDate) > > sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, > AvroKeyInputFormat[GenericRecord]]("/A/B/C/D/D/2015/05/22/out-r-*.avro") > > } > > > This is my method, can you show me where should i modify to use > FileInputFormat ? If you add the path there what should you give while > invoking newAPIHadoopFile > > On Wed, May 27, 2015 at 2:20 PM, Eugen Cepoi <cepoi.eu...@gmail.com> > wrote: > >> You can do that using FileInputFormat.addInputPath >> >> 2015-05-27 10:41 GMT+02:00 ayan guha <guha.a...@gmail.com>: >> >>> What about /blah/*/blah/out*.avro? >>> On 27 May 2015 18:08, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote: >>> >>>> I am doing that now. >>>> Is there no other way ? >>>> >>>> On Wed, May 27, 2015 at 12:40 PM, Akhil Das <ak...@sigmoidanalytics.com >>>> > wrote: >>>> >>>>> How about creating two and union [ sc.union(first, second) ] them? >>>>> >>>>> Thanks >>>>> Best Regards >>>>> >>>>> On Wed, May 27, 2015 at 11:51 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> I have this piece >>>>>> >>>>>> sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, >>>>>> AvroKeyInputFormat[GenericRecord]]( >>>>>> "/a/b/c/d/exptsession/2015/05/22/out-r-*.avro") >>>>>> >>>>>> that takes ("/a/b/c/d/exptsession/2015/05/22/out-r-*.avro") this as >>>>>> input. >>>>>> >>>>>> I want to give a second directory as input but this is a invalid >>>>>> syntax >>>>>> >>>>>> >>>>>> that takes ("/a/b/c/d/exptsession/2015/05/*22*/out-r-*.avro", >>>>>> "/a/b/c/d/exptsession/2015/05/*21*/out-r-*.avro") >>>>>> >>>>>> OR >>>>>> >>>>>> ("/a/b/c/d/exptsession/2015/05/*22*/out-r-*.avro, >>>>>> /a/b/c/d/exptsession/2015/05/*21*/out-r-*.avro") >>>>>> >>>>>> >>>>>> Please suggest. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >> > > > -- > Deepak > >