How to give multiple directories as input ?

2015-05-26 Thread ๏̯͡๏
I have this piece sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]]( "/a/b/c/d/exptsession/2015/05/22/out-r-*.avro") that takes ("/a/b/c/d/exptsession/2015/05/22/out-r-*.avro") this as input. I want to give a second directory as input but this is a inva

Re: How to give multiple directories as input ?

2015-05-27 Thread Akhil Das
How about creating two and union [ sc.union(first, second) ] them? Thanks Best Regards On Wed, May 27, 2015 at 11:51 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > I have this piece > > sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, > AvroKeyInputFormat[GenericRecord]]( > "/a/b/c/d/exptsession/2015/05/2

Re: How to give multiple directories as input ?

2015-05-27 Thread ๏̯͡๏
I am doing that now. Is there no other way ? On Wed, May 27, 2015 at 12:40 PM, Akhil Das wrote: > How about creating two and union [ sc.union(first, second) ] them? > > Thanks > Best Regards > > On Wed, May 27, 2015 at 11:51 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) > wrote: > >> I have this piece >> >> sc.newAPIHadoo

Re: How to give multiple directories as input ?

2015-05-27 Thread ayan guha
What about /blah/*/blah/out*.avro? On 27 May 2015 18:08, "ÐΞ€ρ@Ҝ (๏̯͡๏)" wrote: > I am doing that now. > Is there no other way ? > > On Wed, May 27, 2015 at 12:40 PM, Akhil Das > wrote: > >> How about creating two and union [ sc.union(first, second) ] them? >> >> Thanks >> Best Regards >> >> On

Re: How to give multiple directories as input ?

2015-05-27 Thread Eugen Cepoi
You can do that using FileInputFormat.addInputPath 2015-05-27 10:41 GMT+02:00 ayan guha : > What about /blah/*/blah/out*.avro? > On 27 May 2015 18:08, "ÐΞ€ρ@Ҝ (๏̯͡๏)" wrote: > >> I am doing that now. >> Is there no other way ? >> >> On Wed, May 27, 2015 at 12:40 PM, Akhil Das >> wrote: >> >>> H

Re: How to give multiple directories as input ?

2015-05-27 Thread ๏̯͡๏
def readGenericRecords(sc: SparkContext, inputDir: String, startDate: Date, endDate: Date) = { val path = getInputPaths(inputDir, startDate, endDate) sc.newAPIHadoopFile[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]]("/A/B/C/D/D/2015/05/22/out-r-*.avro") } T

Re: How to give multiple directories as input ?

2015-05-27 Thread Eugen Cepoi
Try something like that: def readGenericRecords(sc: SparkContext, inputDir: String, startDate: Date, endDate: Date) = { // assuming a list of paths val paths: Seq[String] = getInputPaths(inputDir, startDate, endDate) val job = Job.getInstance(new Configuration(sc.hadoopConfiguration)