Hi Pradeep ≥≥≥ Looks like what I was suggesting doesn't work. :/ I guess you mean put comma separated path into one string and pass it to existing API (SparkContext#textFile). It should not work. I suggest to create new api SparkContext#textFiles to accept an array of string. I have already implemented a simple patch and it works.
On Thu, Nov 12, 2015 at 10:17 AM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > Looks like what I was suggesting doesn't work. :/ > > On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> Yes, that's what I suggest. TextInputFormat support multiple inputs. So >> in spark side, we just need to provide API to for that. >> >> On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com> >> wrote: >> >>> IIRC, TextInputFormat supports an input path that is a comma separated >>> list. I haven't tried this, but I think you should just be able to do >>> sc.textFile("file1,file2,...") >>> >>> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> I know these workaround, but wouldn't it be more convenient and >>>> straightforward to use SparkContext#textFiles ? >>>> >>>> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> For more than a small number of files, you'd be better off using >>>>> SparkContext#union instead of RDD#union. That will avoid building up a >>>>> lengthy lineage. >>>>> >>>>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey Jeff, >>>>>> Do you mean reading from multiple text files? In that case, as a >>>>>> workaround, you can use the RDD#union() (or ++) method to concatenate >>>>>> multiple rdds. For example: >>>>>> >>>>>> val lines1 = sc.textFile("file1") >>>>>> val lines2 = sc.textFile("file2") >>>>>> >>>>>> val rdd = lines1 union lines2 >>>>>> >>>>>> regards, >>>>>> --Jakob >>>>>> >>>>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >>>>>> >>>>>>> Although user can use the hdfs glob syntax to support multiple >>>>>>> inputs. But sometimes, it is not convenient to do that. Not sure why >>>>>>> there's no api of SparkContext#textFiles. It should be easy to implement >>>>>>> that. I'd love to create a ticket and contribute for that if there's no >>>>>>> other consideration that I don't know. >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> >>>>>>> Jeff Zhang >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > -- Best Regards Jeff Zhang