Didn't notice that I can pass comma separated path in the existing API (SparkContext#textFile). So no necessary for new api. Thanks all.
On Thu, Nov 12, 2015 at 10:24 AM, Jeff Zhang <zjf...@gmail.com> wrote: > Hi Pradeep > > ≥≥≥ Looks like what I was suggesting doesn't work. :/ > I guess you mean put comma separated path into one string and pass it > to existing API (SparkContext#textFile). It should not work. I suggest to > create new api SparkContext#textFiles to accept an array of string. I have > already implemented a simple patch and it works. > > > > > On Thu, Nov 12, 2015 at 10:17 AM, Pradeep Gollakota <pradeep...@gmail.com> > wrote: > >> Looks like what I was suggesting doesn't work. :/ >> >> On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> Yes, that's what I suggest. TextInputFormat support multiple inputs. So >>> in spark side, we just need to provide API to for that. >>> >>> On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com >>> > wrote: >>> >>>> IIRC, TextInputFormat supports an input path that is a comma separated >>>> list. I haven't tried this, but I think you should just be able to do >>>> sc.textFile("file1,file2,...") >>>> >>>> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> I know these workaround, but wouldn't it be more convenient and >>>>> straightforward to use SparkContext#textFiles ? >>>>> >>>>> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com >>>>> > wrote: >>>>> >>>>>> For more than a small number of files, you'd be better off using >>>>>> SparkContext#union instead of RDD#union. That will avoid building up a >>>>>> lengthy lineage. >>>>>> >>>>>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hey Jeff, >>>>>>> Do you mean reading from multiple text files? In that case, as a >>>>>>> workaround, you can use the RDD#union() (or ++) method to concatenate >>>>>>> multiple rdds. For example: >>>>>>> >>>>>>> val lines1 = sc.textFile("file1") >>>>>>> val lines2 = sc.textFile("file2") >>>>>>> >>>>>>> val rdd = lines1 union lines2 >>>>>>> >>>>>>> regards, >>>>>>> --Jakob >>>>>>> >>>>>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >>>>>>> >>>>>>>> Although user can use the hdfs glob syntax to support multiple >>>>>>>> inputs. But sometimes, it is not convenient to do that. Not sure why >>>>>>>> there's no api of SparkContext#textFiles. It should be easy to >>>>>>>> implement >>>>>>>> that. I'd love to create a ticket and contribute for that if there's no >>>>>>>> other consideration that I don't know. >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards >>>>>>>> >>>>>>>> Jeff Zhang >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> > > > -- > Best Regards > > Jeff Zhang > -- Best Regards Jeff Zhang