Yes, that's what I suggest. TextInputFormat support multiple inputs. So in spark side, we just need to provide API to for that.
On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com> wrote: > IIRC, TextInputFormat supports an input path that is a comma separated > list. I haven't tried this, but I think you should just be able to do > sc.textFile("file1,file2,...") > > On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> I know these workaround, but wouldn't it be more convenient and >> straightforward to use SparkContext#textFiles ? >> >> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> >>> For more than a small number of files, you'd be better off using >>> SparkContext#union instead of RDD#union. That will avoid building up a >>> lengthy lineage. >>> >>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> >>> wrote: >>> >>>> Hey Jeff, >>>> Do you mean reading from multiple text files? In that case, as a >>>> workaround, you can use the RDD#union() (or ++) method to concatenate >>>> multiple rdds. For example: >>>> >>>> val lines1 = sc.textFile("file1") >>>> val lines2 = sc.textFile("file2") >>>> >>>> val rdd = lines1 union lines2 >>>> >>>> regards, >>>> --Jakob >>>> >>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> Although user can use the hdfs glob syntax to support multiple inputs. >>>>> But sometimes, it is not convenient to do that. Not sure why there's no >>>>> api >>>>> of SparkContext#textFiles. It should be easy to implement that. I'd love >>>>> to >>>>> create a ticket and contribute for that if there's no other consideration >>>>> that I don't know. >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>> >>>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > -- Best Regards Jeff Zhang