IIRC, TextInputFormat supports an input path that is a comma separated list. I haven't tried this, but I think you should just be able to do sc.textFile("file1,file2,...")
On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: > I know these workaround, but wouldn't it be more convenient and > straightforward to use SparkContext#textFiles ? > > On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> For more than a small number of files, you'd be better off using >> SparkContext#union instead of RDD#union. That will avoid building up a >> lengthy lineage. >> >> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com> >> wrote: >> >>> Hey Jeff, >>> Do you mean reading from multiple text files? In that case, as a >>> workaround, you can use the RDD#union() (or ++) method to concatenate >>> multiple rdds. For example: >>> >>> val lines1 = sc.textFile("file1") >>> val lines2 = sc.textFile("file2") >>> >>> val rdd = lines1 union lines2 >>> >>> regards, >>> --Jakob >>> >>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> Although user can use the hdfs glob syntax to support multiple inputs. >>>> But sometimes, it is not convenient to do that. Not sure why there's no api >>>> of SparkContext#textFiles. It should be easy to implement that. I'd love to >>>> create a ticket and contribute for that if there's no other consideration >>>> that I don't know. >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> >>> >> > > > -- > Best Regards > > Jeff Zhang >