Yep, as I just found out, you can also provide sc.textFile() with a comma-delimited string of all the files you want to load.
For example: sc.textFile('/path/to/file1,/path/to/file2') So once you have your list of files, concatenate their paths like that and pass the single string to textFile(). Nick On Mon, Apr 28, 2014 at 7:23 PM, Pat Ferrel <pat.fer...@gmail.com> wrote: > sc.textFile(URI) supports reading multiple files in parallel but only with > a wildcard. I need to walk a dir tree, match a regex to create a list of > files, then I’d like to read them into a single RDD in parallel. I > understand these could go into separate RDDs then a union RDD can be > created. Is there a way to create a single RDD from a URI list?