sc.textFile(URI) supports reading multiple files in parallel but only with a wildcard. I need to walk a dir tree, match a regex to create a list of files, then I’d like to read them into a single RDD in parallel. I understand these could go into separate RDDs then a union RDD can be created. Is there a way to create a single RDD from a URI list?
- Securing Spark's Network Jacob Eisinger
- Re: Securing Spark's Network Akhil Das
- Re: Securing Spark's Network Jacob Eisinger
- File list read into single RDD Pat Ferrel
- Re: File list read into single RDD Nicholas Chammas
- Re: File list read into single RD... Pat Ferrel
- Re: File list read into singl... Nicholas Chammas
- Re: File list read into s... Christophe Préaud
- Re: File list read into s... Pat Ferrel
- Re: File list read into s... Andrew Ash
- Re: File list read into s... Pat Ferrel