Didn't notice that I can pass comma separated path in the existing API
(SparkContext#textFile). So no necessary for new api. Thanks all.
On Thu, Nov 12, 2015 at 10:24 AM, Jeff Zhang wrote:
> Hi Pradeep
>
> ≥≥≥ Looks like what I was suggesting doesn't work. :/
> I guess you
In addition, if you have more than two text files, you can just put them
into a Seq and use "reduce(_ ++ _)".
Best Regards,
Shixiong Zhu
2015-11-11 10:21 GMT-08:00 Jakob Odersky :
> Hey Jeff,
> Do you mean reading from multiple text files? In that case, as a
> workaround,
Hey Jeff,
Do you mean reading from multiple text files? In that case, as a
workaround, you can use the RDD#union() (or ++) method to concatenate
multiple rdds. For example:
val lines1 = sc.textFile("file1")
val lines2 = sc.textFile("file2")
val rdd = lines1 union lines2
regards,
--Jakob
On 11
Although user can use the hdfs glob syntax to support multiple inputs. But
sometimes, it is not convenient to do that. Not sure why there's no api
of SparkContext#textFiles. It should be easy to implement that. I'd love to
create a ticket and contribute for that if there's no other consideration
Hi Pradeep
≥≥≥ Looks like what I was suggesting doesn't work. :/
I guess you mean put comma separated path into one string and pass it
to existing API (SparkContext#textFile). It should not work. I suggest to
create new api SparkContext#textFiles to accept an array of string. I have
already
I know these workaround, but wouldn't it be more convenient and
straightforward to use SparkContext#textFiles ?
On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra
wrote:
> For more than a small number of files, you'd be better off using
> SparkContext#union instead of
For more than a small number of files, you'd be better off using
SparkContext#union instead of RDD#union. That will avoid building up a
lengthy lineage.
On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky wrote:
> Hey Jeff,
> Do you mean reading from multiple text files? In
Yes, that's what I suggest. TextInputFormat support multiple inputs. So in
spark side, we just need to provide API to for that.
On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota
wrote:
> IIRC, TextInputFormat supports an input path that is a comma separated
> list. I