Didn't notice that I can pass comma separated path in the existing API
(SparkContext#textFile). So no necessary for new api. Thanks all.
On Thu, Nov 12, 2015 at 10:24 AM, Jeff Zhang wrote:
> Hi Pradeep
>
> ≥≥≥ Looks like what I was suggesting doesn't work. :/
> I guess you mean put comma separ
Hi Pradeep
≥≥≥ Looks like what I was suggesting doesn't work. :/
I guess you mean put comma separated path into one string and pass it
to existing API (SparkContext#textFile). It should not work. I suggest to
create new api SparkContext#textFiles to accept an array of string. I have
already implem
Looks like what I was suggesting doesn't work. :/
On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang wrote:
> Yes, that's what I suggest. TextInputFormat support multiple inputs. So in
> spark side, we just need to provide API to for that.
>
> On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota
> wrote
Yes, that's what I suggest. TextInputFormat support multiple inputs. So in
spark side, we just need to provide API to for that.
On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota
wrote:
> IIRC, TextInputFormat supports an input path that is a comma separated
> list. I haven't tried this, but I t
IIRC, TextInputFormat supports an input path that is a comma separated
list. I haven't tried this, but I think you should just be able to do
sc.textFile("file1,file2,...")
On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang wrote:
> I know these workaround, but wouldn't it be more convenient and
> strai
I know these workaround, but wouldn't it be more convenient and
straightforward to use SparkContext#textFiles ?
On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra
wrote:
> For more than a small number of files, you'd be better off using
> SparkContext#union instead of RDD#union. That will avoid buil
For more than a small number of files, you'd be better off using
SparkContext#union instead of RDD#union. That will avoid building up a
lengthy lineage.
On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky wrote:
> Hey Jeff,
> Do you mean reading from multiple text files? In that case, as a
> workar
In addition, if you have more than two text files, you can just put them
into a Seq and use "reduce(_ ++ _)".
Best Regards,
Shixiong Zhu
2015-11-11 10:21 GMT-08:00 Jakob Odersky :
> Hey Jeff,
> Do you mean reading from multiple text files? In that case, as a
> workaround, you can use the RDD#uni
Hey Jeff,
Do you mean reading from multiple text files? In that case, as a
workaround, you can use the RDD#union() (or ++) method to concatenate
multiple rdds. For example:
val lines1 = sc.textFile("file1")
val lines2 = sc.textFile("file2")
val rdd = lines1 union lines2
regards,
--Jakob
On 11 N
Although user can use the hdfs glob syntax to support multiple inputs. But
sometimes, it is not convenient to do that. Not sure why there's no api
of SparkContext#textFiles. It should be easy to implement that. I'd love to
create a ticket and contribute for that if there's no other consideration
th
10 matches
Mail list logo