Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Pradeep Gollakota Wed, 11 Nov 2015 16:46:01 -0800

IIRC, TextInputFormat supports an input path that is a comma separated
list. I haven't tried this, but I think you should just be able to do
sc.textFile("file1,file2,...")


On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote:

> I know these workaround, but wouldn't it be more convenient and
> straightforward to use SparkContext#textFiles ?
>
> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> For more than a small number of files, you'd be better off using
>> SparkContext#union instead of RDD#union.  That will avoid building up a
>> lengthy lineage.
>>
>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com>
>> wrote:
>>
>>> Hey Jeff,
>>> Do you mean reading from multiple text files? In that case, as a
>>> workaround, you can use the RDD#union() (or ++) method to concatenate
>>> multiple rdds. For example:
>>>
>>> val lines1 = sc.textFile("file1")
>>> val lines2 = sc.textFile("file2")
>>>
>>> val rdd = lines1 union lines2
>>>
>>> regards,
>>> --Jakob
>>>
>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>> Although user can use the hdfs glob syntax to support multiple inputs.
>>>> But sometimes, it is not convenient to do that. Not sure why there's no api
>>>> of SparkContext#textFiles. It should be easy to implement that. I'd love to
>>>> create a ticket and contribute for that if there's no other consideration
>>>> that I don't know.
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

Reply via email to