Better use coalesce instead of repatition

On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> Use  counts.repartition(1).save......
> Hth
>
>
> On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote:
>
> Actually, when I run following code,
>
>   val textFile = sc.textFile("Sample.txt")
>   val counts = textFile.flatMap(line => line.split(" "))
>                  .map(word => (word, 1))
>                  .reduceByKey(_ + _)
>
>
> It save the results into more than one partition like part-00000,
> part-00001. I want to collect all of them into one file.
>
>
> 2017-10-20 16:43 GMT+03:00 Marco Mistroni <mmistr...@gmail.com>:
>
>> Hi
>>  Could you just create an rdd/df out of what you want to save and store
>> it in hdfs?
>> Hth
>>
>> On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> In word count example,
>>>
>>>   val textFile = sc.textFile("Sample.txt")
>>>   val counts = textFile.flatMap(line => line.split(" "))
>>>                  .map(word => (word, 1))
>>>                  .reduceByKey(_ + _)
>>>  counts.saveAsTextFile("hdfs://master:8020/user/abc")
>>>
>>> I want to write collection of "*counts" *which is used in code above to
>>> HDFS, so
>>>
>>> val x = counts.collect()
>>>
>>> Actually I want to write *x *to HDFS. But spark wants to RDD to write
>>> sometihng to HDFS
>>>
>>> How can I write Array[(String,Int)] to HDFS
>>>
>>>
>>> --
>>> Uğur
>>>
>>
>
>
> --
> Uğur Sopaoğlu
>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Reply via email to