Re: saveAsTextFile() to save output of Spark program to HDFS

Sudarshan Murty Tue, 05 May 2015 16:39:07 -0700

Another thing - could it be a permission problem ?
It creates all the directory structure (in red)    /tmp/wordcount/
_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
so I am guessing not.


On Tue, May 5, 2015 at 7:27 PM, Sudarshan Murty <njmu...@gmail.com> wrote:

> You are most probably right. I assumed others may have run into this.
> When I try to put the files in there, it creates a directory structure
> with the part-00000 and part00001 files but these files are of size 0 - no
> content. The client error and the server logs have  the error message shown
> - which seem to indicate that the system is aware that a datanode exists
> but is excluded from the operation. So, it looks like it is not partitioned
> and Ambari indicates that HDFS is in good health with one NN, one SN, one
> DN.
> I am unable to figure out what the issue is.
> thanks for your help.
>
> On Tue, May 5, 2015 at 6:39 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> What happens when you try to put files to your hdfs from local
>> filesystem? Looks like its a hdfs issue rather than spark thing.
>> On 6 May 2015 05:04, "Sudarshan" <njmu...@gmail.com> wrote:
>>
>>> I have searched all replies to this question & not found an answer.
>>>
>>> I am running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by 
>>> side, on the same machine and trying to write output of wordcount program 
>>> into HDFS (works fine writing to a local file, /tmp/wordcount).
>>>
>>> Only line I added to the wordcount program is: (where 'counts' is the 
>>> JavaPairRDD)
>>> *counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount 
>>> <http://sandbox.hortonworks.com:8020/tmp/wordcount>");*
>>>
>>> When I check in HDFS at that location (/tmp) here's what I find.
>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000
>>> and
>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>>
>>> and *both part-000[01] are 0 size files*.
>>>
>>> The wordcount client output error is:
>>> [Stage 1:>                                                          (0 + 2) 
>>> / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer Exception
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
>>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
>>>  *could only be replicated to 0 nodes instead of minReplication (=1).  
>>> There are 1 datanode(s) running and 1 node(s) are excluded in this 
>>> operation.*
>>>     at 
>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
>>>     at 
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
>>>     at 
>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)
>>>
>>>
>>> I tried this with Spark 1.2.1 same error.
>>> I have plenty of space on the DFS.
>>> The Name Node, Sec Name Node & the one Data Node are all healthy.
>>>
>>> Any hint as to what may be the problem ?
>>> thanks in advance.
>>> Sudarshan
>>>
>>>
>>> ------------------------------
>>> View this message in context: saveAsTextFile() to save output of Spark
>>> program to HDFS
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html>
>>> Sent from the Apache Spark User List mailing list archive
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>
>>
>

Re: saveAsTextFile() to save output of Spark program to HDFS

Reply via email to