Another thing - could it be a permission problem ? It creates all the directory structure (in red) /tmp/wordcount/ _temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001 so I am guessing not.
On Tue, May 5, 2015 at 7:27 PM, Sudarshan Murty <njmu...@gmail.com> wrote: > You are most probably right. I assumed others may have run into this. > When I try to put the files in there, it creates a directory structure > with the part-00000 and part00001 files but these files are of size 0 - no > content. The client error and the server logs have the error message shown > - which seem to indicate that the system is aware that a datanode exists > but is excluded from the operation. So, it looks like it is not partitioned > and Ambari indicates that HDFS is in good health with one NN, one SN, one > DN. > I am unable to figure out what the issue is. > thanks for your help. > > On Tue, May 5, 2015 at 6:39 PM, ayan guha <guha.a...@gmail.com> wrote: > >> What happens when you try to put files to your hdfs from local >> filesystem? Looks like its a hdfs issue rather than spark thing. >> On 6 May 2015 05:04, "Sudarshan" <njmu...@gmail.com> wrote: >> >>> I have searched all replies to this question & not found an answer. >>> >>> I am running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by >>> side, on the same machine and trying to write output of wordcount program >>> into HDFS (works fine writing to a local file, /tmp/wordcount). >>> >>> Only line I added to the wordcount program is: (where 'counts' is the >>> JavaPairRDD) >>> *counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount >>> <http://sandbox.hortonworks.com:8020/tmp/wordcount>");* >>> >>> When I check in HDFS at that location (/tmp) here's what I find. >>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000 >>> and >>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001 >>> >>> and *both part-000[01] are 0 size files*. >>> >>> The wordcount client output error is: >>> [Stage 1:> (0 + 2) >>> / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer Exception >>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File >>> /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001 >>> *could only be replicated to 0 nodes instead of minReplication (=1). >>> There are 1 datanode(s) running and 1 node(s) are excluded in this >>> operation.* >>> at >>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550) >>> at >>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447) >>> at >>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642) >>> >>> >>> I tried this with Spark 1.2.1 same error. >>> I have plenty of space on the DFS. >>> The Name Node, Sec Name Node & the one Data Node are all healthy. >>> >>> Any hint as to what may be the problem ? >>> thanks in advance. >>> Sudarshan >>> >>> >>> ------------------------------ >>> View this message in context: saveAsTextFile() to save output of Spark >>> program to HDFS >>> <http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html> >>> Sent from the Apache Spark User List mailing list archive >>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >>> >> >