saveAsTextFile() to save output of Spark program to HDFS

Sudarshan Tue, 05 May 2015 12:05:12 -0700

I have searched all replies to this question & not found an answer.I am
running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by side, on
the same machine and trying to write output of wordcount program into HDFS
(works fine writing to a local file, /tmp/wordcount).Only line I added to
the wordcount program is: (where 'counts' is the
JavaPairRDD)*counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount");*When
I check in HDFS at that location (/tmp) here's what I
find./tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000and/tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001and
*both part-000[01] are 0 size files*.The wordcount client output error
is:[Stage 1:>                                                          (0 +
2) / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer
Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001
*could only be replicated to 0 nodes instead of minReplication (=1).  There
are 1 datanode(s) running and 1 node(s) are excluded in this operation.*        
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)I
tried this with Spark 1.2.1 same error.I have plenty of space on the DFS.The
Name Node, Sec Name Node & the one Data Node are all healthy.Any hint as to
what may be the problem ?thanks in advance.Sudarshan





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

saveAsTextFile() to save output of Spark program to HDFS

Reply via email to