I have searched all replies to this question & not found an answer.I am running standalone Spark 1.3.1 and Hortonwork's HDP 2.2 VM, side by side, on the same machine and trying to write output of wordcount program into HDFS (works fine writing to a local file, /tmp/wordcount).Only line I added to the wordcount program is: (where 'counts' is the JavaPairRDD)*counts.saveAsTextFile("hdfs://sandbox.hortonworks.com:8020/tmp/wordcount");*When I check in HDFS at that location (/tmp) here's what I find./tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000000_2/part-00000and/tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001and *both part-000[01] are 0 size files*.The wordcount client output error is:[Stage 1:> (0 + 2) / 2]15/05/05 14:40:45 WARN DFSClient: DataStreamer Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/wordcount/_temporary/0/_temporary/attempt_201505051439_0001_m_000001_3/part-00001 *could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.* at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1550) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3447) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:642)I tried this with Spark 1.2.1 same error.I have plenty of space on the DFS.The Name Node, Sec Name Node & the one Data Node are all healthy.Any hint as to what may be the problem ?thanks in advance.Sudarshan
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-to-save-output-of-Spark-program-to-HDFS-tp22774.html Sent from the Apache Spark User List mailing list archive at Nabble.com.