DistCp doesn't handle non-existent paths correctly --------------------------------------------------
Key: HADOOP-8229 URL: https://issues.apache.org/jira/browse/HADOOP-8229 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.0.0, 0.20.2 Reporter: Jakob Homan Assume /user/jhoman/blork doesn't exist: {noformat}[tardis]$ hadoop distcp 'hftp://nn:50070/user/jhoman/blork' /tmp/plork 12/03/29 22:04:33 INFO tools.DistCp: srcPaths=[hftp://nn:50070/user/jhoman/blork] 12/03/29 22:04:33 INFO tools.DistCp: destPath=/tmp/plork [Fatal Error] :1:173: XML document structures must start and end within the same entity. With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: invalid xml directory content at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:427) at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:432) at org.apache.hadoop.hdfs.HftpFileSystem.getFileStatus(HftpFileSystem.java:461) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:768) at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:636) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656) at org.apache.hadoop.tools.DistCp.run(DistCp.java:881) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:908) Caused by: org.xml.sax.SAXParseException: XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231) at org.apache.hadoop.hdfs.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:421) ... 9 more{noformat} This is because the ListPathsServlet hits an NPE when it calls nnproxy.getFileInfo(path); on the non-existent path and just bails, leaving the resulting XML unformed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira