Sergey Shelukhin created HADOOP-15403: -----------------------------------------
Summary: FileInputFormat recursive=false fails instead of ignoring the directories. Key: HADOOP-15403 URL: https://issues.apache.org/jira/browse/HADOOP-15403 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin We are trying to create a split in Hive that will only read files in a directory and not subdirectories. That fails with the below error. Given how this error comes about (two pieces of code interact, one explicitly adding directories to results without failing, and one failing on any directories in results), this seems like a bug. {noformat} Caused by: java.io.IOException: Not a file: file:/,...warehouse/simple_to_mm_text/delta_0000001_0000001_0000 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-3.1.0.jar:?] at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] {noformat} This code, when recursion is disabled, adds directories to results {noformat} if (recursive && stat.isDirectory()) { result.dirsNeedingRecursiveCalls.add(stat); } else { result.locatedFileStatuses.add(stat); } {noformat} However the getSplits code after that computes the size like this {noformat} long totalSize = 0; // compute total size for (FileStatus file: files) { // check we have valid files if (file.isDirectory()) { throw new IOException("Not a file: "+ file.getPath()); } totalSize += {noformat} which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org