Sergey Shelukhin created HADOOP-15403:
-----------------------------------------

             Summary: FileInputFormat recursive=false fails instead of ignoring 
the directories.
                 Key: HADOOP-15403
                 URL: https://issues.apache.org/jira/browse/HADOOP-15403
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Sergey Shelukhin


We are trying to create a split in Hive that will only read files in a 
directory and not subdirectories.
That fails with the below error.
Given how this error comes about (two pieces of code interact, one explicitly 
adding directories to results without failing, and one failing on any 
directories in results), this seems like a bug.

{noformat}
Caused by: java.io.IOException: Not a file: 
file:/,...warehouse/simple_to_mm_text/delta_0000001_0000001_0000
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
~[hadoop-mapreduce-client-core-3.1.0.jar:?]
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
{noformat}

This code, when recursion is disabled, adds directories to results 
{noformat} 
if (recursive && stat.isDirectory()) {
              result.dirsNeedingRecursiveCalls.add(stat);
            } else {
              result.locatedFileStatuses.add(stat);
            }
{noformat} 
However the getSplits code after that computes the size like this
{noformat}
long totalSize = 0;                           // compute total size
    for (FileStatus file: files) {                // check we have valid files
      if (file.isDirectory()) {
        throw new IOException("Not a file: "+ file.getPath());
      }
      totalSize +=
{noformat}
which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to