Has anyone seen anything like this? Google searches turned up nothing, so I
thought I'd ask here, then file a JIRA if no-one thinks I'm doing it wrong.

If I ALTER a particular table with three partitions once, it works. Second
time it works, too, but reports it is moving a directory to the Trash that
doesn't exist (still, this doesn't kill it). The third time I ALTER the
table, it crashes, because the directory structure has been modified to
something invalid.

Here's a nearly-full output of the 2nd and 3rd runs. The ALTER is exactly
the same both times (I just press UP ARROW):


*HQL, 2nd Run:*hive (analytics)> alter table bidtmp partition
(log_type='bidder',dt='2014-05-01',hour=11) concatenate ;


*Output:*Starting Job = job_1412894367814_0017, Tracking URL =
....application_1412894367814_0017/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1412894367814_0017
Hadoop job information for null: number of mappers: 97; number of reducers:
0
2014-10-13 20:28:23,143 null map = 0%,  reduce = 0%
2014-10-13 20:28:36,042 null map = 1%,  reduce = 0%, Cumulative CPU 49.69
sec
...
2014-10-13 20:31:56,415 null map = 99%,  reduce = 0%, Cumulative CPU 812.65
sec
2014-10-13 20:31:57,458 null map = 100%,  reduce = 0%, Cumulative CPU
813.88 sec
MapReduce Total cumulative CPU time: 13 minutes 33 seconds 880 msec
Ended Job = job_1412894367814_0017
Loading data to table analytics.bidtmp partition (log_type=bidder,
dt=2014-05-01, hour=11)
rmr: DEPRECATED: Please use 'rm -r' instead.
Moved: '.../apps/hive/warehouse/analytics.db/bidtmp/
*dt=2014-05-01/hour=11/log_type=bidder*' to trash at:
.../user/hdfs/.Trash/Current
*// (note the bold-faced path doesn't exist, the partition is specified as
log_type first, then dt, then hour)*
Partition analytics.bidtmp*{log_type=bidder, dt=2014-05-01, hour=11}*
stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
*(here, the partition ordering is correct!)*
MapReduce Jobs Launched:
Job 0: Map: 97   Cumulative CPU: 813.88 sec   HDFS Read: 30298871932 HDFS
Write: 28746848923 SUCCESS
Total MapReduce CPU Time Spent: 13 minutes 33 seconds 880 msec
OK
Time taken: 224.128 seconds


*HQL, 3rd Run:*hive (analytics)> alter table bidtmp partition
(log_type='bidder',dt='2014-05-01',hour=11) concatenate ;


*Output:*java.io.FileNotFoundException: File does not exist:
.../apps/hive/warehouse/analytics.db/bidtmp/dt=2014-05-01/hour=11/log_type=bidder
*(because it should be log_type=.../dt=.../hour=... - not this order)*
        at org.apache.hadoop.hdfs.
DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:419)
        at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
        at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
        at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
        at
org.apache.hadoop.hive.ql.io.rcfile.merge.BlockMergeTask.execute(BlockMergeTask.java:214)
        at
org.apache.hadoop.hive.ql.exec.DDLTask.mergeFiles(DDLTask.java:511)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:458)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
        at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
        at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.io.FileNotFoundException(File
does not exist:
.../apps/hive/warehouse/analytics.db/bidtmp/dt=2014-05-01/hour=11/log_type=bidder)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask

––
*Tim Ellis:* 510-761-6610

Reply via email to