The following thing I already tried...
In the internet I found the hint to set the this configuration, to solve the
problem:
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
But I just get a RuntimeException doing so:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333)
at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:722)
13/04/18 15:37:14 ERROR exec.ExecDriver: Exception:
org.apache.hadoop.hive.ql.io.HiveInputFormat
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask
13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MapRedTask
Von: shrikanth shankar [mailto:[email protected]]
Gesendet: Donnerstag, 18. April 2013 17:32
An: [email protected]
Betreff: Re: Hive query problem on S3 table
Tim,
Could you try doing
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
before running the query?
Shrikanth
On Apr 18, 2013, at 8:09 AM, Tim Bittersohl wrote:
Thanks for your answer, I tested the program with an S3N setup and
unfortunately got the same error behavior...
Von: Dean Wampler [mailto:[email protected]]
Gesendet: Donnerstag, 18. April 2013 16:25
An: [email protected]
Betreff: Re: Hive query problem on S3 table
I'm not sure what's happening here, but one suggestion; use s3n://...
instead of s3://... The "new" version is supposed to provide better
performance.
dean
On Thu, Apr 18, 2013 at 8:43 AM, Tim Bittersohl <[email protected]> wrote:
Hi,
I just found out, that I don't have to change the default file system of
Hadoop.
The location in the create table command has just to be changed:
CREATE EXTERNAL TABLE testtable(nyseVal STRING, cliVal STRING, dateVal
STRING, number1Val STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t'
LINES TERMINATED BY '\\n'
STORED AS TextFile LOCATION "s3://hadoop-bucket/data/
<x-msg://6640/s3://hadoop-bucket/data/> "
But when I try to access the table with a command that creates a Hadoop job,
I get the following error:
13/04/18 15:29:36 ERROR security.UserGroupInformation:
PriviledgedActionException as:tim (auth:SIMPLE)
cause:java.io.FileNotFoundException: File does not exist:
/data/NYSE_daily.txt
java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:807)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi
neFileInputFormat.java:462)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil
eInputFormat.java:256)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp
utFormat.java:212)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:411)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:377)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp
utFormat.java:387)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)
at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)
at
org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)
at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)
at java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1408)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:722)
Job Submission failed with exception 'java.io.FileNotFoundException(File
does not exist: /data/NYSE_daily.txt)'
13/04/18 15:29:36 ERROR exec.Task: Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist: /data/NYSE_daily.txt)'
java.io.FileNotFoundException: File does not exist: /data/NYSE_daily.txt
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:807)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(Combi
neFileInputFormat.java:462)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFil
eInputFormat.java:256)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInp
utFormat.java:212)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:411)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.ge
tSplits(HadoopShimsSecure.java:377)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInp
utFormat.java:387)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1091)
at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1083)
at
org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:993)
at
org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:946)
at java.security.AccessController.doPrivileged(Native
Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1408)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:946)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:920)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:722)
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask
13/04/18 15:29:36 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MapRedTask
In the internet I found the hint to set the this configuration, to solve the
problem:
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
But I just get a RuntimeException doing so:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.io.HiveInputFormat
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:333)
at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServ
er.java:198)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:644)
at
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(Thrift
Hive.java:628)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServ
er.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:722)
13/04/18 15:37:14 ERROR exec.ExecDriver: Exception:
org.apache.hadoop.hive.ql.io.HiveInputFormat
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask
13/04/18 15:37:14 ERROR ql.Driver: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.MapRedTask
Im using the Cloudera 0.10.0-cdh4.2.0 version of the Hive libraries.
Greetings
Tim Bittersohl
Software Engineer
<image001.png>
Innoplexia GmbH
Mannheimer Str. 175
69123 Heidelberg
Tel.: +49 (0) 6221 7198033 <tel:%2B49%20%280%29%206221%207198033>
Mobiltel.: +49 (0) 160 99186759 <tel:%2B49%20%280%29%20160%2099186759>
Fax: +49 (0) 6221 7198034 <tel:%2B49%20%280%29%206221%207198034>
Web: www.innoplexia.com <http://www.innoplexia.com/>
Sitz: 69123 Heidelberg, Mannheimer Str. 175 - Steuernummer 32494/62606 -
USt. IdNr.: DE 272 871 728 - Geschäftsführer: Prof. Dr. Herbert Schuster
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com