Hi,
I'm try to import various tables from Oracle to Hive using Sqoop, but, i
have some errors that i don't understand.
Here is my query :
sqoop import --connect jdbc:oracle:thin:@my.db.server:1521/xx --username
user --password password --create-hive-table --hive-import --table
schema.table_xx
the first error is this one :
Please set $HBASE_HOME to the root of your HBase installation.
13/06/17 15:36:40 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
13/06/17 15:36:40 INFO tool.BaseSqoopTool: Using Hive-specific delimiters
for output. You can override
13/06/17 15:36:40 INFO tool.BaseSqoopTool: delimiters with
--fields-terminated-by, etc.
13/06/17 15:36:40 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/17 15:36:40 INFO tool.CodeGenTool: Beginning code generation
13/06/17 15:36:41 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:36:41 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM KPI.ENTITE t WHERE 1=0
13/06/17 15:36:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/local/hadoop
Note:
/tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI_ENTITE.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/17 15:36:42 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI.ENTITE.jar
13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE contains
a multi-column primary key. Sqoop will default to the column CO_SOCIETE
only for this job.
13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE contains
a multi-column primary key. Sqoop will default to the column CO_SOCIETE
only for this job.
13/06/17 15:36:42 INFO mapreduce.ImportJobBase: Beginning import of
KPI.ENTITE
13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:36:44 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(CO_SOCIETE), MAX(CO_SOCIETE) FROM KPI.ENTITE
13/06/17 15:36:44 INFO mapred.JobClient: Running job: job_201306171456_0005
13/06/17 15:36:45 INFO mapred.JobClient: map 0% reduce 0%
13/06/17 15:36:56 INFO mapred.JobClient: map 25% reduce 0%
13/06/17 15:37:40 INFO mapred.JobClient: map 50% reduce 0%
13/06/17 15:38:00 INFO mapred.JobClient: map 75% reduce 0%
13/06/17 15:38:08 INFO mapred.JobClient: map 100% reduce 0%
13/06/17 15:38:09 INFO mapred.JobClient: Job complete: job_201306171456_0005
13/06/17 15:38:09 INFO mapred.JobClient: Counters: 18
13/06/17 15:38:09 INFO mapred.JobClient: Job Counters
13/06/17 15:38:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=151932
13/06/17 15:38:09 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
13/06/17 15:38:09 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/06/17 15:38:09 INFO mapred.JobClient: Launched map tasks=4
13/06/17 15:38:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/06/17 15:38:09 INFO mapred.JobClient: File Output Format Counters
13/06/17 15:38:09 INFO mapred.JobClient: Bytes Written=26648
13/06/17 15:38:09 INFO mapred.JobClient: FileSystemCounters
13/06/17 15:38:09 INFO mapred.JobClient: HDFS_BYTES_READ=462
13/06/17 15:38:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=244596
13/06/17 15:38:09 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=26648
13/06/17 15:38:09 INFO mapred.JobClient: File Input Format Counters
13/06/17 15:38:09 INFO mapred.JobClient: Bytes Read=0
13/06/17 15:38:09 INFO mapred.JobClient: Map-Reduce Framework
13/06/17 15:38:09 INFO mapred.JobClient: Map input records=339
13/06/17 15:38:09 INFO mapred.JobClient: Physical memory (bytes)
snapshot=171716608
13/06/17 15:38:09 INFO mapred.JobClient: Spilled Records=0
13/06/17 15:38:09 INFO mapred.JobClient: CPU time spent (ms)=3920
13/06/17 15:38:09 INFO mapred.JobClient: Total committed heap usage
(bytes)=65011712
13/06/17 15:38:09 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=1492393984
13/06/17 15:38:09 INFO mapred.JobClient: Map output records=339
13/06/17 15:38:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=462
13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Transferred 26,0234 KB in
86,6921 seconds (307,3869 bytes/sec)
13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Retrieved 339 records.
13/06/17 15:38:09 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:38:09 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM KPI.ENTITE t WHERE 1=0
13/06/17 15:38:09 WARN hive.TableDefWriter: Column CO_SOCIETE had to be
cast to a less precise type in Hive
13/06/17 15:38:09 INFO hive.HiveImport: Removing temporary files from
import process: hdfs://localhost:54310/user/hduser/KPI.ENTITE/_logs
13/06/17 15:38:09 INFO hive.HiveImport: Loading uploaded data into Hive
13/06/17 15:38:11 INFO hive.HiveImport: WARNING:
org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
files.
13/06/17 15:38:12 INFO hive.HiveImport: Logging initialized using
configuration in
jar:file:/usr/local/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
13/06/17 15:38:12 INFO hive.HiveImport: Hive history
file=/tmp/hduser/hive_job_log_hduser_201306171538_49452696.txt
13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Error in metadata:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
13/06/17 15:38:14 ERROR tool.ImportTool: Encountered IOException running
import job: java.io.IOException: Hive exited with status 1
at
org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:364)
at
org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:314)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
I don't understand because the M/R job is completed, but after this, it
give me an I/O error.
when i try a SHOW TABLES on hive, i have no tables.
but, when i retry the SQOOP script, i get this error :
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
13/06/17 15:41:51 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
13/06/17 15:41:51 INFO tool.BaseSqoopTool: Using Hive-specific delimiters
for output. You can override
13/06/17 15:41:51 INFO tool.BaseSqoopTool: delimiters with
--fields-terminated-by, etc.
13/06/17 15:41:51 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/17 15:41:51 INFO tool.CodeGenTool: Beginning code generation
13/06/17 15:42:15 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:42:15 INFO manager.SqlManager: Executing SQL statement: SELECT
t.* FROM KPI.ENTITE t WHERE 1=0
13/06/17 15:42:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/local/hadoop
Note:
/tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI_ENTITE.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/17 15:42:16 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI.ENTITE.jar
13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE contains
a multi-column primary key. Sqoop will default to the column CO_SOCIETE
only for this job.
13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE contains
a multi-column primary key. Sqoop will default to the column CO_SOCIETE
only for this job.
13/06/17 15:42:16 INFO mapreduce.ImportJobBase: Beginning import of
KPI.ENTITE
13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to GMT
13/06/17 15:42:17 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201306171456_0006
13/06/17 15:42:17 ERROR security.UserGroupInformation:
PriviledgedActionException as:hduser
cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
KPI.ENTITE already exists
13/06/17 15:42:17 ERROR tool.ImportTool: Encountered IOException running
import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory KPI.ENTITE already exists
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:949)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:173)
at
org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:151)
at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:221)
at
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:545)
at
org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:380)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:22
The output explains that the output already exists.
But, Hive command SHOW TABLES give me zero tables !
Thanks for your help ;-)
--
Jérôme