Hi, hcatalog options are more powerful, but as for me, since I want to use Sqoop + Atlas, I need to use --hive-import that is not compatible with hcatalog options. Regards
De : Venkat Ranganathan [mailto:[email protected]] Envoyé : mardi 16 août 2016 18:27 À : [email protected] Objet : Re: how to create multi level partition in hive using sqoop --hive-import has several limitations (only datatypes of 0.7 are supported, only text format etc). Use –hcatalog-import option and it handles multiple partitions (including static and dynamic partition combinations) Venkat From: BONNET Benjamin <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, August 16, 2016 at 8:06 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: RE: how to create multi level partition in hive using sqoop Hi, The statement below works because you have a one-level partition (by year). Unfortunately it cannot work for a multi-level partition, even if you write to only one multi-level partition (year=’2016’, month=’08’, day=’16’). SQOOP generates a load statement that can deal with only one partitioning field. Dealing with several partitioning fields would be a nice enhancement (I needed it, too). Regards. De : Mahebub Sayyed [mailto:[email protected]] Envoyé : mardi 16 août 2016 15:23 À : [email protected]<mailto:[email protected]> Objet : Re: how to create multi level partition in hive using sqoop Below command works fine. sqoop import --connect jdbc:postgresql://localhost:7432/test_db \ --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \ --table user1 \ --fields-terminated-by '\001' \ --lines-terminated-by '\012' \ --hive-import \ --hive-database test_db \ --hive-table user1 \ --hive-partition-key year \ --hive-partition-value '2016' \ --verbose \ --delete-target-dir this is log while importing data 16/08/16 13:18:08 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE IF NOT EXISTS `tets_db`.`user1` ( `id` INT, `name` STRING, `birth_date` STRING) COMMENT 'Imported by sqoop on 2016/08/16 13:18:08' PARTITIONED BY (year STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE 16/08/16 13:18:08 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 'hdfs://xxxxx:8020/user/xxx/user1' INTO TABLE `test_db`.`user1` PARTITION (year='2016') 16/08/16 13:18:08 INFO hive.HiveImport: Loading uploaded data into Hive and created table looks like hive> describe user1; OK id int name string birth_date string year string # Partition Information # col_name data_type comment year string Time taken: 0.103 seconds, Fetched: 9 row(s) hive> select * from user1; OK 1 mahebub 2016-08-14 14:03:44.33542 2016 2 sayed 2016-08-14 14:04:04.248561 2016 Time taken: 0.132 seconds, Fetched: 2 row(s) hive> On Tue, Aug 16, 2016 at 4:12 PM, Boglarka Egyed <[email protected]<mailto:[email protected]>> wrote: Hi Mahebub, Based on the exception log the problem could be also caused because Hive metastore is not properly set up. Is it configured in hive-site.xml? Also, is hive-site.xml is on the class path? Documentation for Hive metastore configuration: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin On Tue, Aug 16, 2016 at 2:36 PM, Markus Kemper <[email protected]<mailto:[email protected]>> wrote: Hello Mahebub, The exception you are observing might suggest one of the following: * Table does not exist in the database * TableName is case sensitive and somehow not being found * Database user does not have permissions to see table What is the output from the following commands: sqoop list-tables --connect jdbc:postgresql://localhost:7432/test_db --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 sqoop eval --connect jdbc:postgresql://localhost:7432/test_db --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 --query "select count(*) from user1" Markus Kemper Customer Operations Engineer [mage supprimée par l'expéditeur. www.cloudera.com]<http://www.cloudera.com> On Tue, Aug 16, 2016 at 7:07 AM, Mahebub Sayyed <[email protected]<mailto:[email protected]>> wrote: Hello Boglarka and Markus, Thanks for reply. This is my sqoop command sqoop import --connect jdbc:postgresql://localhost:7432/test_db \ --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \ --table user1 \ --fields-terminated-by '\001' \ --lines-terminated-by '\012' \ --hcatalog-database test \ --hcatalog-table user1 \ --hcatalog-partition-keys year,month,day \ --hcatalog-partition-values '2016,08,15' \ --verbose But I getting Error: ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: NoSuchObjectException(message:test.user1 table not found) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783) at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: NoSuchObjectException(message:test.user1 table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34980) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34948) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:34879) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1214) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1200) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1201) at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180) at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105) at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95) ... 14 more On Mon, Aug 15, 2016 at 4:35 PM, Markus Kemper <[email protected]<mailto:[email protected]>> wrote: Hello Mahebub, Bogi is correct and great answer btw. To the best of my knowledge, with Sqoop and Hive Partitioning the following rules apply: 1. With static partitions you can use either (--hive-import or --hcatalog options) 2. With dynamic partitions you can only use (--hcatalog options) Example (static): sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --columns c1,c2 --where "p1 = 1" --num-mappers 1 --hive-import --hive-database default --hive-table t1_partition --hive-partition-key <col> --hive-partition-value <value> Example (dynamic): sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1_partitioned --num-mappers 1 --verbose --where "c1 > 1" --hive-partition-value <col> Markus Kemper Customer Operations Engineer [mage supprimée par l'expéditeur. www.cloudera.com]<http://www.cloudera.com> On Mon, Aug 15, 2016 at 9:19 AM, Boglarka Egyed <[email protected]<mailto:[email protected]>> wrote: Hi Mahebub, Unfortunatelly, using --hive-partition-key and --hive-partition-value requires each Sqoop statement to be imported into a single Hive partition. There is currently no support for Hive auto-partitioning. Instead, if a data set is to be imported into multiple partitions in a table, separate Sqoop statements are needed for insertion into each partition. However, using --hcatalog-partition-keys and --hcatalog-partition-values you can specify multiple static partition key/value pairs, please find the details in the User Guide: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration Best Regards, Bogi On Mon, Aug 15, 2016 at 9:51 AM, Mahebub Sayyed <[email protected]<mailto:[email protected]>> wrote: I need to create/import hive table having three Partitions year/month/day using Sqoop. I have checked --hive-partition-key and --hive-partition-value in sqoop. using these parameters I have created partition year like this --hive-partition-key year --hive-partition-value '2016' My question is how to pass multiple values for partition-key and partition-value to create partitions like year/month/day. -- Regards, Mahebub Sayyed -- Regards, Mahebub Sayyed -- Regards, Mahebub Sayyed
