RE: how to create multi level partition in hive using sqoop

BONNET Benjamin Wed, 17 Aug 2016 02:12:08 -0700

Hi,
hcatalog options are more powerful, but as for me, since I want to use Sqoop + 
Atlas, I need to use --hive-import that is not compatible with hcatalog options.
Regards


De : Venkat Ranganathan [mailto:[email protected]]
Envoyé : mardi 16 août 2016 18:27
À : [email protected]
Objet : Re: how to create multi level partition in hive using sqoop

--hive-import has several limitations (only datatypes of 0.7 are supported, 
only text format etc).   Use –hcatalog-import option and it handles multiple 
partitions (including static and dynamic partition combinations)

Venkat

From: BONNET Benjamin 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, August 16, 2016 at 8:06 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: RE: how to create multi level partition in hive using sqoop

Hi,
The statement below works because you have a one-level partition (by year). 
Unfortunately it cannot work for a multi-level partition, even if you write to 
only one multi-level partition (year=’2016’, month=’08’, day=’16’).  SQOOP 
generates a load statement that can deal with only one partitioning field. 
Dealing with several partitioning fields would be a nice enhancement (I needed 
it, too).
Regards.

De : Mahebub Sayyed [mailto:[email protected]]
Envoyé : mardi 16 août 2016 15:23
À : [email protected]<mailto:[email protected]>
Objet : Re: how to create multi level partition in hive using sqoop

Below command works fine.

  sqoop import --connect jdbc:postgresql://localhost:7432/test_db \
  --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \
  --table user1  \
  --fields-terminated-by '\001' \
  --lines-terminated-by '\012' \
  --hive-import \
  --hive-database test_db \
  --hive-table user1 \
  --hive-partition-key year \
  --hive-partition-value '2016' \
  --verbose \
  --delete-target-dir

this is log while importing data
16/08/16 13:18:08 DEBUG hive.TableDefWriter: Create statement: CREATE TABLE IF 
NOT EXISTS `tets_db`.`user1` ( `id` INT, `name` STRING, `birth_date` STRING) 
COMMENT 'Imported by sqoop on 2016/08/16 13:18:08' PARTITIONED BY (year STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' 
STORED AS TEXTFILE
16/08/16 13:18:08 DEBUG hive.TableDefWriter: Load statement: LOAD DATA INPATH 
'hdfs://xxxxx:8020/user/xxx/user1' INTO TABLE `test_db`.`user1` PARTITION 
(year='2016')
16/08/16 13:18:08 INFO hive.HiveImport: Loading uploaded data into Hive

and created table looks like
hive> describe user1;
OK
id                      int
name                    string
birth_date              string
year                    string

# Partition Information
# col_name              data_type               comment

year                    string
Time taken: 0.103 seconds, Fetched: 9 row(s)
hive> select * from user1;
OK
1       mahebub   2016-08-14 14:03:44.33542       2016
2       sayed   2016-08-14 14:04:04.248561      2016
Time taken: 0.132 seconds, Fetched: 2 row(s)
hive>





On Tue, Aug 16, 2016 at 4:12 PM, Boglarka Egyed 
<[email protected]<mailto:[email protected]>> wrote:
Hi Mahebub,

Based on the exception log the problem could be also caused because Hive 
metastore is not properly set up. Is it configured in hive-site.xml? Also, is 
hive-site.xml is on the class path?

Documentation for Hive metastore configuration: 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin

On Tue, Aug 16, 2016 at 2:36 PM, Markus Kemper 
<[email protected]<mailto:[email protected]>> wrote:
Hello Mahebub,

The exception you are observing might suggest one of the following:

  *   Table does not exist in the database
  *   TableName is case sensitive and somehow not being found
  *   Database user does not have permissions to see table
What is the output from the following commands:

sqoop list-tables --connect jdbc:postgresql://localhost:7432/test_db --driver 
org.postgresql.Driver --username pgadmin --password pgadmin@1234

sqoop eval --connect jdbc:postgresql://localhost:7432/test_db --driver 
org.postgresql.Driver --username pgadmin --password pgadmin@1234 --query 
"select count(*) from user1"


Markus Kemper
Customer Operations Engineer
[mage supprimée par l'expéditeur.  www.cloudera.com]<http://www.cloudera.com>


On Tue, Aug 16, 2016 at 7:07 AM, Mahebub Sayyed 
<[email protected]<mailto:[email protected]>> wrote:
Hello Boglarka and Markus,

Thanks for reply.
This is my sqoop command

sqoop import --connect jdbc:postgresql://localhost:7432/test_db \

  --driver org.postgresql.Driver --username pgadmin --password pgadmin@1234 \

  --table user1  \

  --fields-terminated-by '\001' \

  --lines-terminated-by '\012' \

  --hcatalog-database test \

  --hcatalog-table user1 \

  --hcatalog-partition-keys year,month,day \

  --hcatalog-partition-values '2016,08,15' \

  --verbose
But I getting Error:

ERROR tool.ImportTool: Encountered IOException running import job: 
java.io.IOException: NoSuchObjectException(message:test.user1 table not found)

        at 
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)

        at 
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)

        at 
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343)

        at 
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)

        at 
org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)

        at 
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)

        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

Caused by: NoSuchObjectException(message:test.user1 table not found)

        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34980)

        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:34948)

        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:34879)

        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)

        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1214)

        at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1200)

        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1201)

        at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)

        at 
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)

        at 
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:86)

        at 
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)

        ... 14 more
On Mon, Aug 15, 2016 at 4:35 PM, Markus Kemper 
<[email protected]<mailto:[email protected]>> wrote:
Hello Mahebub,

Bogi is correct and great answer btw.

To the best of my knowledge, with Sqoop and Hive Partitioning the following 
rules apply:
1. With static partitions you can use either (--hive-import or --hcatalog 
options)
2. With dynamic partitions you can only use (--hcatalog options)

Example (static):
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--columns c1,c2 --where "p1 = 1" --num-mappers 1 --hive-import --hive-database 
default --hive-table t1_partition --hive-partition-key <col> 
--hive-partition-value <value>

Example (dynamic):
sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 
--hcatalog-database default --hcatalog-table t1_partitioned --num-mappers 1 
--verbose --where "c1 > 1" --hive-partition-value <col>





Markus Kemper
Customer Operations Engineer
[mage supprimée par l'expéditeur.  www.cloudera.com]<http://www.cloudera.com>


On Mon, Aug 15, 2016 at 9:19 AM, Boglarka Egyed 
<[email protected]<mailto:[email protected]>> wrote:
Hi Mahebub,

Unfortunatelly, using --hive-partition-key and --hive-partition-value requires 
each Sqoop statement to be imported into a single Hive partition. There is 
currently no support for Hive auto-partitioning. Instead, if a data set is to 
be imported into multiple partitions in a table, separate Sqoop statements are 
needed for insertion into each partition.

However, using --hcatalog-partition-keys and --hcatalog-partition-values you 
can specify multiple static partition key/value pairs, please find the details 
in the User Guide: 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration

Best Regards,
Bogi

On Mon, Aug 15, 2016 at 9:51 AM, Mahebub Sayyed 
<[email protected]<mailto:[email protected]>> wrote:
I need to create/import  hive table having three Partitions year/month/day 
using Sqoop. I have checked --hive-partition-key and --hive-partition-value in 
sqoop. using these parameters I have created partition year like this 
--hive-partition-key year --hive-partition-value '2016' My question is how to 
pass multiple values for partition-key and partition-value to create partitions 
like year/month/day.

--
Regards,
Mahebub Sayyed





--
Regards,
Mahebub Sayyed





--
Regards,
Mahebub Sayyed

RE: how to create multi level partition in hive using sqoop

Reply via email to