[ 
https://issues.apache.org/jira/browse/SQOOP-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvind Prabhakar updated SQOOP-312:
-----------------------------------

    Fix Version/s: 1.4.0

> Support for hive dynamic partitions with SQOOP import
> -----------------------------------------------------
>
>                 Key: SQOOP-312
>                 URL: https://issues.apache.org/jira/browse/SQOOP-312
>             Project: Sqoop
>          Issue Type: New Feature
>            Reporter: Bejoy KS
>             Fix For: 1.4.0
>
>
> Currently in order to populate hive table dynamic partitions using Sqoop 
> import we need to perform the following steps.
> 1. Need to analyze the db table and identify the distinct values to be 
> partitioned column 
> 2. If there are n distinct values for the column then we need to create n 
> different SQOOP import commands, each having the corresponding where clause 
> to pick the specific data corresponding to the value along with 
> --hive-partition-key <key-name/column name> and --hive-partition-value 
> <value-string/column value>.
> This approach becomes a bottle neck in case of larger tables that spawns 
> millions of rows. Such tables should be partitioned in hive and there could 
> at lest 300 to 500 partitions, ie 300 to 500 Sqoop imports. 
> We are currently overcoming this hurdle by the following tweak
> 1. Sqoop import the whole db table into a non partitioned hive table
> 2. Manually create a partition based hive table
> 3. Use hive QL to parse the data from non partitioned hive table to the 
> corresponding partitions in the partitioned hive table.
> Expecting some parameters in SQOOP import to execute the following within 
> SQOOP itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to