[
https://issues.apache.org/jira/browse/SQOOP-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arvind Prabhakar updated SQOOP-312:
-----------------------------------
Fix Version/s: 1.4.0
> Support for hive dynamic partitions with SQOOP import
> -----------------------------------------------------
>
> Key: SQOOP-312
> URL: https://issues.apache.org/jira/browse/SQOOP-312
> Project: Sqoop
> Issue Type: New Feature
> Reporter: Bejoy KS
> Fix For: 1.4.0
>
>
> Currently in order to populate hive table dynamic partitions using Sqoop
> import we need to perform the following steps.
> 1. Need to analyze the db table and identify the distinct values to be
> partitioned column
> 2. If there are n distinct values for the column then we need to create n
> different SQOOP import commands, each having the corresponding where clause
> to pick the specific data corresponding to the value along with
> --hive-partition-key <key-name/column name> and --hive-partition-value
> <value-string/column value>.
> This approach becomes a bottle neck in case of larger tables that spawns
> millions of rows. Such tables should be partitioned in hive and there could
> at lest 300 to 500 partitions, ie 300 to 500 Sqoop imports.
> We are currently overcoming this hurdle by the following tweak
> 1. Sqoop import the whole db table into a non partitioned hive table
> 2. Manually create a partition based hive table
> 3. Use hive QL to parse the data from non partitioned hive table to the
> corresponding partitions in the partitioned hive table.
> Expecting some parameters in SQOOP import to execute the following within
> SQOOP itself.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira