Thanks for the response.  

I was thinking to use Oraoop to automatically import Oracle partitions to Hive 
partitions.  But, based on conversation below, I just learned its not possible. 
 

From automation perspective, I think running one Sqoop job per partition and 
create same partition in Hive is better option.  

Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive 
partitions.  Any idea why there are no commits to Oraoop since 2012?

Regards,
Venkat

-----Original Message-----
From: Gwen Shapira [mailto:[email protected]] 
Sent: Tuesday, August 05, 2014 6:24 PM
To: [email protected]
Subject: Re: Import Partitions from Oracle to Hive Partitions

Having OraOop automatically handle partitions in Hive will be a cool feature. I 
agree that this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson <[email protected]> 
wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do 
> remember that! I would think we could modify Sqoop to create subfolders for 
> each partition - instead of how it now creates a separate file for each 
> partition? This would probably be limited to the direct (OraOop) connector as 
> it is aware of partitions (existing connector doesn't read data dictionary 
> directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then 
> manually move the files into separate folders - at least you'll have each 
> partition in a separate file rather than spread throughout all files. The 
> other thing you could look at is the option below - you could run one Sqoop 
> job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table 
> OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to 
> retain the letter case or if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table 
> OracleTableName If the partition name is not double quoted then its 
> name will be automatically converted to upper case, PARTITIONB for 
> above.
> When using double quotes the entire list of partition names must be 
> enclosed in single quotes.
> If the last partition name in the list is double quoted then there 
> must be a comma at the end of the list. 
> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
> OracleTableName
>
> Name each partition to be included. There is no facility to provide a range 
> of partition names.
>
> There is no facility to define sub partitions. The entire partition is 
> included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:[email protected]]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: [email protected]
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will 
> require some post-processing - copy files into properly named directories and 
> adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this 
> assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <[email protected]> 
> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll 
>> test it out when I get into the office and get back to you. But this 
>> option will make it so there is one file for each Oracle partition – 
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the 
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle 
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to 
>> an error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:[email protected]]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: '[email protected]'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and 
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

Reply via email to