Hi Brenden,

OraOop can create a partitioned table for you - it will be partitioned by the 
date of the load (one new partition for each export), then sub partitioned by 
mapper (so 24 mappers would have 24 sub partitions). The easiest way is to use 
the template table parameter - so here is the command I used - in this example 
TST_PRODUCT is my existing non-partitioned table. TST_PRODUCT_EXP is a table 
that does not exist and will be created by OraOop - it will have the same 
columns as TST_PRODUCT - but will add the columns I mentioned for partitioning. 
You can find more information about it in the documentation: 
https://github.com/QuestSoftwareTCD/OracleSQOOPconnector/blob/master/docs/oraoopuserguide.pdf?raw=true

sqoop export -Dmapred.map.tasks.speculative.execution=false 
-Doraoop.template.table=tst_product -Doraoop.partitioned=true --connect 
jdbc:oracle:thin:@//hostname1:1521/service --username username --password 
password --export-dir tst_product --table tst_product_exp --num-mappers 24

Depending on your use case the partitioning option may or may not work for you 
- but it is good in that it reduces the load on the Oracle database by a lot as 
you can see - and also massively improves loading time.

If you want to generate the TST_PRODUCT table I use in my testing you can run 
the OraOop integration tests - which will generate it with 10,000 rows by 
default (you can change the parameter to be any number of rows you like). There 
is also a command line interface to the integration tests which we use to 
create tables for benchmarking purposes. The source code is available here: 
https://github.com/QuestSoftwareTCD/OracleSQOOPconnector

Let me know how you go - I do the benchmarking on our Oracle products to make 
sure they scale correctly so would be happy to help you out if you have any 
issues.

David

From: Brenden Cobb [mailto:[email protected]]
Sent: Friday, 20 June 2014 12:44 AM
To: [email protected]
Subject: Re: Sqoop to Oracle transfer rates

Thanks David that is very informative. May I ask what partitioning method you 
utilized?

I've got Oracle licensing covered so that should not be an issue.

From: David Robson 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, June 18, 2014 7:43 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: RE: Sqoop to Oracle transfer rates

Hi Brenden,

I did some benchmarking a while ago on a 62858MB folder. The best I achieved 
with 24 mappers was 867 seconds (72.5MB/s). This was on a 4 node Hadoop cluster 
that was on some old servers we had so I'm sure with some better hardware you 
could get a lot faster.

I'll attach some graphs which you might find interesting - basically you can 
see OraOop reduces load on the DB a little bit with a few optimizations - but 
once you use direct path and partition exchange loading the benefits are 
massive. Of course this requires you to have the relevant Oracle licenses so 
may not be an option.

David

From: Brenden Cobb [mailto:[email protected]]
Sent: Thursday, 19 June 2014 6:25 AM
To: [email protected]<mailto:[email protected]>
Subject: Sqoop to Oracle transfer rates

Looking for some benchmarks on Sqoop (or Oraoop) exports.

Appreciate if anyone feels like sharing some metrics. Rough numbers for MB/s 
would suffice, perhaps number of mappers used. I'm working specifically with 
Oracle exports, but other RDBMS export rates would be enlightening as well.

Thanks

Reply via email to