Hi Brenden, OraOop can create a partitioned table for you - it will be partitioned by the date of the load (one new partition for each export), then sub partitioned by mapper (so 24 mappers would have 24 sub partitions). The easiest way is to use the template table parameter - so here is the command I used - in this example TST_PRODUCT is my existing non-partitioned table. TST_PRODUCT_EXP is a table that does not exist and will be created by OraOop - it will have the same columns as TST_PRODUCT - but will add the columns I mentioned for partitioning. You can find more information about it in the documentation: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector/blob/master/docs/oraoopuserguide.pdf?raw=true
sqoop export -Dmapred.map.tasks.speculative.execution=false -Doraoop.template.table=tst_product -Doraoop.partitioned=true --connect jdbc:oracle:thin:@//hostname1:1521/service --username username --password password --export-dir tst_product --table tst_product_exp --num-mappers 24 Depending on your use case the partitioning option may or may not work for you - but it is good in that it reduces the load on the Oracle database by a lot as you can see - and also massively improves loading time. If you want to generate the TST_PRODUCT table I use in my testing you can run the OraOop integration tests - which will generate it with 10,000 rows by default (you can change the parameter to be any number of rows you like). There is also a command line interface to the integration tests which we use to create tables for benchmarking purposes. The source code is available here: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector Let me know how you go - I do the benchmarking on our Oracle products to make sure they scale correctly so would be happy to help you out if you have any issues. David From: Brenden Cobb [mailto:[email protected]] Sent: Friday, 20 June 2014 12:44 AM To: [email protected] Subject: Re: Sqoop to Oracle transfer rates Thanks David that is very informative. May I ask what partitioning method you utilized? I've got Oracle licensing covered so that should not be an issue. From: David Robson <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, June 18, 2014 7:43 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: RE: Sqoop to Oracle transfer rates Hi Brenden, I did some benchmarking a while ago on a 62858MB folder. The best I achieved with 24 mappers was 867 seconds (72.5MB/s). This was on a 4 node Hadoop cluster that was on some old servers we had so I'm sure with some better hardware you could get a lot faster. I'll attach some graphs which you might find interesting - basically you can see OraOop reduces load on the DB a little bit with a few optimizations - but once you use direct path and partition exchange loading the benefits are massive. Of course this requires you to have the relevant Oracle licenses so may not be an option. David From: Brenden Cobb [mailto:[email protected]] Sent: Thursday, 19 June 2014 6:25 AM To: [email protected]<mailto:[email protected]> Subject: Sqoop to Oracle transfer rates Looking for some benchmarks on Sqoop (or Oraoop) exports. Appreciate if anyone feels like sharing some metrics. Rough numbers for MB/s would suffice, perhaps number of mappers used. I'm working specifically with Oracle exports, but other RDBMS export rates would be enlightening as well. Thanks
