I solved this problem. It was a simple fix. -Doracle.row.fetch.size=1000
The default was 5000, and since I was grabbing a lot of columns it was taking up more memory than the JVM could handle. I had been using the --fetch-size option, but that wasn't helping. On Thu, Jun 9, 2016 at 8:51 AM, Mark Libucha <[email protected]> wrote: > Hi, I can’t keep from running out of JVM heap when trying to import a > large Oracle table with the direct flag set. I can import successfully with > smaller tables. > > > > Stack trace in the mapper log shows: > > > > 2016-06-09 14:59:21,266 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Java heap space > > > > and (the subsequent and probably irrelevant?) > > > > Caused by: java.sql.SQLException: Protocol violation: [8, 1] > > > > The line that gets printed to stdout just before the job runs: > > > > 16/06/09 15:21:57 INFO oracle.OraOopDataDrivenDBInputFormat: The table > being imported by sqoop has 80751872 blocks that have been divided into > 5562 chunks which will be processed in 16 splits. The chunks will be > allocated to the splits using the method : ROUNDROBIN > > > > I’ve tried adding this to the command: -Dmapred.child.java.opts=-Xmx4000M > but that doesn’t help. I've also tried increasing/decreasing the number of > splits. > > > > The full command looks like this: > > > > sqoop import -Dmapred.child.java.opts=-Xmx4000M > -Dmapred.map.max.attempts=1 --connect > jdbc:oracle:thin:@ldap://myhost:389/somedb,cn=OracleContext,dc=mycom,dc=com > --username myusername --password mypassword --table mydb.mytable --columns > "COL1, COL2, COL50" --hive-partition-key "ds" --hive-partition-value > "20160607" --hive-database myhivedb --hive-table myhivetable --hive-import > --null-string "" --null-non-string "" --direct --create-hive-table -m 16 > --delete-target-dir --target-dir /tmp/sqoop_test > > > > Thanks for any suggestions. > > > > Mark >
