Hi, I can’t keep from running out of JVM heap when trying to import a large Oracle table with the direct flag set. I can import successfully with smaller tables.
Stack trace in the mapper log shows: 2016-06-09 14:59:21,266 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space and (the subsequent and probably irrelevant?) Caused by: java.sql.SQLException: Protocol violation: [8, 1] The line that gets printed to stdout just before the job runs: 16/06/09 15:21:57 INFO oracle.OraOopDataDrivenDBInputFormat: The table being imported by sqoop has 80751872 blocks that have been divided into 5562 chunks which will be processed in 16 splits. The chunks will be allocated to the splits using the method : ROUNDROBIN I’ve tried adding this to the command: -Dmapred.child.java.opts=-Xmx4000M but that doesn’t help. I've also tried increasing/decreasing the number of splits. The full command looks like this: sqoop import -Dmapred.child.java.opts=-Xmx4000M -Dmapred.map.max.attempts=1 --connect jdbc:oracle:thin:@ldap://myhost:389/somedb,cn=OracleContext,dc=mycom,dc=com --username myusername --password mypassword --table mydb.mytable --columns "COL1, COL2, COL50" --hive-partition-key "ds" --hive-partition-value "20160607" --hive-database myhivedb --hive-table myhivetable --hive-import --null-string "" --null-non-string "" --direct --create-hive-table -m 16 --delete-target-dir --target-dir /tmp/sqoop_test Thanks for any suggestions. Mark
