I have found that you must specify these 6 parameters for incremental imports to work: --null-string '\\N' --null-non-string '\\N' --fields-terminated-by ‘\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by ‘\001'
I believe that the first 3 control how output files (created by Sqoop) are delimited and the second 3 control how input files are delimited. Since an incremental import merges existing files, they are treated like input. From: Manikandan R Reply-To: "[email protected]<mailto:[email protected]>" Date: Wednesday, May 20, 2015 at 1:52 PM To: "[email protected]<mailto:[email protected]>" Subject: Re: Merge failed - timestamp column with null values Hello Swati, Thanks for your reply. I am not using --class-name in sqoop command. Here is my sqoop action in oozie <action name="sqoop-saved-job"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/tmp/sqoop-site.xml</job-xml> <arg>job</arg> <arg>--create</arg> <arg>${dbName}-${tableName}-sync-job</arg> <arg>--</arg> <arg>import</arg> <arg>--connect</arg> <arg>jdbc:mysql://${dbHost}/${dbName}</arg> <arg>--username</arg> <arg>root</arg> <arg>--password-file</arg> <arg>/tmp/.password</arg> <arg>--table</arg> <arg>${tableName}</arg> <arg>--incremental</arg> <arg>${incrementalMode}</arg> <arg>--merge-key</arg> <arg>${mergeKey}</arg> <arg>--check-column</arg> <arg>${checkColumn}</arg> <arg>--last-value</arg> <arg>${lastValue}</arg> <arg>--target-dir</arg> <arg>/data/${dbName}/${stgPrefix}_${tableName}</arg> <arg>--fields-terminated-by</arg> <arg>\001</arg> <arg>--null-string</arg> <arg>\\N</arg> <arg>--null-non-string</arg> <arg>\\N</arg> <arg>${directOption}</arg> </sqoop> <ok to="sqoop-run-or-saved-job-check" /> <error to="sqoop-run-or-saved-job-check" /> </action> and here is the exception - Error: java.lang.RuntimeException: Can't parse input data: '\N' at dim_scd_table.__loadFromFields(dim_scd_table.java:473) at dim_scd_table.parse(dim_scd_table.java:391) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] at java.sql.Timestamp.valueOf(Timestamp.java:202) at dim_scd_table.__loadFromFields(dim_scd_table.java:455) Table name is dim_scd_table. It has scd_end_date column of "timestamp" datatype. When this column has NULL value, am getting the above exception. Please let me know on this. Thanks, Mani On Wed, May 20, 2015 at 10:49 PM, Swati Ambulkar -X (sambulka - PERSISTENT SYSTEMS INC at Cisco) <[email protected]<mailto:[email protected]>> wrote: Can you paste your sqoop command please? Are you generating your class with –class-name option? : Once you do that you should see some code for handling timestamp column similar to listed below. Here if it encounters \N __cur_str.length() will not be 0 and it will go through else part and you can check if this is what is failing for you. __cur_str = __it.next(); if (__cur_str.equals("null") || __cur_str.length() == 0) { this.STARTTIME = null; } else { this.STARTTIME = java.sql.Timestamp.valueOf(__cur_str); } __cur_str = __it.next(); if (__cur_str.equals("null") || __cur_str.length() == 0) { this.ENDTIME = null; } else { this.ENDTIME = java.sql.Timestamp.valueOf(__cur_str); } You can direct sqoop to use null string (“”) for –null-non-string option. options.add("--null-string"); options.add(""); options.add("--null-non-string"); options.add(""); This would put null string in imported row and then the abovementioned check should timestamp column value to null. Thanks, Swati From: Manikandan R [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, May 20, 2015 12:02 AM To: [email protected]<mailto:[email protected]> Subject: Merge failed - timestamp column with null values Hello Everyone, I am trying to push incremental updates from mysql to hdfs using sqoop import command with Mergekey option and incremental mode as "lastmodified". My table has some timestamp columns. I don't see any problems as long as timestamp columns has some values. But, Problem arises only when it has NULL values. I copied the below exception from my logs. Also, Incase of Non-timestamp columns having null values, there is no issues. Error: java.lang.RuntimeException: Can't parse input data: '\N' at dim_scd_table.__loadFromFields(dim_scd_table.java:473) at dim_scd_table.parse(dim_scd_table.java:391) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] at java.sql.Timestamp.valueOf(Timestamp.java:202) at dim_scd_table.__loadFromFields(dim_scd_table.java:455) ... 11 more Kindly let me know on this. Thanks, Mani
