I have found that you must specify these 6 parameters for incremental imports 
to work:
   --null-string '\\N'
   --null-non-string '\\N'
   --fields-terminated-by ‘\001'
   --input-null-string '\\N'
   --input-null-non-string '\\N'
   --input-fields-terminated-by ‘\001'

I believe that the first 3 control how output files (created by Sqoop) are 
delimited and the second 3 control how input files are delimited.
Since an incremental import merges existing files, they are treated like input.

From: Manikandan R
Reply-To: "[email protected]<mailto:[email protected]>"
Date: Wednesday, May 20, 2015 at 1:52 PM
To: "[email protected]<mailto:[email protected]>"
Subject: Re: Merge failed - timestamp column with null values

Hello Swati,

Thanks for your reply.

I am not using --class-name in sqoop command.

Here is my sqoop action in oozie

        <action name="sqoop-saved-job">
                <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                        <job-tracker>${jobTracker}</job-tracker>
                        <name-node>${nameNode}</name-node>
                        <job-xml>/tmp/sqoop-site.xml</job-xml>
                        <arg>job</arg>
                        <arg>--create</arg>
                        <arg>${dbName}-${tableName}-sync-job</arg>
                        <arg>--</arg>
                        <arg>import</arg>
                        <arg>--connect</arg>
                        <arg>jdbc:mysql://${dbHost}/${dbName}</arg>
                        <arg>--username</arg>
                        <arg>root</arg>
                        <arg>--password-file</arg>
                        <arg>/tmp/.password</arg>
                        <arg>--table</arg>
                        <arg>${tableName}</arg>
                        <arg>--incremental</arg>
                        <arg>${incrementalMode}</arg>
                        <arg>--merge-key</arg>
                        <arg>${mergeKey}</arg>
                        <arg>--check-column</arg>
                        <arg>${checkColumn}</arg>
                        <arg>--last-value</arg>
                        <arg>${lastValue}</arg>
                        <arg>--target-dir</arg>
                        <arg>/data/${dbName}/${stgPrefix}_${tableName}</arg>
                        <arg>--fields-terminated-by</arg>
                        <arg>\001</arg>
                        <arg>--null-string</arg>
                        <arg>\\N</arg>
                        <arg>--null-non-string</arg>
                        <arg>\\N</arg>
                        <arg>${directOption}</arg>
                </sqoop>

                <ok to="sqoop-run-or-saved-job-check" />
                <error to="sqoop-run-or-saved-job-check" />
        </action>

and here is the exception -

 Error: java.lang.RuntimeException: Can't parse input data: '\N'
  at dim_scd_table.__loadFromFields(dim_scd_table.java:473)
  at dim_scd_table.parse(dim_scd_table.java:391)
  at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
  at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
  Caused by: java.lang.IllegalArgumentException: Timestamp format must be 
yyyy-mm-dd hh:mm:ss[.fffffffff]
  at java.sql.Timestamp.valueOf(Timestamp.java:202)
  at dim_scd_table.__loadFromFields(dim_scd_table.java:455)

Table name is dim_scd_table. It has scd_end_date column of "timestamp" 
datatype. When this column has NULL value, am getting the above exception.

Please let me know on this.

Thanks,
Mani

On Wed, May 20, 2015 at 10:49 PM, Swati Ambulkar -X (sambulka - PERSISTENT 
SYSTEMS INC at Cisco) <[email protected]<mailto:[email protected]>> wrote:
Can you paste your sqoop command please?

Are you generating your class with –class-name option? :

Once you do that you should see some code for handling timestamp column similar 
to listed below. Here if it encounters \N __cur_str.length() will not be 0 and 
it will go through else part and you can check if this is what is failing for 
you.

    __cur_str = __it.next();
    if (__cur_str.equals("null") || __cur_str.length() == 0) { this.STARTTIME = 
null; } else {
      this.STARTTIME = java.sql.Timestamp.valueOf(__cur_str);
    }

    __cur_str = __it.next();
    if (__cur_str.equals("null") || __cur_str.length() == 0) { this.ENDTIME = 
null; } else {
      this.ENDTIME = java.sql.Timestamp.valueOf(__cur_str);
    }

You can direct sqoop to use null string (“”) for –null-non-string option.
        options.add("--null-string");
        options.add("");

        options.add("--null-non-string");
        options.add("");

This would put null string in imported row and then the abovementioned check 
should timestamp column value to null.

Thanks,
Swati

From: Manikandan R [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, May 20, 2015 12:02 AM
To: [email protected]<mailto:[email protected]>
Subject: Merge failed - timestamp column with null values

Hello Everyone,

I am trying to push incremental updates from mysql to hdfs using sqoop import 
command with Mergekey option and incremental mode as "lastmodified".

My table has some timestamp columns. I don't see any problems as long as 
timestamp columns has some values. But, Problem arises only when it has NULL 
values. I copied the below exception from my logs. Also, Incase of 
Non-timestamp columns having null values, there is no issues.

Error: java.lang.RuntimeException: Can't parse input data: '\N'
  at dim_scd_table.__loadFromFields(dim_scd_table.java:473)
  at dim_scd_table.parse(dim_scd_table.java:391)
  at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
  at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
  Caused by: java.lang.IllegalArgumentException: Timestamp format must be 
yyyy-mm-dd hh:mm:ss[.fffffffff]
  at java.sql.Timestamp.valueOf(Timestamp.java:202)
  at dim_scd_table.__loadFromFields(dim_scd_table.java:455)
  ... 11 more

Kindly let me know on this.

Thanks,
Mani

  • Merge faile... Manikandan R
    • RE: Me... Swati Ambulkar -X (sambulka - PERSISTENT SYSTEMS INC at Cisco)
      • Re... Manikandan R
        • ... Michael Arena
          • ... Manikandan R
            • ... Manikandan R

Reply via email to