Daniel Voros created SQOOP-3288:
-----------------------------------

             Summary: Incremental import's upper bound ignores session time 
zone in Oracle
                 Key: SQOOP-3288
                 URL: https://issues.apache.org/jira/browse/SQOOP-3288
             Project: Sqoop
          Issue Type: Bug
          Components: connectors/oracle
    Affects Versions: 1.4.7
            Reporter: Daniel Voros
            Assignee: Daniel Voros


At the moment we're using [{{SELECT SYSDATE FROM 
dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
 when getting current time from Oracle.

SYSDATE returns the underlying operating system's current time, while 
CURRENT_TIMESTAMP uses the session time zone. This could lead to problems 
during incremental imports *when Oracle's time zone is different from the OS*.

Consider the following scenario when Oracle is configured to {{+0:00}}, while 
the OS is {{+5:00}}:
||Oracle time||OS time||Event||
|2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
|2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to 
{{2:30}} *Won't be imported!*|
|3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|

This way records updated within 5 hours after the last sqoop import won't get 
imported.

Please note, that the example above assumes, that the user/administrator who's 
updating the Oracle table will use the current session time of Oracle when 
setting the "last updated" column of the table.

I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other 
connection managers, like MySQL or PostgreSQL are using that as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to