Support boundary query on the command line
------------------------------------------

                 Key: SQOOP-331
                 URL: https://issues.apache.org/jira/browse/SQOOP-331
             Project: Sqoop
          Issue Type: New Feature
          Components: tools
    Affects Versions: 1.4.0
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho


It would be nice if the sqoop would have ability to specify query that will 
fetch minimal and maximal value for creating splits in DataDrivenDBInputFormat 
from the command line.

Normally sqoop will generate query to get maximal and minimal value for 
creating splits in following form: SELECT min($split_by_column), 
max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed to 
import only portion of data with ranges based on the split_by_column that I 
already have preselected and that are available in special table that holds 
data ranges and appropriate primary key values. So my auto generated query 
looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and id 
<= max_id. That query is obviously useless and is just creating unnecessary 
load on the database server. It would be nice to supply my own boundary query 
that will use the extra table with data ranges.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to