Support boundary query on the command line
------------------------------------------
Key: SQOOP-331
URL: https://issues.apache.org/jira/browse/SQOOP-331
Project: Sqoop
Issue Type: New Feature
Components: tools
Affects Versions: 1.4.0
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
It would be nice if the sqoop would have ability to specify query that will
fetch minimal and maximal value for creating splits in DataDrivenDBInputFormat
from the command line.
Normally sqoop will generate query to get maximal and minimal value for
creating splits in following form: SELECT min($split_by_column),
max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed to
import only portion of data with ranges based on the split_by_column that I
already have preselected and that are available in special table that holds
data ranges and appropriate primary key values. So my auto generated query
looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and id
<= max_id. That query is obviously useless and is just creating unnecessary
load on the database server. It would be nice to supply my own boundary query
that will use the extra table with data ranges.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira