[
https://issues.apache.org/jira/browse/SQOOP-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098262#comment-13098262
]
Arvind Prabhakar commented on SQOOP-331:
----------------------------------------
Thanks for the excellent patch Jarek. It is almost ready for commit except for
a few things:
* The case for free-form query needs to be handled. This would be in the
{{DataDrivenImportJob}} class within the {{else}} block below your current
modifications. Here is a sample diff that I cooked up quickly to convey this
point:
{noformat}
--- src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java
(revision 1165478)
+++ src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java
(working copy)
@@ -153,15 +153,26 @@
DataDrivenDBInputFormat.setInput(job, DBWritable.class,
mgr.escapeTableName(tableName), whereClause,
mgr.escapeColName(splitByCol), sqlColNames);
+
+ // If user specified boundary query on the command line propagate it to
+ // the job
+ if(options.getBoundaryQuery() != null) {
+ DataDrivenDBInputFormat.setBoundingQuery(job.getConfiguration(),
+ options.getBoundaryQuery());
+ }
} else {
// Import a free-form query.
String inputQuery = options.getSqlQuery();
String sanitizedQuery = inputQuery.replace(
DataDrivenDBInputFormat.SUBSTITUTE_TOKEN, " (1 = 1) ");
- String inputBoundingQuery =
- mgr.getInputBoundsQuery(splitByCol, sanitizedQuery);
+ String inputBoundingQuery = options.getBoundaryQuery();
+
if (inputBoundingQuery == null) {
+ mgr.getInputBoundsQuery(splitByCol, sanitizedQuery);
+ }
+
+ if (inputBoundingQuery == null) {
inputBoundingQuery = "SELECT MIN(" + splitByCol + "), MAX("
+ splitByCol + ") FROM (" + sanitizedQuery + ") AS t1";
}
{noformat}
* Second: since you have introduced a new command line option, it is necessary
that the userguide and man pages be updated. These are located under
{{src/docs}} directory and can be built using {{ant docs}} target. In order to
build them though, you would need to have {{asciidoc}} isntalled on your
machine.
* Bonus nit: there is a checkstyle violation in ImportTool:530 where the line
is longer than 80 characters.
Apart from that everything looks great. Some suggestions going forward:
* Usually when introducing a new functionality, it is required to have at least
one test that exercises that functionality. The test you have added is good but
does not really exercise the functionality.
* We use Apache Review Board (https://reviews.apache.org/) to post reviews for
patches that are longer than a few lines. This helps the reviewers give
contextual feedback where necessary.
Please let me know if you have any questions for me on these suggestions.
> Support boundary query on the command line
> ------------------------------------------
>
> Key: SQOOP-331
> URL: https://issues.apache.org/jira/browse/SQOOP-331
> Project: Sqoop
> Issue Type: New Feature
> Components: tools
> Affects Versions: 1.4.0
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Attachments: SQOOP-331.patch
>
>
> It would be nice if the sqoop would have ability to specify query that will
> fetch minimal and maximal value for creating splits in
> DataDrivenDBInputFormat from the command line.
> Normally sqoop will generate query to get maximal and minimal value for
> creating splits in following form: SELECT min($split_by_column),
> max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed
> to import only portion of data with ranges based on the split_by_column that
> I already have preselected and that are available in special table that holds
> data ranges and appropriate primary key values. So my auto generated query
> looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and
> id <= max_id. That query is obviously useless and is just creating
> unnecessary load on the database server. It would be nice to supply my own
> boundary query that will use the extra table with data ranges.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira