[ 
https://issues.apache.org/jira/browse/SQOOP-331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108434#comment-13108434
 ] 

[email protected] commented on SQOOP-331:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1946/
-----------------------------------------------------------

(Updated 2011-09-20 08:29:02.745328)


Review request for Sqoop and Arvind Prabhakar.


Changes
-------

I've added Arvind to reviewer list.


Summary
-------

I've incorporated all Arvind's suggestions (hopefully :-)).


This addresses bug SQOOP-331.
    https://issues.apache.org/jira/browse/SQOOP-331


Diffs
-----

  /src/docs/man/import-args.txt 1171925 
  /src/docs/user/import.txt 1171925 
  /src/java/com/cloudera/sqoop/SqoopOptions.java 1171925 
  /src/java/com/cloudera/sqoop/manager/SqlManager.java 1171925 
  /src/java/com/cloudera/sqoop/mapreduce/DataDrivenImportJob.java 1171925 
  /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1171925 
  /src/java/com/cloudera/sqoop/tool/ImportTool.java 1171925 
  /src/test/com/cloudera/sqoop/TestSqoopOptions.java 1171925 

Diff: https://reviews.apache.org/r/1946/diff


Testing
-------

I'm still having troubles to create meaningful tests for this patch. I've came 
up with two different approaches, but I wasn't able to get running either of 
them:

1) Use boundary query for limiting import data (like "select 1, 2"). This is 
totally wrong usage of this parameter, but I was thinking that It might be fine 
for the testing purpose. Unfortunately underlying code is using this query only 
in case that is creating more than one map task and I was not able to forced it 
create more than one. Which make sense because the -m parameter is also only a 
hint.

2) Parse logs. Fortunately class responsible for creating splits is printing 
used boundary query, so there is possibility to parse those logs and look for 
used boundary query. But I'm not sure how this can be done in proper fashion.

Any ideas will be welcomed.

Jarcec


Thanks,

Jarek



> Support boundary query on the command line
> ------------------------------------------
>
>                 Key: SQOOP-331
>                 URL: https://issues.apache.org/jira/browse/SQOOP-331
>             Project: Sqoop
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 1.4.0
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>         Attachments: SQOOP-331.patch
>
>
> It would be nice if the sqoop would have ability to specify query that will 
> fetch minimal and maximal value for creating splits in 
> DataDrivenDBInputFormat from the command line.
> Normally sqoop will generate query to get maximal and minimal value for 
> creating splits in following form: SELECT min($split_by_column), 
> max($split_by_column) FROM $table WHERE $cmd_where. In my use case, I needed 
> to import only portion of data with ranges based on the split_by_column that 
> I already have preselected and that are available in special table that holds 
> data ranges and appropriate primary key values. So my auto generated query 
> looked like this: SELECT min(id), max(id) FROM table WHERE id => min_id and 
> id <= max_id. That query is obviously useless and is just creating 
> unnecessary load on the database server. It would be nice to supply my own 
> boundary query that will use the extra table with data ranges.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to