[
https://issues.apache.org/jira/browse/SQOOP-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116636#comment-13116636
]
[email protected] commented on SQOOP-342:
-----------------------------------------------------
bq. On 2011-09-23 02:04:19, Arvind Prabhakar wrote:
bq. > Thanks for taking this up Jarek. This is a very important functionality
extension for Sqoop. Looking at the code change, I feel that you have tried to
implement the super-set functionality such that the user can:
bq. >
bq. > * Either map SQL types directly to Java/Hive types, or
bq. > * Map specific columns to Java/Hive types.
bq. >
bq. > Between these two, I feel that the later is more relevant use-case for
Sqoop consumers. Mapping SQL types to Java/Hive types can be done by extending
the Manager and that in itself is not as flexible for the user as the other
option of mapping specific columns to a data type. Even when considering the
option to map columns to specific data types, the user may not necessarily know
what column names Sqoop will use. If these column names do not match, the
default mapping will be used silently and that could lead to other problems.
bq. >
bq. > Therefore I suggest the following:
bq. > * Introduce a new option that tells Sqoop to generate a mapping file for
the job. This file could be a java properties file that contains the names of
columns as read by Sqoop and their default mappings and does not run the actual
job.
bq. > * Introduce another new option that tells Sqoop to use a given mapping
file for the job.
bq. >
bq. > So the typical workflow would be - if you want to run an import you
would do the following:
bq. > * run: sqoop import --connect .... --genrate-mapping-only
/path/to/mapping-file
bq. > * manually modify the mapping file to override the default types where
necessary
bq. > * run: sqoop import --connect .... --use-mapping-file
/path/to/mapping-file
bq. >
bq. > What do you think about this approach?
Hi Arvind,
thank you for your review. I just noticed that I did not include basic help for
my changes, so obviously you have to dig it into source code to find out what
and how is working. I'm sorry for that. I'll not forgot to include basic help
next time.
I did not realized that SQL types can be mapped by extending Manager class. My
original goal here was to offer user chance to change any type mapping out of
the box, without any source code changes. SQL type mapping can be simulated by
the column mapping (on precondition that user do know names of of all
"problematic" columns), so I don't have problems to implement only the second
part as you suggested.
I think that in most cases the column names are known or user can force them to
be known (for example by using "as": SELECT bla-lba-bla AS known_value), so I
would say that user can pass the column names without reading the mapping in a
normal situation. I don't see a problem to add a option that would generate
mapping file that user can read to find out the names used by sqoop, but I
would prefer to pass new mapping on command line rather than in separate file.
Reason for that is than most of the times, I'm executing sqoop using oozie and
in this case it's not easy to guarantee that given property file is located on
all nodes.
What do you think?
Jarcec
- Jarek
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1975/#review2032
-----------------------------------------------------------
On 2011-09-20 08:28:11, Jarek Jarcec wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1975/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-20 08:28:11)
bq.
bq.
bq. Review request for Sqoop and Arvind Prabhakar.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. This is not fully featured patch yet, it's more only preview of what I
have in my mind when I created the bug and how would I image to solve it. I
would like to check with community whether this is acceptable solution and if
so, I'll finish the patch.
bq.
bq. Things that are missing and I'll add them if this way will be accepted:
bq. * Tests
bq. * Documentation
bq. * Supporting for type names (so that user don't have to type the integer
constants on command line)
bq.
bq. Any feedback will be greatly appreciated.
bq.
bq.
bq. This addresses bug sqoop-342.
bq. https://issues.apache.org/jira/browse/sqoop-342
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. /src/java/com/cloudera/sqoop/SqoopOptions.java 1172203
bq. /src/java/com/cloudera/sqoop/hive/HiveTypes.java 1172203
bq. /src/java/com/cloudera/sqoop/hive/TableDefWriter.java 1172203
bq. /src/java/com/cloudera/sqoop/manager/ConnManager.java 1172203
bq. /src/java/com/cloudera/sqoop/manager/SqlManager.java 1172203
bq. /src/java/com/cloudera/sqoop/mapreduce/JdbcExportJob.java 1172203
bq. /src/java/com/cloudera/sqoop/orm/ClassWriter.java 1172203
bq. /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1172203
bq.
bq. Diff: https://reviews.apache.org/r/1975/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Jarek
bq.
bq.
> Allow user to override sqoop type mapping
> -----------------------------------------
>
> Key: SQOOP-342
> URL: https://issues.apache.org/jira/browse/SQOOP-342
> Project: Sqoop
> Issue Type: New Feature
> Components: codegen, hive-integration
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Priority: Minor
>
> It would be nice if user would be able to override any type implicit sqoop
> mapping (from SQL to Java or Hive) on command line.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira