> On 2011-09-23 02:04:19, Arvind Prabhakar wrote: > > Thanks for taking this up Jarek. This is a very important functionality > > extension for Sqoop. Looking at the code change, I feel that you have tried > > to implement the super-set functionality such that the user can: > > > > * Either map SQL types directly to Java/Hive types, or > > * Map specific columns to Java/Hive types. > > > > Between these two, I feel that the later is more relevant use-case for > > Sqoop consumers. Mapping SQL types to Java/Hive types can be done by > > extending the Manager and that in itself is not as flexible for the user as > > the other option of mapping specific columns to a data type. Even when > > considering the option to map columns to specific data types, the user may > > not necessarily know what column names Sqoop will use. If these column > > names do not match, the default mapping will be used silently and that > > could lead to other problems. > > > > Therefore I suggest the following: > > * Introduce a new option that tells Sqoop to generate a mapping file for > > the job. This file could be a java properties file that contains the names > > of columns as read by Sqoop and their default mappings and does not run the > > actual job. > > * Introduce another new option that tells Sqoop to use a given mapping file > > for the job. > > > > So the typical workflow would be - if you want to run an import you would > > do the following: > > * run: sqoop import --connect .... --genrate-mapping-only > > /path/to/mapping-file > > * manually modify the mapping file to override the default types where > > necessary > > * run: sqoop import --connect .... --use-mapping-file /path/to/mapping-file > > > > What do you think about this approach? > > Jarek Jarcec wrote: > Hi Arvind, > thank you for your review. I just noticed that I did not include basic > help for my changes, so obviously you have to dig it into source code to find > out what and how is working. I'm sorry for that. I'll not forgot to include > basic help next time. > > I did not realized that SQL types can be mapped by extending Manager > class. My original goal here was to offer user chance to change any type > mapping out of the box, without any source code changes. SQL type mapping can > be simulated by the column mapping (on precondition that user do know names > of of all "problematic" columns), so I don't have problems to implement only > the second part as you suggested. > > I think that in most cases the column names are known or user can force > them to be known (for example by using "as": SELECT bla-lba-bla AS > known_value), so I would say that user can pass the column names without > reading the mapping in a normal situation. I don't see a problem to add a > option that would generate mapping file that user can read to find out the > names used by sqoop, but I would prefer to pass new mapping on command line > rather than in separate file. Reason for that is than most of the times, I'm > executing sqoop using oozie and in this case it's not easy to guarantee that > given property file is located on all nodes. > > What do you think? > > Jarcec
Thanks Jarek. I agree with your assessment that you could use projections to tie down the column names, although it will be come more verbose for users who simply want to import a table. I understand your preference of specifying it on the command line for Oozie integration - that makes sense. Overall - I think your approach to this is good and we can implement it that way. The one suggestion I have is that if a mapping cannot be applied, it should raise an exception that fails the job, as opposed to defaulting to the built-in mapping. For example if a user specifies a column name that does not exist - that should be an error. - Arvind ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1975/#review2032 ----------------------------------------------------------- On 2011-09-20 08:28:11, Jarek Jarcec wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/1975/ > ----------------------------------------------------------- > > (Updated 2011-09-20 08:28:11) > > > Review request for Sqoop and Arvind Prabhakar. > > > Summary > ------- > > This is not fully featured patch yet, it's more only preview of what I have > in my mind when I created the bug and how would I image to solve it. I would > like to check with community whether this is acceptable solution and if so, > I'll finish the patch. > > Things that are missing and I'll add them if this way will be accepted: > * Tests > * Documentation > * Supporting for type names (so that user don't have to type the integer > constants on command line) > > Any feedback will be greatly appreciated. > > > This addresses bug sqoop-342. > https://issues.apache.org/jira/browse/sqoop-342 > > > Diffs > ----- > > /src/java/com/cloudera/sqoop/SqoopOptions.java 1172203 > /src/java/com/cloudera/sqoop/hive/HiveTypes.java 1172203 > /src/java/com/cloudera/sqoop/hive/TableDefWriter.java 1172203 > /src/java/com/cloudera/sqoop/manager/ConnManager.java 1172203 > /src/java/com/cloudera/sqoop/manager/SqlManager.java 1172203 > /src/java/com/cloudera/sqoop/mapreduce/JdbcExportJob.java 1172203 > /src/java/com/cloudera/sqoop/orm/ClassWriter.java 1172203 > /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1172203 > > Diff: https://reviews.apache.org/r/1975/diff > > > Testing > ------- > > > Thanks, > > Jarek > >
