> On 2011-09-23 02:04:19, Arvind Prabhakar wrote:
> > Thanks for taking this up Jarek. This is a very important functionality 
> > extension for Sqoop. Looking at the code change, I feel that you have tried 
> > to implement the super-set functionality such that the user can:
> > 
> > * Either map SQL types directly to Java/Hive types, or
> > * Map specific columns to Java/Hive types.
> > 
> > Between these two, I feel that the later is more relevant use-case for 
> > Sqoop consumers. Mapping SQL types to Java/Hive types can be done by 
> > extending the Manager and that in itself is not as flexible for the user as 
> > the other option of mapping specific columns to a data type. Even when 
> > considering the option to map columns to specific data types, the user may 
> > not necessarily know what column names Sqoop will use. If these column 
> > names do not match, the default mapping will be used silently and that 
> > could lead to other problems.
> > 
> > Therefore I suggest the following:
> > * Introduce a new option that tells Sqoop to generate a mapping file for 
> > the job. This file could be a java properties file that contains the names 
> > of columns as read by Sqoop and their default mappings and does not run the 
> > actual job.
> > * Introduce another new option that tells Sqoop to use a given mapping file 
> > for the job.
> > 
> > So the typical workflow would be - if you want to run an import you would 
> > do the following:
> > * run: sqoop import --connect .... --genrate-mapping-only 
> > /path/to/mapping-file
> > * manually modify the mapping file to override the default types where 
> > necessary
> > * run: sqoop import --connect .... --use-mapping-file /path/to/mapping-file
> > 
> > What do you think about this approach?
> 
> Jarek Jarcec wrote:
>     Hi Arvind,
>     thank you for your review. I just noticed that I did not include basic 
> help for my changes, so obviously you have to dig it into source code to find 
> out what and how is working. I'm sorry for that. I'll not forgot to include 
> basic help next time.
>     
>     I did not realized that SQL types can be mapped by extending Manager 
> class. My original goal here was to offer user chance to change any type 
> mapping out of the box, without any source code changes. SQL type mapping can 
> be simulated by the column mapping (on precondition that user do know names 
> of of all "problematic" columns), so I don't have problems to implement only 
> the second part as you suggested.
>     
>     I think that in most cases the column names are known or user can force 
> them to be known (for example by using "as": SELECT bla-lba-bla AS 
> known_value), so I would say that user can pass the column names without 
> reading the mapping in a normal situation. I don't see a problem to add a 
> option that would generate mapping file that user can read to find out the 
> names used by sqoop, but I would prefer to pass new mapping on command line 
> rather than in separate file. Reason for that is than most of the times, I'm 
> executing sqoop using oozie and in this case it's not easy to guarantee that 
> given property file is located on all nodes.
>     
>     What do you think?
>     
>     Jarcec

Thanks Jarek. I agree with your assessment that you could use projections to 
tie down the column names, although it will be come more verbose for users who 
simply want to import a table. I understand your preference of specifying it on 
the command line for Oozie integration - that makes sense. 

Overall - I think your approach to this is good and we can implement it that 
way. The one suggestion I have is that if a mapping cannot be applied, it 
should raise an exception that fails the job, as opposed to defaulting to the 
built-in mapping. For example if a user specifies a column name that does not 
exist - that should be an error.


- Arvind


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1975/#review2032
-----------------------------------------------------------


On 2011-09-20 08:28:11, Jarek Jarcec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1975/
> -----------------------------------------------------------
> 
> (Updated 2011-09-20 08:28:11)
> 
> 
> Review request for Sqoop and Arvind Prabhakar.
> 
> 
> Summary
> -------
> 
> This is not fully featured patch yet, it's more only preview of what I have 
> in my mind when I created the bug and how would I image to solve it. I would 
> like to check with community whether this is acceptable solution and if so, 
> I'll finish the patch.
> 
> Things that are missing and I'll add them if this way will be accepted:
> * Tests
> * Documentation
> * Supporting for type names (so that user don't have to type the integer 
> constants on command line)
> 
> Any feedback will be greatly appreciated.
> 
> 
> This addresses bug sqoop-342.
>     https://issues.apache.org/jira/browse/sqoop-342
> 
> 
> Diffs
> -----
> 
>   /src/java/com/cloudera/sqoop/SqoopOptions.java 1172203 
>   /src/java/com/cloudera/sqoop/hive/HiveTypes.java 1172203 
>   /src/java/com/cloudera/sqoop/hive/TableDefWriter.java 1172203 
>   /src/java/com/cloudera/sqoop/manager/ConnManager.java 1172203 
>   /src/java/com/cloudera/sqoop/manager/SqlManager.java 1172203 
>   /src/java/com/cloudera/sqoop/mapreduce/JdbcExportJob.java 1172203 
>   /src/java/com/cloudera/sqoop/orm/ClassWriter.java 1172203 
>   /src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java 1172203 
> 
> Diff: https://reviews.apache.org/r/1975/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jarek
> 
>

Reply via email to