Hi Felix, we've seen similar behaviour in the past when the data itself contains Hive special characters like new line characters. Would you mind trying your import with --hive-drop-import-delims to see if it helps?
Jarcec On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote: > Hello, > > I'm trying to import a full table from MySQL to Hadoop/Hive. It works with > certain parameters, but when I try to do an ETL that's somewhat more > complex, I start getting bogus rows in my resulting table. > > This works: > > sqoop import \ > --connect > 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull' > \ > --username xxxxx \ > --password xxxxx \ > --hive-import \ > --hive-overwrite \ > -m 23 \ > --direct \ > --hive-table profile_felix_test17 \ > --split-by id \ > --table Profile > > But if I use a --query instead of a --table, then I start getting bogus > records (and by that, I mean rows that have a non-sensically high primary > key that doesn't exist in my source database and null for the rest of the > cells). > > The output I get with the above query is not exactly the way I want it. > Using --query, I can get the data in the format I want (by transforming > some stuff inside MySQL), but then I also get the bogus rows, which pretty > much makes the Hive table unusable. > > I tried various combinations of parameters and it's hard to pin-point > exactly what causes the problem, so it could be more intricate than my > above simplistic description. That being said, removing --table and adding > the following params definitely breaks it: > > --target-dir /tests/sqoop/general/profile_felix_test \ > --query "select * from Profile WHERE \$CONDITIONS" > > (Ultimately, I want to use a query that's more complex than this, but even > a simple query like this breaks...) > > Any ideas why this would happen and how to solve it? > > Is this the kind of problem that Sqoop2's cleaner architecture intends to > solve? > > I use CDH 4.2, BTW. > > Thanks :) ! > > -- > Felix
signature.asc
Description: Digital signature
