I seem to be getting a proper output with the above parameters BTW. I'll try to re-integrate the rest of my more complex ETL query in that sqoop job...
Thanks :) ! -- Felix On Thu, Mar 21, 2013 at 12:47 AM, Felix GV <[email protected]> wrote: > Thanks for your response Jarek :) > > I've started a new import run with --hive-drop-import-delims added and > --direct removed (since the two are mutually exclusive), we'll see how it > goes. > > Going to sleep now. I'll report back tomorrow :) > > -- > Felix > > > On Thu, Mar 21, 2013 at 12:42 AM, Jarek Jarcec Cecho <[email protected]>wrote: > >> Hi Felix, >> we've seen similar behaviour in the past when the data itself contains >> Hive special characters like new line characters. Would you mind trying >> your import with --hive-drop-import-delims to see if it helps? >> >> Jarcec >> >> On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote: >> > Hello, >> > >> > I'm trying to import a full table from MySQL to Hadoop/Hive. It works >> with >> > certain parameters, but when I try to do an ETL that's somewhat more >> > complex, I start getting bogus rows in my resulting table. >> > >> > This works: >> > >> > sqoop import \ >> > --connect >> > >> 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull' >> > \ >> > --username xxxxx \ >> > --password xxxxx \ >> > --hive-import \ >> > --hive-overwrite \ >> > -m 23 \ >> > --direct \ >> > --hive-table profile_felix_test17 \ >> > --split-by id \ >> > --table Profile >> > >> > But if I use a --query instead of a --table, then I start getting bogus >> > records (and by that, I mean rows that have a non-sensically high >> primary >> > key that doesn't exist in my source database and null for the rest of >> the >> > cells). >> > >> > The output I get with the above query is not exactly the way I want it. >> > Using --query, I can get the data in the format I want (by transforming >> > some stuff inside MySQL), but then I also get the bogus rows, which >> pretty >> > much makes the Hive table unusable. >> > >> > I tried various combinations of parameters and it's hard to pin-point >> > exactly what causes the problem, so it could be more intricate than my >> > above simplistic description. That being said, removing --table and >> adding >> > the following params definitely breaks it: >> > >> > --target-dir /tests/sqoop/general/profile_felix_test \ >> > --query "select * from Profile WHERE \$CONDITIONS" >> > >> > (Ultimately, I want to use a query that's more complex than this, but >> even >> > a simple query like this breaks...) >> > >> > Any ideas why this would happen and how to solve it? >> > >> > Is this the kind of problem that Sqoop2's cleaner architecture intends >> to >> > solve? >> > >> > I use CDH 4.2, BTW. >> > >> > Thanks :) ! >> > >> > -- >> > Felix >> > >
