Jack, Just a thought... but have you tried using --target-dir?
-Abe On Mon, Mar 2, 2015 at 12:24 PM, Jack Arenas <[email protected]> wrote: > Hi team, > > I'm building an ETL tool that requires me to pull in a bunch of tables > from a db into HDFS and I'm currently doing this sequentially using Sqoop. > I figured it might be a faster to submit the Sqoop jobs in parallel, that > is with a predefined thread pool (currently trying 8) because it took about > two hours to ingest 150 tables of various sizes, frankly not very big > tables as this is POC. So sequentially this works fine, but as soon as I > add parallelism, roughly 75% of my Sqoop jobs fail, and I'm not saying that > they don't ingest any data, simply that the data gets stuck in the staging > area (I.e /user/username) as opposed to the proper hive table (I.e > /user/username/Hive/Lab). Has anyone experienced this before? I figure I > may be able to shoot a separate process that moves the hive tables from the > staging area into the hive table area, but I'm not sure if that process > would simply be to move the tables or if there is more involved. > > Thanks! > > Specs: HDP 2.1, Sqoop 1.4.4.2 > > Cheers, > Jack > >
