Jack,

Just a thought... but have you tried using --target-dir?

-Abe

On Mon, Mar 2, 2015 at 12:24 PM, Jack Arenas <[email protected]> wrote:

> Hi team,
>
> I'm building an ETL tool that requires me to pull in a bunch of tables
> from a db into HDFS and I'm currently doing this sequentially using Sqoop.
> I figured it might be a faster to submit the Sqoop jobs in parallel, that
> is with a predefined thread pool (currently trying 8) because it took about
> two hours to ingest 150 tables of various sizes, frankly not very big
> tables as this is POC. So sequentially this works fine, but as soon as I
> add parallelism, roughly 75% of my Sqoop jobs fail, and I'm not saying that
> they don't ingest any data, simply that the data gets stuck in the staging
> area (I.e /user/username) as opposed to the proper hive table (I.e
> /user/username/Hive/Lab). Has anyone experienced this before? I figure I
> may be able to shoot a separate process that moves the hive tables from the
> staging area into the hive table area, but I'm not sure if that process
> would simply be to move the tables or if there is more involved.
>
> Thanks!
>
> Specs: HDP 2.1, Sqoop 1.4.4.2
>
> Cheers,
> Jack
>
>

Reply via email to