Max number of connections for concurrent sqoop jobs

Per Ullberg Tue, 03 May 2016 00:27:08 -0700

Hi,

I have a bunch of sqoop jobs that import data from the same RDBMS. We have
a max number of allowed connection on the RDBMS and want to make sure that
we at any given time don't exceed that number.


So far this has been solved by using the yarn capacity scheduler (2.6.0)
and configure a queue in such way that the max containers granted is lower
than the connection limit. This however is hard to maintain since all the
queue arguments are relative to the current global capacity of the cluster.

Basically:
Adding more nodes --> more containers granted to the queue --> sqoop jobs
exceed the allowed number of concurrent connections

Is there a simpler way to group and throttle a bunch of sqoop jobs together
so they don't exhaust the RDBMS we import from?

FYI: We schedule the sqoop jobs with oozie

best regards
/Pelle

-- 

*Per Ullberg*
Tech Lead
Odin - Uppsala

Klarna AB
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00
Reg no: 556737-0431
klarna.com

Max number of connections for concurrent sqoop jobs

Reply via email to