Hi, I have a bunch of sqoop jobs that import data from the same RDBMS. We have a max number of allowed connection on the RDBMS and want to make sure that we at any given time don't exceed that number.
So far this has been solved by using the yarn capacity scheduler (2.6.0) and configure a queue in such way that the max containers granted is lower than the connection limit. This however is hard to maintain since all the queue arguments are relative to the current global capacity of the cluster. Basically: Adding more nodes --> more containers granted to the queue --> sqoop jobs exceed the allowed number of concurrent connections Is there a simpler way to group and throttle a bunch of sqoop jobs together so they don't exhaust the RDBMS we import from? FYI: We schedule the sqoop jobs with oozie best regards /Pelle -- *Per Ullberg* Tech Lead Odin - Uppsala Klarna AB Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 Reg no: 556737-0431 klarna.com
