Sqoop2 scheduler pool support

Scott Kuehn Mon, 29 Feb 2016 10:34:19 -0800

Does sqoop2 provide a mechanism to configure jobs to run in ad-hoc
scheduler pools? By ad-hoc, I mean a scheduler pool that is not necessarily
the same as the pool configured in the sqoop2 server's mapred-site.xml.


The use case is to limit cluster-wide sqoop access to a particular FROM
resource. While the throttling extractor mechanics are useful for
preventing a single job from saturating the resource, this mechanism cannot
limit aggregate resource access across jobs. I'd like to allocate a yarn
scheduler pool that caps the vcores and ram available for jobs accessing
the particularly sensitive database. A subset of sqoop2 jobs would be
configured to run in this pool, whereas other sqoop2 jobs would fall back
to the default pool configured for the sqoop2 server.

A glance at the code and some recent configuration work
<https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity>
suggests
this functionality isn't available today. I'm interested to hear if this is
the case, and whether or not any reasonable workarounds exist. I'm using
apache sqoop 1.99.6-RC2.

Sqoop2 scheduler pool support

Reply via email to