At this time sqoop2 does not provide a mechanism to configure the job’s 
scheduler pool or provide a mechanism for passing through arbitrary 
configuration to the map/reduce job.

I am not sure that configuring a scheduler pool is something that we would want 
to specifically prompt for in the shell but I definitely could see the use case 
for passing through job specific mapreduce configuration.

Please feel free to open a JIRA for this feature request.

Thanks,
Abe


> On Feb 29, 2016, at 10:33 AM, Scott Kuehn <[email protected]> wrote:
> 
> Does sqoop2 provide a mechanism to configure jobs to run in ad-hoc scheduler 
> pools? By ad-hoc, I mean a scheduler pool that is not necessarily the same as 
> the pool configured in the sqoop2 server's mapred-site.xml.
> 
> The use case is to limit cluster-wide sqoop access to a particular FROM 
> resource. While the throttling extractor mechanics are useful for preventing 
> a single job from saturating the resource, this mechanism cannot limit 
> aggregate resource access across jobs. I'd like to allocate a yarn scheduler 
> pool that caps the vcores and ram available for jobs accessing the 
> particularly sensitive database. A subset of sqoop2 jobs would be configured 
> to run in this pool, whereas other sqoop2 jobs would fall back to the default 
> pool configured for the sqoop2 server.
> 
> A glance at the code and some recent configuration work 
> <https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+Config+as+Top+Level+Entity>
>  suggests this functionality isn't available today. I'm interested to hear if 
> this is the case, and whether or not any reasonable workarounds exist. I'm 
> using apache sqoop 1.99.6-RC2.
> 

Reply via email to