Hi Stamatis, I agree that the [compactor.]*hive.compactor.queue.name <http://hive.compactor.queue.name>* is a better solution as hive now also supports query based compaction, not only MR. ...although I think this needs to be backward compatible!
What do you think about a logic similar to this: --- a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 2022-02-07 10:31:28.000000000 +0100 +++ b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 2022-02-07 10:33:25.000000000 +0100 @@ -145,10 +145,19 @@ overrideMRProps(job, t.getParameters()); // override MR properties from tblproperties if applicable if (ci.properties != null) { overrideTblProps(job, t.getParameters(), ci.properties); } + // make queue configuration backward compatible + // at that point overrideMRProps and OverrideTblProps already consolidated + // the final value, just need to use job.TBALE_PROPS + String queueNameLegacy = + (new StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("compactor.mapred.job.queue.name"); + if (queueNameLegacy != null && queueNameLegacy.length() > 0) { + job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy); + } + String queueName = HiveConf.getVar(job, ConfVars.COMPACTOR_JOB_QUEUE); if (queueName != null && queueName.length() > 0) { job.setQueueName(queueName); } Of course this can be wrapped around with a new config if needed, like hive.compaction.queue.name.use.legacy or whatever... FYI: we might also want to check legacy config not only for *"compactor.mapred.job.queue.name <http://compactor.mapred.job.queue.name>"* but also for *"compactor.mapreduce.job.queuename" *as the first one was already on the deprecated list as pointed out by Peter Vary. Please also note that the change introduced by HIVE-25595 is currently not compatible with the new config as it was developed for the old compactor.mapred... property: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31 This also needs to be handled - for both the new prop name and backward compatibility. R, Janos On 2022/01/31 09:50:49 Stamatis Zampetakis wrote: > Hi all, > > This email is an attempt to converge on which Hive/Tez/MR properties > someone should use in order to schedule a compaction on specific queues. > For those who are not familiar with how queues are used the YARN capacity > scheduler documentation [1] gives the general idea. > > Using specific queues for compaction jobs is necessary to be able to > efficiently allocate resources for maintenance tasks (compaction) and > production workloads. Hive provides various ways to control the queues used > by the compactor and there have been various tickets with improvements and > fixes in this area (see list below). > > The granularity we can select queues for compactions (all tables vs. per > table) currently depends on which compactor is in use (MR vs Query based) > and boils down to the following properties: > > Global configuration: > * hive.compactor.job.queue > * mapred.job.queue.name > * tez.queue.name > > Per table/statement configuration (table properties): > * compactor.mapred.job.queue.name (before HIVE-20723) > * compactor.hive.compactor.job.queue (after HIVE-20723) > > Things are a bit blurred with respect to what properties someone should use > to achieve the desired result. Some changes, such as HIVE-20723, raise > backward compatibility concerns and other changes seem to have a larger > impact than the one specifically designed for. For example, after > HIVE-25595, map reduce queue properties can have an impact on the compactor > queues even when Tez is in use. > > In order to avoid confusion and ensure long term support of these queue > selection features we should clarify which of the above properties should > be used. > > Given the current situation, I would propose to officially support only the > following: > * hive.compactor.job.queue > * compactor.hive.compactor.job.queue > and align the implementation based on these (if necessary). In other words, > Hive users should not use mapred.job.queue.name and tez.queue.name > explicitly at least when it comes to the compactor. Hive should set them > transparently (as it happens now in various places) based on > [compactor.]hive.compactor.job.queue. > > What do people think? Are there other ideas? > > Best, > Stamatis > > [1] > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > > HIVE-11997: Add ability to send Compaction Jobs to specific queue > HIVE-13354: Add ability to specify Compaction options per table and per > request > HIVE-20723: Allow per table specification of compaction yarn queue > HIVE-24781: Allow to use custom queue for query based compaction > HIVE-25801: Custom queue settings is not honoured by Query based compaction > StatsUpdater > HIVE-25595: Custom queue settings is not honoured by compaction StatsUpdater >