Hi Stamatis, the proposal seems reasonable to me. I think that setting the two properties you mention, independently from the underlying execution engine in use, should lead to the same result.
In addition, I also agree that we should deprecate the per-execution engine properties. Best regards, Alessandro On Mon, 31 Jan 2022 at 10:51, Stamatis Zampetakis <zabe...@gmail.com> wrote: > Hi all, > > This email is an attempt to converge on which Hive/Tez/MR properties > someone should use in order to schedule a compaction on specific queues. > For those who are not familiar with how queues are used the YARN capacity > scheduler documentation [1] gives the general idea. > > Using specific queues for compaction jobs is necessary to be able to > efficiently allocate resources for maintenance tasks (compaction) and > production workloads. Hive provides various ways to control the queues used > by the compactor and there have been various tickets with improvements and > fixes in this area (see list below). > > The granularity we can select queues for compactions (all tables vs. per > table) currently depends on which compactor is in use (MR vs Query based) > and boils down to the following properties: > > Global configuration: > * hive.compactor.job.queue > * mapred.job.queue.name > * tez.queue.name > > Per table/statement configuration (table properties): > * compactor.mapred.job.queue.name (before HIVE-20723) > * compactor.hive.compactor.job.queue (after HIVE-20723) > > Things are a bit blurred with respect to what properties someone should > use to achieve the desired result. Some changes, such as HIVE-20723, raise > backward compatibility concerns and other changes seem to have a larger > impact than the one specifically designed for. For example, after > HIVE-25595, map reduce queue properties can have an impact on the compactor > queues even when Tez is in use. > > In order to avoid confusion and ensure long term support of these queue > selection features we should clarify which of the above properties should > be used. > > Given the current situation, I would propose to officially support only > the following: > * hive.compactor.job.queue > * compactor.hive.compactor.job.queue > and align the implementation based on these (if necessary). In other > words, Hive users should not use mapred.job.queue.name and tez.queue.name > explicitly at least when it comes to the compactor. Hive should set them > transparently (as it happens now in various places) based on > [compactor.]hive.compactor.job.queue. > > What do people think? Are there other ideas? > > Best, > Stamatis > > [1] > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > > HIVE-11997: Add ability to send Compaction Jobs to specific queue > HIVE-13354: Add ability to specify Compaction options per table and per > request > HIVE-20723: Allow per table specification of compaction yarn queue > HIVE-24781: Allow to use custom queue for query based compaction > HIVE-25801: Custom queue settings is not honoured by Query based > compaction StatsUpdater > HIVE-25595: Custom queue settings is not honoured by compaction > StatsUpdater >