Hi Stamatis,
the proposal seems reasonable to me.

I think that setting the two properties you mention, independently from the
underlying execution engine in use, should lead to the same result.

In addition, I also agree that we should deprecate the per-execution engine
properties.

Best regards,
Alessandro

On Mon, 31 Jan 2022 at 10:51, Stamatis Zampetakis <zabe...@gmail.com> wrote:

> Hi all,
>
> This email is an attempt to converge on which Hive/Tez/MR properties
> someone should use in order to schedule a compaction on specific queues.
> For those who are not familiar with how queues are used the YARN capacity
> scheduler documentation [1] gives the general idea.
>
> Using specific queues for compaction jobs is necessary to be able to
> efficiently allocate resources for maintenance tasks (compaction) and
> production workloads. Hive provides various ways to control the queues used
> by the compactor and there have been various tickets with improvements and
> fixes in this area (see list below).
>
> The granularity we can select queues for compactions (all tables vs. per
> table) currently depends on which compactor is in use (MR vs Query based)
> and boils down to the following properties:
>
> Global configuration:
> * hive.compactor.job.queue
> * mapred.job.queue.name
> * tez.queue.name
>
> Per table/statement configuration (table properties):
> * compactor.mapred.job.queue.name (before HIVE-20723)
> * compactor.hive.compactor.job.queue (after HIVE-20723)
>
> Things are a bit blurred with respect to what properties someone should
> use to achieve the desired result. Some changes, such as HIVE-20723, raise
> backward compatibility concerns and other changes seem to have a larger
> impact than the one specifically designed for. For example, after
> HIVE-25595, map reduce queue properties can have an impact on the compactor
> queues even when Tez is in use.
>
> In order to avoid confusion and ensure long term support of these queue
> selection features we should clarify which of the above properties should
> be used.
>
> Given the current situation, I would propose to officially support only
> the following:
> * hive.compactor.job.queue
> * compactor.hive.compactor.job.queue
> and align the implementation based on these (if necessary). In other
> words, Hive users should not use mapred.job.queue.name and tez.queue.name
> explicitly at least when it comes to the compactor. Hive should set them
> transparently (as it happens now in various places) based on
> [compactor.]hive.compactor.job.queue.
>
> What do people think? Are there other ideas?
>
> Best,
> Stamatis
>
> [1]
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> HIVE-11997: Add ability to send Compaction Jobs to specific queue
> HIVE-13354: Add ability to specify Compaction options per table and per
> request
> HIVE-20723: Allow per table specification of compaction yarn queue
> HIVE-24781: Allow to use custom queue for query based compaction
> HIVE-25801: Custom queue settings is not honoured by Query based
> compaction StatsUpdater
> HIVE-25595: Custom queue settings is not honoured by compaction
> StatsUpdater
>

Reply via email to