RE: [DISCUSS] Properties for scheduling compactions on specific queues

Janos Kovacs Mon, 07 Feb 2022 02:49:49 -0800

Hi Stamatis,

I agree that the [compactor.]*hive.compactor.queue.name
<http://hive.compactor.queue.name>* is a better solution as hive now also
supports query based compaction, not only MR.
...although I think this needs to be backward compatible!


What do you think about a logic similar to this:

--- a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
   2022-02-07 10:31:28.000000000 +0100
+++ b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
   2022-02-07 10:33:25.000000000 +0100
@@ -145,10 +145,19 @@
     overrideMRProps(job, t.getParameters()); // override MR
properties from tblproperties if applicable
     if (ci.properties != null) {
       overrideTblProps(job, t.getParameters(), ci.properties);
     }

+    // make queue configuration backward compatible
+    // at that point overrideMRProps and OverrideTblProps already consolidated
+    // the final value, just need to use job.TBALE_PROPS
+    String queueNameLegacy =
+      (new 
StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("compactor.mapred.job.queue.name");
+    if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
+      job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
+    }
+
     String queueName = HiveConf.getVar(job, ConfVars.COMPACTOR_JOB_QUEUE);
     if (queueName != null && queueName.length() > 0) {
       job.setQueueName(queueName);
     }


Of course this can be wrapped around with a new config if needed, like
hive.compaction.queue.name.use.legacy or whatever...
FYI: we might also want to check legacy config not only for
*"compactor.mapred.job.queue.name
<http://compactor.mapred.job.queue.name>"* but also for
*"compactor.mapreduce.job.queuename" *as the first one was already on the
deprecated list as pointed out by Peter Vary.

Please also note that the change introduced by HIVE-25595 is currently not
compatible with the new config as it was developed for the old
compactor.mapred... property:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
This also needs to be handled - for both the new prop name and backward
compatibility.

R, Janos


On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
> Hi all,
>
> This email is an attempt to converge on which Hive/Tez/MR properties
> someone should use in order to schedule a compaction on specific queues.
> For those who are not familiar with how queues are used the YARN capacity
> scheduler documentation [1] gives the general idea.
>
> Using specific queues for compaction jobs is necessary to be able to
> efficiently allocate resources for maintenance tasks (compaction) and
> production workloads. Hive provides various ways to control the queues
used
> by the compactor and there have been various tickets with improvements and
> fixes in this area (see list below).
>
> The granularity we can select queues for compactions (all tables vs. per
> table) currently depends on which compactor is in use (MR vs Query based)
> and boils down to the following properties:
>
> Global configuration:
> * hive.compactor.job.queue
> * mapred.job.queue.name
> * tez.queue.name
>
> Per table/statement configuration (table properties):
> * compactor.mapred.job.queue.name (before HIVE-20723)
> * compactor.hive.compactor.job.queue (after HIVE-20723)
>
> Things are a bit blurred with respect to what properties someone should
use
> to achieve the desired result. Some changes, such as HIVE-20723, raise
> backward compatibility concerns and other changes seem to have a larger
> impact than the one specifically designed for. For example, after
> HIVE-25595, map reduce queue properties can have an impact on the
compactor
> queues even when Tez is in use.
>
> In order to avoid confusion and ensure long term support of these queue
> selection features we should clarify which of the above properties should
> be used.
>
> Given the current situation, I would propose to officially support only
the
> following:
> * hive.compactor.job.queue
> * compactor.hive.compactor.job.queue
> and align the implementation based on these (if necessary). In other
words,
> Hive users should not use mapred.job.queue.name and tez.queue.name
> explicitly at least when it comes to the compactor. Hive should set them
> transparently (as it happens now in various places) based on
> [compactor.]hive.compactor.job.queue.
>
> What do people think? Are there other ideas?
>
> Best,
> Stamatis
>
> [1]
>
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> HIVE-11997: Add ability to send Compaction Jobs to specific queue
> HIVE-13354: Add ability to specify Compaction options per table and per
> request
> HIVE-20723: Allow per table specification of compaction yarn queue
> HIVE-24781: Allow to use custom queue for query based compaction
> HIVE-25801: Custom queue settings is not honoured by Query based
compaction
> StatsUpdater
> HIVE-25595: Custom queue settings is not honoured by compaction
StatsUpdater
>

RE: [DISCUSS] Properties for scheduling compactions on specific queues

Reply via email to