For those interested, I managed to define a way to launch the LLAP application 
master and daemons on separate, targeted machines. It was inspired by an 
article I found [1] and implemented using YARN Node Labels [2] and Placement 
Constraints [3] with a modification to the file scripts/llap/yarn/templates.py. 
Here are the basic instructions:

1. Configure YARN to enable placement constraints and node labels. You have the 
option of using 2 node labels or 1 node label + the default partition. The 
machines that are intended to run the daemons must have a label associated with 
them. If you choose to use 2 node labels, you must set the default label for 
the queue that you're submitting LLAP to, to the node label associated with the 
machine that will run the application master. Note that this affects other 
applications submitted to the same queue. If it's only 1 label, the machine 
that will run the AM must be accessible by the DEFAULT_PARTITION queue, and 
that machine will not be specifically targeted if you have more than one 
machine accessible by the DEFAULT_PARTITION, so this scenario is recommended 
only if you have a single machine intended for application masters, as is my 
case.

2. Modify scripts/llap/yarn/templates.py like so:

#SNIP


          "APP_ROOT": "<WORK_DIR>/app/install/",

          "APP_TMP_DIR": "<WORK_DIR>/tmp/"

        }

      },

      "placement_policy": {

        "constraints": [

          {

            "type": "ANTI_AFFINITY",

            "scope": "NODE",

            "target_tags": [

              "llap"

            ],

            "node_partitions": [

              "<INSERT LLAP DAEMON NODE LABEL HERE>"

            ]

          }

        ]

      }

    }

  ],

  "kerberos_principal" : {

#SNIP

Note that ANTI_AFFINITY means that only 1 daemon will be spawned per machine 
but that should be the desired behaviour anyway. Read more about it in [3].

3. Launch LLAP using the hive --service llap command

Hope this helps someone!
Aaron

[1] 
https://www.gresearch.com/blog/article/hive-llap-in-practice-sizing-setup-and-troubleshooting/
[2] 
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
[3] 
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html

On 2023/03/22 10:19:57 Aaron Grubb wrote:
> Hi all,
>
> I have a Hadoop cluster (3.3.4) with 6 nodes of equal resource size that run 
> HDFS and YARN and 1 node with lower resources which only runs YARN that I use 
> for Hive AMs, the LLAP AM, Spark AMs and Hive file merge containers. The HDFS 
> nodes are set up such that the queue for LLAP on the YARN NodeManager is 
> allocated resources exactly equal to what the LLAP daemons consume. However, 
> when I need to re-launch LLAP, I currently have to stop the NodeManager 
> processes on each HDFS node, then launch LLAP to guarantee that the 
> application master ends up on the YARN-only machine, then start the 
> NodeManager processes again to let the daemons start spawning on the nodes. 
> This used to not be a problem because only Hive/LLAP was using YARN but now 
> we've started using Spark in my company and I'm in a position where if LLAP 
> happens to crash, I would need to wait for Spark jobs to finish before I can 
> re-launch LLAP, which would put our ETL processes behind, potentially to 
> unacceptable delays. I could allocate 1 vcore and 1024mb memory extra for the 
> LLAP queue on each machine, however that would mean I have 5 vcores and 5gb 
> RAM being reserved and unused at all times, so I was wondering if there's a 
> way to specify which node to launch the LLAP AM on, perhaps through YARN node 
> labels similar to the Spark "spark.yarn.am.nodeLabelExpression" 
> configuration? Or even a way to specify the node machine through a different 
> mechanism? My Hive version is 3.1.3.
>
> Thanks,
> Aaron
>

Reply via email to