Re: [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin

2025-06-20 Thread Daniel Kral
On 6/20/25 18:17, Jillian Morgan wrote: On Fri, Jun 20, 2025 at 10:32 AM Daniel Kral wrote: Add the location rule plugin to allow users to specify node affinity constraints for independent services. Location rules must specify one or more services, one or more node with optional priorities (t

[pve-devel] [PATCH ha-manager v2 03/26] introduce rules base plugin

2025-06-20 Thread Daniel Kral
Add a rules base plugin to allow users to specify different kinds of HA rules in a single configuration file, which put constraints on the HA Manager's behavior. Rule checkers can be registered for every plugin with the register_check(...) method and are used for checking the feasibility of the ru

[pve-devel] [PATCH ha-manager v2 17/26] test: ha tester: add test cases for strict negative colocation rules

2025-06-20 Thread Daniel Kral
Add test cases for strict negative colocation rules, i.e. where services must be kept on separate nodes. These verify the behavior of the services in strict negative colocation rules in case of a failover of the node of one or more of these services in the following scenarios: 1. 2 neg. colocated

[pve-devel] [PATCH docs v2 4/5] update static files to include ha resources failback flag

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- This patch is more of a show-case how the static files changed. changes since v1: - NEW! api-viewer/apidata.js | 14 ++ ha-manager.1-synopsis.adoc | 8 ha-resources-opts.adoc | 4 3 files changed, 26 insertions(+) diff --g

[pve-devel] [PATCH ha-manager v2 23/26] api: introduce ha rules api endpoints

2025-06-20 Thread Daniel Kral
Add CRUD API endpoints for HA rules, which assert whether the given properties for the rules are valid and will not make the existing rule set infeasible. Disallowing changes to the rule set via the API, which would make this and other rules infeasible, makes it safer for users of the HA Manager t

Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules

2025-06-20 Thread Jillian Morgan
Daniel, Firstly I want to say thank you very, very, very much! This extensive work obviously took a lot of time and effort. I feel like one of my Top-5 gripes with Proxmox (after moving from oVirt) will finally be resolved by this new feature. Next, however, I would like to add my two cents to th

Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules

2025-06-20 Thread DERUMIER, Alexandre via pve-devel
--- Begin Message --- >>1) Having "location" and "colocation" rules is, I think, going to be >>unnecessarily confusing for people. While it isn't too complicated to >>glean >>the distinction once having read the descriptions of them (and I had >>to go >>read the descriptions), they don't convey imm

Re: [pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin

2025-06-20 Thread Jillian Morgan
On Fri, Jun 20, 2025 at 10:32 AM Daniel Kral wrote: > Add the location rule plugin to allow users to specify node affinity > constraints for independent services. > > Location rules must specify one or more services, one or more node with > optional priorities (the default is 0), and a strictness

Re: [pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules

2025-06-20 Thread Daniel Kral
On 6/20/25 16:31, Daniel Kral wrote: Changelog - Just noticed that I missed one detail that might be beneficial to know, so following the patch changes is easier: - migrate ha groups internally in the HA Manager to ha location rules, so that internally these can already be replaced

[pve-devel] [PATCH ha-manager v2 18/26] test: ha tester: add test cases for strict positive colocation rules

2025-06-20 Thread Daniel Kral
Add test cases for strict positive colocation rules, i.e. where services must be kept on the same node together. These verify the behavior of the services in strict positive colocation rules in case of a failover of their assigned nodes in the following scenarios: 1. 2 pos. colocated services in a

Re: [pve-devel] [PATCH common v2 01/32] schema: parse property string: support skipping keys

2025-06-20 Thread Fabian Grünbichler
On June 18, 2025 3:01 pm, Fiona Ebner wrote: > In certain situations like restoring a backup, it can be useful to > skip certain properties. This allows to drop outdated properties from > the schema while still being able to parse property strings that > contain them, but without allowing all addit

Re: [pve-devel] superseded: [RFC cluster/ha-manager 00/16] HA colocation rules

2025-06-20 Thread Daniel Kral
Superseded by: https://lore.proxmox.com/pve-devel/20250620143148.218469-1-d.k...@proxmox.com/ On 3/25/25 16:12, Daniel Kral wrote: This RFC patch series is a draft for the implementation to allow users to specify colocation rules (or affinity/anti-affinity) for the HA Manager, so that two or mo

[pve-devel] [PATCH docs v2 2/5] update static files to include ha rules api endpoints

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- This patch is more of a show-case how the static files changed. changes since v1: - NEW! api-viewer/apidata.js | 363 + ha-manager.1-synopsis.adoc | 138 ++ 2 files changed, 501 insertions(+) diff --git a/a

[pve-devel] [PATCH ha-manager v2 14/26] manager: apply colocation rules when selecting service nodes

2025-06-20 Thread Daniel Kral
Add a mechanism to the node selection subroutine, which enforces the colocation rules defined in the rules config. The algorithm makes in-place changes to the set of nodes in such a way, that the final set contains only the nodes where the colocation rules allow the service to run on, depending on

[pve-devel] [PATCH docs v2 3/5] update static files to include use-location-rules feature flag

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- This patch is more of a show-case how the static files changed. changes since v1: - NEW! api-viewer/apidata.js | 9 - datacenter.cfg.5-opts.adoc | 6 +- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/api-viewer/apidata.js b/ap

[pve-devel] [PATCH ha-manager v2 26/26] api: services: check for colocations for service motions

2025-06-20 Thread Daniel Kral
The HA Manager already handles positive and negative colocations for individual service migration, but the information about these is only redirected to the HA environment's logger, i.e., for production usage these messages are redirected to the HA Manager node's syslog. Therefore, add checks when

[pve-devel] [PATCH ha-manager v2 12/26] manager: apply location rules when selecting service nodes

2025-06-20 Thread Daniel Kral
Replace the HA group mechanism by replacing it with the functionally equivalent location rules' get_location_preference(...), which enforces the location rules defined in the rules config. This allows the $groups parameter to be replaced with the $rules parameter in select_service_node(...) as all

[pve-devel] [PATCH ha-manager v2 21/26] manager: handle negative colocations with too many services

2025-06-20 Thread Daniel Kral
select_service_node(...) in 'none' mode will usually only return no node, if negative colocations specify more services than nodes available. In these cases, these cannot be separated as there are no more nodes left, so these are put in error state for now. Signed-off-by: Daniel Kral --- This is

[pve-devel] [PATCH ha-manager v2 19/26] test: ha tester: add test cases in more complex scenarios

2025-06-20 Thread Daniel Kral
Add test cases, where colocation rules are used with the static utilization scheduler and the rebalance on start option enabled. These verify the behavior in the following scenarios: - 7 services with interwined colocation rules in a 3 node cluster; 1 node failing - 3 neg. colocated services in

[pve-devel] [PATCH manager v2 3/5] ui: ha: hide ha groups if use-location-rules is enabled

2025-06-20 Thread Daniel Kral
Remove the HA Groups entry from the datacenter's config tabs if the use-location-rules feature flag is enabled. As changing the use-location-rules feature flag doesn't automatically reload the web interface, show an empty message if the HA Groups page is still open. Remove the 'ha-groups' from th

[pve-devel] [PATCH manager v2 2/5] ui: add use-location-rules feature flag

2025-06-20 Thread Daniel Kral
Add 'use-location-rules' feature flag to the datacenter options input panel to control the behavior of the HA Manager, API endpoints, and web interface to either use and show HA Groups (disabled), or use and show HA Location rules (enabled). The util helper is used in following patches to control

[pve-devel] [PATCH ha-manager v2 20/26] test: add test cases for rules config

2025-06-20 Thread Daniel Kral
Add test cases to verify that the rule checkers correctly identify and remove ill-defined location and colocation rules from the rules: - Set defaults when reading location and colocation rules - Dropping location rules, which specify the same service multiple times - Dropping colocation rules, wh

[pve-devel] [PATCH manager v2 5/5] ui: ha: add ha rules components and menu entry

2025-06-20 Thread Daniel Kral
Add components for basic CRUD operations on the HA rules and viewing potentially errors of contradictory HA rules, which are currently only possible by manually editing the file right now. The feature flag 'use-location-rules' controls whether location rules can be created from the web interface.

[pve-devel] [PATCH docs v2 1/5] ha: config: add section about ha rules

2025-06-20 Thread Daniel Kral
Add section about how to create and modify ha rules, describing their use cases and document their common and plugin-specific properties. As of now, HA Location rules are controlled by the feature flag 'use-location-rules' in the datacenter config to replace HA Groups. Signed-off-by: Daniel Kral

[pve-devel] [PATCH manager v2 4/5] ui: ha: adapt resources components if use-location-rules is enabled

2025-06-20 Thread Daniel Kral
Remove the group selector from the Resources grid view and edit window and replace it with the 'failback' field if the use-location-rules feature flag is enabled. Signed-off-by: Daniel Kral --- changes since v1: - NEW! www/manager6/ha/ResourceEdit.js | 27 ++- www/ma

[pve-devel] [PATCH ha-manager v2 25/26] api: groups, services: assert use-location-rules feature flag

2025-06-20 Thread Daniel Kral
Assert whether certain properties are allowed to be passed for the HA groups and HA services API endpoints depending on whether the use-location-rules feature flag is enabled or disabled. Signed-off-by: Daniel Kral --- changes since v1: - NEW! src/PVE/API2/HA/Groups.pm| 20 +

[pve-devel] [PATCH manager v2 1/5] api: ha: add ha rules api endpoints

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- changes since v1: - NEW! PVE/API2/HAConfig.pm | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/PVE/API2/HAConfig.pm b/PVE/API2/HAConfig.pm index 35f49cbb..d29211fb 100644 --- a/PVE/API2/HAConfig.pm +++ b/PVE/API2/HAConfig.pm @@ -12,6 +

[pve-devel] [PATCH ha-manager v2 04/26] rules: introduce location rule plugin

2025-06-20 Thread Daniel Kral
Add the location rule plugin to allow users to specify node affinity constraints for independent services. Location rules must specify one or more services, one or more node with optional priorities (the default is 0), and a strictness, which is either * 0 (loose): services MUST be located on o

[pve-devel] [PATCH ha-manager v2 15/26] manager: handle migrations for colocated services

2025-06-20 Thread Daniel Kral
Make positively colocated services migrate to the same target node as the manually migrated service and prevent a service to be manually migrated to a node, which contains negatively colocated services. The log information here is only redirected to the HA Manager node's syslog, so user-facing end

[pve-devel] [PATCH ha-manager v2 05/26] rules: introduce colocation rule plugin

2025-06-20 Thread Daniel Kral
Add the colocation rule plugin to allow users to specify inter-service affinity constraints. Colocation rules must specify two or more services and a colocation affinity. The inter-service affinity of colocation rules must be either * together (positive): keeping services together, or * separa

[pve-devel] [PATCH ha-manager v2 11/26] manager: migrate ha groups to location rules in-memory

2025-06-20 Thread Daniel Kral
Migrate the currently configured HA groups to HA Location rules in-memory if the use-location-rules feature flag isn't set, so that they can be applied as such in the next patches and therefore replace HA groups internally. Also ignore location rules written to the rules config if the use-location

[pve-devel] [PATCH ha-manager v2 16/26] sim: resources: add option to limit start and migrate tries to node

2025-06-20 Thread Daniel Kral
Add an option to the VirtFail's name to allow the start and migrate fail counts to only apply on a certain node number with a specific naming scheme. This allows a slightly more elaborate test type, e.g. where a service can start on one node (or any other in that case), but fails to start on a spe

[pve-devel] [PATCH ha-manager v2 10/26] resources: introduce failback property in service config

2025-06-20 Thread Daniel Kral
Add the failback property in the service config, which is functionally equivalent to the negation of the HA group's nofailback property. It is set to be enabled by default as the HA group's nofailback property was disabled by default. Signed-off-by: Daniel Kral --- changes since v1: - NEW!

[pve-devel] [PATCH ha-manager v2 08/26] manager: read and update rules config

2025-06-20 Thread Daniel Kral
Read the rules configuration in each round and update the canonicalized rules configuration if there were any changes since the last round to reduce the amount of times of verifying the rule set. Signed-off-by: Daniel Kral --- changes since v1: - only read and canonicalize rules here... intro

[pve-devel] [RFC common/cluster/ha-manager/docs/manager v2 00/40] HA colocation rules

2025-06-20 Thread Daniel Kral
This is a follow-up to the previous RFC patch series for the HA colocation rules feature, which allow users to specify colocation rules (or affinity/anti-affinity) for the HA Manager, so that two or more services are either kept together or apart with respect to each other. Changelog - I

[pve-devel] [PATCH cluster v2 3/3] datacenter config: introduce feature flag for location rules

2025-06-20 Thread Daniel Kral
Add a feature flag 'use-location-rules', which is used to control the behavior of how the HA WebGUI interface and HA API endpoints handle HA Groups and HA Location rules. If the flag is not set, HA Location rules shouldn't be able to be created or modified, but only allow their behavior to be repr

[pve-devel] [PATCH cluster v2 2/3] datacenter config: make pve-ha-shutdown-policy optional

2025-06-20 Thread Daniel Kral
If there are other properties in the HA config hash, these cannot be set without also giving a value for shutdown_policy, which is unnecessary as it already has a default value. Therefore, make it optional. Signed-off-by: Daniel Kral --- changes since v1: - NEW! src/PVE/DataCenterConfig.pm

[pve-devel] [PATCH docs v2 5/5] update static files to include ha service motion return value schema

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- This patch is more of a show-case how the static files changed. changes since v1: - NEW! api-viewer/apidata.js | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/api-viewer/apidata.js b/api-viewer/apidata.js index

[pve-devel] [PATCH cluster v2 1/3] cfs: add 'ha/rules.cfg' to observed files

2025-06-20 Thread Daniel Kral
Signed-off-by: Daniel Kral --- changes since v1: - only rebased on master src/PVE/Cluster.pm | 1 + src/pmxcfs/status.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm index 3b1de57..9ec4f66 100644 --- a/src/PVE/Cluster.pm +++ b/src/PVE/Cluster.

[pve-devel] [PATCH ha-manager v2 24/26] cli: expose ha rules api endpoints to ha-manager cli

2025-06-20 Thread Daniel Kral
Expose the HA rules API endpoints through the CLI in its own subcommand. The names of the subsubcommands are chosen to be consistent with the other commands provided by the ha-manager CLI for services and groups. The properties specified for the 'rules config' command are chosen to reflect the co

[pve-devel] [PATCH ha-manager v2 22/26] config: prune services from rules if services are deleted from config

2025-06-20 Thread Daniel Kral
Remove services from rules, where these services are used, if they are removed by delete_service_from_config(...), which is called by the services' delete API endpoint and possibly external callers, e.g. if the service is removed externally. If all of the rules' services have been removed, the rul

[pve-devel] [PATCH ha-manager v2 13/26] usage: add information about a service's assigned nodes

2025-06-20 Thread Daniel Kral
This will be used to retrieve the nodes, which a service is currently putting load on and using their resources, when dealing with colocation rules in select_service_node(...). For example, a migrating service in a negative colocation will need to block other negatively colocated services to migrat

[pve-devel] [PATCH ha-manager v2 06/26] rules: add global checks between location and colocation rules

2025-06-20 Thread Daniel Kral
Add checks, which determine infeasible colocation rules, because their services are already restricted by their location rules in such a way, that these cannot be satisfied or are reasonable to be proven to be satisfiable. Positive colocation rule services need to have at least one common node to

[pve-devel] [PATCH ha-manager v2 07/26] config, env, hw: add rules read and parse methods

2025-06-20 Thread Daniel Kral
Adds methods to the HA environment to read and write the rules configuration file for the different environment implementations. Signed-off-by: Daniel Kral --- changes since v1: - reorder use statements - use property isolation for the rules plugin - introduce `read_and_check_rules_co

[pve-devel] [PATCH ha-manager v2 02/26] manager: improve signature of select_service_node

2025-06-20 Thread Daniel Kral
As the signature of select_service_node(...) has become rather long already, make it more compact by retrieving service- and affinity-related data directly from the service state in $sd and introduce a $mode parameter to distinguish the behaviors of $try_next and $best_scored, which have already be

[pve-devel] [PATCH ha-manager v2 01/26] tree-wide: make arguments for select_service_node explicit

2025-06-20 Thread Daniel Kral
Explicitly state all the parameters at all call sites for select_service_node(...) to clarify in which states these are. The call site in next_state_recovery(...) sets $best_scored to 1, as it should find the next best node when recovering from the failed node $current_node. All references to $bes

[pve-devel] [PATCH common v2 1/1] introduce HashTools module

2025-06-20 Thread Daniel Kral
Add a new package PVE::HashTools to provide helpers for common operations done on hashes. These initial helper subroutines implement basic set operations done on hash sets, i.e. hashes with elements set to a true value, e.g. 1. Signed-off-by: Daniel Kral --- changes since v1: - moved from pv

[pve-devel] partially-applied: [PATCH-SERIES common/qemu-server v2 00/32] preparation for switch to blockdev

2025-06-20 Thread Fabian Grünbichler
applied most of the patches, except for 6, 28/29 and 31/32. 6 is dropped altogether, the other 4 can be included in the switch to blockdev to avoid affecting the block graph before that happens. folded in a small fixup for 7/8, and committed the changed expected output for tests. thanks a lot

[pve-devel] [PATCH] iscsi: fix excessive connection test spam on storage monitoring

2025-06-20 Thread Stelios Vailakakis
Hi all, This patch addresses excessive "connection lost" and "connection reset" log spam on iSCSI targets caused by Proxmox storage monitoring performing TCP connection tests every 10 seconds, even when iSCSI sessions are active. The issue appears as continuous log entries on iSCSI targets: "ctld

Re: [pve-devel] [PATCH qemu-server v2 18/32] vm start/commandline: activate volumes before config_to_command()

2025-06-20 Thread Fabian Grünbichler
On June 18, 2025 3:01 pm, Fiona Ebner wrote: > With '-blockdev', it is necessary to activate the volumes to generate > the command line, because it can be necessary to check whether the > volume is a block device or a regular file. > > Do not deactivate after commandline generation for 'qm showcmd

Re: [pve-devel] [PATCH qemu-server v2 08/32] drive: remove geometry options gone since QEMU 3.1

2025-06-20 Thread Fabian Grünbichler
On June 18, 2025 3:01 pm, Fiona Ebner wrote: > It was not possible to start a QEMU instance with these options set > since QEMU version 3.1, QEMU commit b24ec3c462 ("block: Remove > deprecated -drive geometry options") and thus also not to take a > backup. It is still possible to restore an old bac

Re: [pve-devel] [PATCH qemu-server v2 08/32] drive: remove geometry options gone since QEMU 3.1

2025-06-20 Thread Fiona Ebner
Am 20.06.25 um 13:03 schrieb Fabian Grünbichler: > On June 18, 2025 3:01 pm, Fiona Ebner wrote: >> It was not possible to start a QEMU instance with these options set >> since QEMU version 3.1, QEMU commit b24ec3c462 ("block: Remove >> deprecated -drive geometry options") and thus also not to take