Hi Levin,

Thanks for the update.

Can you share more information about the two VMs ?

Kind regards,
Wei

On Mon, Jun 30, 2025 at 7:57 PM Levin Ng <levindec...@gmail.com> wrote:

> Hi Wei,
>
> I’ve finally identified two VMs that are constantly causing the CPU
> overcommit ratio to be recreated, which prevents the host from rejoining
> the management server. I deleted the offending VMs and recreated them from
> a template.
>
> Regards,
> Levin
> On 1 Jul 2025 at 01:30 +0800, Levin Ng <levindec...@gmail.com>, wrote:
> > Hi Wei,
> >
> > Today after restart management server, I got same error for the same
> host rejoined last time,  do you have any hint?
> >
> > 2025-07-01 01:11:19,119 ERROR [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentConnectTaskPool-126:[ctx-ef0bee48]) (logid:08898435) Monitor
> ComputeCapacityListener says there is an error in the connect process for
> 125 due to Duplicate key cpuOvercommitRatio (attempted merging values 12
> and 12) java.lang.IllegalStateException: Duplicate key cpuOvercommitRatio
> (attempted merging values 12 and 12)
> >
> > Regards,
> > Levin
> >
> > On 28 Jun 2025 at 16:49 +0800, Levin Ng <levindec...@gmail.com>, wrote:
> > > Hi Wei,
> > >
> > > I did search the user_vm_details and vm_instance tables with the
> host_id, but I couldn’t find any duplicate records. I just shut down the
> running VMs on those hosts, removed the hosts, and let the agent re-join
> the ACS. The problem is gone, thanks to your help again! It’s been really
> frustrating with the recent ACS upgrade.
> > >
> > > Regards,
> > > Levin
> > > On 28 Jun 2025 at 16:34 +0800, Wei ZHOU <ustcweiz...@gmail.com>,
> wrote:
> > > > can you also check user_vm_details for the VMs running on the host ?
> > > >
> > > >
> > > > -Wei
> > > >
> > > > On Sat, Jun 28, 2025 at 10:04 AM Levin Ng <levindec...@gmail.com>
> wrote:
> > > >
> > > > > Hi Wei,
> > > > >
> > > > > Thanks again, from the problematic cluster_id 7, it just contains
> one
> > > > > cpuOvercommitRatio row, any idea?
> > > > >
> > > > > Regads,
> > > > > Levin
> > > > >
> > > > > MariaDB [cloud]> select * from cluster_details;
> > > > > +----+------------+-----------------------+-------+
> > > > > | id | cluster_id | name | value |
> > > > > +----+------------+-----------------------+-------+
> > > > > | 1 | 1 | memoryOvercommitRatio | 1.0 |
> > > > > | 2 | 1 | cpuOvercommitRatio | 1.0 |
> > > > > | 3 | 2 | memoryOvercommitRatio | 1.0 |
> > > > > | 4 | 2 | cpuOvercommitRatio | 1.0 |
> > > > > | 5 | 3 | memoryOvercommitRatio | 1.0 |
> > > > > | 6 | 3 | cpuOvercommitRatio | 1.0 |
> > > > > | 7 | 4 | memoryOvercommitRatio | 1.0 |
> > > > > | 8 | 4 | cpuOvercommitRatio | 1.0 |
> > > > > | 9 | 5 | memoryOvercommitRatio | 1.0 |
> > > > > | 10 | 5 | cpuOvercommitRatio | 1.0 |
> > > > > | 11 | 6 | memoryOvercommitRatio | 1.0 |
> > > > > | 12 | 6 | cpuOvercommitRatio | 1.0 |
> > > > > | 13 | 7 | memoryOvercommitRatio | 1.0 |
> > > > > | 14 | 7 | cpuOvercommitRatio | 12 |
> > > > > | 15 | 7 | resourceHAEnabled | false |
> > > > > | 16 | 8 | memoryOvercommitRatio | 1.3 |
> > > > > | 17 | 8 | cpuOvercommitRatio | 15.0 |
> > > > > | 18 | 9 | memoryOvercommitRatio | 1.3 |
> > > > > | 19 | 9 | cpuOvercommitRatio | 15.0 |
> > > > > | 20 | 10 | memoryOvercommitRatio | 1.3 |
> > > > > | 21 | 10 | cpuOvercommitRatio | 15.0 |
> > > > > | 22 | 11 | memoryOvercommitRatio | 1.0 |
> > > > > | 23 | 11 | cpuOvercommitRatio | 12.0 |
> > > > > +----+------------+-----------------------+-------+
> > > > > 23 rows in set (0.001 sec)
> > > > >
> > > > > MariaDB [cloud]> desc cluster_details;
> > > > >
> > > > >
> +------------+---------------------+------+-----+---------+----------------+
> > > > > | Field | Type | Null | Key | Default | Extra |
> > > > >
> > > > >
> +------------+---------------------+------+-----+---------+----------------+
> > > > > | id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
> > > > > | cluster_id | bigint(20) unsigned | NO | MUL | NULL | |
> > > > > | name | varchar(255) | NO | MUL | NULL | |
> > > > > | value | varchar(255) | NO | | NULL | |
> > > > >
> > > > >
> +------------+---------------------+------+-----+---------+----------------+
> > > > > 4 rows in set (0.005 sec)
> > > > >
> > > > > On 28 Jun 2025 at 15:54 +0800, Wei ZHOU <ustcweiz...@gmail.com>,
> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Maybe check cluster_details if there are multiple records with
> the same
> > > > > > name "cpuOvercommitRatio" for a cluster.
> > > > > >
> > > > > >
> > > > > > -Wei
> > > > > >
> > > > > > On Sat, Jun 28, 2025 at 9:37 AM Levin Ng <levindec...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I’m having trouble after 4.20.1 upgrade, some of the existing
> host are
> > > > > not
> > > > > > > able to reconnect ACS management and found some sql error in
> the log,
> > > > > > > anyone have idea how to resolve it?, thank you very much.
> > > > > > >
> > > > > > > 2025-06-28 15:30:49,259 ERROR
> [c.c.a.m.ClusteredAgentManagerImpl]
> > > > > > > (AgentConnectTaskPool-1092:[ctx-99bfb3dd]) (logid:b354f521)
> Monitor
> > > > > > > ComputeCapacityListener says there is an error in the connect
> process
> > > > > for
> > > > > > > 110 due to Duplicate key cpuOvercommitRatio (attempted merging
> values
> > > > > 12
> > > > > > > and 12) java.lang.IllegalStateException: Duplicate key
> > > > > cpuOvercommitRatio
> > > > > > > (attempted merging values 12 and 12)
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Levin
> > > > > > >
> > > > >
>

Reply via email to