Hello Mosharaf,

Right. So, libvirt is unable to communicate with Ceph for some reason then. 
Would you also please tell what `ceph -s ` and what `ceph -W cephadm` say? Do 
you see any abnormalities?

Please also confirm your cloudstack version. AFAIK, Read Balancer just changes 
the primary OSDs for the PGs but it appears the issue might be related to this 
or could be something else. It could be with your monitors or the OSDs flapping 
which could potentially make your Ceph pools unavailable for clients. Please 
share your monitor map as well.

PS: We've had the similar issue with Quincy where our OSDs were flapping 
continuously, marking healthier ones dead as well. Issue was at our Intel 810 
series NICs where the driver `ice` wasn't able to handle under heavy network 
load (40 to 50 Gbps + and huge PPS) . We solved it eventually but I'm trying to 
corelate your problem here.

Thanks,
Jayanth
________________________________
From: Mosharaf Hossain <mosharaf.hoss...@bol-online.com>
Sent: Tuesday, September 19, 2023 10:58:06 PM
To: Jayanth Reddy <jayanthreddy5...@gmail.com>
Cc: users@cloudstack.apache.org <users@cloudstack.apache.org>; Andrija Panic 
<andrija.pa...@gmail.com>; Product Development | BEXIMCO IT 
<p...@bol-online.com>
Subject: Re: CloudStack agent can't connect to upgraded CEPH Cluster

Hello Reddy
virsh secrect-list is showing but pool-list can't shows and it seems stuck.
[image.png]

Regards
Mosharaf Hossain
Manager, Product Development
IT Division

Bangladesh Export Import Company Ltd.

Level-8, SAM Tower, Plot #4, Road #22, Gulshan-1, Dhaka-1212,Bangladesh

Tel: +880 9609 000 999, +880 2 5881 5559, Ext: 14191, Fax: +880 2 9895757

Cell: +8801787680828, Email: 
mosharaf.hoss...@bol-online.com<mailto:mosharaf.hoss...@bol-online.com>, Web: 
www.bol-online.com
<https://www.google.com/url?q=http://www.bol-online.com&sa=D&source=hangouts&ust=1557908951423000&usg=AFQjCNGMxIuHSHsD3qO6y5JddpEZ0S592A>


On Tue, Sep 19, 2023 at 9:05 PM Jayanth Reddy 
<jayanthreddy5...@gmail.com<mailto:jayanthreddy5...@gmail.com>> wrote:
Hello Mosharaf,

I also see that you've created a thread on the Ceph-users mailing list 
regarding this. Did you get a chance to disable the Read Balancer as one of the 
devs suggested?

At Cloudstack end, in order to see if libvirt has issue communicating with 
Ceph, please try executing the below continuously on your hosts

# virsh pool-list

Please let me know if it freezes or doesn't return any response sometimes. 
AFAIK, there shouldn't be any compatibility issues as one of my Cloudstack 
deployments (v4.18.0.0) is running with Reef 18.2.0. Guess it has something to 
do with the Read balancer alone. Please also share your hosts' information, 
I'll see if I can reproduce.

Thanks,
Jayanth

________________________________
From: Simon Weller <siwelle...@gmail.com<mailto:siwelle...@gmail.com>>
Sent: Tuesday, September 19, 2023 8:25:17 PM
To: users@cloudstack.apache.org<mailto:users@cloudstack.apache.org> 
<users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>>
Cc: Andrija Panic <andrija.pa...@gmail.com<mailto:andrija.pa...@gmail.com>>; 
Product Development | BEXIMCO IT 
<p...@bol-online.com<mailto:p...@bol-online.com>>
Subject: Re: CloudStack agent can't connect to upgraded CEPH Cluster

Mosharaf,

Did you upgrade the Ceph client on your hosts as well?

What does "ceph -s" report? Is your cluster healthy?

Do you have any logs that indicate OSDs are disconnecting?

I'm not very familiar with the new read balancer feature in Reef. Can you
disable it and see if your performance improves?

-Si











On Tue, Sep 19, 2023 at 1:25 AM Mosharaf Hossain <
mosharaf.hoss...@bol-online.com<mailto:mosharaf.hoss...@bol-online.com>> wrote:

> Hello Andrija
>
>  Presently, CloudStack's host lists exhibited stability prior to the
> disaster, but their statuses are currently fluctuating continuously. Some
> hosts are initially marked as disconnected, but after a period, they
> transition to a connected state."
>
>
>
>
> [image: image.png]
>
> *Using virsh we are getting VM status on cshost1 as below*
> root@cshost1:~# virsh list
>  Id    Name           State
> -------------------------------
>  10    i-14-597-VM    running
>  61    r-757-VM       running
>  69    i-24-767-VM    running
>  76    r-71-VM        running
>  82    i-24-797-VM    running
>  113   r-335-VM       running
>  128   r-577-VM       running
>  148   i-14-1151-VM   running
>  164   i-2-1253-VM    running
>
>
> Regards
> Mosharaf Hossain
> Manager, Product Development
> IT Division
>
> Bangladesh Export Import Company Ltd.
>
> Level-8, SAM Tower, Plot #4, Road #22, Gulshan-1, Dhaka-1212,Bangladesh
>
> Tel: +880 9609 000 999, +880 2 5881 5559, Ext: 14191, Fax: +880 2 9895757
>
> Cell: +8801787680828, Email: 
> mosharaf.hoss...@bol-online.com<mailto:mosharaf.hoss...@bol-online.com>, Web:
> www.bol-online.com<http://www.bol-online.com>
>
> <https://www.google.com/url?q=http://www.bol-online.com&sa=D&source=hangouts&ust=1557908951423000&usg=AFQjCNGMxIuHSHsD3qO6y5JddpEZ0S592A>
>
>
>
> On Mon, Sep 18, 2023 at 12:43 PM Andrija Panic 
> <andrija.pa...@gmail.com<mailto:andrija.pa...@gmail.com>>
> wrote:
>
>> Hi,
>>
>> the message " Agent-Handler-1:null) (logid:) Connection with libvirtd is
>> broken: invalid connection pointer in virConnectGetVersion " - is a false
>> alarm and does NOT means any errors actually.
>>
>> I can see that ACS agent sees different storage pools - namely
>> "daab90ad-42d3-3c48-a9e4-b4c3c7fcdc84" and
>> "a2d455c6-68cb-303f-a7fa-287e62a5be9c" - and I don't see any explicit error
>> message about these 2 pools (both RBD/Ceph) pools.
>>
>> Also I can see that the cloudstack agent says it's connected to the mgmt
>> host - which means that all pools are in place (otherwise the agent would
>> not connect)
>>
>> 1. Are you KVM hosts all green when checking in CloudStack UI
>> (Connected/Up)?
>> 2. You can always use virsh to list pools and see if they are there
>>
>> Best,
>>
>> On Wed, 13 Sept 2023 at 13:54, Mosharaf Hossain <
>> mosharaf.hoss...@bol-online.com<mailto:mosharaf.hoss...@bol-online.com>> 
>> wrote:
>>
>>> Hello Folks
>>> We've recently performed an upgrade on our Cephadm cluster, transitioning
>>> from Ceph Quiency to Reef. However, following the manual implementation
>>> of
>>> a read balancer in the Reef cluster, we've experienced a significant
>>> slowdown in client I/O operations within the Ceph cluster, affecting both
>>> client bandwidth and overall cluster performance.
>>>
>>> This slowdown has resulted in unresponsiveness across all virtual
>>> machines
>>> within the cluster, despite the fact that the cluster exclusively
>>> utilizes
>>> SSD storage."
>>>
>>> In the CloudStack agent, we are getting libvirrt can't connect to CEPH
>>> pool
>>> and generating an error message.
>>>
>>> 2023-09-13 16:57:51,660 INFO  [cloud.agent.Agent] (Agent-Handler-4:null)
>>> (logid:) Lost connection to host: 10.10.11.61. Attempting reconnection
>>> while we still have 1 command in progress.
>>> 2023-09-13 16:57:51,661 INFO  [utils.nio.NioClient]
>>> (Agent-Handler-4:null)
>>> (logid:) NioClient connection closed
>>> 2023-09-13 16:57:51,662 INFO  [cloud.agent.Agent] (Agent-Handler-4:null)
>>> (logid:) Reconnecting to host:10.10.11.62
>>> 2023-09-13 16:57:51,662 INFO  [utils.nio.NioClient]
>>> (Agent-Handler-4:null)
>>> (logid:) Connecting to 10.10.11.62:8250<http://10.10.11.62:8250>
>>> 2023-09-13 16:57:51,663 INFO  [utils.nio.Link] (Agent-Handler-4:null)
>>> (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
>>> 2023-09-13 16:57:51,779 INFO  [utils.nio.NioClient]
>>> (Agent-Handler-4:null)
>>> (logid:) SSL: Handshake done
>>> 2023-09-13 16:57:51,779 INFO  [utils.nio.NioClient]
>>> (Agent-Handler-4:null)
>>> (logid:) Connected to 10.10.11.62:8250<http://10.10.11.62:8250>
>>> 2023-09-13 16:57:51,815 INFO  [utils.linux.KVMHostInfo]
>>> (Agent-Handler-1:null) (logid:) Fetching CPU speed from command "lscpu".
>>> 2023-09-13 16:57:51,836 INFO  [utils.linux.KVMHostInfo]
>>> (Agent-Handler-1:null) (logid:) Command [lscpu | grep -i 'Model name' |
>>> head -n 1 | egrep -o '[[:digit:]].[[:digit:]]+GHz' | sed 's/GHz//g']
>>> resulted in the value [2100] for CPU speed.
>>> 2023-09-13 16:57:51,900 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (Agent-Handler-1:null) (logid:) Attempting to create storage pool
>>> e205cf5f-ea32-46c7-ba18-d18f62772b80 (Filesystem) in libvirt
>>> 2023-09-13 16:57:51,901 ERROR [kvm.resource.LibvirtConnection]
>>> (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken:
>>> invalid
>>> connection pointer in virConnectGetVersion
>>> 2023-09-13 16:57:51,903 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (Agent-Handler-1:null) (logid:) Found existing defined storage pool
>>> e205cf5f-ea32-46c7-ba18-d18f62772b80, using it.
>>> 2023-09-13 16:57:51,904 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (Agent-Handler-1:null) (logid:) Trying to fetch storage pool
>>> e205cf5f-ea32-46c7-ba18-d18f62772b80 from libvirt
>>> 2023-09-13 16:57:51,924 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
>>> (logid:) Process agent startup answer, agent id = 0
>>> 2023-09-13 16:57:51,924 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
>>> (logid:) Set agent id 0
>>> 2023-09-13 16:57:51,955 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
>>> (logid:) Startup Response Received: agent id = 0
>>> 2023-09-13 16:57:52,047 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-5:null) (logid:e396a97c) Attempting to create
>>> storage
>>> pool daab90ad-42d3-3c48-a9e4-b4c3c7fcdc84 (RBD) in libvirt
>>> 2023-09-13 16:57:52,050 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-5:null) (logid:e396a97c) Found existing defined
>>> storage pool daab90ad-42d3-3c48-a9e4-b4c3c7fcdc84, using it.
>>> 2023-09-13 16:57:52,050 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-5:null) (logid:e396a97c) Trying to fetch storage
>>> pool
>>> daab90ad-42d3-3c48-a9e4-b4c3c7fcdc84 from libvirt
>>> 2023-09-13 16:57:52,161 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-1:null) (logid:e396a97c) Attempting to create
>>> storage
>>> pool a2d455c6-68cb-303f-a7fa-287e62a5be9c (RBD) in libvirt
>>> 2023-09-13 16:57:52,163 WARN  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-1:null) (logid:e396a97c) Storage pool
>>> a2d455c6-68cb-303f-a7fa-287e62a5be9c was not found running in libvirt.
>>> Need
>>> to create it.
>>> 2023-09-13 16:57:52,164 INFO  [kvm.storage.LibvirtStorageAdaptor]
>>> (agentRequest-Handler-1:null) (logid:e396a97c) Didn't find an existing
>>> storage pool a2d455c6-68cb-303f-a7fa-287e62a5be9c by UUID, checking for
>>> pools with duplicate paths
>>> 2023-09-13 16:57:56,780 INFO  [cloud.agent.Agent] (Agent-Handler-4:null)
>>> (logid:) Connected to the host: 10.10.11.62
>>> ^C
>>>
>>>
>>> Kindly guide us to move forward.
>>>
>>>
>>>
>>> Regards
>>> Mosharaf Hossain
>>> Manager, Product Development
>>> IT Division
>>>
>>> Bangladesh Export Import Company Ltd.
>>>
>>> Level-8, SAM Tower, Plot #4, Road #22, Gulshan-1, Dhaka-1212,Bangladesh
>>>
>>> Tel: +880 9609 000 999, +880 2 5881 5559, Ext: 14191, Fax: +880 2 9895757
>>>
>>> Cell: +8801787680828, Email: 
>>> mosharaf.hoss...@bol-online.com<mailto:mosharaf.hoss...@bol-online.com>, 
>>> Web:
>>> www.bol-online.com<http://www.bol-online.com>
>>> <
>>> https://www.google.com/url?q=http://www.bol-online.com&sa=D&source=hangouts&ust=1557908951423000&usg=AFQjCNGMxIuHSHsD3qO6y5JddpEZ0S592A
>>> >
>>>
>>
>>
>> --
>>
>> Andrija Panić
>>
>

Reply via email to