Storpool have looked into it and have determined that 'fencing' is causing the 
corruption - we are seeing VM instances running on 2 hosts - here is a log 
excerpt :

Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-eb111af1 work-670) 
(logid:07a47ffd) Ovm3Investigator could not find VM[User|i-2-393-VM]
Jul 16 12:37:04 server25311 java[962152]: DEBUG 
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-eb111af1 work-670) 
(logid:07a47ffd) Fencing off VM that we don't know the state of
Jul 16 12:37:04 server25311 java[962152]: DEBUG [c.c.o.h.OvmFencer] 
(HA-Worker-1:ctx-eb111af1 work-670) (logid:07a47ffd) Don't know how to fence 
non Ovm hosts KVM
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-eb111af1 work-670) 
(logid:07a47ffd) Fencer OvmFenceBuilder returned null
Jul 16 12:37:04 server25311 java[962152]: DEBUG [c.c.h.o.r.Ovm3FenceBuilder] 
(HA-Worker-1:ctx-eb111af1 work-670) (logid:07a47ffd) Don't know how to fence 
non Ovm3 hosts KVM
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-eb111af1 work-670) 
(logid:07a47ffd) Fencer Ovm3FenceBuilder returned null
Jul 16 12:37:04 server25311 java[962152]: DEBUG 
[c.c.h.ManagementIPSystemVMInvestigator] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) Unable to find a management nic, cannot ping this system VM, 
unable to determine state of VM[User|i-2-393-VM] returning null
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) ManagementIPSysVMInvestigator could not find 
VM[User|i-2-393-VM]
Jul 16 12:37:04 server25311 java[962152]: DEBUG [c.c.h.Ovm3Investigator] 
(HA-Worker-2:ctx-f405f7dd work-669) (logid:5cd9b357) isVmAlive: CTXDC02 on 
qcloud-s1-p1-c1-kvm3
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) Ovm3Investigator could not find VM[User|i-2-393-VM]
Jul 16 12:37:04 server25311 java[962152]: DEBUG 
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) Fencing off VM that we don't know the state of
Jul 16 12:37:04 server25311 java[962152]: DEBUG [c.c.o.h.OvmFencer] 
(HA-Worker-2:ctx-f405f7dd work-669) (logid:5cd9b357) Don't know how to fence 
non Ovm hosts KVM
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) Fencer OvmFenceBuilder returned null
Jul 16 12:37:04 server25311 java[962152]: DEBUG [c.c.h.o.r.Ovm3FenceBuilder] 
(HA-Worker-2:ctx-f405f7dd work-669) (logid:5cd9b357) Don't know how to fence 
non Ovm3 hosts KVM
Jul 16 12:37:04 server25311 java[962152]: INFO  
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f405f7dd work-669) 
(logid:5cd9b357) Fencer Ovm3FenceBuilder returned null



Gary Dixon
Technical Consultant
T:  0161 537 4980
W: www.quadris.co.uk
The information contained in this e-mail from Quadris may be confidential and 
privileged for the private use of the named recipient.  The contents of this 
e-mail may not necessarily represent the official views of Quadris.  If you 
have received this information in error you must not copy, distribute or take 
any action or reliance on its contents.  Please destroy any hard copies and 
delete this message.
-----Original Message-----
From: Simon Weller <swel...@ena.com.INVALID>
Sent: 20 July 2022 22:10
To: users@cloudstack.apache.org
Subject: Re: Virtual Router filesystem corruption

Gary,

No prob with the info, thanks for providing it.

Since you're using Storpool, I'd suggest you reach out to them on this directly 
and see whether they have any information that could be helpful.

There was an issue a while ago (Storpool actually reported it) where a kernel 
commit introduced a bug that caused file corruption. That was back in about 
2018  - 
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstorpool.com%2Fblog%2Fbeware-silent-data-corruption-discovered-in-linux-kernels-4-10-4-17%2F&amp;data=05%7C01%7CGary.Dixon%40quadris.co.uk%7Cece461d8f261463ed2b408da6a9433b7%7Cf1d6abf3d3b44894ae16db0fb93a96a2%7C0%7C0%7C637939481997555790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=yajiRaYix9u2mhCLVnG9%2FDQHcS9tSPNuhzORVgTEBQ4%3D&amp;reserved=0
I believe ACS 4.15.x uses Debian 10.5 (Buster) for the VR images (dates to 
August 2020), That release is based on kernel 4.19.0-10.

-Si






________________________________
From: Gary Dixon <gary.di...@quadris.co.uk.INVALID>
Sent: Wednesday, July 20, 2022 3:00 PM
To: users@cloudstack.apache.org <users@cloudstack.apache.org>
Subject: Re: Virtual Router filesystem corruption

EXTERNAL EMAIL: This message originated outside of ENA. Use caution when 
clicking links, opening attachments, or complying with requests. Click the 
"Phish Alert Report" button above the email, or contact MIS, regarding any 
suspicious message.


Hi SI

Sure. Sorry for the lack of info. First time posting on the forum.
We are using KVM hyper visor on Ubuntu 20.04 hosts. Primary storage is 
Storpool. Let me know if you need more info Best regards Gary

Get Outlook for 
iOS<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&amp;data=05%7C01%7CGary.Dixon%40quadris.co.uk%7Cece461d8f261463ed2b408da6a9433b7%7Cf1d6abf3d3b44894ae16db0fb93a96a2%7C0%7C0%7C637939481997555790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=mTRmXAFdzzfnILhmSaX%2BYityquQqipKEofQN5XSsLUs%3D&amp;reserved=0>
Gary Dixon​
Technical Consultant
T:  0161 537 4980<tel:0161%20537%204980>
W: 
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.quadris.co.uk%2F&amp;data=05%7C01%7CGary.Dixon%40quadris.co.uk%7Cece461d8f261463ed2b408da6a9433b7%7Cf1d6abf3d3b44894ae16db0fb93a96a2%7C0%7C0%7C637939481997555790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=7akNVPG9bwo%2FYtqh1NH0DBJxW81kL2EuF%2Br9dqiWoAU%3D&amp;reserved=0
[cid:image937435.png@813B89FF.AD6D2675]
The information contained in this e-mail from Quadris may be confidential and 
privileged for the private use of the named recipient.  The contents of this 
e-mail may not necessarily represent the official views of Quadris.  If you 
have received this information in error you must not copy, distribute or take 
any action or reliance on its contents.  Please destroy any hard copies and 
delete this message.
________________________________
From: Simon Weller <swel...@ena.com.INVALID>
Sent: Wednesday, July 20, 2022 8:55:22 PM
To: users@cloudstack.apache.org <users@cloudstack.apache.org>
Subject: Re: Virtual Router filesystem corruption

Gary,

Can you provide some information about the OS, underlying hypervisor and 
primary storage in use?

-Si
________________________________
From: Gary Dixon <gary.di...@quadris.co.uk.INVALID>
Sent: Wednesday, July 20, 2022 11:15 AM
To: users@cloudstack.apache.org <users@cloudstack.apache.org>
Subject: Virtual Router filesystem corruption

EXTERNAL EMAIL: This message originated outside of ENA. Use caution when 
clicking links, opening attachments, or complying with requests. Click the 
"Phish Alert Report" button above the email, or contact MIS, regarding any 
suspicious message.





Hi All



We are seeing ext4 filesystem corruption on a number of virtual routers 
recently and manually running fsck doesn’t appear to help at all in fixing the 
issue (Corrupt inode bitmap)

We end up having to restart the associated VPC with cleanup enable to rebuild a 
new VR. Is this a common issue with ACS 4.15.1 ? Or are there specific 
circumstances that are causing the VR fs corruption that we could perhaps 
mitigate ?



Kind regards



Gary

Gary Dixon​​

Technical Consultant

T:  0161 537 4980<tel:0161%20537%204980>

W: 
https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.quadris.co.uk%2F&amp;data=05%7C01%7CGary.Dixon%40quadris.co.uk%7Cece461d8f261463ed2b408da6a9433b7%7Cf1d6abf3d3b44894ae16db0fb93a96a2%7C0%7C0%7C637939481997555790%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=7akNVPG9bwo%2FYtqh1NH0DBJxW81kL2EuF%2Br9dqiWoAU%3D&amp;reserved=0

[cid:image001.png@01D89C5B.5685DCC0]

The information contained in this e-mail from Quadris may be confidential and 
privileged for the private use of the named recipient.  The contents of this 
e-mail may not necessarily represent the official views of Quadris.  If you 
have received this information in error you must not copy, distribute or take 
any action or reliance on its contents.  Please destroy any hard copies and 
delete this message.


Reply via email to