Hello,

I am back to the initial pb related to that post , since I updated to /OpenVZ release 7.0.14 (136)  | ///Virtuozzo Linux release 7.8.0 (609)// , I am also facing CT corrupted status .

I don't see the exact same error as mentioned by Kevin Drysdale below (ploop/fsck) , but I am not able to enter certain CT neither can I stop them

/[root@olb~]# prlctl stop trans8//
//Stopping the CT...//
//Failed to stop the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: Cannot lock the Container//
//)//
/

/[root@olb ~]# prlctl enter trans8//
//Unable to get init pid//
//enter into CT failed//
//
//exited from CT 02faecdd-ddb6-42eb-8103-202508f18256/

For those CTs that fail to enter or stop, I noticed that there is a 2nd device mounted with name ending in /dump/Dump/.criu.cgyard.4EJB8c//
/

/[root@olb ~]# df -H |grep 02faecdd-ddb6-42eb-8103-202508f18256//
///dev/ploop53152p1          11G    2,2G  7,7G  23% /vz/root/02faecdd-ddb6-42eb-8103-202508f18256// //none                      537M       0  537M   0% /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/dump/Dump/.criu.cgyard.4EJB8c/


//[root@olb ~]# prlctl list | grep 02faecdd-ddb6-42eb-8103-202508f18256//
//{02faecdd-ddb6-42eb-8103-202508f18256}  running 157.159.196.17  CT isptrans8//
//

I rebooted the whole hardware node, and since reboot here is the related vzctl.log

/2020-07-06T15:10:38+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Removing the stale lock file /vz/lock/02faecdd-ddb6-42eb-8103-202508f18256.lck// //2020-07-06T15:10:38+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Restoring the Container ...// //2020-07-06T15:10:38+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Mount image: /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd // //2020-07-06T15:10:38+0200 : Opening delta /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds// //2020-07-06T15:10:38+0200 : Opening delta /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds// //2020-07-06T15:10:38+0200 : Opening delta /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds// //2020-07-06T15:10:38+0200 : Adding delta dev=/dev/ploop53152 img=/vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds (rw)// //2020-07-06T15:10:39+0200 : Mounted /dev/ploop53152p1 at /vz/root/02faecdd-ddb6-42eb-8103-202508f18256 fstype=ext4 data=',balloon_ino=12' // //2020-07-06T15:10:39+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Container is mounted// //2020-07-06T15:10:40+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Setting permissions for image=/vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd// //2020-07-06T15:10:40+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Configure memguarantee: 0%// //2020-07-06T15:18:12+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Unable to get init pid// //2020-07-06T15:18:12+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : enter into CT failed// //2020-07-06T15:19:49+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Cannot lock the Container// //2020-07-06T15:25:33+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : Unable to get init pid// //2020-07-06T15:25:33+0200 vzctl : CT 02faecdd-ddb6-42eb-8103-202508f18256 : enter into CT failed/

on another CT failing to enter / stop same kind of logs  + /Error (criu /:

/2020-07-06T15:10:38+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Restoring the Container ...// //2020-07-06T15:10:38+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Mount image: /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd // //2020-07-06T15:10:38+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:10:39+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:10:39+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:10:39+0200 : Adding delta dev=/dev/ploop36049 img=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds (rw)// //2020-07-06T15:10:41+0200 : Mounted /dev/ploop36049p1 at /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0 fstype=ext4 data=',balloon_ino=12' // //2020-07-06T15:10:41+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is mounted// //2020-07-06T15:10:41+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Setting permissions for image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd// //2020-07-06T15:10:41+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Configure memguarantee: 0%// //2020-07-06T15:10:57+0200 vzeventd : Run: /etc/vz/vzevent.d/ve-stop id=4ae48335-5b63-475d-8629-c8d742cb0ba0// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (03.038774) Error (criu/util.c:666): exited, status=4// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (14.446513)      1: Error (criu/files.c:230): Empty list on file desc id 0x1f(5)// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (14.446518)      1: Error (criu/files.c:231): BUG at criu/files.c:231// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (15.589529) Error (criu/cr-restore.c:1612): 7130 killed by signal 11: Segmentation fault// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (15.604550) Error (criu/cr-restore.c:2614): Restoring FAILED.// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : The restore log was saved in /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/dump/Dump/restore.log// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : criu exited with rc=17// //2020-07-06T15:10:57+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Unmount image: /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd (190)// //2020-07-06T15:10:57+0200 : Unmounting file system at /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0// //2020-07-06T15:11:31+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is unmounted// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Failed to restore the Container// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Restoring the Container ...// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Mount image: /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd // //2020-07-06T15:11:31+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:11:31+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:11:31+0200 : Opening delta /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds// //2020-07-06T15:11:31+0200 : Adding delta dev=/dev/ploop36049 img=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds (rw)// //2020-07-06T15:11:31+0200 : Mounted /dev/ploop36049p1 at /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0 fstype=ext4 data=',balloon_ino=12' // //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is mounted// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Setting permissions for image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd// //2020-07-06T15:11:31+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Configure memguarantee: 0%// //2020-07-06T15:14:18+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Unable to get init pid// //2020-07-06T15:14:18+0200 vzctl : CT 4ae48335-5b63-475d-8629-c8d742cb0ba0 : enter into CT failed//
/

in prl-disp.log

/07-06 15:10:30.797 F /virtuozzo:4836:4836/ register CT: 4ae48335-5b63-475d-8629-c8d742cb0ba0// //07-06 15:10:38.717 F /disp:4836:6163/ Processing command 'DspCmdVmStartEx' 1036 for CT uuid='{4ae48335-5b63-475d-8629-c8d742cb0ba0}' // //07-06 15:10:38.738 I /virtuozzo:4836:6234/ /usr/sbin/vzctl resume 4ae48335-5b63-475d-8629-c8d742cb0ba0// //07-06 15:10:48.542 I /disp:4836:5196/ vzevent: state=6, envid=4ae48335-5b63-475d-8629-c8d742cb0ba0// //07-06 15:10:57.364 I /disp:4836:5196/ vzevent: state=8, envid=4ae48335-5b63-475d-8629-c8d742cb0ba0// //07-06 15:10:57.475 I /disp:4836:5196/ vzevent: state=12, envid=4ae48335-5b63-475d-8629-c8d742cb0ba0// //07-06 15:11:31.161 F /virtuozzo:4836:6234/ /usr/sbin/vzctl utility failed: /usr/sbin/vzctl resume 4ae48335-5b63-475d-8629-c8d742cb0ba0 [6]//
//Mount image: /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd //
//Setting permissions for image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd// //Unmount image: /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd (190)// //The restore log was saved in /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/dump/Dump/restore.log// //07-06 15:11:31.162 I /virtuozzo:4836:6234/ /usr/sbin/vzctl start 4ae48335-5b63-475d-8629-c8d742cb0ba0/

Is this related to the update ? how can I renable those CT ?

Thanks .


////



Le 29/06/2020 à 12:30, Kevin Drysdale a écrit :
Hello,

After updating one of our OpenVZ VPS hosting nodes at the end of last week, we've started to have issues with corruption apparently occurring inside containers.  Issues of this nature have never affected the node previously, and there do not appear to be any hardware issues that could explain this.

Specifically, a few hours after updating, we began to see containers experiencing errors such as this in the logs:

[90471.678994] EXT4-fs (ploop35454p1): error count since last fsck: 25
[90471.679022] EXT4-fs (ploop35454p1): initial error at time 1593205255: ext4_ext_find_extent:904: inode 136399 [90471.679030] EXT4-fs (ploop35454p1): last error at time 1593232922: ext4_ext_find_extent:904: inode 136399
[95189.954569] EXT4-fs (ploop42983p1): error count since last fsck: 67
[95189.954582] EXT4-fs (ploop42983p1): initial error at time 1593210174: htree_dirblock_to_tree:918: inode 926441: block 3683060 [95189.954589] EXT4-fs (ploop42983p1): last error at time 1593276902: ext4_iget:4435: inode 1849777
[95714.207432] EXT4-fs (ploop60706p1): error count since last fsck: 42
[95714.207447] EXT4-fs (ploop60706p1): initial error at time 1593210489: ext4_ext_find_extent:904: inode 136272 [95714.207452] EXT4-fs (ploop60706p1): last error at time 1593231063: ext4_ext_find_extent:904: inode 136272

Shutting the containers down and manually mounting and e2fsck'ing their filesystems did clear these errors, but each of the containers (which were mostly used for running Plesk) had widespread issues with corrupt or missing files after the fsck's completed, necessitating their being restored from backup.

Concurrently, we also began to see messages like this appearing in /var/log/vzctl.log, which again have never appeared at any point prior to this update being installed:

/var/log/vzctl.log:2020-06-26T21:05:19+0100 : Error in fill_hole (check.c:240): Warning: ploop image '/vz/private/8288448/root.hdd/root.hds' is sparse /var/log/vzctl.log:2020-06-26T21:09:41+0100 : Error in fill_hole (check.c:240): Warning: ploop image '/vz/private/8288450/root.hdd/root.hds' is sparse /var/log/vzctl.log:2020-06-26T21:16:22+0100 : Error in fill_hole (check.c:240): Warning: ploop image '/vz/private/8288451/root.hdd/root.hds' is sparse /var/log/vzctl.log:2020-06-26T21:19:57+0100 : Error in fill_hole (check.c:240): Warning: ploop image '/vz/private/8288452/root.hdd/root.hds' is sparse

The basic procedure we follow when updating our nodes is as follows:

1, Update the standby node we keep spare for this process
2. vzmigrate all containers from the live node being updated to the standby node
3. Update the live node
4. Reboot the live node
5. vzmigrate the containers from the standby node back to the live node they originally came from

So the only tool which has been used to affect these containers is 'vzmigrate' itself, so I'm at something of a loss as to how to explain the root.hdd images for these containers containing sparse gaps.  This is something we have never done, as we have always been aware that OpenVZ does not support their use inside a container's hard drive image.  And the fact that these images have suddenly become sparse at the same time they have started to exhibit filesystem corruption is somewhat concerning.

We can restore all affected containers from backups, but I wanted to get in touch with the list to see if anyone else at any other site has experienced these or similar issues after applying the 7.0.14 (136) update.

Thank you,
Kevin Drysdale.




_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users



_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

Reply via email to