Re: [Users] simfs challenge

Simon Barrett Fri, 29 Jan 2016 05:57:40 -0800

>
>
> Message: 8
> Date: Thu, 28 Jan 2016 16:42:55 +0500
> From: Nick Knutov <m...@knutov.com>
> To: OpenVZ users <users@openvz.org>
> Subject: Re: [Users] simfs challenge
> Message-ID: <56a9febf.5010...@knutov.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hello,
>
> One of big reasons to prefer simfs over ploop is disk space overhead in
> ploop after using snapshots (for backups for example).
> It can be really huge - we have one CT which takes 120Gb instead of 60Gb
> (df -h inside) after daily backups with vzpbackup (*) tool. It's
> after(!) ploop merge & compact.
>
> Do you plan to make some improvements for this case?
>
> (*) https://github.com/andreasfaerber/vzpbackup
>
>


Nick, I believe that snapshots are actually write logs, not disk images.
Please, someone, correct me if I'm wrong.  If it is a write log, it can't
really be much more efficient than it is now without committing the write
log, which is what happens when you merge/delete the snapshot.

As soon as the write log is started i.e. the first snapshot is taken, any
new data being written is written to the write log.  Certainly this is how
it behaves on my systems.  After the snapshot is created, deleting data
within the CT does not change anything except a record that the data is to
be deleted is written to the write log.  You can see how this produces what
you see as an inefficiency.  For example, if a user creates and deletes
then recreates large files in the volume after the initial snapshot has
been started, all those created files are written to the write log instead
of the disk image.  In this way you can greatly exceed the space allocated
to the disk image.

We fell foul of this.  In our case, we had one user with 40GB of space
available in their 100GB CT create 500GB+ of writes in a very short time
through normal (for them) use.  The /vz volume filled and all other users
lost the ability to write to their CTs because all other users were on the
same snapshot schedule, so all data was at that point being written to
write logs.  Unfortunately, I had no data on the /vz volume that was not a
ploop image or a snapshot.  Long story short, I had to do a hard reset on
the node and we had the first unscheduled VZ outage in 6 years.  Entirely
my fault.  I didn't fully grasp the implications of the snapshot
architecture and I also didn't have sufficiently granular alerting in place
for a volume that I had previously been confident that no CT could fill.
In addition, this was a once-per-month thing and didn't show up in my
analyses prior to implementing snapshots.

I can easily replicate this.  Creating and then deleting a large file
inside the container:
root@ct4155:/# cp bigblob bigblob2
root@ct4155:/# rm bigblob2
root@ct4155:/# cp bigblob bigblob2
root@ct4155:/# rm bigblob2
and so on.

Simultaneously checking the size of the write log:

[root@intvz03 4155]# ls -lh
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
-rw------- 1 root root 3.0G Jan 29 08:15
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
[root@intvz03 4155]# ls -lh
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
-rw------- 1 root root 3.8G Jan 29 08:15
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
[root@intvz03 4155]# ls -lh
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
-rw------- 1 root root 4.6G Jan 29 08:15
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
[root@intvz03 4155]# ls -lh
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}
-rw------- 1 root root 5.5G Jan 29 08:15
root.hdd/root.hdd.{230a7ec7-3f75-4a14-9db4-0c79bf3eea52}


Basically as soon as you create your first snapshot, you may be in a race
against time to prevent a CT from using up the rest of the volume.  I don't
see how I can safely leave snapshots in place on a ploop system and still
ensure that individual CTs do not impact other CTs as happened to us
above.  All I can do is provide enough space and remove the snapshots
immediately after the backup has completed.  I may move to per-CT LVM,
though that does not provide any kind of elegant failure for the offending
CT.

Regards,

Simon

_______________________________________________
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

Re: [Users] simfs challenge

Reply via email to