On 07/10/2014 12:50 PM, Pavel Snajdr wrote: > On 07/10/2014 12:32 PM, Pavel Odintsov wrote: >> Could you share your patches to vzmigrate and vzctl? > > We don't have any, where vzctl/vzmigrate didn't satisfy our needs, we've > went the way around these utilities and let vpsAdmin on the hwnode > manage things. > > You can take a look here: > > https://github.com/vpsfreecz/vpsadmind > > I wouldn't recommend anyone outside of our organization to use vpsAdmin > yet, as the 2.0 transition to self-describing RESTful API is still > underway. As soon as it's finished and well documented, I'll post a note > here as well. > > The 2.0 version will be primarily controled via a CLI tool, which > autogenerates itself from the API description. > > A running version of the API can be seen here: > > https://api.vpsfree.cz/v1/ > > Github repos: > > https://github.com/vpsfreecz/vpsadminapi (the API) > https://github.com/vpsfreecz/vpsadminctl (the CLI tool) > > https://github.com/vpsfreecz/vpsadmind (deamon run on hwnode) > https://github.com/vpsfreecz/vpsadmindctl (CLI tool to control the daemon) > > https://github.com/vpsfreecz/vpsadmin > > The last repo is the vpsAdmin 1.x, which all 2.0 things still require to > run, it's a pain to get this running yourself, but stay tuned, once we > get rid of 1.x and document 2.0 properly, it's going to be a great thing. > > /snajpa >
Though, if you don't mind managing things via a web interface, vpsAdmin 1.x can be installed through these scripts: https://github.com/vpsfreecz/vpsadmininstall /snajpa >> >> On Thu, Jul 10, 2014 at 2:25 PM, Pavel Odintsov >> <[email protected]> wrote: >>> Thank you for your answers! It's really useful information. >>> >>> On Thu, Jul 10, 2014 at 2:08 PM, Pavel Snajdr <[email protected]> wrote: >>>> On 07/10/2014 11:35 AM, Pavel Odintsov wrote: >>>>>> Not true, IO limits are working as they should (if we're talking vzctl >>>>>> set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO >>>>>> accounting support, so it is there. >>>>> >>>>> You can share tests with us? For standard folders like simfs this >>>>> limits works bad in big number of cases >>>> >>>> If you can give me concrete tests to run, sure, I'm curious to see if >>>> you're right - then we'd have something concrete to fix :) >>>> >>>>> >>>>>> How? ZFS doesn't have a limit on number of files (2^48 isn't a limit >>>>>> really) >>>>> >>>>> It's ok when your customer create 1 billion of small files on 10GB VPS >>>>> and you will try to archive it for backup? On slow disk system it's >>>>> really nightmare because a lot of disk operations which kills your >>>>> I/O. >>>> >>>> zfs snapshot <dataset>@<snapname> >>>> zfs send <dataset>@<snapname> > your-file or | ssh backuper zfs recv >>>> <backupdataset> >>>> >>>> That's done on block level. No need to run rsync anymore, it's a lot >>>> faster this way. >>>> >>>>> >>>>>> Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS, >>>>>> I thought the point of migration is to don't have the CT notice any >>>>>> change, I don't see why the inode numbers should change. >>>>> >>>>> Do you have really working zero downtime vzmigrate on ZFS? >>>> >>>> Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not >>>> supporting ZFS, we're implementing this our own way in vpsAdmin, which >>>> in it's 2.0 re-implementation will go opensource under GPL. >>>> >>>>> >>>>>> How exactly? I haven't seen a problem with any userspace software, other >>>>>> than MySQL default setting to AIO (it fallbacks to older method), which >>>>>> ZFS doesn't support (*yet*, they have it in their plans). >>>>> >>>>> I speaks about MySQL primarily. I have thousands of containers and I >>>>> can tune MySQL for another mode for all customers, it's impossible. >>>> >>>> As I said, this is under development and will improve. >>>> >>>>> >>>>>> L2ARC cache really smart >>>>> >>>>> Yep, fine, I knew. But can you account L2ARC cache usage per customer? >>>>> OpenVZ can it via flag: >>>>> sysctl -a|grep pagecache_isola >>>>> ubc.pagecache_isolation = 0 >>>> >>>> I can't account for caches per CT, but I didn't have any need to do so. >>>> >>>> L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for >>>> the content of ARC that is the least significant in case of low memory - >>>> it gets pushed from ARC to L2ARC. >>>> >>>> ARC has two primary lists of cached data - most frequently used and most >>>> recently used and these two lists are divided by a boundary marking >>>> which data can be pushed away in low mem situation. >>>> >>>> It doesn't happen like with Linux VFS cache that you're copying one big >>>> file and it pushes out all of the other useful data there. >>>> >>>> Thanks to this distinction of MRU and MFU ARC achieves far better hitrates. >>>> >>>>> >>>>> But one customer can eat almost all L2ARC cache and displace another >>>>> customers data. >>>> >>>> Yes, but ZFS keeps track on what's being used, so useful data can't be >>>> pushed away that easily, things naturally balance themselves due to the >>>> way how ARC mechanism works. >>>> >>>>> >>>>> I'm not agains ZFS but I'm against of usage ZFS as underlying system >>>>> for containers. We caught ~100 kernel bugs with simfs on EXT4 when >>>>> customers do some strange thinks. >>>> >>>> I haven't encountered any problems especially with vzquota disabled (no >>>> need for it, ZFS has its own quotas, which never need to be recalculated >>>> as with vzquota). >>>> >>>>> >>>>> But ext4 has about few thouasands developers and the fix this issues >>>>> asap but ZFS on Linux has only 3-5 developers which VERY slow. >>>>> Because of this I recommends using ext4 with ploop because this >>>>> solution is rock stable or ZFS with ZVOL's with ext4 because this >>>>> solution if more reliable and more predictable then placing ZFS >>>>> containers on ZFS volumes. >>>> >>>> ZFS itself is a stable and mature filesystem, it first shipped as >>>> production with Solaris in 2006. >>>> And it's still being developed upstream as OpenZFS, that code is shared >>>> between the primary version - Illumos and the ports - FreeBSD, OS X, Linux. >>>> >>>> So what really needs and still is being developed is the way how ZFS is >>>> run under Linux kernel, but with recent release of 0.6.3, things have >>>> gotten mature enough to be used in production without any fears. Of >>>> course, no software is without bugs, but I can say with absolute >>>> certainty that ZFS will never eat your data, the only problem you can >>>> encounter is with the memory management, which is done really >>>> differently in Linux than in ZFS's original habitat - Solaris. >>>> >>>> /snajpa >>>> >>>>> >>>>> >>>>> On Thu, Jul 10, 2014 at 1:08 PM, Pavel Snajdr <[email protected]> wrote: >>>>>> On 07/10/2014 10:34 AM, Pavel Odintsov wrote: >>>>>>> Hello! >>>>>>> >>>>>>> You scheme is fine but you can't divide I/O load with cgroup blkio >>>>>>> (ioprio/iolimit/iopslimit) between different folders but between >>>>>>> different ZVOL you do. >>>>>> >>>>>> Not true, IO limits are working as they should (if we're talking vzctl >>>>>> set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO >>>>>> accounting support, so it is there. >>>>>> >>>>>>> >>>>>>> I could imagine following problems for per folder scheme: >>>>>>> 1) Can't limit number of inodes in different folders (but there are >>>>>>> not an inode limit for ZFS like ext4 but bug amount of files in >>>>>>> container could broke node; >>>>>> >>>>>> How? ZFS doesn't have a limit on number of files (2^48 isn't a limit >>>>>> really) >>>>>> >>>>>>> http://serverfault.com/questions/503658/can-you-set-inode-quotas-in-zfs) >>>>>>> 2) Problems with system cache which used by all containers in HWN >>>>>>> together >>>>>> >>>>>> This exactly isn't a problem, but a *HUGE* benefit, you'd need to see it >>>>>> in practice :) Linux VFS cache is really dumb in comparison to ARC. >>>>>> ARC's hitrates just can't be done with what linux currently offers. >>>>>> >>>>>>> 3) Problems with live migration because you _should_ change inode >>>>>>> numbers on diffferent nodes >>>>>> >>>>>> Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS, >>>>>> I thought the point of migration is to don't have the CT notice any >>>>>> change, I don't see why the inode numbers should change. >>>>>> >>>>>>> 4) ZFS behaviour with linux software in some cases is very STRANGE >>>>>>> (DIRECT_IO) >>>>>> >>>>>> How exactly? I haven't seen a problem with any userspace software, other >>>>>> than MySQL default setting to AIO (it fallbacks to older method), which >>>>>> ZFS doesn't support (*yet*, they have it in their plans). >>>>>> >>>>>>> 5) ext4 has good support from vzctl (fsck, resize2fs) >>>>>> >>>>>> Yeah, but ext4 sucks big time. At least in my use-case. >>>>>> >>>>>> We've implemented most of vzctl create/destroy/etc. functionality in our >>>>>> vpsAdmin software instead. >>>>>> >>>>>> Guys, can I ask you to keep your mind open instead of fighting with >>>>>> pointless arguments? :) Give ZFS a try and then decide for yourselves. >>>>>> >>>>>> I think the community would benefit greatly if ZFS woudn't be fought as >>>>>> something alien in the Linux world, which I in my experience is what >>>>>> every Linux zealot I talk to about ZFS is doing. >>>>>> This is just not fair. It's primarily about technology, primarily about >>>>>> the best tool for the job. If we can implement something like this in >>>>>> Linux but without having ties to CDDL and possibly Oracle patents, that >>>>>> would be awesome, yet nobody has done such a thing yet. BTRFS is nowhere >>>>>> near ZFS when it comes to running larger scale deployments and in some >>>>>> regards I don't think it will ever match ZFS, just looking at the way >>>>>> it's been designed. >>>>>> >>>>>> I'm not trying to flame here, I'm trying to open you guys to the fact, >>>>>> that there really is a better alternative than you're currently seeing. >>>>>> And if it has some technological drawbacks like these that you're trying >>>>>> to point out, instead of pointing at them as something, which can't be >>>>>> changed and thus everyone should use "your best solution(tm)", try to >>>>>> think of ways how to change it for the better. >>>>>> >>>>>>> >>>>>>> My ideas like simfs vs ploop comparison: >>>>>>> http://openvz.org/images/f/f3/Ct_in_a_file.pdf >>>>>> >>>>>> Again, you have to see ZFS doing its magic in production under a really >>>>>> heavy load, otherwise you won't understand. Any arbitrary benchmarks >>>>>> I've seen show ZFS is slower than ext4, but these are not tuned for such >>>>>> use cases as I'm talking about. >>>>>> >>>>>> /snajpa >>>>>> >>>>>>> >>>>>>> On Thu, Jul 10, 2014 at 12:06 PM, Pavel Snajdr <[email protected]> wrote: >>>>>>>> On 07/09/2014 06:58 PM, Kir Kolyshkin wrote: >>>>>>>>> On 07/08/2014 11:54 PM, Pavel Snajdr wrote: >>>>>>>>>> On 07/08/2014 07:52 PM, Scott Dowdle wrote: >>>>>>>>>>> Greetings, >>>>>>>>>>> >>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>> (offtopic) We can not use ZFS. Unfortunately, NAS with something >>>>>>>>>>>> like >>>>>>>>>>>> Nexenta is to expensive for us. >>>>>>>>>>> From what I've gathered from a few presentations, ZFS on Linux >>>>>>>>>>> (http://zfsonlinux.org/) is as stable but more performant than it >>>>>>>>>>> is on the OpenSolaris forks... so you can build your own if you can >>>>>>>>>>> spare the people to learn the best practices. >>>>>>>>>>> >>>>>>>>>>> I don't have a use for ZFS myself so I'm not really advocating it. >>>>>>>>>>> >>>>>>>>>>> TYL, >>>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> we run tens of OpenVZ nodes (bigger boxes, 256G RAM, 12cores+, 90 >>>>>>>>>> CTs at >>>>>>>>>> least). We've used to run ext4+flashcache, but ext4 has proven to be >>>>>>>>>> a >>>>>>>>>> bottleneck. That was the primary motivation behind ploop as far as I >>>>>>>>>> know. >>>>>>>>>> >>>>>>>>>> We've switched to ZFS on Linux around the time Ploop was announced >>>>>>>>>> and I >>>>>>>>>> didn't have second thoughts since. ZFS really *is* in my experience >>>>>>>>>> the >>>>>>>>>> best filesystem there is at the moment for this kind of deployment - >>>>>>>>>> especially if you use dedicated SSDs for ZIL and L2ARC, although the >>>>>>>>>> latter is less important. You will know what I'm talking about when >>>>>>>>>> you >>>>>>>>>> try this on boxes with lots of CTs doing LAMP load - databases and >>>>>>>>>> their >>>>>>>>>> synchronous writes are the real problem, which ZFS with dedicated ZIL >>>>>>>>>> device solves. >>>>>>>>>> >>>>>>>>>> Also there is the ARC caching, which is smarter then linux VFS cache >>>>>>>>>> - >>>>>>>>>> we're able to achieve about 99% of hitrate at about 99% of the time, >>>>>>>>>> even under high loads. >>>>>>>>>> >>>>>>>>>> Having said all that, I recommend everyone to give ZFS a chance, but >>>>>>>>>> I'm >>>>>>>>>> aware this is yet another out-of-mainline code and that doesn't suit >>>>>>>>>> everyone that well. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Are you using per-container ZVOL or something else? >>>>>>>> >>>>>>>> That would mean I'd need to do another filesystem on top of ZFS, which >>>>>>>> would in turn mean I'd add another unnecessary layer of indirection. >>>>>>>> ZFS >>>>>>>> is a pooled storage like BTRFS is, we're giving one dataset to each >>>>>>>> container. >>>>>>>> >>>>>>>> vzctl tries to move the VE_PRIVATE folder around, so we had to add one >>>>>>>> more directory to put the VE_PRIVATE data into (see the first ls). >>>>>>>> >>>>>>>> Example from production: >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # zpool status vz >>>>>>>> pool: vz >>>>>>>> state: ONLINE >>>>>>>> scan: scrub repaired 0 in 1h24m with 0 errors on Tue Jul 8 16:22:17 >>>>>>>> 2014 >>>>>>>> config: >>>>>>>> >>>>>>>> NAME STATE READ WRITE CKSUM >>>>>>>> vz ONLINE 0 0 0 >>>>>>>> mirror-0 ONLINE 0 0 0 >>>>>>>> sda ONLINE 0 0 0 >>>>>>>> sdb ONLINE 0 0 0 >>>>>>>> mirror-1 ONLINE 0 0 0 >>>>>>>> sde ONLINE 0 0 0 >>>>>>>> sdf ONLINE 0 0 0 >>>>>>>> mirror-2 ONLINE 0 0 0 >>>>>>>> sdg ONLINE 0 0 0 >>>>>>>> sdh ONLINE 0 0 0 >>>>>>>> logs >>>>>>>> mirror-3 ONLINE 0 0 0 >>>>>>>> sdc3 ONLINE 0 0 0 >>>>>>>> sdd3 ONLINE 0 0 0 >>>>>>>> cache >>>>>>>> sdc5 ONLINE 0 0 0 >>>>>>>> sdd5 ONLINE 0 0 0 >>>>>>>> >>>>>>>> errors: No known data errors >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # zfs list >>>>>>>> NAME USED AVAIL REFER MOUNTPOINT >>>>>>>> vz 432G 2.25T 36K /vz >>>>>>>> vz/private 427G 2.25T 111K /vz/private >>>>>>>> vz/private/101 17.7G 42.3G 17.7G /vz/private/101 >>>>>>>> <snip> >>>>>>>> vz/root 104K 2.25T 104K /vz/root >>>>>>>> vz/template 5.38G 2.25T 5.38G /vz/template >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # zfs get compressratio vz/private/101 >>>>>>>> NAME PROPERTY VALUE SOURCE >>>>>>>> vz/private/101 compressratio 1.38x - >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # ls /vz/private/101 >>>>>>>> private >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # ls /vz/private/101/private/ >>>>>>>> aquota.group aquota.user b bin boot dev etc git home lib >>>>>>>> <snip> >>>>>>>> >>>>>>>> [[email protected]] >>>>>>>> ~ # cat /etc/vz/conf/101.conf | grep -P "PRIVATE|ROOT" >>>>>>>> VE_ROOT="/vz/root/101" >>>>>>>> VE_PRIVATE="/vz/private/101/private" >>>>>>>> >>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.openvz.org/mailman/listinfo/users >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.openvz.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> https://lists.openvz.org/mailman/listinfo/users >>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> https://lists.openvz.org/mailman/listinfo/users >>> >>> >>> >>> -- >>> Sincerely yours, Pavel Odintsov >> >> >> > _______________________________________________ Users mailing list [email protected] https://lists.openvz.org/mailman/listinfo/users
