On Fri, Jul 15, 2011 at 5:19 AM, Jim Klimov <jimkli...@cos.ru> wrote: > 2011-07-15 11:10, phil.har...@gmail.com пишет: >> >> If you clone zones from a golden image using ZFS cloning, you get fast, >> efficient dedup for free. Sparse root always was a horrible hack! > > Sounds like a holy war is flaming up ;) > > From what I heard, sparse root zones with shared common > system libraries allowed to save not only on disk space but > also on RAM. Can't vouch, never tested extensively myself.
There may be some benefit to that, I'd argue that most of the time there's not that much. Using what is surely an imperfect way of measuring, I took a look a zone on a Solaris 10 box that I happen to be logged into. I found it is using about 52 MB of memory in mappings of executables and libraries. By disabling webconsole (a java program that has a RSS size of 100+ MB) the shared mappings drop to 40 MB. # cd /proc # pmap -xa * | grep r.x | grep -v ' anon ' | grep -v ' stack ' | grep -v ' heap ' | sort -u | nawk '{ t+= $3 } END { print t / 1024, "MB" }' pmap: cannot examine 22427: system process 40.3281 MB If you are running the same large application (large executable + libraries resident in memory) in many zones, you may have additional benefit. Solaris 10 was released in 2005, meaning that sparse root zones were conceived sometime in the years leading up to that. In that time, the entry level servers have gone from 1 - 2 GB of memory (e.g. a V210 or V240) to 12 - 16+ GB of memory (X2270 M2, T3-1). Further, large systems tend to have NUMA characteristics that challenge the logic of trying to maintain only one copy of hot read-only executable pages. It just doesn't make sense to constrain the design of zones around something that is going to save 0.3% of the memory of an entry level server. Even in 2005, I'm not so sure it was a strong argument. Disk space is another issue. Jim does a fine job of describing the issues around that. > Cloning of golden zones is of course used in our systems. > But this approach breaks badly upon any major systems > update (i.e. LiveUpgrade to a new release) - many of the > binaries change, and you either suddenly have the zones > (wanting to) consume many gigabytes of disk space which > are not there on a small rpool or a busy data pool, or you > have to make a new golden image, clone a new set of > zones and reinstall/migrate all applications and settings. > > True, this is a no-brainer for zones running a single task > like an /opt/tomcat directory which can be tossed around > to any OS, but becomes tedious for software with many > packages and complicated settings, especially if (in some > extremity) it was homebrewn and/or home-compiled and > unpackaged ;) > > I am not the first (or probably last) to write about inconvenience > of zone upgrades which loses the cloning benefit, and much > of the same is true for upgrading cloned/deduped VM golden > images as well, where the golden image is just some common > baseline OS but the clones all run different software. And it is > this different software which makes them useful and unique, > and too distinct to maintain a dozen of golden images efficiently > (i.e. there might be just 2 or 3 clones of each gold). > > But in general, the problem is there - you either accept that > your OS images in effect won't be deduped, much or at all, > after some lifespan involving OS upgrades, or you don't > update them often (which may be inacceptable for security > and/or paranoia types of deployments), or you use some > trickery to update frequently and not lose much disk space, > such as automation of software and configs migration > from one clone (of old gold) to another clone (of new gold). > > Dedup was a promising variant in this context, unless it > kills performance and/or stability... which was the subject > of this thread, with Edward's research into performance > of current dedup implementation (and perhaps some > baseline to test whether real improvements appear in > the future). > > And in terms of performance there's some surprise in > Edward's findings regarding i.e. reads from the deduped > data. For infrequent maintenance (i.e. monthly upgrades) > zoneroots (OS image part) would be read-mostly and write > performance of dedup may not matter much. If the updates > must pour in often for whatever reason, then write and > delete performance of dedup may begin to matter. > > Sorry about straying the discussion into zones - they, > their performance and coping with changes introduced > during lifetime (see OS upgrades), are one good example > for discussion of dedup, and its one application which > may be commonly useful on any server or workstation, > not only on on hardware built for dedicated storage. > > Sparse-root vs. full-root zones, or disk images of VMs; > are they stuffed in one rpool or spread between rpool and > data pools - that detail is not actually the point of the thread. > > Actual useability of dedup for savings and gains on these > tasks (preferably working also on low-mid-range boxes, > where adding a good enterprise SSD would double the > server cost - not only on those big good systems with > tens of GB of RAM), and hopefully simplifying the system > configuration and maintenance - that is indeed the point > in question. > > //Jim > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss