Oh, I agree. Much of the duplication described is clearly the result of "bad design" in many of our systems. After all, most of an OS can be served off the network (diskless systems etc.). But much of the dupe I'm talking about is less about not using the most efficient system administration tricks. Rather, it's about the fact that software (e.g. Samba) is used by people, and people don't always do things efficiently.
Case in point: students in one of our courses were hitting their quota by growing around 8GB per day. Rather than simply agree that "these kids need more space," we had a look at the files. Turns out just about every student copied a 600MB file into their own directories, as it was created by another student to be used as a "template" for many of their projects. Nobody understood that they could use the file right where it sat. Nope. 7GB of dupe data. And these students are even familiar with our practice of putting "class media" on a read-only share (these files serve as similar "templates" for their own projects - you can create a full video project with just a few MB in your "project file" this way). So, while much of the situation is caused by "bad data management," there aren't always systems we can employ that prevent it. Done right, dedup can certainly be "worth it" for my operations. Yes, teaching the user the "right thing" is useful, but that user isn't there to know how to "manage data" for my benefit. They're there to learn how to be filmmakers, journalists, speech pathologists, etc. Charles On 7/7/08 9:24 PM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote: > On Mon, 7 Jul 2008, Mike Gerdts wrote: >> >> As I have considered deduplication for application data I see several >> things happen in various areas. > > You have provided an excellent description of gross inefficiencies in > the way systems and software are deployed today, resulting in massive > duplication. Massive duplication is used to ease service deployment > and management. Most of this massive duplication is not technically > necessary. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss