Oh, I agree.  Much of the duplication described is clearly the result of
"bad design" in many of our systems.  After all, most of an OS can be served
off the network (diskless systems etc.).  But much of the dupe I'm talking
about is less about not using the most efficient system administration
tricks.  Rather, it's about the fact that software (e.g. Samba) is used by
people, and people don't always do things efficiently.

Case in point:  students in one of our courses were hitting their quota by
growing around 8GB per day.  Rather than simply agree that "these kids need
more space," we had a look at the files.  Turns out just about every student
copied a 600MB file into their own directories, as it was created by another
student to be used as a "template" for many of their projects.  Nobody
understood that they could use the file right where it sat.  Nope. 7GB of
dupe data.  And these students are even familiar with our practice of
putting "class media" on a read-only share (these files serve as similar
"templates" for their own projects - you can create a full video project
with just a few MB in your "project file" this way).

So, while much of the situation is caused by "bad data management," there
aren't always systems we can employ that prevent it.  Done right, dedup can
certainly be "worth it" for my operations.  Yes, teaching the user the
"right thing" is useful, but that user isn't there to know how to "manage
data" for my benefit.  They're there to learn how to be filmmakers,
journalists, speech pathologists, etc.

Charles


On 7/7/08 9:24 PM, "Bob Friesenhahn" <[EMAIL PROTECTED]> wrote:

> On Mon, 7 Jul 2008, Mike Gerdts wrote:
>> 
>> As I have considered deduplication for application data I see several
>> things happen in various areas.
> 
> You have provided an excellent description of gross inefficiencies in
> the way systems and software are deployed today, resulting in massive
> duplication.  Massive duplication is used to ease service deployment
> and management.  Most of this massive duplication is not technically
> necessary.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to