[i]And would drive storage requirements through the roof!![/i]

The interesting part is, Nathan, you're probably wrong.

First, though, some of my contacts in the enterprise gladly spent millions for 
third-party applications running on Microsoft to do exactly that.
[But we all know that SUN is famous for almost always missing the departing 
train]

I have no proof for what I state, though my hypothesis is just the opposite.
Increasing backup frequency simply requires more storage, I think we can all 
agree. It is therefore my assumption, that at some moment in time,
1. cron-like jobs will consume more and more resources (undoubtedly)
2. at one moment in increasing the density of cron-like (time-line) backups, 
the amount of metadata is going to supersede the amount of actual changes.

Of course, you are right, it depends on the applications. Though I guess that - 
very roughly - an hourly backup (TimeMachine) is already close to that point, 
at least on an average home box. (Of course, you don't include /tmp in *any* 
such observations!)
The disadvantages of something like TimeMachine are manyfold, worst of all is 
that they work on the level of files.

Someone mentioned something like doing it through the application was more 
useful. Welcome to the 20th century. The argument is wrong, by the way, since 
such logs (RDBMS) are high-level, usually human-readable append operations on 
the file system through the application->operating system. 
Yes, they *are* useful, very useful; for a limited number of specific 
operations. But this argument fades very much within the context we discuss 
here. 'Backup' is not 'archive'; but comprehensive. So doing the task of CDP on 
that level is suicide on system resources (cycles) as well as on storage space.

Back to my hypothesis: increasing backup frequency increases data amount (my 
hypothesis: metadata beyond 'change data'); offering 'only' a near-CDP 
experience. Once the time slots are done away with, the only data to be stored 
will be actual 'change data'. Meaning that the amount of metadata will be 
greatly reduced. And one can achieve real CDP. Prerequisite, though, is 
probably that the notion of 'file change' is not done on a system-wide level, 
but local to the change, the 'write' process. And, of course, it is obligatory 
to perform it on a block level, incremental, you-name-it.

I challenge you to follow up on this matter. My interest arouse due to a 
presentation I'll be giving shortly at a conference. To me, when preparing, it 
was obvious that ZFS can do this (CDP). Just wanted to make sure; surprise, 
surprise.
Now I am really interested to prove my hypothesis, that - under disregard of 
the possible technology involved - the 'change data', the amount of actual 
changes to be stored for CDP, is below the amount of data (and resources) 
required for near-CDP on the level of files.

Contact me offline if you feel like sponsoring this research.

Uwe
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to