Hi Alan,

Commercial tools promise this ability. How do they get the block-to-file
mapping to do the restore? I was looking for a way to do that so I could
do the same using LVM snapshots.
you cannot go block to file. To start with when restoring the block may
already have been reused for another file.

I suppose they use something like inotify (or their own virtual file system driver over a real file system, like NFS or a loop fs) to learn about changed blocks, but they find to which file each block belongs to and salve this info in their backup catalog. If the changed block is filesystem (or md device, or lvm) metadata, they have to understand this and eithert log the change apropriately or ignore it as it's not file data.

I can imagine something like this working and even how to program. And I'm a little scared about some backup tool being monitoring my file accesses all the time. ;-)

So I won't find anything similar from open source tools, not even a kernel API to help me if I want to implement myself?

You can go file to block list, but thats only for some file systems and
not really reliable except for an unmounted snapshot.
As far as the goal is to capture the data, I can't see why it couldn't be made in a realiable way. I'm not saying it would be trivial. But all file changes have to go though the kernel, even if they are kept in memory before going to the disk, so it should be possible for a daemon to be notified about all changes and get the data. It's just a matter of having a kernel API. I suppose inotify would be it.

But LVM snapshots are a "whole" disk. If I try to backup them using dd
or rsync, they are the same as a full backup. How to backup just the
snapshot changed blocks and later restore them (of course after
restoring the full volume, or to a mirror)?
What the snapshot gives you is an atomic copy of the file system so you
can do a full file system copy, or backup the snapshot without the stuff
underneath changing. It's basically a way to get an unmounted, out of use
copy cheaply that you can then use for stuff.
No questions about this. I want to move further. Doing a dump or a rsync from a snapshot of a multiple TB filesystem is the same as doing to the original volume. I want to devise a way to do this in a faster way without sacrificing realiability.

Correct - the only way to check any copy is valid is by comparing the original to the copy. That in fact (plus clever magic) is how rsync works, so in effect the way to check if an rsync copy is valid is to try and rsync it again. Doing a set of sha or md5sums on the two sides and comparing the output now and then ought to provide a further check.
More time spent in what's already too slow. There could be a rsync or drdb tool that calculates, stores and sends hashes on-the-fly, so the remote copy could be checked per se.

There has to be a better way to restore a few TB of backup consisting of
lots of small files. :-(
Is the issue backing up or restoring ?
The main issue is backing up every day, even many times a day. But for me there's no value in a speedy backup which I cannot restore reliably, not just from the computer standpoint. Someone (people) has to find which backup sets are needed to do the restore. They need to be able to check these backup sets before or during the restore.

  If it is backing up then it may be
possible to work out which blocks are different between two snapshots and
transfer just those.

How? Anyone on the list can provide hints?

I don't know the innards of the LVM layer well
enough to know if there is a clever way to do that. I'm also not sure it
would help if the blocks are scattered about as it would still be a lot
of seeking.
That "clever way" seems to be what commercial tools promise, but they don't tell me what they use: which kernel API, their own driver, or if they work only this or that network storage... :-( I don't trust anything I can't understand how it works. All "magical" solutions I found previoulsy proved to be no solution at all.

I'm seeing the file three walk is taking too long, just to find that most files weren't changed, even relying on last modification time, that if I could get a list of blocks to back up it should be faster (less disk seeks).

It shouldn't be too hard to implement a deamon using inotify and some queueing strategy to deal with changed file blocks, add metadara, then compress and send elsewhere. On the same machine, if I read the changed block I should get it's correct data, even if they weren't synced yet. But I can't find anoyne who did as open source, so maybe threre's some problem I could not see yet. And I'd take too long to implement and debug myself alone. Any developers out there seeking for alfa testers for their new, revolutionary, backup tool ;-)

[]s, Fernando Lozano

users mailing list
To unsubscribe or change subscription options:
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to