Chris,
> It is a counter and a back reference. With Yan Zheng's new format
> work, the limit is not 2^64.
That means that there is one back reference for every use of the block?
Where is this back reference stored? (I'm asking because if one back
reference for every copy is stored, it can obviou
Hello Chris,
> > My question is now, how often can a block in btrfs be refferenced?
> The exact answer depends on if we are referencing it from a single
> file or from multiple files. But either way it is roughly 2^32.
could you please explain to me what underlying datastructure is used to
mon
Hello Heinz,
> Hi, during the last half year I thought a little bit about doing dedup
> for my backup program: not only with fixed blocks (which is
> implemented), but with moving blocks (with all offsets in a file: 1
> byte, 2 byte, ...). That means, I have to have *lots* of comparisions
> (size
Hello Jan,
* Jan-Frode Myklebust [090504 20:20]:
> "thin or shallow clones" sounds more like sparse images. I believe
> "linked clones" is the word for running multiple virtual machines off
> a single gold image. Ref, the "VMware View Composer" section of:
not exactly. VMware has one golden imag
Ric,
> I would not categorize it as offline, but just not as inband (i.e., you can
> run a low priority background process to handle dedup).
> Offline windows are extremely rare in production sites these days and
> it could take a very long time to do dedup at the block level over a
> large file
Hello Andrey,
> As far as I understand, VMware already ships this "gold image" feature
> (as they call it) for Windows environments and claims it to be very
> efficient.
they call it ,,thin or shallow clones'' and ship it with desktop
virtualization (one vm per thinclient user) and for VMware lab
Hello Ric,
> (1) Block level or file level dedup?
what is the difference between the two?
> (2) Inband dedup (during a write) or background dedup?
I think inband dedup is way to intensive on ressources (memory) and also
would kill every performance benchmark. So I think the offline dedup is
the
Hello Chris,
> Your database should know, and the ioctl could check to see if the
> source and destination already point to the same thing before doing
> anything expensive.
I see.
> > So, if I only have file, offset, len and not the block number, is there
> > a way from userland to tell if two
Hello Chris,
> But, in your ioctls you want to deal with [file, offset, len], not
> directly with block numbers. COW means that blocks can move around
> without you knowing, and some of the btrfs internals will COW files in
> order to relocate storage.
> So, what you want is a dedup file (or fil
Hello Chris,
> You can start with the code documentation section on
> http://btrfs.wiki.kernel.org
I read through this and at the moment one questions come in my mind:
http://btrfs.wiki.kernel.org/images-btrfs/7/72/Chunks-overview.png
Looking at this picture, when I'm going to implement the ded
Hello Chris,
> They are, but only the crc32c are stored today.
maybe crc32c is good enough to identify duplicated blocks, I mean we
only need a hint, the dedup ioctl does the double checking. I will write
tomorrow a perl script and compare the results to the one that uses md5
and repoort back.
>
Hello,
> > - Implement a system call that reports all checksums and unique
> > block identifiers for all stored blocks.
> This would require storing the larger checksums in the filesystem. It
> is much better done in the dedup program.
I think I misunderstood something here. I
Hello Heinz,
> I wrote a backup tool which uses dedup, so I know a little bit about
> the problem and the performance impact if the checksums are not in
> memory (optionally in that tool).
> http://savannah.gnu.org/projects/storebackup
> Dedup really helps a lot - I think more than I could imagin
Hello,
* Thomas Glanzmann [090428 22:10]:
> exactly. And if there is a way to retrieve the already calculated
> checksums from kernel land, than it would be possible to implement a
> ,,systemcall'' that gives the kernel a hint of a possible duplicated
> block (like provid
Hello,
> Not today. The sage developers sent a patch to make an ioctl for this,
> but since it was hard coded to crc32c I haven't taken it yet.
could you send me the patch, I would love to make it work for arbritrary
checksums and resubmit.
Thomas
--
To unsubscribe from this list: send
Hello Heinz,
> It's not only cpu time, it's also memory. You need 32 byte for each 4k
> block. It needs to be in RAM for performance reason.
exactly and that is not going to scale.
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message
Hello Chris,
> Yes, but for the purposes of dedup, it's not exactly what you want.
> You want an index by checksum, and the current btrfs code indexes by
> logical byte number in the disk.
that would be good for online dedup, but in practice that is not going
to work or I don't see how.
> So you
Hello Michael,
> I'd start with a crc32 and/or MD5 to find candidate blocks, then do a
> bytewise comparison before actually merging them. Even the risk of an
> accidental collision is too high, and considering there are plenty of
> birthday-style MD5 attacks it would not be extraordinarily dif
Hello Chris,
> Right now the blocksize can only be the same as the page size. For
> this external dedup program you have in mind, you could use any
> multiple of the page size.
perfect. Exactly what I need.
> Three days is probably not quite enough ;) I'd honestly prefer the
> dedup happen ent
Hello,
> It is possible, there's room in the metadata for about about 4k of
> checksum for each 4k of data. The initial btrfs code used sha256, but
> the real limiting factor is the CPU time used.
I see. There a very efficient md5 algorithms out there, for example,
especially if the code is writ
Hello Chris,
> > Is there a checksum for every block in btrfs?
> Yes, but they are only crc32c.
I see, is it easily possible to exchange that with sha-1 or md5?
> > Is it possible to retrieve these checksums from userland?
> Not today. The sage developers sent a patch to make an ioctl for
> t
Hello,
> I wouldn't rely on crc32: it is not a strong hash,
> Such deduplication can lead to various problems,
> including security ones.
sure thing, did you think of replacing crc32 with sha1 or md5, is this
even possible (is there enough space reserved so that the change can be
done without cha
Hello Tomasz,
> Did you just compare checksums, or did you also compare the data "bit
> after bit" if the checksums matched?
no, I just used the md5 checksum. And even if I have a hash escalation
which is highly unlikely it still gives a good house number.
Thomas
--
To unsubscribe from t
Hello,
I have a few more questions to this:
- Is there a checksum for every block in btrfs?
- Is it possible to retrieve these checksums from userland?
- Is it possible to use a blocksize of 4 or 8 kbyte with btrfs?
To get a bit more specific: If it is relatively easy to
Chris,
what blocksizes can I choose with btrfs? Do you think that it is
possible for an outsider like me to submit patches to btrfs which enable
dedup in three fulltime days?
Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord
Hello Chris,
> There is a btrfs ioctl to clone individual files, and this could be used
> to implement an online dedup. But, since it is happening from userland,
> you can't lock out all of the other users of a given file.
> So, the dedup application would be responsible for making sure a given
Hello,
I would like to know if it would be possible to implement the following
feature in btrfs:
Have an online filesystem check which accounts for possible duplicated
data blocks (maybe with the help of already implemented checksums: Are
these checksums for the whole file or block based?) and de
27 matches
Mail list logo