Quoting Oliver Mattos on Thu, Dec 11 00:18:
>
> > It would be interesting to see how many duplicate *blocks* there are
> > across the filesystem, agnostic to files...
Here is my contribution. It's a perl script that goes through every
block (of various block sizes) on a device and prints out sum
On Thu, 11 Dec 2008 8:19:03 am Ray Van Dolson wrote:
> I'm not sure why this hasn't caught on, but as soon as a solid and fast
> implementation of it exists in the Linux world I really think it can
> catch on for VM datastores I know we've hollered at Sun as to why
> they haven't rolled it out
On Wed, 2008-12-10 at 17:53 +, Oliver Mattos wrote:
> > > 2) Keep a tree of checksums for data blocks, so that a bit of data can
> > > be located by it's checksum. Whenever a data block is about to be
> > > written check if the block matches any known block, and if it does then
> > > don't bo
> Neat. Thanks much. It'd be cool to output the results of each of your
> hashes to a database so you can get a feel for how many duplicate
> blocks there are cross-files as well.
>
> I'd like to run this in a similar setup on all my VMware VMDK files and
> get an idea of how much space savings
On Thu, Dec 11, 2008 at 03:42:58AM +, Oliver Mattos wrote:
> Here is a script to locate duplicate data WITHIN files:
>
> On some test file sets of binary data with no duplicated files, about 3%
> of the data blocks were duplicated, and about 0.1% of the data blocks
> were nulls. The data was
Here is a script to locate duplicate data WITHIN files:
On some test file sets of binary data with no duplicated files, about 3%
of the data blocks were duplicated, and about 0.1% of the data blocks
were nulls. The data was mainly elf and win32 binaries plus some random
game data, office document
> It would be interesting to see how many duplicate *blocks* there are
> across the filesystem, agnostic to files...
>
> Is this somthing your script does Oliver?
My script doesn't yet exist, although when created it would, yes. I was
thinking of just making a BASH script and using dd to extrac
On Wed, Dec 10, 2008 at 01:57:54PM -0800, Tracy Reed wrote:
> On Wed, Dec 10, 2008 at 09:42:16PM +, Oliver Mattos spake thusly:
> > I'm considering writing that script to test on my ext3 disk just to see
> > how much duplicate wasted data I really have.
>
> Check out the fdupes command. In Fed
On Wed, 2008-12-10 at 13:57 -0800, Tracy Reed wrote:
> On Wed, Dec 10, 2008 at 09:42:16PM +, Oliver Mattos spake thusly:
> > I'm considering writing that script to test on my ext3 disk just to see
> > how much duplicate wasted data I really have.
>
> Check out the fdupes command. In Fedora 8 i
On Wed, Dec 10, 2008 at 09:42:16PM +, Oliver Mattos spake thusly:
> I'm considering writing that script to test on my ext3 disk just to see
> how much duplicate wasted data I really have.
Check out the fdupes command. In Fedora 8 it is in the yum repo as
fdupes-1.40-10.fc8
--
Tracy Reed
http
I see quite a few uses for this, and while it looks like the kernel mode
automatic de-dup-on-write code might be performance costly, require disk
format changes, and be controversial, it sounds like the user mode
utility could be implemented today.
It looks like a simple script could do the job -
I lost the original post so I'm jumping in at the wrong thread-point :)
Someone mentioned that the primary usage of de-dup is in the backup
realm. True perhaps currently, but de-dup IMO is *the* killer app in
the world of virtualization and is a huge reason why we're picking
NetApp at work to back
On Wed, 2008-12-10 at 13:07 -0700, Anthony Roberts wrote:
> > When the a direct read
> > comparison is required before sharing blocks, it is probably best done
> > by a stand alone utility, since we don't want wait for a read of a full
> > extent every time we want to write on.
>
> Can a stand-alo
> > 2) Keep a tree of checksums for data blocks, so that a bit of data can
> > be located by it's checksum. Whenever a data block is about to be
> > written check if the block matches any known block, and if it does then
> > don't bother duplicating the data on disk. I suspect this option may
>
De-duplication is useful in data backup systems because of the high level of
data redundancy, but I'm not sure whether it is necessary for a
general-purposed fs. If you really want to do so, I will suggest the latter
one. File-level de-dup can be implemented in a user-level application.
As I go
On Tue, 2008-12-09 at 22:48 +, Oliver Mattos wrote:
> Hi,
>
> Say I download a large file from the net to /mnt/a.iso. I then download
> the same file again to /mnt/b.iso. These files now have the same
> content, but are stored twice since the copies weren't made with the bcp
> utility.
>
>
Hi all,
On Tue, Dec 9, 2008 at 10:48 PM, Oliver Mattos
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> Say I download a large file from the net to /mnt/a.iso. I then download
> the same file again to /mnt/b.iso. These files now have the same
> content, but are stored twice since the copies weren't made wit
Hi,
Say I download a large file from the net to /mnt/a.iso. I then download
the same file again to /mnt/b.iso. These files now have the same
content, but are stored twice since the copies weren't made with the bcp
utility.
The same occurs if a directory tree with duplicate files (created with
b
18 matches
Mail list logo