Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-05-04 Thread Scott Steagall
On 05/04/2010 09:29 AM, Kyle McDonald wrote:
> On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote:
>> "valrh...@gmail.com"  writes:
>>
>>   
>>> I have been using DVDs for small backups here and there for a decade
>>> now, and have a huge pile of several hundred. They have a lot of
>>> overlapping content, so I was thinking of feeding the entire stack
>>> into some sort of DVD autoloader, which would just read each disk, and
>>> write its contents to a ZFS filesystem with dedup enabled. [...] That
>>> would allow me to consolidate a few hundred CDs and DVDs onto probably
>>> a terabyte or so, which could then be kept conveniently on a hard
>>> drive and archived to tape.
>>> 
>> it would be inconvenient to make a dedup copy on harddisk or tape, you
>> could only do it as a ZFS filesystem or ZFS send stream.  it's better to
>> use a generic tool like hardlink(1), and just delete files afterwards
>> with
>>
>>   
> There is a perl script floating around on the internet for years that
> will convert copies of files on the same FS to hardlinks (sorry I don't
> have the name handy). So you don't need ZFS. Once this is done you can
> even recreate an ISO and burn it back to DVD (possibly merging hundreds
> of CD's into one DVD (or BD!). The script can also delete the
> duplicates, but there isn't much control over which one it keeps - for
> backupsyou may realyl want to  keep the earliest (or latest?) backup the
> file appeared in.

I've used "Dirvish" http://www.dirvish.org/ and rsync to do just
that...worked great!

Scott

> 
> Using ZFS Dedup is an interesting way of doing this. However archiving
> the result may be hard. If you use different datasets (FS's) for each
> backup, can you only send 1 dataset at a time (since you can only
> snapshot on a dataset level? Won't that 'undo' the deduping?
>  
> If you instead put all the backups on on data set, then the snapshot can
> theoretically contain the dedpued data. I'm not clear on whether
> 'send'ing it will preserve the deduping or not - or if it's up to the
> receiving dataset to recognize matching blocks? If the dedup is in the
> stream, then you may be able to write the stream to a DVD or BD.
> 
> Still if you save enough space so that you can add the required level of
> redundancy, you could just leave it on disk and chuck the DVD's. Not
> sure I'd do that, but it might let me put the media in the basement,
> instead of the closet, or on the desk next to me.
> 
>   -Kyle
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-05-04 Thread Kyle McDonald
On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote:
> "valrh...@gmail.com"  writes:
>
>   
>> I have been using DVDs for small backups here and there for a decade
>> now, and have a huge pile of several hundred. They have a lot of
>> overlapping content, so I was thinking of feeding the entire stack
>> into some sort of DVD autoloader, which would just read each disk, and
>> write its contents to a ZFS filesystem with dedup enabled. [...] That
>> would allow me to consolidate a few hundred CDs and DVDs onto probably
>> a terabyte or so, which could then be kept conveniently on a hard
>> drive and archived to tape.
>> 
> it would be inconvenient to make a dedup copy on harddisk or tape, you
> could only do it as a ZFS filesystem or ZFS send stream.  it's better to
> use a generic tool like hardlink(1), and just delete files afterwards
> with
>
>   
There is a perl script floating around on the internet for years that
will convert copies of files on the same FS to hardlinks (sorry I don't
have the name handy). So you don't need ZFS. Once this is done you can
even recreate an ISO and burn it back to DVD (possibly merging hundreds
of CD's into one DVD (or BD!). The script can also delete the
duplicates, but there isn't much control over which one it keeps - for
backupsyou may realyl want to  keep the earliest (or latest?) backup the
file appeared in.

Using ZFS Dedup is an interesting way of doing this. However archiving
the result may be hard. If you use different datasets (FS's) for each
backup, can you only send 1 dataset at a time (since you can only
snapshot on a dataset level? Won't that 'undo' the deduping?
 
If you instead put all the backups on on data set, then the snapshot can
theoretically contain the dedpued data. I'm not clear on whether
'send'ing it will preserve the deduping or not - or if it's up to the
receiving dataset to recognize matching blocks? If the dedup is in the
stream, then you may be able to write the stream to a DVD or BD.

Still if you save enough space so that you can add the required level of
redundancy, you could just leave it on disk and chuck the DVD's. Not
sure I'd do that, but it might let me put the media in the basement,
instead of the closet, or on the desk next to me.

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-04 Thread Dan Pritts
On Tue, Mar 02, 2010 at 05:35:07PM -0800, R.G. Keen wrote:
> And as to automation for reading: I recently ripped and archived my entire CD 
> collection, some 500 titles. Not the same issue in terms of data, but much 
> the same in terms of needing to load/unload the disks. I went as far as to 
> think of getting/renting an autoloader, but I found that it was much more 
> efficient to keep a stack by my desk and swap disks when the ripper beeped at 
> me. This was a very low priority task in my personal stack, but over a  few 
> weeks, there were enough beeps and minutes to swap the disks out. 

I did something very similar but with over 1000 CDs.  If you can scare
up an external DVD drive, use it too - that way you'll have to change
half as many times.  

danno
--
Dan Pritts, Sr. Systems Engineer
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224

Internet2 Spring Member Meeting
April 26-28, 2010 - Arlington, Virginia
http://events.internet2.edu/2010/spring-mm/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread R.G. Keen
This is meant with the sincerest of urges to help. 

I have a similar situation, and pondered much the same issues. However, I'm 
extremely short of time as it is. I decided that my needs would be best served 
leaving the data on those backup DVDs and CDs in case I needed it. The "in case 
I need it" is something that hasn't happened to me for over fifteen years now, 
largely because I'm careful with what I'm working on at the moment and backing 
to other disks. 

You might first want to install DVDisaster to scan those old disks and see if 
they're still self consistent. Some of them may not be readable at all, or only 
partially readable. 

And as to automation for reading: I recently ripped and archived my entire CD 
collection, some 500 titles. Not the same issue in terms of data, but much the 
same in terms of needing to load/unload the disks. I went as far as to think of 
getting/renting an autoloader, but I found that it was much more efficient to 
keep a stack by my desk and swap disks when the ripper beeped at me. This was a 
very low priority task in my personal stack, but over a  few weeks, there were 
enough beeps and minutes to swap the disks out. 

It's very tempting to use a neato tool - and zfs is a major neat one! - when 
there's a task to be done. However, sometimes just scratching away at a task a 
little at a time is almost as fast and much cheaper. 

Now, how did you say you set up dedup? 8-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread Toby Thain


On 2-Mar-10, at 4:31 PM, valrh...@gmail.com wrote:


Freddie: I think you understand my intent correctly.

This is not about a perfect backup system. The point is that I have  
hundreds of DVDs that I don't particularly want to sort out, but  
they are pretty useless from a management standpoint in their  
current form. ZFS + dedup would be the way to at least get them all  
in one place, where at least I can search, etc.---which is pretty  
much impossible on a stack of disks.


I also don't want file-level dedup, as a lot of these disks are a  
"oh, it's the end of the day; I'm going to burn what I worked on  
today, so if my computer dies I won't be completely stuck on this  
project..."



Wow, you are going to like snapshots and redundancy a whole lot  
better, as a solution to that.


--Toby



File-level dedup would be a nightmare to sort out, because of lots  
of incremental changes---exactly the point of block-level dedup.


This is not an organized archive at all; I just want to consolidate  
a bunch of old disks, in the small case they could be useful, and  
do it without investing much time.


So does anyone know of an autoloader solution that would do this?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread Lori Alt

On 03/ 2/10 11:48 AM, Freddie Cash wrote:
On Tue, Mar 2, 2010 at 7:15 AM, Kjetil Torgrim Homme 
mailto:kjeti...@linpro.no>> wrote:


"valrh...@gmail.com "
mailto:valrh...@gmail.com>> writes:

> I have been using DVDs for small backups here and there for a decade
> now, and have a huge pile of several hundred. They have a lot of
> overlapping content, so I was thinking of feeding the entire stack
> into some sort of DVD autoloader, which would just read each
disk, and
> write its contents to a ZFS filesystem with dedup enabled. [...]
That
> would allow me to consolidate a few hundred CDs and DVDs onto
probably
> a terabyte or so, which could then be kept conveniently on a hard
> drive and archived to tape.

it would be inconvenient to make a dedup copy on harddisk or tape, you
could only do it as a ZFS filesystem or ZFS send stream.  it's
better to
use a generic tool like hardlink(1), and just delete files afterwards
with

Why would it be inconvenient?  This is pretty much exactly what ZFS + 
dedupe is perfect for.


Since dedupe is pool-wide, you could create individual filesystems for 
each DVD.  Or use just 1 filesystem with sub-directories.  Or just one 
filesystem with snapshots after each DVD is copied over top.


The data would be dedupe'd on write, so you would only have 1 copy of 
unique data.


To save it to tape, just "zfs send" it, and save the stream file.


Stream dedup is largely independent of on-disk dedup.  If the content is 
dedup'ed on disk, but you don't specify the -D to 'zfs send', the 
dedup'ed data will be re-expanded.  Even if the content is NOT dedup'ed 
on disk, the -D option will cause the blocks to be dedup'ed in the stream.


One advantage to using them both is that the 'zfs send -D' processing 
doesn't need to recalculate the block checksums if they already exist on 
disk.  This speeds up the send stream generation code by a lot.


Also, in response to another comment about the send stream format not 
being recommended for archiving, that all depends on how you intend to 
use the send stream in the future.   The format IS supported going 
forward, and future version of zfs will continue to be capable of 
reading older send stream formats (the zfs(1M) man page has been 
modified to clarify this now).


Lori


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread valrh...@gmail.com
Freddie: I think you understand my intent correctly. 

This is not about a perfect backup system. The point is that I have hundreds of 
DVDs that I don't particularly want to sort out, but they are pretty useless 
from a management standpoint in their current form. ZFS + dedup would be the 
way to at least get them all in one place, where at least I can search, 
etc.---which is pretty much impossible on a stack of disks.

I also don't want file-level dedup, as a lot of these disks are a "oh, it's the 
end of the day; I'm going to burn what I worked on today, so if my computer 
dies I won't be completely stuck on this project..." File-level dedup would be 
a nightmare to sort out, because of lots of incremental changes---exactly the 
point of block-level dedup.

This is not an organized archive at all; I just want to consolidate a bunch of 
old disks, in the small case they could be useful, and do it without investing 
much time.

So does anyone know of an autoloader solution that would do this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread Kjetil Torgrim Homme
Freddie Cash  writes:

> Kjetil Torgrim Homme  wrote:
>
> it would be inconvenient to make a dedup copy on harddisk or tape,
> you could only do it as a ZFS filesystem or ZFS send stream.  it's
> better to use a generic tool like hardlink(1), and just delete
> files afterwards with
>
> Why would it be inconvenient?  This is pretty much exactly what ZFS +
> dedupe is perfect for.

the duplication is not visible, so it's still a wilderness of duplicates
when you navigate the files.

> Since dedupe is pool-wide, you could create individual filesystems for
> each DVD.  Or use just 1 filesystem with sub-directories.  Or just one
> filesystem with snapshots after each DVD is copied over top.
>
> The data would be dedupe'd on write, so you would only have 1 copy of
> unique data.

for this application, I don't think the OP *wants* COW if he changes one
file.  he'll want the duplicates to be kept in sync, not diverging (in
contrast to storage for VMs, for instance).

with hardlinks, it is easier to identify duplicates and handle them
however you like.  if there is a reason for the duplicate access paths
to your data, you can keep them.  I would want to straighten the mess
out, though, rather than keep it intact as closely as possible.

> To save it to tape, just "zfs send" it, and save the stream file.

the zfs stream format is not recommended for archiving.

> ZFS dedupe would also work better than hardlinking files, as it works
> at the block layer, and will be able to dedupe partial files.

yes, but for the most part this will be negligible.  copies of growing
files, like log files, or perhaps your novel written as a stream of
conciousness, will benefit.  unrelated partially identical files are
rare.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread Freddie Cash
On Tue, Mar 2, 2010 at 7:15 AM, Kjetil Torgrim Homme wrote:

> "valrh...@gmail.com"  writes:
>
> > I have been using DVDs for small backups here and there for a decade
> > now, and have a huge pile of several hundred. They have a lot of
> > overlapping content, so I was thinking of feeding the entire stack
> > into some sort of DVD autoloader, which would just read each disk, and
> > write its contents to a ZFS filesystem with dedup enabled. [...] That
> > would allow me to consolidate a few hundred CDs and DVDs onto probably
> > a terabyte or so, which could then be kept conveniently on a hard
> > drive and archived to tape.
>
> it would be inconvenient to make a dedup copy on harddisk or tape, you
> could only do it as a ZFS filesystem or ZFS send stream.  it's better to
> use a generic tool like hardlink(1), and just delete files afterwards
> with
>
> Why would it be inconvenient?  This is pretty much exactly what ZFS +
dedupe is perfect for.

Since dedupe is pool-wide, you could create individual filesystems for each
DVD.  Or use just 1 filesystem with sub-directories.  Or just one filesystem
with snapshots after each DVD is copied over top.

The data would be dedupe'd on write, so you would only have 1 copy of unique
data.

To save it to tape, just "zfs send" it, and save the stream file.

ZFS dedupe would also work better than hardlinking files, as it works at the
block layer, and will be able to dedupe partial files.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-02 Thread Kjetil Torgrim Homme
"valrh...@gmail.com"  writes:

> I have been using DVDs for small backups here and there for a decade
> now, and have a huge pile of several hundred. They have a lot of
> overlapping content, so I was thinking of feeding the entire stack
> into some sort of DVD autoloader, which would just read each disk, and
> write its contents to a ZFS filesystem with dedup enabled. [...] That
> would allow me to consolidate a few hundred CDs and DVDs onto probably
> a terabyte or so, which could then be kept conveniently on a hard
> drive and archived to tape.

it would be inconvenient to make a dedup copy on harddisk or tape, you
could only do it as a ZFS filesystem or ZFS send stream.  it's better to
use a generic tool like hardlink(1), and just delete files afterwards
with

  find . -type f -links +1 -exec rm {} \;

(untested!  notice that using xargs or -exec rm {} + will wipe out all
copies of your duplicate files, so don't do that!)

  http://linux.die.net/man/1/hardlink

perhaps this is more convenient:
  http://netdial.caribe.net/~adrian2/fdupes.html

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-03-01 Thread Thomas Burgess
On Mon, Mar 1, 2010 at 11:48 PM, valrh...@gmail.com wrote:

> One of the most useful things I've found with ZFS dedup (way to go Jeff
> Bonwick and Co.!) is the ability to consolidate backups. I had six different
> complete backups of all of my files spread out over various hard drives, and
> dedup allowed me to consolidate them into something that took less twice the
> space of the original. I was thrilled when I saw this the first time.
>
> This led me to another idea: I have been using DVDs for small backups here
> and there for a decade now, and have a huge pile of several hundred. They
> have a lot of overlapping content, so I was thinking of feeding the entire
> stack into some sort of DVD autoloader, which would just read each disk, and
> write its contents to a ZFS filesystem with dedup enabled. Even if the
> autoloader had to run on Windows or Linux, I could just use a mounted drive
> to achieve the same ends. That would allow me to consolidate a few hundred
> CDs and DVDs onto probably a terabyte or so, which could then be kept
> conveniently on a hard drive and archived to tape. Does anyone know of a DVD
> autoloader that would allow me to do this easily, and if someone might be
> willing to rent one to me (I'm in the Boston area)? I only need to do this
> once.
> --
>


This would be a kick ass project to try to make with spare parts.  I might
even try it now that you bring it up.


> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss