Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2010-06-15 Thread Jim Klimov
Hi all

I wonder if there has been any new development on this matter over the past 6 
months.

Today i pondered an idea of zfs-aware mv, capable of doing zero read/write of 
file data when moving files between datasets of one pool.

This seems like a (z)cp idea proposed in this thread and seems like a trivial 
job for sun - who have all APIs and functional implementations for cloning and 
dedup as a means to reference the same block from different files.

Such extension to cp should be cheaper than generac dedup and useful for 
copying any templated file sets. i thought of local zones first, but most 
people may init them by packages (though zoneadm says it is copying thousands 
of files), so /etc/skel might be a better example of the usecase - though 
nearly useless ,)

jim
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2010-06-15 Thread Jim Klimov
Hi all

I wonder if there has been any new development on this matter over the past 6 
months.

Today i pondered an idea of zfs-aware mv, capable of doing zero read/write of 
file data when moving files between datasets of one pool.

This seems like a (z)cp idea proposed in this thread and seems like a trivial 
job for sun - who have all APIs and functional implementations for cloning and 
dedup as a means to reference the same block from different files.

Such extension to cp should be cheaper than generac dedup and useful for 
copying any templated file sets. i thought of local zones first, but most 
people may init them by packages (though zoneadm says it is copying thousands 
of files), so /etc/skel might be a better example of the usecase - though 
nearly useless ,)

jim
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Per Baatrup
Michael,

Your explanation is 100% correct: I am concerned about the effort when managing 
quite large files ex. 500MB.

In my specific case we have DVD/BlueRay chapter files 500MB - 2GB (part of 
movie) that are concatenated into complete movie (3-20GB).

From my point of view (large files) it is not so important whether there is a 
minor issue with handling the last block in terms of disk space efficiency.

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Jeffry Molanus
Actually, I asked about this a while ago only called it file-level cloning. 
Consider you have 100VM's and you want to clone just one? 

BTRFS added a specialized IOCTL() call to make the FS aware that it has to 
clone this obviously saves copy time and dedup time.

Regards, Jeffry 

 -Oorspronkelijk bericht-
 Van: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] Namens Roland Rambau
 Verzonden: donderdag 3 december 2009 16:25
 Aan: Per Baatrup
 CC: zfs-discuss@opensolaris.org
 Onderwerp: Re: [zfs-discuss] file concatenation with ZFS copy-on-write
 
 gang,
 
 actually a simpler version of that idea would be a zcp:
 
 if I just cp a file, I know that all blocks of the new file
 will be duplicates; so the cp could take full advantage for
 the dedup without a need to check/read/write anz actual data
 
-- Roland
 
 Per Baatrup schrieb:
  dedup operates on the block level leveraging the existing FFS
 checksums. Read What to dedup: Files, blocks, or bytes here
 http://blogs.sun.com/bonwick/entry/zfs_dedup
 
  The trick should be that the zcat userland app already knows that it
 will generate duplicate files so data read and writes could be avoided
 all together.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Per Baatrup
I was thinking in the same direction about the efficiency of the offset 
calculations. Trying to get into the ZFS source code to understand this part, 
but did not have time to get there yet.
This issue may be a showstopper for the proposal as it would restrict the 
functionality to quite rare cases.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread sgheeren

Darren J Moffat wrote:

Per Baatrup wrote:
I would like to to concatenate N files into one big file taking 
advantage of ZFS copy-on-write semantics so that the file 
concatenation is done without actually copying any (large amount of) 
file content.

  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS 
filesystem?


Am looking into the ZFS source code to understand if there are 
sufficient (private) interfaces to make a simple zcat -o f15   f1 f2 
f3 f4 f5 userland application in C code. Does anybody have advice on 
this?


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Well, to be precise, dedup is implemented based on the COW features in
ZFS-es block allocator :) So yes, COW helps: it is the actual
optimization feature.
However, for this use case, it is DEDUP that obviates the need to do any
'special case' handling for this specific type of job, because DEDUP
generalizes the detection of re-useable sotrage blocks.

In all accounts: yes, yes and yes inclusive :)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread sgheeren

Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient (private) 
interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland application 
in C code. Does anybody have advice on this?

TIA
Per
  

You are right that a lot of blocks should be re-usable. This is
essentially ZFS's (new) dedup feature, so why bother writing complicated
(non-posix) userland extensions, when you already have the storage
optimization in recent versions of ZFS...?

Also what you are proposing in fs implementation would not actually be
on file level but (normally) on block allocation level. This idea will
break down badly unless your 'files' are all aligned and in multiples of
the blocksize (which is dynamic/configurable in ZFS).

Lastly, you might post at zfs-code :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread sgheeren
After reading all the comments it appears that there may be a 'real' 
problem with unaligned block sizes that DEDUP simply will not handle.


What you seem to be after, then, is the opposite of sparse files, 
'virtual files' that can be chained together as a linked list of 
_fragments_ of allocation blocks as well as full allocation blocks. This 
could then be leveraged by a specialized concatenation driver (userland) 
to avoid realigning the blocks and missing the 'opportunity' to DEDUP or 
COW the existing blocks.


As always in computing, a specialized per-use-case driver will be able 
to yields the best optimizations. However, there will be a balance point 
since obviously the optimization based on leaving parts of allocation 
blocks unused is not going to healthy for say:


cat s1 s2 s3 s4  s999  all_s_files
rm s*

Where s1...s999 all use (much) less than a block size. All the 'gain' in 
DEDUP is quickly offset by the enormous waste of block space after 
deletion of the constituent files.


Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient (private) 
interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland application 
in C code. Does anybody have advice on this?

TIA
Per
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Per Baatrup Petersen

Thank you for the feedback Michael.

zcat was my acronym for a special ZFS aware version of cat and I did 
not know that it was an existing command. Simply forgot to check. Should 
rename if to zfscat or something similar.


Venlig hilsen
Per


Michael Schuster skrev:

Per Baatrup wrote:

dedup operates on the block level leveraging the existing FFS
checksums. Read What to dedup: Files, blocks, or bytes here
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it
will generate duplicate files so data read and writes could be avoided
all together.


you'd probably be better off avoiding zcat - it's been in use since 
almost forever, from the man-page:


  zcat
 The zcat utility writes to standard output the  uncompressed
 form  of  files that have been compressed using compress. It
 is the equivalent  of  uncompress-c.  Input  files  are  not
 affected.

:-)

cheers
Michael


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Bob Friesenhahn

On Fri, 4 Dec 2009, Jeffry Molanus wrote:


Actually, I asked about this a while ago only called it file-level cloning. 
Consider you have 100VM's and you want to clone just one?

BTRFS added a specialized IOCTL() call to make the FS aware that it 
has to clone this obviously saves copy time and dedup time.


The best thing that I see in Solaris for efficiently 
copying/concatenating files are the functions sendfile() and 
sendfilev() in libsendfile.  Unfortunately, these are not portable 
interfaces and the amount of data which can be copied in one call is 
limited by the maximum value supported by the size_t type.  A 64-bit 
program should be able to request sending the content of a large 
file into another large file in one call, but a 32-bit program would 
need to use multiple calls.


If Solaris sendfile is similar to the sendfile in other OSs, the data 
copy is done in kernel space (with potentially zero-copy for the read) 
but the data still needs to be copied from/to the filesystem.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Richard Elling

On Dec 4, 2009, at 2:21 AM, Jeffry Molanus wrote:

Actually, I asked about this a while ago only called it file-level  
cloning. Consider you have 100VM's and you want to clone just one?


In my experience, cloning is done for basic provisioning, so how would  
you get

to the case where you could not clone any particular VM?
 -- richard



BTRFS added a specialized IOCTL() call to make the FS aware that it  
has to clone this obviously saves copy time and dedup time.


Regards, Jeffry


-Oorspronkelijk bericht-
Van: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] Namens Roland Rambau
Verzonden: donderdag 3 december 2009 16:25
Aan: Per Baatrup
CC: zfs-discuss@opensolaris.org
Onderwerp: Re: [zfs-discuss] file concatenation with ZFS copy-on- 
write


gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data

  -- Roland

Per Baatrup schrieb:

dedup operates on the block level leveraging the existing FFS

checksums. Read What to dedup: Files, blocks, or bytes here
http://blogs.sun.com/bonwick/entry/zfs_dedup


The trick should be that the zcat userland app already knows that it
will generate duplicate files so data read and writes could be  
avoided

all together.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Richard Elling
The way I see it, a filename is a handle to a specific set of blocks.  
For applications
that can handle multiple files, no worries. For applications that  
can't (inferring DVD
players?) I sense that a fixing the tail block issue in a file system  
is probably not
the best place.  This affects all file systems because all of them  
must use at least

512 bytes for a physical block.

Suppose we consider a shortcut, say a symbolic link with multiple  
sources. When
read, it will appear to the application as a single file, but be  
comprised of the
concatenated contents of multiple files, respecting the proper EOFs.  
This could
work as long as the files are readonly.  Would that be too much of a  
constraint?

 -- richard

On Dec 3, 2009, at 5:23 AM, sgheeren wrote:

After reading all the comments it appears that there may be a 'real'  
problem with unaligned block sizes that DEDUP simply will not handle.


What you seem to be after, then, is the opposite of sparse files,  
'virtual files' that can be chained together as a linked list of  
_fragments_ of allocation blocks as well as full allocation blocks.  
This could then be leveraged by a specialized concatenation driver  
(userland) to avoid realigning the blocks and missing the  
'opportunity' to DEDUP or COW the existing blocks.


As always in computing, a specialized per-use-case driver will be  
able to yields the best optimizations. However, there will be a  
balance point since obviously the optimization based on leaving  
parts of allocation blocks unused is not going to healthy for say:


   cat s1 s2 s3 s4  s999  all_s_files
   rm s*

Where s1...s999 all use (much) less than a block size. All the  
'gain' in DEDUP is quickly offset by the enormous waste of block  
space after deletion of the constituent files.


Per Baatrup wrote:
I would like to to concatenate N files into one big file taking  
advantage of ZFS copy-on-write semantics so that the file  
concatenation is done without actually copying any (large amount  
of) file content.

 cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS  
filesystem?


Am looking into the ZFS source code to understand if there are  
sufficient (private) interfaces to make a simple zcat -o f15   f1  
f2 f3 f4 f5 userland application in C code. Does anybody have  
advice on this?


TIA
Per



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-04 Thread Jeffry Molanus

 In my experience, cloning is done for basic provisioning, so how would
 you get
 to the case where you could not clone any particular VM?
   -- richard

Well, a situation where this might come in handy is when you have your typical 
ISP provider that has multiple ESX hosts with multiple datastores. ESX has 
limits on how many datastores it can have so cloning filesystems over and over 
will only get you that far. (16 I believe?). Or a VDI environment for schools 
for instance? Instead of cloning a complete zfs fs; you can clone the 
freshmen-gold.vmkd times the new subscribed students? 

Let's assume the scenario of the school? You have a NFS export containing gold 
images with different pre installed applications or whatever. How would you 
rapidly deploy 500 new gold-images? Copy them 500 times? If you clone them on 
the ESX side; you would also have to copy them. Moreover why copy-dedup if you 
can prevent the dedup process all together? Since the dedup process in inline; 
it could affect the storage performance as it goes along. 

Regards, Jeffry
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient 
(private) interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland 
application in C code. Does anybody have advice on this?

TIA
Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient (private) 
interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5 userland application 
in C code. Does anybody have advice on this?


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Peter Tribble
On Thu, Dec 3, 2009 at 12:08 PM, Darren J Moffat
darr...@opensolaris.org wrote:
 Per Baatrup wrote:

 I would like to to concatenate N files into one big file taking advantage
 of ZFS copy-on-write semantics so that the file concatenation is done
 without actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5  f15
 Is this already possible when source and target are on the same ZFS
 filesystem?

 Am looking into the ZFS source code to understand if there are sufficient
 (private) interfaces to make a simple zcat -o f15   f1 f2 f3 f4 f5
 userland application in C code. Does anybody have advice on this?

 The answer to this is likely deduplication which ZFS now has.

 The reason dedup should help here is that after the 'cat' f15 will be made
 up of blocks that match the blocks of f1 f2 f3 f4 f5.

Is that likely to happen? dedup is at the block level, so the blocks
in f2 will only
match the same data in f15 if they're aligned, which is only going to happen if
f1 ends on a block boundary.

Besides, you still have to read all the data off the disk, manipulate
it, and write
it all back.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be made up 
of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they 
used before?  This seems unlikely to me.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Erik Ableson
 On 3 déc. 2009, at 13:29, Bob Friesenhahn bfrie...@simple.dallas.tx.u 
s wrote:



On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will  
be made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the  
concatenated blocks are perfectly aligned on the same zfs block  
boundaries they used before?  This seems unlikely to me.


It's also worth noting that if the block alignment works out for the  
dedup, the actual write traffic will be trivial, consisting only of  
pointer references, so the heavy lifting will be the read operations.


Much depends on the contents of the files. Fixed size binary blobs  
that align nicely with 16/32/64k boundaries, or variable sized text  
files.


Cordialement,

Erik Ableson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
dedup operates on the block level leveraging the existing FFS checksums. Read 
What to dedup: Files, blocks, or bytes here 
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Michael Schuster

Per Baatrup wrote:

dedup operates on the block level leveraging the existing FFS
checksums. Read What to dedup: Files, blocks, or bytes here
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it
will generate duplicate files so data read and writes could be avoided
all together.


you'd probably be better off avoiding zcat - it's been in use since 
almost forever, from the man-page:


  zcat
 The zcat utility writes to standard output the  uncompressed
 form  of  files that have been compressed using compress. It
 is the equivalent  of  uncompress-c.  Input  files  are  not
 affected.

:-)

cheers
Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
zcat was my acronym for a special ZFS aware version of cat and the name was 
obviously a big mistake as I did not know it was an existing command and simply 
forgot to check.

Should rename if to zfscat or something similar?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data

  -- Roland

Per Baatrup schrieb:

dedup operates on the block level leveraging the existing FFS checksums. Read 
What to dedup: Files, blocks, or bytes here http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they used 
before?  This seems unlikely to me.


Yes that would be the case.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Actually 'ln -s source target' would not be the same zcp source target as 
writing to the source file after the operation would change the target file as 
well where as for zcp this would only change the source file due to 
copy-on-write semantics of ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster

Per Baatrup wrote:

Actually 'ln -s source target' would not be the same zcp source target
as writing to the source file after the operation would change the
target file as well where as for zcp this would only change the source
file due to copy-on-write semantics of ZFS.


I actually was thinking of creating a hard link (without the -s option), 
but your point is valid for hard and soft links.


cheers
Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Seth

michael schuster wrote:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael

+1

More and more it sounds like an optimization that will either

A. not add much over dedup

or

B. have value only in specific situations - and completely misbehave in 
other situations (even the same situations after passage of time)


Why not just make a special-purpose application (completely user-land) 
for it? I know, 'ln' is remotely kin of this idea but, 'ln' is POSIX and 
people know what to expect.
What you'd practically need to do is whip up a vfs layer that exposes 
the underlying blocks of a filesystem and possibly name them by their 
SHA256 or MD5 hash. Then you'd need (another?) vfs abstraction that 
allows 'virtual' files to be assembled from these blocks in multiple 
independent chains.


I know there is already a fuse implementation of the first vfs driver 
(the name evades me, but I think it was something like chunkfs[1]) and 
one could at least whip up a reasonable read-only Proof-of-Concept of 
the second part.


The reason _I_ wouldn't do that is because, I'm already happy with e.g.:

   mkfifo /var/run/my_part_collector
   (while true; do cat /local/data/my_part_*  
/var/run/my_part_collector; done)

   wc -l /var/run/my_part_collector

The equivalent of this could be (better) expressed in C, perl or any 
language of your choice). I believe this is all POSIX.
  

[1] The reason this exists is obviously for backup and synchronization 
implementations: it will make it possible to backup files using rsync 
when the encryption key is not available to the backup process (with a 
EBC mode crypto algo); it should make it 'simple' to synchronize ones 
large monolythic files with e.g. Amazon S3 cloud storage etc. etc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs that align 
nicely with 16/32/64k boundaries, or variable sized text files.


Note that the default zfs block size is 128K and so that will 
therefore be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of 
deduplication.


It is not wise to riddle zfs with many special-purpose features since 
zfs would then be encumbered by these many features, which tend to 
defeat future improvements.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Darren J Moffat

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs 
that align nicely with 16/32/64k boundaries, or variable sized text 
files.


Note that the default zfs block size is 128K and so that will therefore 
be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of deduplication.


Most ?  I think that is a bit of a sweeping statement.  In know of some 
environments where most files are multiple gigabytes in size and 
others where 1K is the upper bound of the file system.


So I don't think you can say at all that Most files are  128K.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Jason King
On Thu, Dec 3, 2009 at 9:58 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Thu, 3 Dec 2009, Erik Ableson wrote:

 Much depends on the contents of the files. Fixed size binary blobs that
 align nicely with 16/32/64k boundaries, or variable sized text files.

 Note that the default zfs block size is 128K and so that will therefore be
 the default dedup block size.

 Most files are less than 128K and occupy a short tail block so concatenating
 them will not usually enjoy the benefits of deduplication.

 It is not wise to riddle zfs with many special-purpose features since zfs
 would then be encumbered by these many features, which tend to defeat future
 improvements.

Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source  target
are the same or different filesystems and acts accordingly.

There are a few use cases I've encountered where having this would
have been _very_ useful (usually when trying to get large crashdumps
to Sun quickly).  In general, it would allow one to manipulate very
large files by breaking them up into smaller subsets while still
having the end result be a single file.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Bob Friesenhahn

On Thu, 3 Dec 2009, Jason King wrote:


Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source  target
are the same or different filesystems and acts accordingly.


The most common way that I concatenate files into a larger file is by 
using a utility such as 'tar', which outputs a different format.  I 
rarely use 'cat' to concatenate files.


It is desired to concatenate files in a way which works best for 
deduplication then a tar-like format can be invented which takes care 
to always start new file output on a filesystem block boundary.  With 
zfs deduplication this should be faster and take less space than 
compressing the entire result as long as the ouput is stored in the 
same pool.  If output is written to a destination filesystem which 
uses a different block size, then the ideal block size will be that of 
the destination filesystem so that large archive files can still be 
usefull deduplicated.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

Michael,

michael schuster schrieb:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a zcp:

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.


quite similar but with a critical difference:

with hard links any modifications through either link are
seen by both links, since it stays a single file (note that
editors like vi do an implicit cp, they do NOT update the
original file )

That zcp ( actually it should be just a feature of 'cp' )
would be blockwise copy-on-write. It would have exactly
the same semantics as cp but just avoid any data movement,
since we can easily predict what the effect of a cp followed
by a dedup should be.

  -- Roland




--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Roland,

Clearly an extension of cp would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about cat in stead of cp is that in addition to 
copying a single file I would like also to concatenate several files into a 
single file. Can this be accomplished with your (z)cp?

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread A Darren Dunham
On Thu, Dec 03, 2009 at 09:36:23AM -0800, Per Baatrup wrote:

 The reason I was speaking about cat in stead of cp is that in
 addition to copying a single file I would like also to concatenate
 several files into a single file. Can this be accomplished with your
 (z)cp?

Unless you have special data formats, I think it's unlikely that the
last ZFS block in the file will be exactly full.  But to append without
copying, you'd need some way of ignoring a portion of the data in a
non-final ZFS block and stitching together the bytestream.  I don't
think that's possible with the ZFS layout.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Nicolas Williams
On Thu, Dec 03, 2009 at 03:57:28AM -0800, Per Baatrup wrote:
 I would like to to concatenate N files into one big file taking
 advantage of ZFS copy-on-write semantics so that the file
 concatenation is done without actually copying any (large amount of)
 file content.
   cat f1 f2 f3 f4 f5  f15
 Is this already possible when source and target are on the same ZFS
 filesystem?
 
 Am looking into the ZFS source code to understand if there are
 sufficient (private) interfaces to make a simple zcat -o f15   f1 f2
 f3 f4 f5 userland application in C code. Does anybody have advice on
 this?

There have been plenty of answers already.

Quite aside from dedup, the fact that all blocks in a file must have the
same uncompressed size means that if any of f2..f5 have different block
sizes from f1, or any of f1..f5's last blocks are partial then ZFS could
not perform this concatenation as efficiently as you wish.

In other words: dedup _is_ what you're looking for...

...but also ZFS most likely could not do any better with any other, more
specific non-dedup solution.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau

Per,

Per Baatrup schrieb:

Roland,

Clearly an extension of cp would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about cat in stead of cp is that in addition to copying a 
single file I would like also to concatenate several files into a single file. Can this be accomplished with 
your (z)cp?


No - zcp is a simpler case than what you proposed, and thats why
I pointed it out as a discussion case.  ( And it is clearly NOT
the same as 'ln'. )

Btw. I would be surprised to hear that this can be implemented
with current APIs;  you would need a call like (my fantasy here)
write_existing_block() where the data argument is not a pointer
to a buffer in memory but instead a reference to an already existing
data block in the pool. Based on such a call ( and a corresponding one
for read that returns those references in the pool ) IMHO an implementation
of the commands would be straight forward ( the actual work would be
in the implementation of those calls ).

This can certainly been done - I just doubt it already exists.

  -- Roland


--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
if any of f2..f5 have different block sizes from f1
This restriction does not sound so bad to me if this only refers to changes to 
the blocksize of a particular ZFS filesystem or copying between different ZFSes 
in the same pool. This can properly be managed with a -f switch on the 
userlan app to force the copy when it would fail.

any of f1..f5's last blocks are partial
Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS 
blocksize? This is a severe restriction that will fail unless in very special 
cases.
Is this related to the disk format or is it restriction in the implrmentation? 
(do you know where to look in the source code?).

...but also ZFS most likely could not do any better with any other, more
specific non-dedup solution
Properly lots of I/O traffic, digest calculation+lookups, could be saved as we 
already know it will be a duplicate.
(In our case the files are gigabyte sizes)

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Per Baatrup
Btw. I would be surprised to hear that this can be implemented
with current APIs;
I agree. However it looks like an opportunity to dive into the Z-source code.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Nicolas Williams
On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:
 if any of f2..f5 have different block sizes from f1
 
 This restriction does not sound so bad to me if this only refers to
 changes to the blocksize of a particular ZFS filesystem or copying
 between different ZFSes in the same pool. This can properly be managed
 with a -f switch on the userlan app to force the copy when it would
 fail.

Why expose such details?

If you have dedup on and if the file blocks and sizes align then

cat f1 f2 f3 f4 f5  f6

will do the right thing and consume only space for new metadata.

If the file blocks and sizes do not align then

cat f1 f2 f3 f4 f5  f6

will still work correctly.

Or do you mean that you want a way to do that cat ONLY if it would
consume no new space for data?  (That might actually be a good
justification for a ZFS cat command, though I think, too, that one could
script it.)

 any of f1..f5's last blocks are partial
 
 Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS
 blocksize? This is a severe restriction that will fail unless in very
 special cases.

Say f1 is 1MB, f2 is 128KB, f3 is 510 bytes, f4 is 514 bytes, and f5 is
10MB, and the recordsize for their containing datasets is 128KB, then
the new file will consume 10MB + 128KB more than f1..f5 did, but 1MB +
128KB will be de-duplicated.

This is not really a severe restriction.  To make ZFS do better than
that would require much extra metadata and complexity in the filesystem
that users who don't need to do space-efficient file concatenation (most
users, that is) won't want to pay for.

 Is this related to the disk format or is it restriction in the
 implrmentation? (do you know where to look in the source code?).

Both.

 ...but also ZFS most likely could not do any better with any other, more
 specific non-dedup solution
 
 Properly lots of I/O traffic, digest calculation+lookups, could be
 saved as we already know it will be a duplicate.  (In our case the
 files are gigabyte sizes)

ZFS hashes, and records hashes of blocks, not sub-blocks.  Look at my
above example.  To efficiently dedup the concatenation of the 10MB of f5
would require being able to have something like sub-block pointers.
Alternatively, if you want a concatenation-specific feature ZFS would
have to have a metadata notion of concatentation, but then the Unix way
of concatenating files couldn't be used for this since the necessary
context is lost in the I/O redirection.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Daniel Carosone
  Isn't this only true if the file sizes are such that the concatenated 
  blocks are perfectly aligned on the same zfs block boundaries they used 
  before?  This seems unlikely to me.
 
 Yes that would be the case.

While eagerly awaiting b128 to appear in IPS, I have been giving this issue 
(block size and alignment vs dedup) some thought recently.  I have a different, 
but sufficiently similar, scenario where the effectiveness of dedup will depend 
heavily on this factor.

For this case, though, the alignment question for short tails is relatively 
easily dealt with.  The key is that the record size of the file is up to 128k 
and may be shorter depending on various circumstances, such as the write 
pattern used.

To simplify, let us assume that the original files were all written quickly and 
sequentially, that is that they have n 128k blocks, plus a shorter tail.   When 
concatenating them, it should be sufficient to write out the target file in 
128k chunks from the source, then the first tail, then issue an fsync before 
moving on to the chunks from the second file.  

If the source files were not written in this pattern (e.g. log files, 
accumulating small varying-size writes), the best thing to do is to rewrite 
those in place as well, with the same pattern as being written to the joined 
file.  This can also have an improvement on compression efficiency, by allowing 
larger block sizes than the original.

Issues/questions:
 * This is an optimistic method of alignment, is there any mechanism to get 
stronger results - ie, to know the size of each record of the original, or to 
produce specific record size/alignment on output?
 * There's already the very useful seek interface for finding holes and data, 
perhaps something similar is useful here. Or a direct io related option to 
read, that can return short reads only up to the end of the current record?
 * Perhaps a pause of some kind (to wait for the txg to close) is also 
necessary, to ensure the tail doesn't get combined with new data and reblocked?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread A Darren Dunham
On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:
 any of f1..f5's last blocks are partial
 Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS
 blocksize? This is a severe restriction that will fail unless in very
 special cases.  Is this related to the disk format or is it
 restriction in the implrmentation? (do you know where to look in the
 source code?).

I'm sure it's related to the FS structure.  How do you find a particular
point in a file quickly?  You don't read up to that point, you want to
go to it directly.  To do so, you have to know how the file is indexed.
If every block contains the same amount of data, this is a simple math
equation.  If some blocks have more or less data, then you have to keep
track of them and their size.  I doubt ZFS has any space or ability to
include non-full blocks in the middle of a file.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Michael Schuster

Nicolas Williams wrote:

On Thu, Dec 03, 2009 at 12:44:16PM -0800, Per Baatrup wrote:

if any of f2..f5 have different block sizes from f1

This restriction does not sound so bad to me if this only refers to
changes to the blocksize of a particular ZFS filesystem or copying
between different ZFSes in the same pool. This can properly be managed
with a -f switch on the userlan app to force the copy when it would
fail.


Why expose such details?

If you have dedup on and if the file blocks and sizes align then

cat f1 f2 f3 f4 f5  f6

will do the right thing and consume only space for new metadata.


I think Per's concern was not only with space consumed but also the effort 
involved in the process (think large files); if I read his emails 
correctly, he'd like what amounts to manipulation of meta-data only to have 
the data blocks of what was originally 5 files to end up in one file; the 
traditional concat operation will cause all the data to be read and written 
back, at which point dedup will kick in, and so most of the processing has 
already been spent. (Per, please correct/comment)


Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss