date:20091203

>Btw. I would be surprised to hear that this can be implemented
>with current APIs;
I agree. However it looks like an opportunity to dive into the Z-source code.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

>if any of f2..f5 have different block sizes from f1
This restriction does not sound so bad to me if this only refers to changes to 
the blocksize of a particular ZFS filesystem or copying between different ZFSes 
in the same pool. This can properly be managed with a "-f" switch on the 
userlan app to force the copy when it would fail.

>any of f1..f5's last blocks are partial
Does this mean that f1,f2,f3,f4 needs to be exact multiplum of the ZFS 
blocksize? This is a severe restriction that will fail unless in very special 
cases.
Is this related to the disk format or is it restriction in the implrmentation? 
(do you know where to look in the source code?).

>...but also ZFS most likely could not do any better with any other, more
>specific non-dedup solution
Properly lots of I/O traffic, digest calculation+lookups, could be saved as we 
already know it will be a duplicate.
(In our case the files are gigabyte sizes)

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Petabytes on a budget - blog

2009-12-03 Thread Trevor Pretty






Just thought I would let everybody know I saw one at a local ISP
yesterday. They hadn't started testing the metal had only arrived the
day before and they where waiting for the drives to arrive. They had
also changed the design to give it more network. I will try to find out
more as the customer progresses.


>Interesting blog:
>http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
>






-- 



Trevor Pretty 
| Technical Account Manager
|
T: +64 9 639 0652 |
M: +64 21 666 161

Eagle Technology Group Ltd. 
Gate D, Alexandra Park, Greenlane West, Epsom

Private Bag 93211, Parnell, Auckland




www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] EON ZFS Storage 0.59.5 based on snv 125 released!

2009-12-03 Thread Andre Lue

Embedded Operating system/Networking (EON), RAM based live ZFS NAS appliance is 
released on Genunix! Many thanks to Al Hopper and Genunix.org for download 
hosting and serving the opensolaris community.

EON ZFS storage is available in a 32/64-bit CIFS and Samba versions:
tryitEON 64-bit x86 CIFS ISO image version 0.59.5 based on snv_125

* eon-0.595-125-64-cifs.iso
* MD5: a21c0b6111803f95c29e421af96ee016
* Size: ~90Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 Samba ISO image version 0.59.5 based on snv_125

* eon-0.595-125-64-smb.iso
* MD5: 4678298f0152439867d218987c3ec20e
* Size: ~103Mb
* Released: Thursday 3-December-2009

tryitEON 32-bit x86 CIFS ISO image version 0.59.5 based on snv_125

* eon-0.595-125-32-cifs.iso
* MD5: 4b76893c3363d46fad34bf7d0c23548c
* Size: ~57Mb
* Released: Thursday 3-December-2009

tryitEON 32-bit x86 Samba ISO image version 0.59.5 based on snv_125

* eon-0.595-125-32-smb.iso
* MD5: f478a8ea9228f16dc1bd93adae03d200
* Size: ~70Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 CIFS ISO image version 0.59.5 based on snv_125 (NO HTTP)

* eon-0.595-125-64-cifs-min.iso
* MD5: c7b9ec5c487302c1aa97363eb440fe00
* Size: ~85Mb
* Released: Thursday 3-December-2009

tryitEON 64-bit x86 Samba ISO image version 0.59.5 based on snv_125 (NO HTTP)

* eon-0.595-125-64-smb-min.iso
* MD5: a33f34506f05070ffc554de7beaafd4d
* Size: ~98Mb
* Released: Thursday 3-December-2009

New/Changes/Fixes:
- removed iscsitgd and replaced it with COMSTAR (iscsit, stmf)
- added SUNWhd to image vs being in the binary kit.
- added rsync to image vs being in the binary kit.
- added nge, yge and yukonx drivers.
- added (/etc/inet/hosts, /etc/default/init) to /mnt/eon0/.backup (TIMEZONE and 
hostname change fix)
- fixed typo entry /mnt/eon0/.exec zpool -a to zpool import -a
- eon rebooting at grub(since snv_122) in ESXi, Fusion and various versions of 
VMware workstation. This is related to bug 6820576. Workaround, at grub press e 
and add on the end of the kernel line "-B disable-pcieb=true"
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau


Per,

Per Baatrup schrieb:

Roland,

Clearly an extension of "cp" would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about "cat" in stead of "cp" is that in addition to copying a 
single file I would like also to concatenate several files into a single file. Can this be accomplished with 
your "(z)cp"?


No - "zcp" is a simpler case than what you proposed, and thats why
I pointed it out as a discussion case.  ( And it is clearly NOT
the same as 'ln'. )

Btw. I would be surprised to hear that this can be implemented
with current APIs;  you would need a call like (my fantasy here)
"write_existing_block()" where the data argument is not a pointer
to a buffer in memory but instead a reference to an already existing
data block in the pool. Based on such a call ( and a corresponding one
for read that returns those references in the pool ) IMHO an implementation
of the commands would be straight forward ( the actual work would be
in the implementation of those calls ).

This can certainly been done - I just doubt it already exists.

  -- Roland


--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] possible mega_sas issue sol10u8 (Re: Workaround for mpt timeouts in snv_127)

2009-12-03 Thread Tru Huynh

follow up, another crash today.

On Mon, Nov 30, 2009 at 11:35:07AM +0100, Tru Huynh wrote:
> 1) OS
> SunOS xargos.bis.pasteur.fr 5.10 Generic_141445-09 i86pc i386 i86pc
> 
> it's only sharing though NFS v3 to linux clients running
> 20x CentOS-5 x86_64 2.6.18-164.6.1.el5 x86_64/i386
> 78x CentOS-3 x86_64/ia32e/i386
> 
> 2) usual logs:
>  /var/adm/messages
> -> nothing
still empty

> 
> 3) fmdump -ev
> /var/fm/fmd/errlog is empty
same

> 7) not tried yet
> reboot -d to force a dump
failed (not returned from sync)
reboot -dfn
failed at 98% of the dump (I could not catch the reason, screen blanked too 
fast)
> 
> 9) from the #irc channel, I will keep a screen running with:
[...@xargos ~]$ ps -ef
 UID   PID  PPID   CSTIME TTY TIME CMD
root 0 0   0   Nov 29 ?   3:16 sched
root 1 0   0   Nov 29 ?   0:00 /sbin/init
root 2 0   0   Nov 29 ?   0:00 pageout
root 3 0   0   Nov 29 ?  20:04 fsflush
root   154 1   0   Nov 29 ?   0:00 /usr/lib/picl/picld
root 7 1   0   Nov 29 ?   0:04 /lib/svc/bin/svc.startd
root 9 1   0   Nov 29 ?   0:08 /lib/svc/bin/svc.configd
  daemon   152 1   0   Nov 29 ?   0:03 /usr/lib/crypto/kcfd
 tru  2258  2226   0   Nov 30 pts/7   0:00 /usr/bin/bash
root   409   408   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root   142 1   0   Nov 29 ?   0:01 /usr/lib/sysevent/syseventd
root   429 1   0   Nov 29 ?   0:00 sh 
/opt/MegaRaidStorageManager/Framework/startup.sh
root57 1   0   Nov 29 ?   0:00 /sbin/dhcpagent
root64 1   0   Nov 29 ?   0:00 devfsadmd
root   208 1   0   Nov 29 ?   0:00 /lib/svc/method/iscsid
  daemon   306 1   0   Nov 29 ?   0:00 /usr/sbin/rpcbind
root   146 1   0   Nov 29 ?   0:12 /usr/sbin/nscd
root  2228  2226   0   Nov 30 pts/2   0:08 zpool iostat -v 60
root   332 7   0   Nov 29 ?   0:00 /usr/lib/saf/sac -t 300
root   145 1   0   Nov 29 ?   0:00 /usr/lib/power/powerd
root   226 1   0   Nov 29 ?   0:10 /usr/lib/inet/xntpd
root   394   332   0   Nov 29 ?   0:00 /usr/lib/saf/ttymon
root   262 1   0   Nov 29 ?   0:00 /usr/sbin/cron
root   366 1   0   Nov 29 ?   0:00 /usr/lib/utmpd
noaccess   673 1   0   Nov 29 ?   3:04 /usr/java/bin/java -server 
-Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
root   349 7   0   Nov 29 console 0:00 /usr/lib/saf/ttymon -g -d 
/dev/console -l console -m ldterm,ttcompat -h -p xarg
  daemon   315 1   0   Nov 29 ?   0:00 /usr/lib/nfs/statd
  daemon   317 1   0   Nov 29 ?   0:01 /usr/lib/nfs/nfsmapid
root   552 1   0   Nov 29 ?   0:01 /usr/sfw/sbin/snmpd
  daemon   324 1   0   Nov 29 ?   0:00 /usr/lib/nfs/lockd
root   431 1   0   Nov 29 ?   0:05 /usr/sbin/syslogd
 tru   695   689   0   Nov 29 pts/1   0:00 -bash
root   367 1   0   Nov 29 ?   0:00 /usr/lib/autofs/automountd
root   365 1   0   Nov 29 ?   0:02 /usr/lib/inet/inetd start
root   369   367   0   Nov 29 ?   0:01 /usr/lib/autofs/automountd
root   430   429   0   Nov 29 ?   3:26 ../jre/bin/java -classpath 
../jre/lib/rt.jar:../jre/lib/jsse.jar:../jre/lib/jce
root   408 1   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root   410   408   0   Nov 29 ?   0:00 /usr/sadm/lib/smc/bin/smcboot
root  2234  2226   0   Nov 30 pts/5   0:15 intrstat 60
 tru  2236  2226   0   Nov 30 pts/6   0:06 vmstat 60
 tru   689   688   0   Nov 29 ?   0:01 /usr/lib/ssh/sshd
root   594 1   0   Nov 29 ?   4:25 /usr/sbin/lsi_mrdsnmpagent 
-c /etc/sma/snmp/snmpd.conf
 tru  2232  2226   0   Nov 30 pts/4   0:13 prstat 60
 tru  2225   695   0   Nov 30 pts/1   0:01 screen
root   443 1   0   Nov 29 ?   0:00 /usr/lib/ssh/sshd
root   688   443   0   Nov 29 ?   0:00 /usr/lib/ssh/sshd
root   541 1   0   Nov 29 ?   0:03 /usr/lib/sendmail -bd -q15m 
-C /etc/mail/local.cf
   smmsp   537 1   0   Nov 29 ?   0:00 /usr/lib/sendmail -Ac -q15m
root  2565 1   0   Nov 30 ?   0:06 /usr/local/bin/mrmonitord
 tru  2226  2225   0   Nov 30 ?   0:05 screen
 tru  3988  3982   0 15:33:51 pts/11  0:00 prstat
root   498 1   0   Nov 29 ?   0:00 /usr/sbin/vold -f 
/etc/vold.conf
root   509 1   0   Nov 29 ?   0:04 /usr/lib/fm/fmd/fmd
 tru  2230  2226   0   Nov 30 pts/3   0:06 iostat -xn 60
 tru  3967  3966   0 15:33:36 ?   0:00 /usr/lib/ssh/sshd
root   522 1   0   Nov 29 ?   0:00 /usr/lib/nfs/mountd
  daemon   524 1   0   Nov 29 ?   9:45 /usr/lib/

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Nicolas Williams

On Thu, Dec 03, 2009 at 03:57:28AM -0800, Per Baatrup wrote:
> I would like to to concatenate N files into one big file taking
> advantage of ZFS copy-on-write semantics so that the file
> concatenation is done without actually copying any (large amount of)
> file content.
>   cat f1 f2 f3 f4 f5 > f15
> Is this already possible when source and target are on the same ZFS
> filesystem?
> 
> Am looking into the ZFS source code to understand if there are
> sufficient (private) interfaces to make a simple "zcat -o f15   f1 f2
> f3 f4 f5" userland application in C code. Does anybody have advice on
> this?

There have been plenty of answers already.

Quite aside from dedup, the fact that all blocks in a file must have the
same uncompressed size means that if any of f2..f5 have different block
sizes from f1, or any of f1..f5's last blocks are partial then ZFS could
not perform this concatenation as efficiently as you wish.

In other words: dedup _is_ what you're looking for...

...but also ZFS most likely could not do any better with any other, more
specific non-dedup solution.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] L2ARC re-uses new device if it is in the same "place"


Hi,


mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   test ONLINE   0 0 0
 /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors

lets add a cache device:

mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   testONLINE   0 0 0
 /rpool/tmp/f1 ONLINE   0 0 0
   cache
 /dev/zvol/dsk/rpool/tmp/ssd2  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp#

now lets export the pool, re-create the zvol and then import the pool again:


mi...@r600:/rpool/tmp# zpool export test
mi...@r600:/rpool/tmp# zfs destroy rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd2
mi...@r600:/rpool/tmp# zpool import -d /rpool/tmp/ test


mi...@r600:/rpool/tmp# zpool status test
 pool: test
state: ONLINE
scrub: none requested
config:

   NAMESTATE READ WRITE CKSUM
   testONLINE   0 0 0
 /rpool/tmp/f1 ONLINE   0 0 0
   cache
 /dev/zvol/dsk/rpool/tmp/ssd2  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp#


No complaint here...
I'm not entirely sure that it should behave that way - in some 
circumstances it could be risky.
For example what if zvol/ssd/disk which is used on one server as a cache 
device has the same path on another server and then a pool is imported 
there? Would l2arc just blindly start using it as a cache device and 
overwriting some other data?


Shouldn't l2arc devices have a label/signature or at least use uuid of a 
disk and during import be checked if it is the same device? Or maybe it 
does and there is some other issue here with re-creating zvol...


btw: x86, snv_127

--
Robert Milkowski
http://milek.blogspot.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC in clusters


Robert Milkowski wrote:

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now 
lets assume one zfs pool would be created on top of a lun exported 
from 2540. Now 4x local SSDs could be added as L2ARC but because they 
are not visible on a 2nd node when cluster does failover it should be 
able to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it 
doesn't have to be shared between node. SLOG would be a whole 
different story and generally it wouldn't be possible. But L2ARC 
should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
node-1-ssd4

node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
node-2-ssd4



This is assuming that pool can be imported when some of its slog 
devices are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.


Actually it looks like it already works like that!
A pool imports with its cache device unavailable just fine.
Then I added another cache device. And I can still import it with the 
first one available but not the 2nd one.


zpool status complains of course but other than that it seems to be 
working fine.


Any thought?


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread A Darren Dunham

On Thu, Dec 03, 2009 at 09:36:23AM -0800, Per Baatrup wrote:

> The reason I was speaking about "cat" in stead of "cp" is that in
> addition to copying a single file I would like also to concatenate
> several files into a single file. Can this be accomplished with your
> "(z)cp"?

Unless you have special data formats, I think it's unlikely that the
last ZFS block in the file will be exactly full.  But to append without
copying, you'd need some way of ignoring a portion of the data in a
non-final ZFS block and stitching together the bytestream.  I don't
think that's possible with the ZFS layout.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC in clusters


Robert Milkowski wrote:

Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now 
lets assume one zfs pool would be created on top of a lun exported 
from 2540. Now 4x local SSDs could be added as L2ARC but because they 
are not visible on a 2nd node when cluster does failover it should be 
able to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it 
doesn't have to be shared between node. SLOG would be a whole 
different story and generally it wouldn't be possible. But L2ARC 
should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 
node-1-ssd4

node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 
node-2-ssd4



This is assuming that pool can be imported when some of its slog 
devices are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.



btw:


mi...@r600:/rpool/tmp# mkfile 200m f1
mi...@r600:/rpool/tmp# mkfile 100m s1
mi...@r600:/rpool/tmp# zpool create test /rpool/tmp/f1
mi...@r600:/rpool/tmp# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
cannot add to 'test': cache device must be a disk or disk slice
mi...@r600:/rpool/tmp#


is there a reason why a cache device can't be set-up on a file like 
for other vdevs?




mi...@r600:/rpool/tmp# zfs create -V 100m rpool/tmp/ssd1
mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/rdsk/rpool/tmp/ssd1
cannot use '/dev/zvol/rdsk/rpool/tmp/ssd1': must be a block device or 
regular file

mi...@r600:/rpool/tmp# zpool add test cache /dev/zvol/dsk/rpool/tmp/ssd1
mi...@r600:/rpool/tmp#


So when I try to add a cache device on-top of a file I get an error that 
a cache device must be a disk or a disk slice, so when I try to add a 
cache device on a rdsk I get an error that it bust be a block device or 
regular file which suggest a regular file should work... (dsk works fine).


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] L2ARC in clusters


Robert Milkowski wrote:

Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives 
as L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster 
with 1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
assume one zfs pool would be created on top of a lun exported from 
2540. Now 4x local SSDs could be added as L2ARC but because they are 
not visible on a 2nd node when cluster does failover it should be able 
to pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it doesn't 
have to be shared between node. SLOG would be a whole different story 
and generally it wouldn't be possible. But L2ARC should be.





Perhaps a scenario like below should be allowed:

node-1# zpool add mysql cache node-1-ssd1 node-1-ssd2 node1-ssd3 node-1-ssd4
node-1# zpool export mysql
node-2# zpool import mysql
node-2# zpool add mysql cache node-2-ssd1 node-2-ssd2 node2-ssd3 node-2-ssd4


This is assuming that pool can be imported when some of its slog devices 
are not accessible.
That way the pool always would have some L2ARC/SSDs not accessible but 
would provide L2ARC cache on each node with local SSDs.



btw:


mi...@r600:/rpool/tmp# mkfile 200m f1
mi...@r600:/rpool/tmp# mkfile 100m s1
mi...@r600:/rpool/tmp# zpool create test /rpool/tmp/f1
mi...@r600:/rpool/tmp# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  /rpool/tmp/f1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:/rpool/tmp# zpool add test cache /rpool/tmp/s1
cannot add to 'test': cache device must be a disk or disk slice
mi...@r600:/rpool/tmp#


is there a reason why a cache device can't be set-up on a file like for 
other vdevs?


--

Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] L2ARC in clusters


Hi,

When deploying ZFS in cluster environment it would be nice to be able 
to have some SSDs as local drives (not on SAN) and when pool switches 
over to the other node zfs would pick up the node's local disk drives as 
L2ARC.


To better clarify what I mean lets assume there is a 2-node cluster with 
1sx 2540 disk array.
Now lets put 4x SSDs in each node (as internal/local drives). Now lets 
assume one zfs pool would be created on top of a lun exported from 2540. 
Now 4x local SSDs could be added as L2ARC but because they are not 
visible on a 2nd node when cluster does failover it should be able to 
pick up the ssd's which are local to the other node.


L2ARC doesn't contain any data which is critical to pool so it doesn't 
have to be shared between node. SLOG would be a whole different story 
and generally it wouldn't be possible. But L2ARC should be.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

Roland,

Clearly an extension of "cp" would be very nice when managing large files.
Today we are relying heavily on snapshots for this, but this requires disipline 
on storing files in separate zfs'es avioding to snapshot too many files that 
changes frequently.

The reason I was speaking about "cat" in stead of "cp" is that in addition to 
copying a single file I would like also to concatenate several files into a 
single file. Can this be accomplished with your "(z)cp"?

--Per
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool import - device names not always updated?

2009-12-03 Thread Cindy Swearingen


Hi Ragnar,

A bug might exist but you are building a pool based on the ZFS
volumes that are created in another pool. This configuration
is not supported and possible deadlocks can occur.

If you can retry this example without building a pool on another
pool, like using files to create a pool and can reproduce this,
then please let me know.

Thanks,

Cindy

On 12/01/09 17:57, Ragnar Sundblad wrote:

It seems that device names aren't always updated when importing
pools if devices have moved. I am not sure if this is only an
cosmetic issue or if it could actually be a real problem -
could it lead to the device not being found at a later import?

/ragge

(This is on snv_127.)

I ran the following script:

#!/bin/bash

set -e
set -x

zfs create -V 1G rpool/vol1
zfs create -V 1G rpool/vol2
zpool create pool mirror /dev/zvol/dsk/rpool/vol1 /dev/zvol/dsk/rpool/vol2
zpool status pool
zpool export pool
zfs create rpool/subvol1
zfs create rpool/subvol2
zfs rename rpool/vol1 rpool/subvol1/vol1
zfs rename rpool/vol2 rpool/subvol2/vol2
zpool import -d /dev/zvol/dsk/rpool/subvol1
sleep 1
zpool import -d /dev/zvol/dsk/rpool/subvol2
sleep 1
zpool import -d /dev/zvol/dsk/rpool/subvol1 pool
zpool status pool


And got the output below. I have annotated it with ### remarks.


# bash zfs-test.bash
+ zfs create -V 1G rpool/vol1
+ zfs create -V 1G rpool/vol2
+ zpool create pool mirror /dev/zvol/dsk/rpool/vol1 /dev/zvol/dsk/rpool/vol2
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol2  ONLINE   0 0 0

errors: No known data errors
+ zpool export pool
+ zfs create rpool/subvol1
+ zfs create rpool/subvol2
+ zfs rename rpool/vol1 rpool/subvol1/vol1
+ zfs rename rpool/vol2 rpool/subvol2/vol2
+ zpool import -d /dev/zvol/dsk/rpool/subvol1
  pool: pool
id: 13941781561414544058
 state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-2Q
config:

pool  DEGRADED
  mirror-0DEGRADED
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE
/dev/zvol/dsk/rpool/vol2  UNAVAIL  cannot open
### Note that it can't find vol2 - which is expected.
+ sleep 1
### The sleep here seems to be necessary for vol1 to magically be
### found in the next zpool import.
+ zpool import -d /dev/zvol/dsk/rpool/subvol2
  pool: pool
id: 13941781561414544058
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

pool  ONLINE
  mirror-0ONLINE
/dev/zvol/dsk/rpool/vol1  ONLINE
/dev/zvol/dsk/rpool/subvol2/vol2  ONLINE
### Note that it says vol1 is ONLINE, under it's old path, though it actually 
has moved
+ sleep 1
+ zpool import -d /dev/zvol/dsk/rpool/subvol1 pool
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/vol2  ONLINE   0 0 0

errors: No known data errors
### Note that vol2 has it old path shown!



### Interestingly, if you then
+ zpool export pool
+ zpool import -d /dev/zvol/dsk/rpool/subvol2 pool
### vol2's path gets updated too:
+ zpool status pool
  pool: pool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
pool  ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol1/vol1  ONLINE   0 0 0
/dev/zvol/dsk/rpool/subvol2/vol2  ONLINE   0 0 0

errors: No known data errors



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZIL corrupt, not recoverable even with logfix

2009-12-03 Thread Anon Y Mous

Was the zpool originally created by a FreeBSD operating system or by an 
OpenSolaris operating system? If so, what version of FreeBSD, SXCE, OpenSolaris 
Indiana was it originally created by? The reason I'm asking this is because 
there are different versions of ZFS in different versions of OpenSolaris, so if 
you take a newer version zpool and try to mount it in an older version 
OpenSolaris, it won't mount.

The last time I tried it a long time ago, ZFS in FreeBSD was pretty unstable 
and still under heavy development, which was the sole reason I migrated my 
storage server with my important data on it to OpenSolaris, and it has been 
rock solid stable since.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau


Michael,

michael schuster schrieb:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a "zcp":

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.


quite similar but with a critical difference:

with hard links any modifications through either link are
seen by both links, since it stays a single file (note that
editors like vi do an implicit cp, they do NOT update the
original file )

That "zcp" ( actually it should be just a feature of 'cp' )
would be blockwise copy-on-write. It would have exactly
the same semantics as cp but just avoid any data movement,
since we can easily predict what the effect of a cp followed
by a dedup should be.

  -- Roland




--

**
Roland Rambau Platform Technology Team
Principal Field Technologist  Global Systems Engineering
Phone: +49-89-46008-2520  Mobile:+49-172-84 58 129
Fax:   +49-89-46008-  mailto:roland.ram...@sun.com
**
Sitz der Gesellschaft: Sun Microsystems GmbH,
Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028;  Geschäftsführer:
Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates:   Martin Häring
*** UNIX * /bin/sh  FORTRAN **
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread Neil Perrin




On 12/03/09 09:21, mbr wrote:

Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is
a copy of the data still in the machines memory from where it can be 
used

to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the 
ZILlog SSD fails.


Mirrored log devices are advised to improve reliablity. As previously mentioned,
if during writing a log device fails or is temporarily full then we use the 
main pool
devices to chain the log blocks. If we get read errors when trying to replay the
intent log (after a crash/power fail) then the admin is given the option to 
ignore
the log and continue or somehow fix the device (eg re-attach) and then retry.
Multiple log devices would provide extra reliability here.
We do not look in memory for the log records if we can't get the records
from the log blocks.



If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the 
faulty ZILlog SSD but otherwise without probs and with a clean zpool.


The log records are not required for consistency of the pool (it's not a 
journal).



Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


Er that bug should probably be closed as a duplicate.
We now have this functionality.



Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Separate Zil on HDD ?


On Thu, 3 Dec 2009, mbr wrote:


Has the following error no consequences?

Bug ID 6538021
Synopsis   Need a way to force pool startup when zil cannot be replayed
State  3-Accepted (Yes, that is a problem)
Link 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021


I don't know the status of this but it does make sense to require the 
user to explicitly corrupt/lose data in the storage pool.  It could be 
that the log device is just temporarily missing and can be restored so 
zfs should not do this by default.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


On Thu, 3 Dec 2009, Jason King wrote:


Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source & target
are the same or different filesystems and acts accordingly.


The most common way that I concatenate files into a larger file is by 
using a utility such as 'tar', which outputs a different format.  I 
rarely use 'cat' to concatenate files.


It is desired to concatenate files in a way which works best for 
deduplication then a tar-like format can be invented which takes care 
to always start new file output on a filesystem block boundary.  With 
zfs deduplication this should be faster and take less space than 
compressing the entire result as long as the ouput is stored in the 
same pool.  If output is written to a destination filesystem which 
uses a different block size, then the ideal block size will be that of 
the destination filesystem so that large archive files can still be 
usefull deduplicated.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr


Hello,

Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of 
failure, is

a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, in 
which case it is used to support recovery.  The system memory is used as 
the write path in the normal case.  Once the data is written to the 
intent log, then the data is declared to be written as far as higher 
level applications are concerned.


thank you Bob for the clarification.
So I don't need a mirrored ZILlog for security reasons, all the information
is still in memory and will be used from there by default if only the ZILlog
SSD fails.

If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


I can live with the data loss as long as the machine comes up with the faulty
ZILlog SSD but otherwise without probs and with a clean zpool.

Has the following error no consequences?

 Bug ID 6538021
 Synopsis   Need a way to force pool startup when zil cannot be replayed
 State  3-Accepted (Yes, that is a problem)
 Link   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6538021

Michael.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Jason King

On Thu, Dec 3, 2009 at 9:58 AM, Bob Friesenhahn
 wrote:
> On Thu, 3 Dec 2009, Erik Ableson wrote:
>>
>> Much depends on the contents of the files. Fixed size binary blobs that
>> align nicely with 16/32/64k boundaries, or variable sized text files.
>
> Note that the default zfs block size is 128K and so that will therefore be
> the default dedup block size.
>
> Most files are less than 128K and occupy a short tail block so concatenating
> them will not usually enjoy the benefits of deduplication.
>
> It is not wise to riddle zfs with many special-purpose features since zfs
> would then be encumbered by these many features, which tend to defeat future
> improvements.

Well it could be done in a way such that it could be fs-agnostic
(perhaps extending /bin/cat with a new flag such as -o outputfile, or
detecting if stdout is a file vs tty, though corner cases might get
tricky).   If a particular fs supported such a feature, it could take
advantage of it, but if it didn't, it could fall back to doing a
read+append.  Sort of like how mv figures out if the source & target
are the same or different filesystems and acts accordingly.

There are a few use cases I've encountered where having this would
have been _very_ useful (usually when trying to get large crashdumps
to Sun quickly).  In general, it would allow one to manipulate very
large files by breaking them up into smaller subsets while still
having the end result be a single file.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs 
that align nicely with 16/32/64k boundaries, or variable sized text 
files.


Note that the default zfs block size is 128K and so that will therefore 
be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of deduplication.


Most ?  I think that is a bit of a sweeping statement.  In know of some 
environments where "most" files are multiple gigabytes in size and 
others where 1K is the upper bound of the file system.


So I don't think you can say at all that "Most" files are < 128K.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


On Thu, 3 Dec 2009, Erik Ableson wrote:


Much depends on the contents of the files. Fixed size binary blobs that align 
nicely with 16/32/64k boundaries, or variable sized text files.


Note that the default zfs block size is 128K and so that will 
therefore be the default dedup block size.


Most files are less than 128K and occupy a short tail block so 
concatenating them will not usually enjoy the benefits of 
deduplication.


It is not wise to riddle zfs with many special-purpose features since 
zfs would then be encumbered by these many features, which tend to 
defeat future improvements.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Seth


michael schuster wrote:

Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a "zcp":

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael

+1

More and more it sounds like an optimization that will either

A. not add much over dedup

or

B. have value only in specific situations - and completely misbehave in 
other situations (even the same situations after passage of time)


Why not just make a special-purpose application (completely user-land) 
for it? I know, 'ln' is remotely kin of this idea but, 'ln' is POSIX and 
people know what to expect.
What you'd practically need to do is whip up a vfs layer that exposes 
the underlying blocks of a filesystem and possibly name them by their 
SHA256 or MD5 hash. Then you'd need (another?) vfs abstraction that 
allows 'virtual' files to be assembled from these blocks in multiple 
independent chains.


I know there is already a fuse implementation of the first vfs driver 
(the name evades me, but I think it was something like chunkfs[1]) and 
one could at least whip up a reasonable read-only Proof-of-Concept of 
the second part.


The reason _I_ wouldn't do that is because, I'm already happy with e.g.:

   mkfifo /var/run/my_part_collector
   (while true; do cat /local/data/my_part_* > 
/var/run/my_part_collector; done)&

   wc -l /var/run/my_part_collector

The equivalent of this could be (better) expressed in C, perl or any 
language of your choice). I believe this is all POSIX.
  

[1] The reason this exists is obviously for backup and synchronization 
implementations: it will make it possible to backup files using rsync 
when the encryption key is not available to the backup process (with a 
EBC mode crypto algo); it should make it 'simple' to synchronize ones 
large monolythic files with e.g. Amazon S3 cloud storage etc. etc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster


Per Baatrup wrote:

Actually 'ln -s source target' would not be the same "zcp source target"
as writing to the source file after the operation would change the
target file as well where as for "zcp" this would only change the source
file due to copy-on-write semantics of ZFS.


I actually was thinking of creating a hard link (without the -s option), 
but your point is valid for hard and soft links.


cheers
Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Separate Zil on HDD ?


On Thu, 3 Dec 2009, mbr wrote:


What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?


The intent log SSD is used as 'write only' unless the system reboots, 
in which case it is used to support recovery.  The system memory is 
used as the write path in the normal case.  Once the data is written 
to the intent log, then the data is declared to be written as far as 
higher level applications are concerned.


If the intent log SSD fails and the system spontaneously reboots, then 
data may be lost.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

Actually 'ln -s source target' would not be the same "zcp source target" as 
writing to the source file after the operation would change the target file as 
well where as for "zcp" this would only change the source file due to 
copy-on-write semantics of ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


Bob Friesenhahn wrote:

On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they used 
before?  This seems unlikely to me.


Yes that would be the case.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread michael schuster


Roland Rambau wrote:

gang,

actually a simpler version of that idea would be a "zcp":

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data


I think they call it 'ln' ;-) and that even works on ufs.

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Roland Rambau


gang,

actually a simpler version of that idea would be a "zcp":

if I just cp a file, I know that all blocks of the new file
will be duplicates; so the cp could take full advantage for
the dedup without a need to check/read/write anz actual data

  -- Roland

Per Baatrup schrieb:

"dedup" operates on the block level leveraging the existing FFS checksums. Read 
"What to dedup: Files, blocks, or bytes" here http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

"zcat" was my acronym for a special ZFS aware version of "cat" and the name was 
obviously a big mistake as I did not know it was an existing command and simply 
forgot to check.

Should rename if to "zfscat" or something similar?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Michael Schuster


Per Baatrup wrote:

"dedup" operates on the block level leveraging the existing FFS
checksums. Read "What to dedup: Files, blocks, or bytes" here
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it
will generate duplicate files so data read and writes could be avoided
all together.


you'd probably be better off avoiding "zcat" - it's been in use since 
almost forever, from the man-page:


  zcat
 The zcat utility writes to standard output the  uncompressed
 form  of  files that have been compressed using compress. It
 is the equivalent  of  uncompress-c.  Input  files  are  not
 affected.

:-)

cheers
Michael
--
Michael Schusterhttp://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

"dedup" operates on the block level leveraging the existing FFS checksums. Read 
"What to dedup: Files, blocks, or bytes" here 
http://blogs.sun.com/bonwick/entry/zfs_dedup

The trick should be that the zcat userland app already knows that it will 
generate duplicate files so data read and writes could be avoided all together.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Erik Ableson

 On 3 déc. 2009, at 13:29, Bob Friesenhahn s> wrote:



On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will  
be made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the  
concatenated blocks are perfectly aligned on the same zfs block  
boundaries they used before?  This seems unlikely to me.


It's also worth noting that if the block alignment works out for the  
dedup, the actual write traffic will be trivial, consisting only of  
pointer references, so the heavy lifting will be the read operations.


Much depends on the contents of the files. Fixed size binary blobs  
that align nicely with 16/32/64k boundaries, or variable sized text  
files.


Cordialement,

Erik Ableson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-03 Thread mbr

Hello,

Edward Ned Harvey wrote:

Yes, I have SSD for ZIL. Just one SSD. 32G. But if this is the problem,
then you'll have the same poor performance on the local machine that you
have over NFS. So I'm curious to see if you have the same poor performance
locally. The ZIL does not need to be reliable; if it fails, the ZIL will
begin writing to the main storage, and performance will suffer until the new
SSD is put into production.

I am also planning to install a SSD as ZILlog. Is it really true that there
are no problems if the ZILlog fails and there is no mirror of the ZILlog?

What about the data that were on the ZILlog SSD at the time of failure, is
a copy of the data still in the machines memory from where it can be used
to put the transaction to the stable storage pool?

What if the machine reboots after the SSD has failed?
The ZFS Best Practices Guide commends to mirror the log:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations

Mirroring the log device is highly recommended.
Protecting the log device by mirroring will allow you to access the storage
pool even if a log device has failed. Failure of the log device may cause the
storage pool to be inaccessible if you are running the Solaris Nevada release
prior to build 96 and a release prior to the Solaris 10 10/09 release.
For more information, see CR 6707530.

http://bugs.opensolaris.org/view_bug.do?bug_id=6707530

No probs with that if I use Sol10U8?

Regrads,
Michael.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


On Thu, 3 Dec 2009, Darren J Moffat wrote:


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be made up 
of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.


Isn't this only true if the file sizes are such that the concatenated 
blocks are perfectly aligned on the same zfs block boundaries they 
used before?  This seems unlikely to me.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


Peter Tribble wrote:

On Thu, Dec 3, 2009 at 12:08 PM, Darren J Moffat
 wrote:

Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage
of ZFS copy-on-write semantics so that the file concatenation is done
without actually copying any (large amount of) file content.
 cat f1 f2 f3 f4 f5 > f15
Is this already possible when source and target are on the same ZFS
filesystem?

Am looking into the ZFS source code to understand if there are sufficient
(private) interfaces to make a simple "zcat -o f15   f1 f2 f3 f4 f5"
userland application in C code. Does anybody have advice on this?

The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be made
up of blocks that match the blocks of f1 f2 f3 f4 f5.


Is that likely to happen? dedup is at the block level, so the blocks
in f2 will only
match the same data in f15 if they're aligned, which is only going to happen if
f1 ends on a block boundary.


Correct you will only get the maximum benefit if the source files are 
ending on a block boundary.  Which is why I said "likely deduplication".



Besides, you still have to read all the data off the disk, manipulate
it, and write it all back.


Yep.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write

2009-12-03 Thread Peter Tribble

On Thu, Dec 3, 2009 at 12:08 PM, Darren J Moffat
 wrote:
> Per Baatrup wrote:
>>
>> I would like to to concatenate N files into one big file taking advantage
>> of ZFS copy-on-write semantics so that the file concatenation is done
>> without actually copying any (large amount of) file content.
>>  cat f1 f2 f3 f4 f5 > f15
>> Is this already possible when source and target are on the same ZFS
>> filesystem?
>>
>> Am looking into the ZFS source code to understand if there are sufficient
>> (private) interfaces to make a simple "zcat -o f15   f1 f2 f3 f4 f5"
>> userland application in C code. Does anybody have advice on this?
>
> The answer to this is likely deduplication which ZFS now has.
>
> The reason dedup should help here is that after the 'cat' f15 will be made
> up of blocks that match the blocks of f1 f2 f3 f4 f5.

Is that likely to happen? dedup is at the block level, so the blocks
in f2 will only
match the same data in f15 if they're aligned, which is only going to happen if
f1 ends on a block boundary.

Besides, you still have to read all the data off the disk, manipulate
it, and write
it all back.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] file concatenation with ZFS copy-on-write


Per Baatrup wrote:

I would like to to concatenate N files into one big file taking advantage of 
ZFS copy-on-write semantics so that the file concatenation is done without 
actually copying any (large amount of) file content.
  cat f1 f2 f3 f4 f5 > f15
Is this already possible when source and target are on the same ZFS filesystem?

Am looking into the ZFS source code to understand if there are sufficient (private) 
interfaces to make a simple "zcat -o f15   f1 f2 f3 f4 f5" userland application 
in C code. Does anybody have advice on this?


The answer to this is likely deduplication which ZFS now has.

The reason dedup should help here is that after the 'cat' f15 will be 
made up of blocks that match the blocks of f1 f2 f3 f4 f5.


Copy-on-write isn't what helps you here it is dedup.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] file concatenation with ZFS copy-on-write