Re: [zfs-discuss] ZFS compression on Clearcase

2010-02-05 Thread Darren J Moffat

On 4 Feb 2010, at 16:35, Bob Friesenhahn wrote:


 On Thu, 4 Feb 2010, Darren J Moffat wrote:

 Thanks - IBM basically haven't test clearcase with ZFS compression 
therefore, they don't support currently. Future may change, as such my customer cannot 
use compression. I have asked IBM for roadmap info to find whether/when it will be 
supported.


 That is FUD generation in my opinion and being overly cautious.  The whole 
point of the POSIX interfaces to a filesystem is that applications don't actually 
care how the filesystem stores their data.


 Clearcase itself implements a versioning filesystem so perhaps it is not 
being overly cautious.  Compression could change aspects such as how free space is 
reported.

I'd also like to echo Bob's observations here. Darren's FUDFUD is

 based on limited experience of ClearCase, I expect ...

I do know how ClearCase works and it works *above* the POSIX layer in 
ZFS - at the VFS layer (and higher).  [I've debugged Solaris crash dumps 
with the clear case kernel modules loaded in them in the past].


By FUD I don't mean it is wrong, but without information about a bug or 
observed undesirable behaviour it is coming across as Fear that there 
could be problems.  Basically we need more data.


What I was pointing out is that because of the layer that ClearCase 
works there should be no problems - I'm not saying there aren't any just 
that I don't see where they would be.


If there are problems with ZFS then bugs should be logged, leaving 
statements like ISV x doesn't support using feature f of ZFS is harm 
full to the ISV's product and to ZFS when there is no bug logged or data 
about why there is a problem.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unionfs help

2010-02-05 Thread Joerg Schilling
Nicolas Williams nicolas.willi...@sun.com wrote:

 There's no unionfs for Solaris.

 (For those of you who don't know, unionfs is a BSDism and is a
 pseudo-filesystem which presents the union of two underlying
 filesystems, but with all changes being made only to one of the two
 filesystems.  The idea is that one of the underlying filesystems cannot
 be modified through the union, with all changes made through the union

...and it seems that the ideas for this FS have been taken from TFS 
(Translucent file system) that appeared in Sun OS in 1986 already.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread grarpamp
Are the sha256/fletcher[x]/etc checksums sent to the receiver along
with the other data/metadata? And checked upon receipt of course.
Do they chain all the way back to the uberblock or to some calculated
transfer specific checksum value?
The idea is to carry through the integrity checks wherever possible.
Whether done as close as within the same zpool, or miles away.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to get a list of changed files between two snapshots?

2010-02-05 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/03/2010 04:35 PM, Andrey Kuzmin wrote:
 At zfs_send level there are no files, just DMU objects (modified in
 some txg which is the basis for changed/unchanged decision).

Would be awesome if zfs send would have an option to show files
changed (with offset), and mode/directory changes (showing the before 
after data).

As is, zfs send is nice but you require ZFS in both sides. I would
love a rsync-like tool that could avoid to scan 20 millions of files
just to find a couple of small changes (or none at all).

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBS2wc2Zlgi5GaxT1NAQKtRgP/dVBF8xfGPRRcq5tpKBQTW7C1aCiHzMhV
0Sxu2lWY7Fcl7+se5O2YINYYVFWF7dA+Rh0yr2dAQDNTbe0CfwRxt3BKjS+nsjvH
GFW7cBOD+Zg7tt3nrVaYf7fg86ZssR9rTDj56fRycdA2rzfpnIgjP0bYoZczo6Lx
9DdiopUHaec=
=RkVb
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to get a list of changed files between two snapshots?

2010-02-05 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/04/2010 05:10 AM, Matthew Ahrens wrote:
 This is RFE 6425091 want 'zfs diff' to list files that have changed
 between snapshots, which covers both file  directory changes, and file
 removal/creation/renaming.  We actually have a prototype of zfs diff.
 Hopefully someday we will finish it up...

Can't wait! :-))

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBS2we+5lgi5GaxT1NAQJbzQP9FuwJAFNP+7m+kIHG0Tx4ksDUwrD8g+UD
8dYSjsymNANml1St39vlLUyG9czz2jt/9HR+fw6ERc4lJI+omlZx9eUMy6f3nVyP
GcPpReVE5yMoDUZuhWJwu2fJLvcxzQl6yTSN/J+CVKGeIAJeR6TDWV6Z7UbxmgRA
Oc/qN9f70hg=
=H9sA
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Keeping resilverscrubbing time persistently

2010-02-05 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

When a scrub/resilver finishes, you see the date and time in zpool
status. But this information doesn't persist across reboots.

Would be nice being able to see the date and time it took to scrub the
pool, even if you reboot your machine :).

PS: I am talking about Solaris 10 U8.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBS2wgAZlgi5GaxT1NAQIFYQQAiLuQilN1BiqxlQv9P/94fIy2BUg+YnSx
Liknb7kaM7YOayZUsTm7a8whG+wfQ5yNIjLAXQ0/pMbVNPZHP5eYKGt42USPIyIV
t8no7s33cAlqTIW/JcZ2JqLEkTQ4EJ5vFigFWnEcV7CzQo8b4xiUK3jaV2FfN1zb
QE1IKlYu52Q=
=fItU
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool disk replacing fails

2010-02-05 Thread Mark J Musante

On Fri, 5 Feb 2010, Alexander M. Stetsenko wrote:



  NAMESTATE READ WRITE CKSUM
  mypool  DEGRADED 0 0 0
mirrorDEGRADED 0 0 0
  c1t4d0  DEGRADED 0 028  too many errors
  c1t5d0  ONLINE   0 0 0


I think your best bet is to do 'zpool detach mypool c1d4d0' followed by a 
'zpool attach mypool c1t5d0 c1t4d0'.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Keeping resilverscrubbing time persistently

2010-02-05 Thread Peter Schow
On Fri, Feb 05, 2010 at 02:41:35PM +0100, Jesus Cea wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 When a scrub/resilver finishes, you see the date and time in zpool
 status. But this information doesn't persist across reboots.
 
 Would be nice being able to see the date and time it took to scrub the
 pool, even if you reboot your machine :).
 
 PS: I am talking about Solaris 10 U8.

This is likely (RFE):

   6878281  zpool should store the time of last scrub/resilver and 
other zpool status info in pool properties

   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6878281
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Marty Scholes
 Was my raidz2 performance comment above correct?
  That the write speed is that of the slowest disk?
  That is what I believe I have
 read.

 You are
 sort-of-correct that its the write speed of the
 slowest disk.

My experience is not in line with that statement.  RAIDZ will write a complete 
stripe plus parity (RAIDZ2 - two parities, etc.).  The write speed of the 
entire stripe will be brought down to that of the slowest disk, but only for 
its portion of the stripe.  In the case of a 5 spindle RAIDZ2, 1/3 of the 
stripe will be written to each of three disks and parity info on the other two 
disks.  The throughput would be 3x the slowest disk for read or write.

 Mirrored drives will be faster, especially for
 random I/O. But you sacrifice storage for that
 performance boost.

Is that really true?  Even after glancing at the code, I don't know if zfs 
overlaps mirror reads across devices.  Watching my rpool mirror leads me to 
believe that it does not.  If true, then mirror reads would be no faster than a 
single disk.  Mirror writes are no faster than the slowest disk.

As a somewhat related rant, there seems to be confusion about mirror IOPS vs. 
RAIDZ[123] IOPS.  Assuming mirror reads are not overlapped, then a mirror vdev 
will read and write at roughly the same throughput and IOPS as a single disk 
(ignoring bus and cpu constraints).

Also ignoring bus and cpu constraints, a RAIDZ[123] vdev will read and write at 
roughly the same throughput of a single disk, multiplied by the number of data 
drives: three in the config being discussed.  Also, a RAIDZ[123] vdev will have 
IOPS performance similar to that of a single disk.

A stack of mirror vdevs will, of course, perform much better than a single 
mirror vdev in terms of throughput and IOPS.

A stack of RAIDZ[123] vdevs will also perform much better than a single 
RAIDZ[123] vdev in terms of throughput and IOPS.

RAIDZ tends to have more CPU overhead and provides more flexibility in choosing 
the optimal data to redundancy ratio.

Many read IOPS problems can be mitigated by L2ARC, even a set of small, fast 
disk drives.  Many write IOPS problems can be mitigated by ZIL.

My anecdotal conclusions backed by zero science,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Robert Milkowski

On 05/02/2010 04:11, Edward Ned Harvey wrote:

Data in raidz2 is striped so that it is split across multiple disks.
 

Partial truth.
Yes, the data is on more than one disk, but it's a parity hash, requiring
computation overhead and a write operation on each and every disk.  It's not
simply striped.  Whenever you read or write, you need to access all the
disks (or a bunch of 'em) and use compute cycles to generate the actual data
stream.  I don't know enough about the underlying methods of calculating and
distributing everything to say intelligently *why*, but I know this:

   


Well, that's not entirely true. When reading from raidz2 (non-degraded) 
you don't need to re-compute any hashes except for a standard fs block 
checksum which zfs checks regardless of underlying redundancy.




In this (sequential) sense it is faster than a single disk.
 

Whenever I benchmark raid5 versus a mirror, the mirror is always faster.
Noticeably and measurably faster, as in 50% to 4x faster.  (50% for a single
disk mirror versus a 6-disk raid5, and 4x faster for a stripe of mirrors, 6
disks with the capacity of 3, versus a 6-disk raid5.)  Granted, I'm talking
about raid5 and not raidz.  There is possibly a difference there, but I
don't think so.

   

Actually, there is.
One difference is that when writing to a raid-z{1|2} pool compared to 
raid-10 pool you should get better throughput if at least 4 drives are 
used. Basically it is due to the fact that in RAID-10 the maximum you 
can get in terms of write throughput is a total aggregated throughput of 
half the number of used disks and only assuming there are no other 
bottlenecks between the OS and disks especially as you need to take into 
account that you are double the bandwidth requirements due to mirroring. 
In case of RAID-Zn you have some extra overhead for writing additional 
checksum but other than that you should get a write throughput closer to 
of T-N (where N is a RAID-Z level) instead of T/2 in RAID-10.


See 
http://milek.blogspot.com/2006/04/software-raid-5-faster-tha_114588672235104990.html



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recover ZFS Array after OS Crash?

2010-02-05 Thread J
Hi all,

I'm building a whole new server system for my employer, and I really want to 
use OpenSolaris as the OS for the new file server.  One thing is keeping me 
back, though: is it possible to recover a ZFS Raid Array after the OS crashes?  
I've spent hours with Google to avail

To be more descriptive, I plan to have a Raid 1 array for the OS, and then I 
will need 3 additional Raid5/RaidZ/etc arrays for data archiving, backups and 
other purposes.  There is plenty of documentation on how to recover an array if 
one of the drives in the array fails, but what if the OS crashes?  Since ZFS is 
a software-based RAID, if the OS crashes is it even possible to recover any of 
the arrays?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Bob Friesenhahn

On Fri, 5 Feb 2010, Rob Logan wrote:


well, lets look at Intel's offerings... Ram is faster than AMD's
at 1333Mhz DDR3 and one gets ECC and thermal sensor for $10 over non-ECC


Intel's RAM is faster because it needs to be.  It is wise to see the 
role that architecture plays in total performance.



Now, this gets one to 8G ECC easily...AMD's unfair advantage is all those
ram slots on their multi-die MBs... A slow AMD cpu with 64G ram
might be better depending on your working set / dedup requirements.


With the AMD CPU, the memory will run cooler and be cheaper. 
Regardless, for zfs, memory is more important than raw CPU 
performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover ZFS Array after OS Crash?

2010-02-05 Thread A Darren Dunham
On Fri, Feb 05, 2010 at 08:35:15AM -0800, J wrote:
 To be more descriptive, I plan to have a Raid 1 array for the OS, and
 then I will need 3 additional Raid5/RaidZ/etc arrays for data
 archiving, backups and other purposes.  There is plenty of
 documentation on how to recover an array if one of the drives in the
 array fails, but what if the OS crashes?  Since ZFS is a
 software-based RAID, if the OS crashes is it even possible to recover
 any of the arrays?

Sure, because the ZFS configuration is stored within the pool, not in
the OS.

Just install a new OS, attach the disks, and do a 'zfs import' to find
the importable pools.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Autoreplace property not accounted ?

2010-02-05 Thread Francois

Hi list,

I've a strange behaviour with autoreplace property. It is set to off by 
default, ok. I want to manually manage disk replacement so default off 
matches my need.


# zpool get autoreplace mypool
NAME   PROPERTY VALUESOURCE
mypool  autoreplace  off  default

Then I added 2 spare disks.

spares
  c1t18d0  AVAIL
  c1t19d0  AVAIL

Ok, fine.

Then I had failures with 1 disk of the pool and can see in logs the 
following :



DESC: The number of I/O errors associated with a ZFS device exceeded
acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more 
information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An 
attempt will be made to activate a hot spare if available.

---

This is where my problem occurs , zfs automatically replaced faulted 
disk by a spare ! even with autoreplace=off


# zpool status
  pool: mypool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
 scrub: resilver completed after 0h0m with 0 errors on Thu Feb  4 
00:10:25 2010

config:

NAME   STATE READ WRITE CKSUM
mypool DEGRADED 0 0 0
  mirror   ONLINE   0 0 0
c0t2d0 ONLINE   0 0 0
c0t3d0 ONLINE   0 0 0
c0t4d0 ONLINE   0 0 0
c0t5d0 ONLINE   0 0 0
  mirror   DEGRADED 0 0 0
c0t6d0 ONLINE   0 0 0
c0t7d0 ONLINE   0 0 0
spare  DEGRADED 4 0 0
  c1t8d0   FAULTED  326 0  too many errors
  c1t18d0  ONLINE   0 0 4  56K resilvered
c1t9d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
cache
  c2d0 ONLINE   0 0 0
  c3d0 ONLINE   0 0 0
spares
  c1t18d0  INUSE currently in use
  c1t19d0  AVAIL

errors: No known data errors


Any idea why it has been done automatically ?

solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4


Thx for your answers.

--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Rob Logan


 if zfs overlaps mirror reads across devices.

it does... I have one very old disk in this mirror and
when I attach another element one can see more reads going
to the faster disks... this past isn't right after the attach
but since the reboot, but one can still see the reads are
load balanced depending on the response of elements
in the vdev.

13 % zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rpool   7.01G   142G  0  0  1.60K  1.44K
  mirror7.01G   142G  0  0  1.60K  1.44K
c9t1d0s0  -  -  0  0674  1.46K
c9t2d0s0  -  -  0  0687  1.46K
c9t3d0s0  -  -  0  0720  1.46K
c9t4d0s0  -  -  0  0750  1.46K


but I also support your conclusions.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Autoreplace property not accounted ?

2010-02-05 Thread Cindy Swearingen

Hi Francois,

The autoreplace property works independently of the spare
feature.

Spares are activated automatically when a device in the main
pool fails.

Thanks,

Cindy

On 02/05/10 09:43, Francois wrote:

Hi list,

I've a strange behaviour with autoreplace property. It is set to off by 
default, ok. I want to manually manage disk replacement so default off 
matches my need.


# zpool get autoreplace mypool
NAME   PROPERTY VALUESOURCE
mypool  autoreplace  off  default

Then I added 2 spare disks.

spares
  c1t18d0  AVAIL
  c1t19d0  AVAIL

Ok, fine.

Then I had failures with 1 disk of the pool and can see in logs the 
following :



DESC: The number of I/O errors associated with a ZFS device exceeded
acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more 
information.
AUTO-RESPONSE: The device has been offlined and marked as faulted.  An 
attempt will be made to activate a hot spare if available.

---

This is where my problem occurs , zfs automatically replaced faulted 
disk by a spare ! even with autoreplace=off


# zpool status
  pool: mypool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
 scrub: resilver completed after 0h0m with 0 errors on Thu Feb  4 
00:10:25 2010

config:

NAME   STATE READ WRITE CKSUM
mypool DEGRADED 0 0 0
  mirror   ONLINE   0 0 0
c0t2d0 ONLINE   0 0 0
c0t3d0 ONLINE   0 0 0
c0t4d0 ONLINE   0 0 0
c0t5d0 ONLINE   0 0 0
  mirror   DEGRADED 0 0 0
c0t6d0 ONLINE   0 0 0
c0t7d0 ONLINE   0 0 0
spare  DEGRADED 4 0 0
  c1t8d0   FAULTED  326 0  too many errors
  c1t18d0  ONLINE   0 0 4  56K resilvered
c1t9d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
cache
  c2d0 ONLINE   0 0 0
  c3d0 ONLINE   0 0 0
spares
  c1t18d0  INUSE currently in use
  c1t19d0  AVAIL

errors: No known data errors


Any idea why it has been done automatically ?

solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4


Thx for your answers.

--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Autoreplace property not accounted ?

2010-02-05 Thread Tim Cook
On Fri, Feb 5, 2010 at 12:11 PM, Cindy Swearingen
cindy.swearin...@sun.comwrote:

 Hi Francois,

 The autoreplace property works independently of the spare
 feature.

 Spares are activated automatically when a device in the main
 pool fails.

 Thanks,

 Cindy


 On 02/05/10 09:43, Francois wrote:

 Hi list,

 I've a strange behaviour with autoreplace property. It is set to off by
 default, ok. I want to manually manage disk replacement so default off
 matches my need.

 # zpool get autoreplace mypool
 NAME   PROPERTY VALUESOURCE
 mypool  autoreplace  off  default

 Then I added 2 spare disks.

spares
  c1t18d0  AVAIL
  c1t19d0  AVAIL

 Ok, fine.

 Then I had failures with 1 disk of the pool and can see in logs the
 following :

 
 DESC: The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more
 information.
 AUTO-RESPONSE: The device has been offlined and marked as faulted.  An
 attempt will be made to activate a hot spare if available.
 ---

 This is where my problem occurs , zfs automatically replaced faulted disk
 by a spare ! even with autoreplace=off

 # zpool status
  pool: mypool
  state: DEGRADED
 status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
 action: Replace the faulted device, or use 'zpool clear' to mark the
 device
repaired.
  scrub: resilver completed after 0h0m with 0 errors on Thu Feb  4 00:10:25
 2010
 config:

NAME   STATE READ WRITE CKSUM
mypool DEGRADED 0 0 0
  mirror   ONLINE   0 0 0
c0t2d0 ONLINE   0 0 0
c0t3d0 ONLINE   0 0 0
c0t4d0 ONLINE   0 0 0
c0t5d0 ONLINE   0 0 0
  mirror   DEGRADED 0 0 0
c0t6d0 ONLINE   0 0 0
c0t7d0 ONLINE   0 0 0
spare  DEGRADED 4 0 0
  c1t8d0   FAULTED  326 0  too many errors
  c1t18d0  ONLINE   0 0 4  56K resilvered
c1t9d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
cache
  c2d0 ONLINE   0 0 0
  c3d0 ONLINE   0 0 0
spares
  c1t18d0  INUSE currently in use
  c1t19d0  AVAIL

 errors: No known data errors


 Any idea why it has been done automatically ?

 solaris 10U8 Generic_141445-09 - zpool version 15 - zfs version 4


 Thx for your answers.

 --
 Francois
 ___




I think it might be helpful to explain exactly what that means.  I'll give
it a shot, feel free to correct my mistake(s).  Francois: when you have
autoreplace on, what that means is if you remove the bad drive, and stick in
a new one to replace it, it will automatically be added to the pool.  To do
what you're trying to do, you shouldn't have drives added as hot spares at
all.  If you want it to be a cold spare, put it in the system, and just
leave it unassigned.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover ZFS Array after OS Crash?

2010-02-05 Thread J
Ah, I see!
Simple, easy, and saves me hundreds on HW-based RAID controllers ^_^

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-05 Thread Miles Nordin
 pr == Peter Radig pe...@radig.de writes:
 ls == Lutz Schumann presa...@storageconcepts.de writes:

pr I was expecting a good performance from the X25-E, but was
pr really suprised that it is that good (only 1.7 times slower
pr than it takes with ZIL completely disabled). So I will use the
pr X25-E as ZIL device on my box and will not consider disabling
pr ZIL at all to improve NFS performance.

According to Lutz posting here ~2010-01-10, the X25-M may not actually
be functioning as a ZIL unless you disable its write cache with
'hdadm'.  He said he found normal hard drives respect cache flush
commands in stream, but Intel X25-M does not.  however both do respect
disabling the write cache.

ls r...@nexenta:/volumes# hdadm write_cache off c3t5

ls  c3t5 write_cache disabled

You might want to repeat his test with X25-E.  If the X25-E is also
dropping cache flush commands (it might!), you may be, compared to
disabling the ZIL, slowing down your pool for no reason, and making it
more fragile as well since an exported pool with a dead ZIL cannot be
imported.


pgpdmhmYd4Yxq.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-05 Thread Bob Friesenhahn

On Fri, 5 Feb 2010, Miles Nordin wrote:


   ls r...@nexenta:/volumes# hdadm write_cache off c3t5

   ls  c3t5 write_cache disabled

You might want to repeat his test with X25-E.  If the X25-E is also
dropping cache flush commands (it might!), you may be, compared to
disabling the ZIL, slowing down your pool for no reason, and making it
more fragile as well since an exported pool with a dead ZIL cannot be
imported.


Others have tested the X25-E and found that with its cache enabled, it 
does drop flushed writes, but is clearly not such a gaping chasm as 
the X25-M.  Some time has passed so there is the possibility that 
X25-E firmware has (or will) improve.  If Sun offers an X25-E based 
device for use as an slog, you can be sure that its has been qualified 
for this purpose, and may contain modified firmware.


The 'E' stands for Extreme and not Enterprise as some tend to 
believe.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-05 Thread Andrey Kuzmin
On Fri, Feb 5, 2010 at 10:55 PM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Fri, 5 Feb 2010, Miles Nordin wrote:

   ls r...@nexenta:/volumes# hdadm write_cache off c3t5

   ls  c3t5 write_cache disabled

 You might want to repeat his test with X25-E.  If the X25-E is also
 dropping cache flush commands (it might!), you may be, compared to
 disabling the ZIL, slowing down your pool for no reason, and making it
 more fragile as well since an exported pool with a dead ZIL cannot be
 imported.

 Others have tested the X25-E and found that with its cache enabled, it does
 drop flushed writes, but is clearly not such a gaping chasm as the X25-M.
  Some time has passed so there is the possibility that X25-E firmware has
 (or will) improve.  If Sun offers an X25-E based device for use as an slog,
 you can be sure that its has been qualified for this purpose, and may
 contain modified firmware.

 The 'E' stands for Extreme and not Enterprise as some tend to believe.

Exactly. It would be therefore very interesting to hear on performance
from anyone using (real) enterprise SSD (which now spells STEC) as
slog.

Regards,
Andrey


 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Miles Nordin
 b == Brian  broco...@vt.edu writes:

 b (4) Hold backups from windows machines, mac (time machine),
 b linux.

for time machine you will probably find yourself using COMSTAR and the
GlobalSAN iSCSI initiator because Time Machine does not seem willing
to work over NFS.  Otherwise, for Macs you should definitely use NFS,
and you should definitely use the automounter, and you should use it
with the 'net' option (let Mac OS pick were tou mount the fs) if you
have heirarchical mounts.

Anyway for time machine you cannot use NFS.  I'm using:

 * snv_130
 * globalSAN_4.0.0.197_BETA-20091110
 * Mac OS X 10.5.latest

and it seems to basically work for the last ~1month.  I've no reason
to believe these versions are special but suggest you get the BETA
globalsan and not the stable one.

for linux, if you mount Linux NFS filesystems from Solaris you need to
use '-o sec=sys' to avoid everything showing up as guest, due to a
weird corner case that I think eventually got fixed on one side or the
other but probably hasn't percolated through all the stable branches
yet.

If you mount Solaris NFS filesystems from Linux, you may want to use
'-o noacl' because Solaris NFS fabricates ACL's and feeds them to
Linux even when you haven't made any, leading to annoying '+' signs in
'ls -l' and sometimes weird, unnecessary permissions problems.  This
happens even with NFSv3. :( What's even stupider, busybox 'mount'
doesn't seem to support the noacl flag which cost me an extra couple
hours getting an NFS-rooted system to boot.  I like the idea of
smoothly transitioning to a more advanced permissions system, but IMHO
the whole mess just goes to show you, let people who've been mucking
about with Windows touch anything else in your codebase, and their
brains are so warped by the influence of that platform on their
thinking they make a ponderous mess of it and then chant ``this
shouldn't be happening'' over and over.

 b (5) Be an iSCSI target for several different Virtual Boxes.

I've been using plain statically-allocated (not dynamic) .VDI's on ZFS
filesystems.  I've not been using zvol's nor any iSCSI yet.  If you do
the latter two suggest comparing performance with the former
one---there are rumors of some cache flush knobs may need tuning.

Also in general when you yank the cord, the integrity of a physical
machine's filesystems is guaranteed, but the same is *not* true of a
virtual machine when its host's cord is yanked.  It's supposed to be
true when you force-virtual-powerdown the guest, but not when you yank
the host's cord, because of the same knobs were twisted to compromise
integrity for performance.  The compromise is probably the right one
provided you can work around it, by for example snapshotting the guest
so yuo can roll back if there's corruption, and keeping oft-changing
files that can't be rolled back outside the guest using either guest
serevices shared folders on Windows or NFS on Unix.

 b Function 4 will use compression and deduplication.  Function 5
 b will use deduplication.

I've not dared to use dedup yet.  In particular the DDT needs to fit
in RAM (or maybe L2ARC) to avoid performance degredations so severe
you may find yourself painted into a corner (ex., 'zfs delete' runs
1wk forcing you to give up, 'zfs send' non-deduped filesystems
elsewhere, destroy pool, restore from backup).  not sure sddt-vdev is
the best idea but that's discussed here:

 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566

What's missing to my view is a way to manage it: if overgrown DDT can,
in effect, trash the pool by making maintenance commands take forever,
then there's got to be a way to watch the size of the DDT, maybe even
cap it and disable dedup if it overgrows.  That said I haven't tried
it so I'm talking out my ass.

Also gzip compression does not sound like it works well---suggest lzjb
instead---but this might be fixed in 6586537, 6806882, or by this fix
which sounds like a fairly big deal:

 http://arc.opensolaris.org/caselog/PSARC/2009/615/mail

so I would say gzip may be worth another try now but definitely be
ready to fall back to lzjb and convert with zfs send | zfs recv.

anyway...seems many things are really improving drastically since a
year ago, and thank god for the list!


pgpCAgC0MCdCz.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover ZFS Array after OS Crash?

2010-02-05 Thread Toby Thain


On 5-Feb-10, at 11:35 AM, J wrote:


Hi all,

I'm building a whole new server system for my employer, and I  
really want to use OpenSolaris as the OS for the new file server.   
One thing is keeping me back, though: is it possible to recover a  
ZFS Raid Array after the OS crashes?  I've spent hours with Google  
to avail


To be more descriptive, I plan to have a Raid 1 array for the OS,  
and then I will need 3 additional Raid5/RaidZ/etc arrays for data  
archiving, backups and other purposes.  There is plenty of  
documentation on how to recover an array if one of the drives in  
the array fails, but what if the OS crashes?  Since ZFS is a  
software-based RAID, if the OS crashes is it even possible to  
recover any of the arrays?



Being a software system it is inherently more recoverable than  
hardware RAID (the latter is probably only going to be readable on  
exactly the same configuration, and if the constellations are aligned  
just right, and the black rooster has crowed four times, etc).


As Darren says, you can simply take either or both sides of the  
mirror and boot or access the pool on another ZFS-capable system.


It doesn't even have to use the same interfaces; last week I built a  
new Solaris 10 web server and migrated pool data from one half of a  
ZFS pool from the old server, connected by USB/SATA adapter. This  
kind of flexibility (not to mention data integrity) just isn't there  
with HW RAID.


--Toby


--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-05 Thread Ray Van Dolson
On Fri, Feb 05, 2010 at 11:55:12AM -0800, Bob Friesenhahn wrote:
 On Fri, 5 Feb 2010, Miles Nordin wrote:
 
 ls r...@nexenta:/volumes# hdadm write_cache off c3t5
 
 ls  c3t5 write_cache disabled
 
  You might want to repeat his test with X25-E.  If the X25-E is also
  dropping cache flush commands (it might!), you may be, compared to
  disabling the ZIL, slowing down your pool for no reason, and making it
  more fragile as well since an exported pool with a dead ZIL cannot be
  imported.
 
 Others have tested the X25-E and found that with its cache enabled, it 
 does drop flushed writes, but is clearly not such a gaping chasm as 
 the X25-M.  Some time has passed so there is the possibility that 
 X25-E firmware has (or will) improve.  If Sun offers an X25-E based 
 device for use as an slog, you can be sure that its has been qualified 
 for this purpose, and may contain modified firmware.
 
 The 'E' stands for Extreme and not Enterprise as some tend to 
 believe.
 

I missed out on this thread.  How would these dropped flushed writes
manifest themselves?  Something in the logs, or just worsened
performance?

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Identifying firmware version of SATA controller (LSI)

2010-02-05 Thread Ray Van Dolson
Trying to track down why our two Intel X-25E's are spewing out
Write/Retryable errors when being used as a ZIL (mirrored).  The
system is running a LSI1068E controller with LSISASx36 expander
(box built by Silicon Mechanics).

The drives are fairly new, and it seems odd that both of the pair would
start showing errors at the same time

I'm trying to figure out where I can find the firmware on the LSI
controller... are the bootup messages the only place I could expect to
see this?  prtconf and prtdiag both don't appear to give firmware
information.

We have another nearly identical box that isn't showing these errors
which is why I want to compare firmware versions... the boot logs on
the good server have been rotated out so I can't find a Firmware
number for the mpt0 device in its logs to compare with.

Solaris 10 U8 x86.

Thanks,
Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Brandon High
On Fri, Feb 5, 2010 at 12:20 PM, Miles Nordin car...@ivy.net wrote:
 for time machine you will probably find yourself using COMSTAR and the
 GlobalSAN iSCSI initiator because Time Machine does not seem willing
 to work over NFS.  Otherwise, for Macs you should definitely use NFS,

Slightly off-topic ...

You can make Time Machine work with CIFS or NFS mounts by setting a
system preference.

The command is:
defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1

I've had some success trying to get my father-in-law's system to back
up to a drobo with this. It was working last time I was by his house,
but I'm not sure if it's still working.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread c.hanover
Two things, mostly related, that I'm trying to find answers to for our security 
team.

Does this scenario make sense:
* Create a filesystem at /users/nfsshare1, user uses it for a while, asks for 
the filesystem to be deleted
* New user asks for a filesystem and is given /users/nfsshare2.  What are the 
chances that they could use some tool or other to read unallocated blocks to 
view the previous user's data?

Related to that, when files are deleted on a ZFS volume over an NFS share, how 
are they wiped out?  Are they zeroed or anything.  Same question for destroying 
ZFS filesystems, does the data lay about in any way?  (That's largely answered 
by the first scenario.)

If the data is retrievable in any way, is there a way to a) securely destroy a 
filesystem, or b) securely erase empty space on a filesystem.

I know in some sense those questions don't apply in the way they would to, say, 
ext3, since a filesystem doesn't have a block until a file is written.

Sorry if these questions aren't worded well.  I've been in meetings for the 
last couple hours.

-
Cameron Hanover
chano...@umich.edu

Chaos was the law of nature.  Order was the dream of man.
--Henry Brooks Adams





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying firmware version of SATA controller (LSI)

2010-02-05 Thread Marion Hakanson
rvandol...@esri.com said:
 I'm trying to figure out where I can find the firmware on the LSI
 controller... are the bootup messages the only place I could expect to see
 this?  prtconf and prtdiag both don't appear to give firmware information. 
 . . .
 Solaris 10 U8 x86.

The raidctl command is your friend;  Useful for updating firmware
if you choose to do so, as well.  You can also find the revisions in
the output of prtconf -Dv, search for firm in the long list.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Frank Cusack

On 2/5/10 3:49 PM -0500 c.hanover wrote:

Two things, mostly related, that I'm trying to find answers to for our
security team.

Does this scenario make sense:
* Create a filesystem at /users/nfsshare1, user uses it for a while, asks
for the filesystem to be deleted * New user asks for a filesystem and is
given /users/nfsshare2.  What are the chances that they could use some
tool or other to read unallocated blocks to view the previous user's data?


Over NFS?  none.


Related to that, when files are deleted on a ZFS volume over an NFS
share, how are they wiped out?  Are they zeroed or anything.  Same
question for destroying ZFS filesystems, does the data lay about in any
way?  (That's largely answered by the first scenario.)


In both cases the data is still on disk.


If the data is retrievable in any way, is there a way to a) securely
destroy a filesystem, or b) securely erase empty space on a filesystem.


Someone else will have to answer that.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-05 Thread Miles Nordin
 rvd == Ray Van Dolson rvandol...@esri.com writes:
 ak == Andrey Kuzmin andrey.v.kuz...@gmail.com writes:

   rvd I missed out on this thread.  How would these dropped flushed
   rvd writes manifest themselves?  

presumably corrupted databases, lost mail, or strange NFS behavior
after the server reboots when the clients do not.  But the actual test
to which I referred is benchmark-like and didn't observe any of those
things.  If you read my post I gave you Lutz's name and the date he
posted and also linked to the msgid in my message's header, so go read
for yourself!

A good point, though, is that drives with lying write caches are still
okay if your box reboots because of a kernel panic, just not if it
loses power, so they're not worthless.

ak performance from anyone using (real) enterprise SSD (which now
ak spells STEC) as slog.

I wonder how ACARD would do also since it is 1/5th the cost, or if
Seagate Pulsar will behave correctly.  STEC coming in at more
expensive than DRAM is like a sucker-premium you pay because no one
else has their act together.  And according to the test Lutz did the
X25-M (and probably also -E?) are okay so long as you disable the
write cache, though you have to do it at every boot, and 'hdadm' is
not bundled.

It would also be nice to convince anandtech and friends to yank power
cords, too, to confirm that write flushes issued in their tests are
actually obeyed, and to redo the io/s test with write cache disabled
if the device lies, so that we actually have comparable numbers.  If
they would do that, the $ value of a supercap would become obvious.


pgp2YcU6ajqw3.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Miles Nordin
 ch == c hanover chano...@umich.edu writes:

ch is there a way to a) securely destroy a filesystem,

AIUI zfs crypto will include this, some day, by forgetting the key.

but for SSD, zfs above a zvol, or zfs above a SAN that may do
snapshots without your consent, I think it's just logically not a
solveable problem, period, unless you have a writeable keystore
outside the vdev structure.


pgpLXQTl8372Y.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Nicolas Williams
On Fri, Feb 05, 2010 at 03:49:15PM -0500, c.hanover wrote:
 Two things, mostly related, that I'm trying to find answers to for our
 security team.
 
 Does this scenario make sense:
 * Create a filesystem at /users/nfsshare1, user uses it for a while,
 asks for the filesystem to be deleted
 * New user asks for a filesystem and is given /users/nfsshare2.  What
 are the chances that they could use some tool or other to read
 unallocated blocks to view the previous user's data?

If the tool isn't accessing the raw disks, then the answer is no
chance.  (There's no way to access the raw disks over NFS.)

 Related to that, when files are deleted on a ZFS volume over an NFS
 share, how are they wiped out?  Are they zeroed or anything.  Same
 question for destroying ZFS filesystems, does the data lay about in
 any way?  (That's largely answered by the first scenario.)

Deleting a file does not guarantee that data blocks are released:
snapshots might exist that retain references to the data blocks of a
file that is being deleted.  Nor are blocks wiped when released.

 If the data is retrievable in any way, is there a way to a) securely
 destroy a filesystem, or b) securely erase empty space on a
 filesystem.

When ZFS crypto ships you'll be able to securely destroy encrypted
datasets.  Until then the only form of secure erasure is to destroy the
pool and then wipe the individual disks.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Nicolas Williams
On Fri, Feb 05, 2010 at 04:41:08PM -0500, Miles Nordin wrote:
  ch == c hanover chano...@umich.edu writes:
 
 ch is there a way to a) securely destroy a filesystem,
 
 AIUI zfs crypto will include this, some day, by forgetting the key.

Right.

 but for SSD, zfs above a zvol, or zfs above a SAN that may do
 snapshots without your consent, I think it's just logically not a
 solveable problem, period, unless you have a writeable keystore
 outside the vdev structure.

IIIRC ZFS crypto will store encrypted blocks in L2ARC and ZIL, so
forgetting the key is sufficient to obtain a high degree of security.

ZFS crypto over zvols and what not presents no additional problems.
However, if your passphrase is guessable then the key might be
recoverable even after it's forgotten.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread c.hanover
In our particular case, there won't be snapshots of destroyed filesystems (I 
create the snapshots, and destroy them with the filesystem).
I'm not too sure on the particulars of NFS/ZFS, but would it be possible to 
create a 1GB file without writing any data to it, and then use a hex editor to 
access the data stored on those blocks previously?  Any chance someone could 
make any kind of sense of the contents (allocated in the same order they were 
before, or what have you)?

ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for over the 
wire encryption.  Until then, not much point.

-
Cameron Hanover
chano...@umich.edu

Our integrity sells for so little, but it is all we really have. It is the 
very last inch of us, but within that inch, we are free.
--Valerie (V for Vendetta)

On Feb 5, 2010, at 4:36 PM, Nicolas Williams wrote:

 On Fri, Feb 05, 2010 at 03:49:15PM -0500, c.hanover wrote:
 Two things, mostly related, that I'm trying to find answers to for our
 security team.
 
 Does this scenario make sense:
 * Create a filesystem at /users/nfsshare1, user uses it for a while,
 asks for the filesystem to be deleted
 * New user asks for a filesystem and is given /users/nfsshare2.  What
 are the chances that they could use some tool or other to read
 unallocated blocks to view the previous user's data?
 
 If the tool isn't accessing the raw disks, then the answer is no
 chance.  (There's no way to access the raw disks over NFS.)
 
 Related to that, when files are deleted on a ZFS volume over an NFS
 share, how are they wiped out?  Are they zeroed or anything.  Same
 question for destroying ZFS filesystems, does the data lay about in
 any way?  (That's largely answered by the first scenario.)
 
 Deleting a file does not guarantee that data blocks are released:
 snapshots might exist that retain references to the data blocks of a
 file that is being deleted.  Nor are blocks wiped when released.
 
 If the data is retrievable in any way, is there a way to a) securely
 destroy a filesystem, or b) securely erase empty space on a
 filesystem.
 
 When ZFS crypto ships you'll be able to securely destroy encrypted
 datasets.  Until then the only form of secure erasure is to destroy the
 pool and then wipe the individual disks.
 
 Nico
 -- 
 
 



smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hybrid storage ... thing

2010-02-05 Thread Brandon High
I saw this in /. and thought I'd point it out to this list. It appears
to act as a L2 cache for a single drive, in theory providing better
performance.

http://www.silverstonetek.com/products/p_contents.php?pno=HDDBOOSTarea

-B

-- 
Brandon High : bh...@freaks.com
Indecision is the key to flexibility.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Frank Cusack

On 2/5/10 5:08 PM -0500 c.hanover wrote:

 would it be possible to
create a 1GB file without writing any data to it, and then use a hex
editor to access the data stored on those blocks previously?


No, not over NFS and also not locally.  You'd be creating a sparse file,
which doesn't allocate space on disk for any filesystem (not just zfs).
So when you read it back, you get back all 0s.  The only way to actually
allocate the space on disk is to write to it, and then of course you
read back the data you wrote, not what was previously there.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Nicolas Williams
On Fri, Feb 05, 2010 at 05:08:02PM -0500, c.hanover wrote:
 In our particular case, there won't be snapshots of destroyed
 filesystems (I create the snapshots, and destroy them with the
 filesystem).

OK.

 I'm not too sure on the particulars of NFS/ZFS, but would it be
 possible to create a 1GB file without writing any data to it, and then
 use a hex editor to access the data stored on those blocks previously?

Absolutely not.

That is, you can create a 1GB file without writing to it, but it will
appear to contain all zeros.

 Any chance someone could make any kind of sense of the contents
 (allocated in the same order they were before, or what have you)?

No.  See above.

 ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for
 over the wire encryption.  Until then, not much point.

You can use NFS with krb5 over the wire encryption _now_.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread c.hanover
On Feb 5, 2010, at 5:19 PM, Nicolas Williams wrote:

 ZFS crypto will be nice when we get either NFSv4 or NFSv3 w/krb5 for
 over the wire encryption.  Until then, not much point.
 
 You can use NFS with krb5 over the wire encryption _now_.
 
 Nico
 -- 

I know, that's just something I'm working out the particulars of before we 
decide if/when we want to offer it in production.  I've got it working to some 
extent now.

-
Cameron Hanover
chano...@umich.edu

Tact is for people who aren't witty enough to be sarcastic.




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hybrid storage ... thing

2010-02-05 Thread Adam Leventhal
 I saw this in /. and thought I'd point it out to this list. It appears
 to act as a L2 cache for a single drive, in theory providing better
 performance.
 
 http://www.silverstonetek.com/products/p_contents.php?pno=HDDBOOSTarea

It's a neat device, but the notion of a hybrid drive is nothing new. As
with any block-based caching, this device has no notion of the semantic
meaning of a given block so there's only so much intelligence it can bring
to bear on the problem.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread Richard Elling
On Feb 5, 2010, at 3:11 AM, grarpamp wrote:
 Are the sha256/fletcher[x]/etc checksums sent to the receiver along
 with the other data/metadata?

No. Checksums are made on the records, and there could be a different
record size for the sending and receiving file systems. The stream itself
is checksummed with fletcher4.

 And checked upon receipt of course.

Of course.

 Do they chain all the way back to the uberblock or to some calculated
 transfer specific checksum value?

I suppose one could say a calculated transfer fletcher4 checksum value.

 The idea is to carry through the integrity checks wherever possible.
 Whether done as close as within the same zpool, or miles away.

yes.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Ross Walker

On Feb 5, 2010, at 10:49 AM, Robert Milkowski mi...@task.gda.pl wrote:


Actually, there is.
One difference is that when writing to a raid-z{1|2} pool compared  
to raid-10 pool you should get better throughput if at least 4  
drives are used. Basically it is due to the fact that in RAID-10 the  
maximum you can get in terms of write throughput is a total  
aggregated throughput of half the number of used disks and only  
assuming there are no other bottlenecks between the OS and disks  
especially as you need to take into account that you are double the  
bandwidth requirements due to mirroring. In case of RAID-Zn you have  
some extra overhead for writing additional checksum but other than  
that you should get a write throughput closer to of T-N (where N is  
a RAID-Z level) instead of T/2 in RAID-10.


That hasn't been my experience with raidz. I get a max read and write  
IOPS of the slowest drive in the vdev.


Which makes sense because each write spans all drives and each read  
spans all drives (except the parity drives) so they end up having the  
performance characteristics of a single drive.


Now if you have enough drives you can create multiple raidz vdevs to  
get the IOPS up, but you need a lot more drives then what multiple  
mirror vdevs can provide IOPS wise with the same amount of spindles.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-05 Thread Frank Cusack

You might also want to note that with traditional filesystems, the
'shred' utility will securely erase data, but no tools like that
will work for zfs.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread grarpamp
  No. Checksums are made on the records, and there could be a different
  record size for the sending and receiving file systems.

Oh. So there's a zfs read to ram somewhere, which checks the sums on disk.
And then entirely new stream checksums are made while sending it all off
to the pipe.

I see the bit about different zfs block sizes perhaps preventing use of the
actual on disk checksums in the transfer itself... including thereby, the
chain to uberblock in the transfer. Thanks for that part.

 The stream itself is checksummed with fletcher4.
  I suppose one could say a calculated transfer fletcher4 checksum value.

Hmm, is that configurable? Say to match the checksums being
used on the filesystem itself... ie: sha256? It would seem odd to
send with less bits than what is used on disk.

 The idea is to carry through the integrity checks wherever possible.
 Whether done as close as within the same zpool, or miles away.
  yes.

Was thinking that plaintext ethernet/wan and even some of the 'weaker'
ssl algorithms would be candidates to back with sha256 in a transfer.
Not really needed for a 'within the box only' unix pipe though.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread Richard Elling
On Feb 5, 2010, at 7:20 PM, grarpamp wrote:
 No. Checksums are made on the records, and there could be a different
 record size for the sending and receiving file systems.
 
 Oh. So there's a zfs read to ram somewhere, which checks the sums on disk.
 And then entirely new stream checksums are made while sending it all off
 to the pipe.
 
 I see the bit about different zfs block sizes perhaps preventing use of the
 actual on disk checksums in the transfer itself... including thereby, the
 chain to uberblock in the transfer. Thanks for that part.
 
 The stream itself is checksummed with fletcher4.
 I suppose one could say a calculated transfer fletcher4 checksum value.
 
 Hmm, is that configurable? Say to match the checksums being
 used on the filesystem itself... ie: sha256? It would seem odd to
 send with less bits than what is used on disk.

Do you expect the same errors in the pipe as you do on disk?

 The idea is to carry through the integrity checks wherever possible.
 Whether done as close as within the same zpool, or miles away.
 yes.
 
 Was thinking that plaintext ethernet/wan and even some of the 'weaker'
 ssl algorithms would be candidates to back with sha256 in a transfer.
 Not really needed for a 'within the box only' unix pipe though.

most folks use ssh.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread grarpamp
 Hmm, is that configurable? Say to match the checksums being
 used on the filesystem itself... ie: sha256? It would seem odd to
 send with less bits than what is used on disk.

 Was thinking that plaintext ethernet/wan and even some of the 'weaker'
 ssl algorithms

 Do you expect the same errors in the pipe as you do on disk?

Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk]
is assumed to handle data with integrity. So say netcat is used as transport,
zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv,
and your wire takes some undetected/uncorrected hits, and the hits also
happen to make it past fletcher4... it kindof nullifies the SA's choice/thought
that sha256 would be used throughout all zfs operations.

I din't see notation in the man page that checksums are indeed used
in send/recv operations...

In any case, at least something is used over the bare wire :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Rob Logan

 Intel's RAM is faster because it needs to be.
I'm confused how AMD's dual channel, two way interleaved 
128-bit DDR2-667 into an on-cpu controller is faster than
Intel's Lynnfield dual channel, Rank and Channel interleaved 
DDR3-1333 into an on-cpu controller. 
http://www.anandtech.com/printarticle.aspx?i=3634

 With the AMD CPU, the memory will run cooler and be cheaper. 
cooler yes, but only $2 more per gig for 2x bandwidth?

http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050
http://www.newegg.com/Product/Product.aspx?Item=N82E16820134652

and if one uses all 16 slots, that 667Mhz simm runs at 533Mhz
with AMD. The same is true for Lynnfield if one uses Registered
DDR3, one only gets 800Mhz with all 6 slots. (single or dual rank)

 Regardless, for zfs, memory is more important than raw CPU 
agreed! but everything must be balanced.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread Richard Elling
On Feb 5, 2010, at 8:09 PM, grarpamp wrote:

 Hmm, is that configurable? Say to match the checksums being
 used on the filesystem itself... ie: sha256? It would seem odd to
 send with less bits than what is used on disk.
 
 Was thinking that plaintext ethernet/wan and even some of the 'weaker'
 ssl algorithms
 
 Do you expect the same errors in the pipe as you do on disk?
 
 Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk]
 is assumed to handle data with integrity. So say netcat is used as transport,
 zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv,
 and your wire takes some undetected/uncorrected hits, and the hits also
 happen to make it past fletcher4... it kindof nullifies the SA's 
 choice/thought
 that sha256 would be used throughout all zfs operations.

Hold it right there, fella.  SHA256 is not used for everything ZFS, so
expecting it to be so will set the stage for disappointment.  You can
set the data to be checksummed with SHA256.

 I din't see notation in the man page that checksums are indeed used
 in send/recv operations...

It is an implementation detail.  But if you can make the case for
why it is required to be inside the protocol, rather than its transport,
then please file an RFE.

 In any case, at least something is used over the bare wire :)

Lots of things are used on the bare wire and there are many
hops along the way. This is another good reason to use ssh, or
some other end-to-end verification mechanism. UNIX pipes are
a great invention! :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv checksum transmission

2010-02-05 Thread grarpamp
 Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk]
 is assumed to handle data with integrity. So say netcat is used as transport,
 zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv,
 and your wire takes some undetected/uncorrected hits, and the hits also
 happen to make it past fletcher4... it kindof nullifies the SA's 
 choice/thought
 that sha256 would be used throughout all zfs operations.

  Hold it right there, fella.  SHA256 is not used for everything ZFS,

Well, ok, and in my limited knowhow... zfs set checksum=sha256 only
covers user scribbled data [POSIX file metadata, file contents, directory
structure, ZVOL blocks] and not necessarily any zfs filesystem internals.

 You can set the data to be checksummed with SHA256.

Definitely, as indeed set above :)

 I din't see notation in the man page that checksums are indeed used
 in send/recv operations...

  It is an implementation detail.  But if you can make the case for
  why it is required to be inside the protocol, rather than its transport,
  then please file an RFE.

The case had to have been previously made to include fletcher4 in the
zfs send/recv protocol. So sha256 would just be an update to the user's
options. Similar to how f4 was an available on disk update to f2, z3 to z2
to z1, etc.

Was really only looking to see what, if anything, was currently used in
the protocol, not actually proposing an update. Now I know :)

Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon

 In any case, at least something is used over the bare wire :)
 UNIX pipes are a great invention! :-)

Yeah, I suppose a pipe to ssh has enough bits to catch things these days.
Netcat might be different, ergo, at least f4 as already implemented.

debug1: kex: server-client aes128-ctr hmac-sha1 none
debug1: kex: client-server aes128-ctr hmac-sha1 none

Thanks.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss