Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Thommy M . Malmström
What operating system does it run?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] MPT issues strikes back

2010-04-29 Thread Bruno Sousa
Hi Mark,

I also had some SSD drives in this machine, but i have take them out but
the problem still occours...
Regarding the bug, well it seems to be related with usage of xVM , and
since i don't use maybe it will not make any difference to this
particular server...

Anyway, thanks for the tip, and i will try to understand what's wrong
with this machine.

Bruno

On 27-4-2010 16:41, Mark Ogden wrote:
 Bruno Sousa on Tue, Apr 27, 2010 at 09:16:08AM +0200 wrote:
   
 Hi all,

 Yet another story regarding mpt issues, and in order to make a long
 story short everytime that a Dell R710 running snv_134 logs the information
  scsi: [ID 107833 kern.warning] WARNING:
 /p...@0,0/pci8086,3...@4/pci1028,1...@0 (mpt0): , the system freezes and
 ony a hard-reset fixes the issue.

 Is there any sort of parameter to be used to minimize/avoid this issue?

 
 We had the same problem on a X4600, turned out to be a bad
 SSD and or connection at the location listed in the error message. 

 Since removing that drive, we have not encounted that issue. 

 You might want to look at

 http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=7acda35c626180d9cda7bd1df451?bug_id=6894775
  too.


 -Mark

   
 Machine specs :

 Dell R710, 16 GB memory, 2 Intel Quad-Core E5506
 SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris
 Dell Integrated SAS 6/i Controller ( mpt0 Firmware version v0.25.47.0
 (IR) ) with 2 disks attached without raid


 Thanks in advance,
 Bruno



 -- 
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
   



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrate ZFS volume to new pool

2010-04-29 Thread Jim Horng
 Why would you recommend a spare for raidz2 or raidz3?
  -- richard

Spare is to minimize the reconstruction time.  Because remember a vdev can not 
start resilvering until there is a spare disk available. And with disks as big 
as they are today, resilvering also take many hours.  I rather have the disk 
finished resilvering before I have the chance to replace the bad disk than to 
risk more disks fail before It had a chance to resilverize. 

This is especially important if the file system is not at a location with 24 
hours staff.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Cyril Plisko
2010/4/29 Thommy M. Malmström thommy.m.malmst...@gmail.com:
 What operating system does it run?

Nexenta I believe.


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Phil Harman
That screen shot looks very much like Nexenta 3.0 with a different  
branding. Elsewhere, The Register confirms it's OpenSolaris.


On 29 Apr 2010, at 07:35, Thommy M. Malmström thommy.m.malmst...@gmail.co 
m wrote:



What operating system does it run?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Bruno Sousa
Indeed the scrub seems to take too much resources from a live system.
For instance i have a server with 24 disks (SATA 1TB) serving as NFS
store to a linux machine holding user mailboxes. I have around 200
users, with maybe 30-40% of active users at the same time.
As soon as the scrub process kicks in, linux box starts to give messages
like  nfs server not available and the users start to complain that
the Outlook gives connection timeout. Again, as soon as the scrub
process stops everything comes to normal.
So for me, it's real issue the fact that the scrub takes so many
resources of the system, making it pretty much unusable. In my case i
did a *workaround, *where basically i have zfs send/receive from this
server to another server and the scrub process is now running on the
second server.
I don't know if this such a good idea, given the fact that i don't know
for sure if the scrub process in the secondary machine will be usefull
in case of data corruption...but so far so good , and it's probably
better than nothing.
I still remember before ZFS , that any good RAID controller would have a
background consistency check task, and such a task would be possible to
assign priority , like low, medium, high ...going back to ZFS what's
the possibility of getting this feature as well?


Just out as curiosity , the Sun OpenStorage appliances , or Nexenta
based ones, have any scrub task enabled by default ? I would like to get
some feedback from users that run ZFS appliances regarding the impact of
running a scrub on their appliances.


Bruno

On 28-4-2010 22:39, David Dyer-Bennet wrote:
 On Wed, April 28, 2010 10:16, Eric D. Mudama wrote:
   
 On Wed, Apr 28 at  1:34, Tonmaus wrote:
 
 Zfs scrub needs to access all written data on all
 disks and is usually
 disk-seek or disk I/O bound so it is difficult to
 keep it from hogging
 the disk resources.  A pool based on mirror devices
 will behave much
 more nicely while being scrubbed than one based on
 RAIDz2.
 
 Experience seconded entirely. I'd like to repeat that I think we
 need more efficient load balancing functions in order to keep
 housekeeping payload manageable. Detrimental side effects of scrub
 should not be a decision point for choosing certain hardware or
 redundancy concepts in my opinion.
   
 While there may be some possible optimizations, i'm sure everyone
 would love the random performance of mirror vdevs, combined with the
 redundancy of raidz3 and the space of a raidz1.  However, as in all
 systems, there are tradeoffs.
 
 The situations being mentioned are much worse than what seem reasonable
 tradeoffs to me.  Maybe that's because my intuition is misleading me about
 what's available.  But if the normal workload of a system uses 25% of its
 sustained IOPS, and a scrub is run at low priority, I'd like to think
 that during a scrub I'd see a little degradation in performance, and that
 the scrub would take 25% or so longer than it would on an idle system. 
 There's presumably some inefficiency, so the two loads don't just add
 perfectly; so maybe another 5% lost to that?  That's the big uncertainty. 
 I have a hard time believing in 20% lost to that.

 Do you think that's a reasonable outcome to hope for?  Do you think ZFS is
 close to meeting it?

 People with systems that live at 75% all day are obviously going to have
 more problems than people who live at 25%!

   


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Roy Sigurd Karlsbakk
I got this hint from Richard Elling, but haven't had time to test it much. 
Perhaps someone else could help? 

roy 

 Interesting. If you'd like to experiment, you can change the limit of the 
 number of scrub I/Os queued to each vdev. The default is 10, but that 
 is too close to the normal limit. You can see the current scrub limit via: 
 
 # echo zfs_scrub_limit/D | mdb -k 
 zfs_scrub_limit: 
 zfs_scrub_limit:10 
 
 you can change it with: 
 # echo zfs_scrub_limit/W0t2 | mdb -kw 
 zfs_scrub_limit:0xa = 0x2 
 
 # echo zfs_scrub_limit/D | mdb -k 
 zfs_scrub_limit: 
 zfs_scrub_limit:2 
 
 In theory, this should help your scenario, but I do not believe this has 
 been exhaustively tested in the lab. Hopefully, it will help. 
 -- richard 


- Bruno Sousa bso...@epinfante.com skrev: 


Indeed the scrub seems to take too much resources from a live system. 
For instance i have a server with 24 disks (SATA 1TB) serving as NFS store to a 
linux machine holding user mailboxes. I have around 200 users, with maybe 
30-40% of active users at the same time. 
As soon as the scrub process kicks in, linux box starts to give messages like  
nfs server not available and the users start to complain that the Outlook 
gives connection timeout. Again, as soon as the scrub process stops 
everything comes to normal. 
So for me, it's real issue the fact that the scrub takes so many resources of 
the system, making it pretty much unusable. In my case i did a workaround, 
where basically i have zfs send/receive from this server to another server and 
the scrub process is now running on the second server. 
I don't know if this such a good idea, given the fact that i don't know for 
sure if the scrub process in the secondary machine will be usefull in case of 
data corruption...but so far so good , and it's probably better than nothing. 
I still remember before ZFS , that any good RAID controller would have a 
background consistency check task, and such a task would be possible to assign 
priority , like low, medium, high ...going back to ZFS what's the possibility 
of getting this feature as well? 


Just out as curiosity , the Sun OpenStorage appliances , or Nexenta based ones, 
have any scrub task enabled by default ? I would like to get some feedback from 
users that run ZFS appliances regarding the impact of running a scrub on their 
appliances. 


Bruno 

On 28-4-2010 22:39, David Dyer-Bennet wrote: 

On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: 

On Wed, Apr 28 at  1:34, Tonmaus wrote: 



Zfs scrub needs to access all written data on all
disks and is usually
disk-seek or disk I/O bound so it is difficult to
keep it from hogging
the disk resources.  A pool based on mirror devices
will behave much
more nicely while being scrubbed than one based on
RAIDz2. Experience seconded entirely. I'd like to repeat that I think we
need more efficient load balancing functions in order to keep
housekeeping payload manageable. Detrimental side effects of scrub
should not be a decision point for choosing certain hardware or
redundancy concepts in my opinion. While there may be some possible 
optimizations, i'm sure everyone
would love the random performance of mirror vdevs, combined with the
redundancy of raidz3 and the space of a raidz1.  However, as in all
systems, there are tradeoffs. The situations being mentioned are much worse 
than what seem reasonable
tradeoffs to me.  Maybe that's because my intuition is misleading me about
what's available.  But if the normal workload of a system uses 25% of its
sustained IOPS, and a scrub is run at low priority, I'd like to think
that during a scrub I'd see a little degradation in performance, and that
the scrub would take 25% or so longer than it would on an idle system. 
There's presumably some inefficiency, so the two loads don't just add
perfectly; so maybe another 5% lost to that?  That's the big uncertainty. 
I have a hard time believing in 20% lost to that.

Do you think that's a reasonable outcome to hope for?  Do you think ZFS is
close to meeting it?

People with systems that live at 75% all day are obviously going to have
more problems than people who live at 25%! 

-- 
This message has been scanned for viruses and 
dangerous content by MailScanner , and is 
believed to be clean. 
___ 
zfs-discuss mailing list 
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore

2010-04-29 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Euan Thoms
 
 I'm looking for a way to backup my entire system, the rpool zfs pool to
 an external HDD so that it can be recovered in full if the internal HDD
 fails. Previously with Solaris 10 using UFS I would use ufsdump and
 ufsrestore, which worked so well, I was very confident with it. Now ZFS
 doesn't have an exact replacement of this so I need to find a best
 practice to replace it.
 
 I'm guessing that I can format the external HDD as a pool called
 'backup' and zfs send -R ... | zfs receive ... to it. What I'm not
 sure about is how to restore. Back in the days of UFS, I would boot of
 the Solaris 10 CD in single user mode command prompt, partition HDD
 with correct slices, format it, mount it and ufsrestore the entire
 filesystem. With zfs, I don't know what I'm doing. Can I just make a
 pool called rpool and zfs send/receive it back?

An excellent question.  One which many people would never bother to explore,
but important nonethenless.

I have not tested this, so I'll encourage testing it and coming back to say
how it went:

I would install solaris or opensolaris just as you did the first time.  That
way, the bootloader, partition tables, etc, are all configured for you
automatically.  (Just restoring the filesystem is not enough.)

Then I'd boot from the CD, and zfs send | zfs receive, from the external
backup disk to the actual rpool.  Thus replacing the entire filesystem.

You should test this, because I am only like 90% certain it will work.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Robert Milkowski

On 28/04/2010 21:39, David Dyer-Bennet wrote:


The situations being mentioned are much worse than what seem reasonable
tradeoffs to me.  Maybe that's because my intuition is misleading me about
what's available.  But if the normal workload of a system uses 25% of its
sustained IOPS, and a scrub is run at low priority, I'd like to think
that during a scrub I'd see a little degradation in performance, and that
the scrub would take 25% or so longer than it would on an idle system.
There's presumably some inefficiency, so the two loads don't just add
perfectly; so maybe another 5% lost to that?  That's the big uncertainty.
I have a hard time believing in 20% lost to that.

   


Well, it's not that easy as there are many other factors you need to 
take into account.
For example how many IOs are you allowing to be queued per device? This 
might affect a latency for your application.


Or if you have a disk array with its own cache - just by doing scrub you 
might be pushing other entries in a cache out which might impact the 
performance of your application.


Then there might be SAN and
and so on.

I'm not saying there is no room for improvement here. All I'm saying is 
that it is not as easy problem as it seems.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Robert Milkowski

On 29/04/2010 07:57, Phil Harman wrote:
That screen shot looks very much like Nexenta 3.0 with a different 
branding. Elsewhere, The Register confirms it's OpenSolaris.




Well it looks like it is running Nexenta which is based on Open Solaris.
But it is not the Open Solaris *distribution*.

--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren
On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes:

 I got this hint from Richard Elling, but haven't had time to test it much. 
 Perhaps someone else could help? 
 
 roy 
 
  Interesting. If you'd like to experiment, you can change the limit of the 
  number of scrub I/Os queued to each vdev. The default is 10, but that 
  is too close to the normal limit. You can see the current scrub limit via: 
  
  # echo zfs_scrub_limit/D | mdb -k 
  zfs_scrub_limit: 
  zfs_scrub_limit:10 
  
  you can change it with: 
  # echo zfs_scrub_limit/W0t2 | mdb -kw 
  zfs_scrub_limit:0xa = 0x2 
  
  # echo zfs_scrub_limit/D | mdb -k 
  zfs_scrub_limit: 
  zfs_scrub_limit:2 
  
  In theory, this should help your scenario, but I do not believe this has 
  been exhaustively tested in the lab. Hopefully, it will help. 
  -- richard 

If I'm reading the code right, it's only used when creating a new vdev
(import, zpool create, maybe at boot).. So I took an alternate route:

http://pastebin.com/hcYtQcJH

(spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices *
zfs_scrub_limit(10) = 70..)

With these lower numbers, our pool is much more responsive over NFS..

 scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go

Might take a while though. We've taken periodic snapshots and have
snapshots from 2008, which probably has fragmented the pool beyond
sanity or something..

 - Bruno Sousa bso...@epinfante.com skrev: 
 
 
 Indeed the scrub seems to take too much resources from a live system. 
 For instance i have a server with 24 disks (SATA 1TB) serving as NFS store to 
 a linux machine holding user mailboxes. I have around 200 users, with maybe 
 30-40% of active users at the same time. 
 As soon as the scrub process kicks in, linux box starts to give messages like 
  nfs server not available and the users start to complain that the Outlook 
 gives connection timeout. Again, as soon as the scrub process stops 
 everything comes to normal. 
 So for me, it's real issue the fact that the scrub takes so many resources of 
 the system, making it pretty much unusable. In my case i did a workaround, 
 where basically i have zfs send/receive from this server to another server 
 and the scrub process is now running on the second server. 
 I don't know if this such a good idea, given the fact that i don't know for 
 sure if the scrub process in the secondary machine will be usefull in case of 
 data corruption...but so far so good , and it's probably better than nothing. 
 I still remember before ZFS , that any good RAID controller would have a 
 background consistency check task, and such a task would be possible to 
 assign priority , like low, medium, high ...going back to ZFS what's the 
 possibility of getting this feature as well? 
 
 
 Just out as curiosity , the Sun OpenStorage appliances , or Nexenta based 
 ones, have any scrub task enabled by default ? I would like to get some 
 feedback from users that run ZFS appliances regarding the impact of running a 
 scrub on their appliances. 
 
 
 Bruno 
 
 On 28-4-2010 22:39, David Dyer-Bennet wrote: 
 
 On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: 
 
 On Wed, Apr 28 at  1:34, Tonmaus wrote: 
 
 
 
 Zfs scrub needs to access all written data on all
 disks and is usually
 disk-seek or disk I/O bound so it is difficult to
 keep it from hogging
 the disk resources.  A pool based on mirror devices
 will behave much
 more nicely while being scrubbed than one based on
 RAIDz2. Experience seconded entirely. I'd like to repeat that I think we
 need more efficient load balancing functions in order to keep
 housekeeping payload manageable. Detrimental side effects of scrub
 should not be a decision point for choosing certain hardware or
 redundancy concepts in my opinion. While there may be some possible 
 optimizations, i'm sure everyone
 would love the random performance of mirror vdevs, combined with the
 redundancy of raidz3 and the space of a raidz1.  However, as in all
 systems, there are tradeoffs. The situations being mentioned are much worse 
 than what seem reasonable
 tradeoffs to me.  Maybe that's because my intuition is misleading me about
 what's available.  But if the normal workload of a system uses 25% of its
 sustained IOPS, and a scrub is run at low priority, I'd like to think
 that during a scrub I'd see a little degradation in performance, and that
 the scrub would take 25% or so longer than it would on an idle system. 
 There's presumably some inefficiency, so the two loads don't just add
 perfectly; so maybe another 5% lost to that?  That's the big uncertainty. 
 I have a hard time believing in 20% lost to that.
 
 Do you think that's a reasonable outcome to hope for?  Do you think ZFS is
 close to meeting it?
 
 People with systems that live at 75% all day are obviously going to have
 more problems than people who live at 25%! 
 
 -- 
 This message has been scanned for viruses and 
 dangerous content by MailScanner , and is 
 

Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren
On 29 April, 2010 - Tomas Ögren sent me these 5,8K bytes:

 On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes:
 
  I got this hint from Richard Elling, but haven't had time to test it much. 
  Perhaps someone else could help? 
  
  roy 
  
   Interesting. If you'd like to experiment, you can change the limit of the 
   number of scrub I/Os queued to each vdev. The default is 10, but that 
   is too close to the normal limit. You can see the current scrub limit 
   via: 
   
   # echo zfs_scrub_limit/D | mdb -k 
   zfs_scrub_limit: 
   zfs_scrub_limit:10 
   
   you can change it with: 
   # echo zfs_scrub_limit/W0t2 | mdb -kw 
   zfs_scrub_limit:0xa = 0x2 
   
   # echo zfs_scrub_limit/D | mdb -k 
   zfs_scrub_limit: 
   zfs_scrub_limit:2 
   
   In theory, this should help your scenario, but I do not believe this has 
   been exhaustively tested in the lab. Hopefully, it will help. 
   -- richard 
 
 If I'm reading the code right, it's only used when creating a new vdev
 (import, zpool create, maybe at boot).. So I took an alternate route:
 
 http://pastebin.com/hcYtQcJH
 
 (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices *
 zfs_scrub_limit(10) = 70..)
 
 With these lower numbers, our pool is much more responsive over NFS..

But taking snapshots is quite bad.. A single recursive snapshot over
~800 filesystems took about 45 minutes, with NFS operations taking 5-10
seconds.. Snapshots usually take 10-30 seconds..

  scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go

 scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go

This is chugging along..

The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.
Should have enough oompf, but when you combine snapshot with a
scrub/resilver, sync performance gets abysmal.. Should probably try
adding a ZIL when u9 comes, so we can remove it again if performance
goes crap.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore

2010-04-29 Thread Cindy Swearingen

Hi Euan,

For full root pool recovery see the ZFS Administration Guide, here:

http://docs.sun.com/app/docs/doc/819-5461/ghzvz?l=ena=view

Recovering the ZFS Root Pool or Root Pool Snapshots

Additional scenarios and details are provided in the ZFS troubleshooting
wiki. The link is here but the site is not responding at the moment:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Check back here later today.

Thanks,

Cindy

On 04/28/10 23:02, Euan Thoms wrote:

I'm looking for a way to backup my entire system, the rpool zfs pool to an 
external HDD so that it can be recovered in full if the internal HDD fails. 
Previously with Solaris 10 using UFS I would use ufsdump and ufsrestore, which 
worked so well, I was very confident with it. Now ZFS doesn't have an exact 
replacement of this so I need to find a best practice to replace it.

I'm guessing that I can format the external HDD as a pool called 'backup' and zfs 
send -R ... | zfs receive ... to it. What I'm not sure about is how to restore. 
Back in the days of UFS, I would boot of the Solaris 10 CD in single user mode command 
prompt, partition HDD with correct slices, format it, mount it and ufsrestore the entire 
filesystem. With zfs, I don't know what I'm doing. Can I just make a pool called rpool 
and zfs send/receive it back?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Andrey Kuzmin
I believe the name is Compellent Technologies,
http://www.google.com/finance?q=NYSE:CML.
Regards,
Andrey




On Wed, Apr 28, 2010 at 5:54 AM, Richard Elling
richard.ell...@richardelling.com wrote:
 Today, Compellant announced their zNAS addition to their unified storage
 line. zNAS uses ZFS behind the scenes.
 http://www.compellent.com/Community/Blog/Posts/2010/4/Compellent-zNAS.aspx

 Congrats Compellant!
  -- richard

 ZFS storage and performance consulting at http://www.RichardElling.com
 ZFS training on deduplication, NexentaStor, and NAS performance
 Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com





 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Richard Elling
On Apr 29, 2010, at 5:52 AM, Tomas Ögren wrote:

 On 29 April, 2010 - Tomas Ögren sent me these 5,8K bytes:
 
 On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes:
 
 I got this hint from Richard Elling, but haven't had time to test it much. 
 Perhaps someone else could help? 
 
 roy 
 
 Interesting. If you'd like to experiment, you can change the limit of the 
 number of scrub I/Os queued to each vdev. The default is 10, but that 
 is too close to the normal limit. You can see the current scrub limit via: 
 
 # echo zfs_scrub_limit/D | mdb -k 
 zfs_scrub_limit: 
 zfs_scrub_limit:10 
 
 you can change it with: 
 # echo zfs_scrub_limit/W0t2 | mdb -kw 
 zfs_scrub_limit:0xa = 0x2 
 
 # echo zfs_scrub_limit/D | mdb -k 
 zfs_scrub_limit: 
 zfs_scrub_limit:2 
 
 In theory, this should help your scenario, but I do not believe this has 
 been exhaustively tested in the lab. Hopefully, it will help. 
 -- richard 
 
 If I'm reading the code right, it's only used when creating a new vdev
 (import, zpool create, maybe at boot).. So I took an alternate route:
 
 http://pastebin.com/hcYtQcJH
 
 (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices *
 zfs_scrub_limit(10) = 70..)
 
 With these lower numbers, our pool is much more responsive over NFS..
 
 But taking snapshots is quite bad.. A single recursive snapshot over
 ~800 filesystems took about 45 minutes, with NFS operations taking 5-10
 seconds.. Snapshots usually take 10-30 seconds..
 
 scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go
 
 scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go
 
 This is chugging along..
 
 The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
 SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.

slow disks == poor performance

 Should have enough oompf, but when you combine snapshot with a
 scrub/resilver, sync performance gets abysmal.. Should probably try
 adding a ZIL when u9 comes, so we can remove it again if performance
 goes crap.

A separate log will not help.  Try faster disks.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Tomas Ögren
On 29 April, 2010 - Richard Elling sent me these 2,5K bytes:

  With these lower numbers, our pool is much more responsive over NFS..
  
  But taking snapshots is quite bad.. A single recursive snapshot over
  ~800 filesystems took about 45 minutes, with NFS operations taking 5-10
  seconds.. Snapshots usually take 10-30 seconds..
  
  scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go
  
  scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go
  
  This is chugging along..
  
  The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
  SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.
 
 slow disks == poor performance

I know they're not fast, but they're not should take 10-30 seconds to
create a directory. They do perfectly well in all combinations, except
when a scrub comes along (or sometimes when a snapshot feels like taking
45 minutes instead of 4.5 seconds). iostat says the disks aren't 100%
busy, the storage box itself doesn't seem to be busy, yet with zfs they
go downhill in some conditions..

  Should have enough oompf, but when you combine snapshot with a
  scrub/resilver, sync performance gets abysmal.. Should probably try
  adding a ZIL when u9 comes, so we can remove it again if performance
  goes crap.
 
 A separate log will not help.  Try faster disks.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS, and ACLs ssues

2010-04-29 Thread Mary Ellen Fitzpatrick
I setup the share and mounted on linux client, permissions did not carry 
over from zfs share.



hecate:~ zfs create zp-ext/test/mfitzpat
hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat
NAME  PROPERTY  VALUE SOURCE
zp-ext/test/mfitzpat  sharenfs  oninherited from zp-ext
hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat

updated auto.home on linux client(nona-man)
test-rw,hard,intr   hecate:/zp-ext/test

nona-man:/# cd /fs/test
nona-man:/fs/test# ls -l
total 3
drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat

Permissions did not carry over from zfs share.
Willing test/try next step.

Mary  Ellen




Cindy Swearingen wrote:

Hi Mary Ellen,

We were looking at this problem and are unsure what the problem is...

To rule out NFS as the root cause, could you create and share a test ZFS 
file system without any ACLs to see if you can access the data from the

Linux client?

Let us know the result of your test.

Thanks,

Cindy
On 04/28/10 12:54, Mary Ellen Fitzpatrick wrote:
  
New to Solairs/ZFS and having a difficult time getting ZFS, NFS and ACLs 
all working together, properly.   Trying access/use zfs shared 
filesystems on a linux client. When I access the dir/files on the linux 
client, my permissions do not carry over, nor do the newly created 
files, and I can not create new files/dirs.   The permissions/owner on 
the zfs share are set so the owner (mfitzpat) is allowed to do 
everything, but permissions are not carrying over via NFS to the linux 
client.I have googled/read and can not get it right.   I think this 
has something to do with NSF4, but I can not figure it out.


Any help appreciated
Mary Ellen

Running Solaris10 5/09 (u7) on a SunFire x4540 (hecate) with ZFS and zfs 
shares automounted to Centos5 client (nona-man).
Running NIS on nona-man(Centos5) and hecate (zfs) is a client.  All 
works well.


I have created the following zfs filesystems to share and have sharenfs=on
hecate:/zp-ext/spartans/umass zfs get sharenfs
zp-ext/spartans/umass   sharenfs  oninherited from 
zp-ext/spartans
zp-ext/spartans/umass/mfitzpat  sharenfs  oninherited from 
zp-ext/spartans


set up inheritance:
hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough 
zp-ext/spartans/umass
hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough 
zp-ext/spartans/umass/mfitzpat
hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough 
zp-ext/spartans/umass
hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough 
zp-ext/spartans/umass/mfitzpat


Set owner:group:
hecate:/zp-ext/spartans/umass chown mfitzpat:umass mfitzpat
hecate:/zp-ext/spartans/umass ls -l
total 5
drwxr-xr-x   2 mfitzpat umass  2 Apr 28 13:18 mfitzpat

Permissions:
hecate:/zp-ext/spartans/umass ls -dv mfitzpat
drwxr-xr-x   2 mfitzpat umass  2 Apr 28 14:06 mfitzpat
0:owner@::deny
1:owner@:list_directory/read_data/add_file/write_data/add_subdirectory
/append_data/write_xattr/execute/write_attributes/write_acl
/write_owner:allow
2:group@:add_file/write_data/add_subdirectory/append_data:deny
3:group@:list_directory/read_data/execute:allow

4:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr

/write_attributes/write_acl/write_owner:deny
5:everyone@:list_directory/read_data/read_xattr/execute/read_attributes
/read_acl/synchronize:allow

I can access, create/delete files/dirs on the zfs system and permissions 
hold.

[mfitz...@hecate mfitzpat]$ touch foo
[mfitz...@hecate mfitzpat]$ ls -l
total 1
-rw-r--r--   1 mfitzpat umass  0 Apr 28 14:18 foo

When I try to access the dir/files on the linux client, my permissions 
do no carry over, nor do the newly created files, and I can not create 
new files/dirs.

[mfitz...@nona-man umass]$ ls -l
drwxr-xr-x+ 2 root root 2 Apr 28 13:18 mfitzpat

[mfitz...@nona-man mfitzpat]$ pwd
/fs/umass/mfitzpat
[mfitz...@nona-man mfitzpat]$ ls
[mfitz...@nona-man mfitzpat]$





--
Thanks
Mary Ellen


Mary Ellen FitzPatrick
Systems Analyst 
Bioinformatics

Boston University
24 Cummington St.
Boston, MA 02215
office 617-358-2771
cell 617-797-7856 
mfitz...@bu.edu


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to clear invisible, partially received snapshots?

2010-04-29 Thread Andrew Daugherity
I currently use zfs send/recv for onsite backups [1], and am configuring
it for replication to an offsite server as well.  I did an initial full
send, and then a series of incrementals to bring the offsite pool up to
date.

During one of these transfers, the offsite server hung, and I had to
power-cycle it.  It came back up just fine, except that the snapshot it
was receiving when it hung appeared to be both present and nonexistent,
depending on which command was run.

'zfs recv' complained that the target snapshot already existed, but it
did not show up in the output of 'zfs list', and 'zfs destroy' said it
did not exist.

I ran a scrub, which did not find any errors; nor did it solve the
problem.  I discovered some useful commands with zdb [2], and found more
info:

zdb -d showed the snapshot, with an unusual name:
Dataset backup/ims/%zfs-auto-snap_daily-2010-04-22-1900 [ZPL], ID 6325,
cr_txg 28137403, 2.62T, 123234 objects

As opposed to a normal snapshot: 
Dataset backup/i...@zfs-auto-snap_daily-2010-04-21-1900 [ZPL], ID 5132,
cr_txg 27472350, 2.61T, 123200 objects

I then attempted 
'zfs destroy backup/ims/%zfs-auto-snap_daily-2010-04-22-1900', but it
still said the dataset did not exist.

Finally I exported the pool, and after importing it, the snapshot was
gone, and I could receive the snapshot normally.

Is there a way to clear a partial snapshot without an export/import
cycle?


Thanks,

Andrew

[1]
http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034554.html
[2] http://www.cuddletech.com/blog/pivot/entry.php?id=980

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS, and ACLs ssues

2010-04-29 Thread Cindy Swearingen

Hi Mary Ellen,

I'm not really qualified to help you troubleshoot this problem.
Other community members on this list have wrestled with similar
problems and I hope they will comment...

Your Linux client doesn't seem to be suffering from the nobody
problem because you see mfitzpat on nona-man so UID/GIDs are
translated correctly.

This issue has come up often enough that I will start tracking
this in our troubleshooting wiki as soon as we get more feedback.

Thanks,

Cindy
On 04/29/10 09:23, Mary Ellen Fitzpatrick wrote:
I setup the share and mounted on linux client, permissions did not carry 
over from zfs share.



hecate:~ zfs create zp-ext/test/mfitzpat
hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat
NAME  PROPERTY  VALUE SOURCE
zp-ext/test/mfitzpat  sharenfs  oninherited from zp-ext
hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat

updated auto.home on linux client(nona-man)
test-rw,hard,intr   hecate:/zp-ext/test

nona-man:/# cd /fs/test
nona-man:/fs/test# ls -l
total 3
drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat

Permissions did not carry over from zfs share.
Willing test/try next step.

Mary  Ellen




Cindy Swearingen wrote:

Hi Mary Ellen,

We were looking at this problem and are unsure what the problem is...

To rule out NFS as the root cause, could you create and share a test 
ZFS file system without any ACLs to see if you can access the data 
from the

Linux client?

Let us know the result of your test.

Thanks,

Cindy
On 04/28/10 12:54, Mary Ellen Fitzpatrick wrote:
 
New to Solairs/ZFS and having a difficult time getting ZFS, NFS and 
ACLs all working together, properly.   Trying access/use zfs shared 
filesystems on a linux client. When I access the dir/files on the 
linux client, my permissions do not carry over, nor do the newly 
created files, and I can not create new files/dirs.   The 
permissions/owner on the zfs share are set so the owner (mfitzpat) is 
allowed to do everything, but permissions are not carrying over via 
NFS to the linux client.I have googled/read and can not get it 
right.   I think this has something to do with NSF4, but I can not 
figure it out.


Any help appreciated
Mary Ellen

Running Solaris10 5/09 (u7) on a SunFire x4540 (hecate) with ZFS and 
zfs shares automounted to Centos5 client (nona-man).
Running NIS on nona-man(Centos5) and hecate (zfs) is a client.  All 
works well.


I have created the following zfs filesystems to share and have 
sharenfs=on

hecate:/zp-ext/spartans/umass zfs get sharenfs
zp-ext/spartans/umass   sharenfs  oninherited 
from zp-ext/spartans
zp-ext/spartans/umass/mfitzpat  sharenfs  oninherited 
from zp-ext/spartans


set up inheritance:
hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough 
zp-ext/spartans/umass
hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough 
zp-ext/spartans/umass/mfitzpat
hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough 
zp-ext/spartans/umass
hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough 
zp-ext/spartans/umass/mfitzpat


Set owner:group:
hecate:/zp-ext/spartans/umass chown mfitzpat:umass mfitzpat
hecate:/zp-ext/spartans/umass ls -l
total 5
drwxr-xr-x   2 mfitzpat umass  2 Apr 28 13:18 mfitzpat

Permissions:
hecate:/zp-ext/spartans/umass ls -dv mfitzpat
drwxr-xr-x   2 mfitzpat umass  2 Apr 28 14:06 mfitzpat
0:owner@::deny

1:owner@:list_directory/read_data/add_file/write_data/add_subdirectory

/append_data/write_xattr/execute/write_attributes/write_acl
/write_owner:allow
2:group@:add_file/write_data/add_subdirectory/append_data:deny
3:group@:list_directory/read_data/execute:allow

4:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr

/write_attributes/write_acl/write_owner:deny

5:everyone@:list_directory/read_data/read_xattr/execute/read_attributes

/read_acl/synchronize:allow

I can access, create/delete files/dirs on the zfs system and 
permissions hold.

[mfitz...@hecate mfitzpat]$ touch foo
[mfitz...@hecate mfitzpat]$ ls -l
total 1
-rw-r--r--   1 mfitzpat umass  0 Apr 28 14:18 foo

When I try to access the dir/files on the linux client, my 
permissions do no carry over, nor do the newly created files, and I 
can not create new files/dirs.

[mfitz...@nona-man umass]$ ls -l
drwxr-xr-x+ 2 root root 2 Apr 28 13:18 mfitzpat

[mfitz...@nona-man mfitzpat]$ pwd
/fs/umass/mfitzpat
[mfitz...@nona-man mfitzpat]$ ls
[mfitz...@nona-man mfitzpat]$






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question about du and compression

2010-04-29 Thread Roy Sigurd Karlsbakk
Hi all

Is there a good way to do a du that tells me how much data is there in case I 
want to move it to, say, an USB drive? Most filesystems don't have compression, 
but we're using it on (most of) our zfs filesystems, and it can be troublesome 
for someone that wants to copy a set of data to somewhere to find it's twice as 
big as reported by du.

Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about du and compression

2010-04-29 Thread Tomas Ögren
On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 1,2K bytes:

 Hi all
 
 Is there a good way to do a du that tells me how much data is there in
 case I want to move it to, say, an USB drive? Most filesystems don't
 have compression, but we're using it on (most of) our zfs filesystems,
 and it can be troublesome for someone that wants to copy a set of data
 to somewhere to find it's twice as big as reported by du.

GNU du has --apparent-size which reports the file size instead of how
much disk space it uses.. compression and sparse files will make this
differ, and you can't really tell them apart.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Katzke, Karl
  The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G
  SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8.

 slow disks == poor performance

  Should have enough oompf, but when you combine snapshot with a
  scrub/resilver, sync performance gets abysmal.. Should probably try
  adding a ZIL when u9 comes, so we can remove it again if performance
  goes crap.

 A separate log will not help.  Try faster disks.

We're seeing the same thing in Sol10u8 with both 300gb 15k rpm SAS disks 
in-board on a Sun x4250 and an external chassis with 1tb 7200 rpm SATA disks 
connected via SAS. Faster disks aren't the problem; there's a fundamental issue 
with ZFS [iscsi;nfs;cifs] share performance under scrub  resilver. 

-K 

--- 
Karl Katzke
Systems Analyst II
TAMU DRGS




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs inherit vs. received properties

2010-04-29 Thread Brandon High
I'm seeing some weird behavior on b133 with 'zfs inherit' that seems
to conflict with what the docs say. According to the man page it
clears the specified property, causing it to be inherited from an
ancestor but that's not the behavior I'm seeing.

For example:

basestar:~$ zfs get compress tank/export/vmware
NAME                PROPERTY                      VALUE
           SOURCE
tank/export/vmware  compression                   gzip
           local
basestar:~$ zfs get compress tank/export/vmware/delusional
NAME                           PROPERTY                        VALUE
                        SOURCE
tank/export/vmware/delusional  compression                     on
                        received
bh...@basestar:~$ pfexec zfs inherit compress tank/export/vmware/delusional
basestar:~$ zfs get compress tank/export/vmware/delusional
NAME                           PROPERTY                        VALUE
                        SOURCE
tank/export/vmware/delusional  compression                     on
                        received

Is this a bug in inherit, or is the documentation off?

-B

--
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrate ZFS volume to new pool

2010-04-29 Thread Bob Friesenhahn

On Wed, 28 Apr 2010, Jim Horng wrote:


Why would you recommend a spare for raidz2 or raidz3?


Spare is to minimize the reconstruction time.  Because remember a 
vdev can not start resilvering until there is a spare disk 
available. And with disks as big as they are today, resilvering also 
take many hours.  I rather have the disk finished resilvering before 
I have the chance to replace the bad disk than to risk more disks 
fail before It had a chance to resilverize.


Would your opinion change if the disks you used took 7 days to 
resilver?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Bob Friesenhahn

On Thu, 29 Apr 2010, Roy Sigurd Karlsbakk wrote:


While there may be some possible optimizations, i'm sure everyone
would love the random performance of mirror vdevs, combined with the
redundancy of raidz3 and the space of a raidz1.  However, as in all
systems, there are tradeoffs.


In my opinion periodic scrubs are most useful for pools based on 
mirrors, or raidz1, and much less useful for pools based on raidz2 or 
raidz3.  It is useful to run a scrub at least once on a well-populated 
new pool in order to validate the hardware and OS, but otherwise, the 
scrub is most useful for discovering bit-rot in singly-redundant 
pools.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-29 Thread Ian Collins

On 04/30/10 10:35 AM, Bob Friesenhahn wrote:

On Thu, 29 Apr 2010, Roy Sigurd Karlsbakk wrote:


While there may be some possible optimizations, i'm sure everyone
would love the random performance of mirror vdevs, combined with the
redundancy of raidz3 and the space of a raidz1.  However, as in all
systems, there are tradeoffs.


In my opinion periodic scrubs are most useful for pools based on 
mirrors, or raidz1, and much less useful for pools based on raidz2 or 
raidz3.  It is useful to run a scrub at least once on a well-populated 
new pool in order to validate the hardware and OS, but otherwise, the 
scrub is most useful for discovering bit-rot in singly-redundant pools.



I agree.

I look after an x4500 with a poll of raidz2 vdevs that I can't run 
scrubs on due the the dire impact on performance. That's one reason I'd 
never use raidz1 in a real system.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrate ZFS volume to new pool

2010-04-29 Thread Jim Horng
 Would your opinion change if the disks you used took
 7 days to resilver?
 
 Bob

That will only make a stronger case that hot spare is absolutely needed.
This will also make a strong case for choosing raidz3 over raidz2 as well as 
vdev smaller number of disks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience

2010-04-29 Thread Edward Ned Harvey
I finally got it, I think.  Somebody (with deep and intimate knowledge of
ZFS development) please tell me if I've been hitting the crack pipe too
hard.  But .

 

Part 1 of this email:

Netapp snapshot security flaw.  Inherent in their implementation of
.snapshot directories. 

 

Part 2 of this email:

How ZFS could do this, much better.

 

(#1)

Netapp snapshot security flaw.  Inherent in their implementation of
.snapshot directories. 

 

(as root)

# mkdir -p a/b/c

# echo secret info  a/b/c/info.txt

# chmod 777 a

# chmod 700 a/b

# chmod 777 a/b/c

# chmod 666 a/b/c/info.txt

# rsh netappfiler snap create vol0 test

creating snapshot...

# echo public info  a/b/c/info.txt

# mv a/b/c a/c

 

(as a normal user)

$ cat a/c/info.txt

public info

$ cat a/c/.snapshot/test/info.txt

secret info

 

D'oh!!!  By changing permissions in the present filesystem, the normal user
has been granted access to restricted information in the past.

 

 

(#2)

How ZFS could do this, much better.

 

First let it be said, ZFS doesn't have this security flaw.  (Kudos.)

 

But also let it be said, the user experience of having the .snapshot always
conveniently locally available, is a very positive thing.  Even if you
rename and move some directory all over the place like crazy, with zillions
of snapshots being taken in all those locations, when you look in that
directory's .snapshot, you still have access to *all* the previous snapshots
of that directory, regardless of what that directory was formerly named, or
where in the directory tree it was linked.

 

In short, the user experience of .snapshot is more user friendly.  But the
.zfs style snapshot requires less development complexity and therefore
immune to this sort of flaw.

 

So here's the idea, in which ZFS could provide the best of both worlds:

 

Each inode contain a link count.  In most cases, each inode has a link count
of 1, but of course that can't be assumed.  It seems trivially simple to me,
that along with the link count in each inode, the filesystem could also
store a list of which inodes link to it.  If link count is 2, then there's a
list of 2 inodes, which are the parents of this inode.

 

In which case, it would be trivially easy to walk back up the whole tree,
almost instantly identifying every combination of paths that could possibly
lead to this inode, while simultaneously correctly handling security
concerns about bypassing security of parent directories and everything.
Once the absolute path is generated, if the user doesn't have access to that
path, then the user simply doesn't get that particular result returned to
them.

 

It seems too perfect and too simple.  Instead of a one-directional directed
graph, simply make a bidirectional.  There's no significant additional
overhead as far as I can tell.  It seems like it would even be easy.

 

By doing this, it will be very easy for zhist (or anything else) to
instantly produce all the names of all the snapshot versions of any file or
directory, even if that filename has been changing over time . even if that
file is hardlinked in more than one directory path . 

 

Then ZFS has a technique, different from .snapshot directories, which
perform more simply, more reliably, more securely than the netapp
implementation.  This technique works equally well for files or directories
(unlike the netapp method.)  And there is no danger of legal infringement
upon any netapp invention.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, NFS, and ACLs ssues

2010-04-29 Thread Paul B. Henson
On Thu, 29 Apr 2010, Mary Ellen Fitzpatrick wrote:

 hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat
[...]
 hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat
[...]
 test-rw,hard,intr   hecate:/zp-ext/test
[...]
 drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat

Unless I'm missing something, you chown'd the filesystem
zp-ext/test/mfitzpat but you mounted the filesystem zp-ext/test; hence
you're seeing the mount point for the mfitzpat filesystem in the
zp-ext/test filesystem over NFS, not the actual zp-ext/test/mfitzpat
filesystem.

Pending the availability of mirror mounts


(http://hub.opensolaris.org/bin/download/Project+nfs-namespace/files/mm-PRS-open.html)

you need to mount each ZFS filesystem you're exporting via NFS separately.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot versus Netapp - Security and convenience

2010-04-29 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 Each inode contain a link count.  It seems trivially
 simple to me, that along with the link count in each inode, the
 filesystem could also store a list of which inodes link to it.

Others may have better ideas for implementation.  But at least for a
starting point, here's how I imagine this:

The goal is to be always able to instantly locate all the previous snapshot
versions of any file or directory, regardless of whether or not that
filename, directory name, or any path leading up to that file or directory
may have ever changed.  An additional goal is to obey security.  Don’t give
the user any information that they couldn't have found by other (slower)
means.  In this described scenario, these goals has been achieved.

Currently, there's a .zfs directory, which is not a real directory.  By
default, it's hidden until you explicitly try to access it by name.

Inside the .zfs directory, there's presently a snapshot directory, and
nothing else.  Let's suppose my system has several snapshots.  snap1,
snap2, snap3, ...  Then these appear as
/tank/.zfs/snapshot/{snap1,snap2,snap3,...}  And inside there, are all the
subdirectories which lead to all the files.

Let there be also, an inodes directory next to the snapshot directory.  
/tank/.zfs/snapshot
/tank/.zfs/inodes

Whenever a snap is created, let it be listed under both snapshot and
inodes
/tank/.zfs/snapshot/{snap1,snap2,snap3,...}
/tank/.zfs/inodes/{snap1,snap2,snap3,...}

If you simply ls /tank/.zfs/inodes/snap1 then you see nothing.  The system
will not generate a list of every single inode in the whole filesystem; that
would be crazy.

But, just as the .zfs directory was hidden and appears upon attempted
access, let there be text files, whose names are inode numbers, and these
text files only appear upon attempted access.
ls /tank/.zfs/inodes/snap1
(no result)
cat /tank/.zfs/inodes/snap1/12345
(gives the following results)
/tank/.zfs/snapshot/snap1/foo/bar/baz  (which is the abs path to the
file having inode 12345)

And so, a mechanism has been created, so a user can do this:
ls -i /tank/exports/home/jbond/somefile.txt
12345
cat /tank/.zfs/inodes/snap1/12345
(result is:  exports/home/jbond/Some-File.TXT)
Thus, we have identified the former name of somefile.txt and ...
cat /tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT

Note: the above ls -i ; cat process is slightly tedious.
I don't expect many users to do this directly.
But I would happily automate and simplify this process
By coding zhist to utilize this technique automatically.

User could:
zhist ls somefile.txt
Result would be:
/tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT

And of course, once the command-line verson of zhist
Can do that, there's no obstacle preventing the GUI frontend.

One important note:
Since you're doing a reverse mapping, from inode number to path name, it's
important to obey filesystem security.  Fortunately, the process of
generating absolute path names from an inode number is handled by kernel,
and only after the complete absolute pathname has been generated, is
anything returned to the user.  Which means the kernel has the opportunity
to test whether or not the user would have access to ls the specified
inode by pathname, before returning that pathname to the user.  In other
words, if the user couldn't get that pathname via find
/tank/.zfs/snapshot/snap1 -inum 12345 then the user could not get that
pathname via .zfs/inodes either.

The only difference is that the find command could run for a very long
time, yet the .zfs/inodes directory returns that same result nearly
instantly.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore

2010-04-29 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Cindy Swearingen
 
 For full root pool recovery see the ZFS Administration Guide, here:
 
 http://docs.sun.com/app/docs/doc/819-5461/ghzvz?l=ena=view
 
 Recovering the ZFS Root Pool or Root Pool Snapshots

Unless I misunderstand, I think the intent of the OP question is how to do
bare metal recovery after some catastrophic failure.  In this situation,
recovery is much more complex than what the ZFS Admin Guide says above.  You
would need to boot from CD, and partition and format the disk, then create a
pool, and create a filesystem, and zfs send | zfs receive into that
filesystem, and finally install the boot blocks.  Only some of these steps
are described in the ZFS Admin Guide, because simply expanding the rpool is
a fundamentally easier thing to do.

Even though I think I could do that ... I don't have a lot of confidence in
it, and I can certainly imagine some pesky little detail being a problem.

This is why I suggested the technique of:
Reinstall the OS just like you did when you first built your machine, before
the catastrophy.  It doesn't even matter if you make the same selections you
made before (IP address, package selection, authentication method, etc) as
long as you're choosing to partition and install the bootloader like you did
before.

This way, you're sure the partitions, format, pool, filesystem, and
bootloader are all configured properly.
Then boot from CD again, and zfs send | zfs receive to overwrite your
existing rpool.

And as far as I know, that will take care of everything.  But I only feel
like 90% confident that would work.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Panic when deleting a large dedup snapshot

2010-04-29 Thread Brandon High
I tried destroying a large (710GB) snapshot from a dataset that had
been written with dedup on. The host locked up almost immediately, but
there wasn't a stack trace on the console and the host required a
power cycle, but seemed to reboot normally. Once up, the snapshot was
still there. I was able to get a dump from this. The data was written
with b129, and the system is currently at b134.

I tried destroying it again, and the host started behaving badly.
'less' would hang, and there were several zfs-auto-snapshot processes
that were over an hour old, and the 'zfs snapshot' processes were
stuck on the first dataset of the pool. Eventually the host became
unusable and I rebooted again.

The host seems to be fine now, and is currently running a scrub.

Any ideas on how to avoid this in the future? I'm no longer using
dedup due to performance issues with it, which implies that the DDT is
very large.

bh...@basestar:~$ pfexec zdb -DD tank
DDT-sha256-zap-duplicate: 5339247 entries, size 348 on disk, 162 in core
DDT-sha256-zap-unique: 1479972 entries, size 1859 on disk, 1070 in core

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot versus Netapp - Security and convenience

2010-04-29 Thread Peter Jeremy
On 2010-Apr-30 10:24:14 +0800, Edward Ned Harvey solar...@nedharvey.com wrote:
Each inode contain a link count.  In most cases, each inode has a
link count of 1, but of course that can't be assumed.  It seems
trivially simple to me, that along with the link count in each inode,
the filesystem could also store a list of which inodes link to it.
If link count is 2, then there's a list of 2 inodes, which are the
parents of this inode.

I'm not sure exactly what you are trying to say here but it don't
think it will work.

In a Unix FS (UFS or ZFS), a directory entry contains a filename and a
pointer to an inode.  The inode itself contains a count of the number
of directory entries that point to it and pointers to the actual data.
There is currently no provision for a reverse link back to the
directory.

I gather you are suggesting that the inode be extended to contain a
list of the inode numbers of all directories that contain a filename
referring to that inode.  Whilst I agree that this would simplify
inode to filename mapping and provide an alternate mechanism for
checking file permissions, I think you are glossing over the issue of
how/where to store these links.

Whilst files can have a link count of 1 (I'm not sure if this is true
in most cases), they can have up to 32767 links.  Where is this list
of (up to) 32767 parent inodes going to be stored?

In which case, it would be trivially easy to walk back up the whole
tree, almost instantly identifying every combination of paths that
could possibly lead to this inode, while simultaneously correctly
handling security concerns about bypassing security of parent
directories and everything.

Whilst it's trivially easy to get from the file to the list of
directories containing that file, actually getting from one directory
to its parent is less so: A directory containing N sub-directories has
N+2 links.  Whilst the '.' link is easy to identify (it points to its
own inode), distinguishing between the name of this directory in its
parent and the '..' entries in its subdirectories is rather messy
(requiring directory scans) unless you mandate that the reference to
the parent directory is in a fixed location (ie 1st or 2nd entry in
the parent inode list).

It seems too perfect and too simple.  Instead of a one-directional
directed graph, simply make a bidirectional.  There's no significant
additional overhead as far as I can tell.  It seems like it would
even be easy.

Well, you need to find somewhere to store up to 32K inode numbers,
whilst having minimal space overhead for small numbers of links.  Then
you will need to patch the vnode operations underlying creat(),
link(), unlink(), rename(), mkdir() and rmdir() to manage the
backlinks (taking into account transactional consistency).

-- 
Peter Jeremy


pgpLmGCkPtpSv.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss