Re: [zfs-discuss] Compellant announces zNAS
What operating system does it run? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] MPT issues strikes back
Hi Mark, I also had some SSD drives in this machine, but i have take them out but the problem still occours... Regarding the bug, well it seems to be related with usage of xVM , and since i don't use maybe it will not make any difference to this particular server... Anyway, thanks for the tip, and i will try to understand what's wrong with this machine. Bruno On 27-4-2010 16:41, Mark Ogden wrote: Bruno Sousa on Tue, Apr 27, 2010 at 09:16:08AM +0200 wrote: Hi all, Yet another story regarding mpt issues, and in order to make a long story short everytime that a Dell R710 running snv_134 logs the information scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,3...@4/pci1028,1...@0 (mpt0): , the system freezes and ony a hard-reset fixes the issue. Is there any sort of parameter to be used to minimize/avoid this issue? We had the same problem on a X4600, turned out to be a bad SSD and or connection at the location listed in the error message. Since removing that drive, we have not encounted that issue. You might want to look at http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=7acda35c626180d9cda7bd1df451?bug_id=6894775 too. -Mark Machine specs : Dell R710, 16 GB memory, 2 Intel Quad-Core E5506 SunOS san01 5.11 snv_134 i86pc i386 i86pc Solaris Dell Integrated SAS 6/i Controller ( mpt0 Firmware version v0.25.47.0 (IR) ) with 2 disks attached without raid Thanks in advance, Bruno -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrate ZFS volume to new pool
Why would you recommend a spare for raidz2 or raidz3? -- richard Spare is to minimize the reconstruction time. Because remember a vdev can not start resilvering until there is a spare disk available. And with disks as big as they are today, resilvering also take many hours. I rather have the disk finished resilvering before I have the chance to replace the bad disk than to risk more disks fail before It had a chance to resilverize. This is especially important if the file system is not at a location with 24 hours staff. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Compellant announces zNAS
2010/4/29 Thommy M. Malmström thommy.m.malmst...@gmail.com: What operating system does it run? Nexenta I believe. -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Compellant announces zNAS
That screen shot looks very much like Nexenta 3.0 with a different branding. Elsewhere, The Register confirms it's OpenSolaris. On 29 Apr 2010, at 07:35, Thommy M. Malmström thommy.m.malmst...@gmail.co m wrote: What operating system does it run? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
Indeed the scrub seems to take too much resources from a live system. For instance i have a server with 24 disks (SATA 1TB) serving as NFS store to a linux machine holding user mailboxes. I have around 200 users, with maybe 30-40% of active users at the same time. As soon as the scrub process kicks in, linux box starts to give messages like nfs server not available and the users start to complain that the Outlook gives connection timeout. Again, as soon as the scrub process stops everything comes to normal. So for me, it's real issue the fact that the scrub takes so many resources of the system, making it pretty much unusable. In my case i did a *workaround, *where basically i have zfs send/receive from this server to another server and the scrub process is now running on the second server. I don't know if this such a good idea, given the fact that i don't know for sure if the scrub process in the secondary machine will be usefull in case of data corruption...but so far so good , and it's probably better than nothing. I still remember before ZFS , that any good RAID controller would have a background consistency check task, and such a task would be possible to assign priority , like low, medium, high ...going back to ZFS what's the possibility of getting this feature as well? Just out as curiosity , the Sun OpenStorage appliances , or Nexenta based ones, have any scrub task enabled by default ? I would like to get some feedback from users that run ZFS appliances regarding the impact of running a scrub on their appliances. Bruno On 28-4-2010 22:39, David Dyer-Bennet wrote: On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: On Wed, Apr 28 at 1:34, Tonmaus wrote: Zfs scrub needs to access all written data on all disks and is usually disk-seek or disk I/O bound so it is difficult to keep it from hogging the disk resources. A pool based on mirror devices will behave much more nicely while being scrubbed than one based on RAIDz2. Experience seconded entirely. I'd like to repeat that I think we need more efficient load balancing functions in order to keep housekeeping payload manageable. Detrimental side effects of scrub should not be a decision point for choosing certain hardware or redundancy concepts in my opinion. While there may be some possible optimizations, i'm sure everyone would love the random performance of mirror vdevs, combined with the redundancy of raidz3 and the space of a raidz1. However, as in all systems, there are tradeoffs. The situations being mentioned are much worse than what seem reasonable tradeoffs to me. Maybe that's because my intuition is misleading me about what's available. But if the normal workload of a system uses 25% of its sustained IOPS, and a scrub is run at low priority, I'd like to think that during a scrub I'd see a little degradation in performance, and that the scrub would take 25% or so longer than it would on an idle system. There's presumably some inefficiency, so the two loads don't just add perfectly; so maybe another 5% lost to that? That's the big uncertainty. I have a hard time believing in 20% lost to that. Do you think that's a reasonable outcome to hope for? Do you think ZFS is close to meeting it? People with systems that live at 75% all day are obviously going to have more problems than people who live at 25%! -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
I got this hint from Richard Elling, but haven't had time to test it much. Perhaps someone else could help? roy Interesting. If you'd like to experiment, you can change the limit of the number of scrub I/Os queued to each vdev. The default is 10, but that is too close to the normal limit. You can see the current scrub limit via: # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:10 you can change it with: # echo zfs_scrub_limit/W0t2 | mdb -kw zfs_scrub_limit:0xa = 0x2 # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:2 In theory, this should help your scenario, but I do not believe this has been exhaustively tested in the lab. Hopefully, it will help. -- richard - Bruno Sousa bso...@epinfante.com skrev: Indeed the scrub seems to take too much resources from a live system. For instance i have a server with 24 disks (SATA 1TB) serving as NFS store to a linux machine holding user mailboxes. I have around 200 users, with maybe 30-40% of active users at the same time. As soon as the scrub process kicks in, linux box starts to give messages like nfs server not available and the users start to complain that the Outlook gives connection timeout. Again, as soon as the scrub process stops everything comes to normal. So for me, it's real issue the fact that the scrub takes so many resources of the system, making it pretty much unusable. In my case i did a workaround, where basically i have zfs send/receive from this server to another server and the scrub process is now running on the second server. I don't know if this such a good idea, given the fact that i don't know for sure if the scrub process in the secondary machine will be usefull in case of data corruption...but so far so good , and it's probably better than nothing. I still remember before ZFS , that any good RAID controller would have a background consistency check task, and such a task would be possible to assign priority , like low, medium, high ...going back to ZFS what's the possibility of getting this feature as well? Just out as curiosity , the Sun OpenStorage appliances , or Nexenta based ones, have any scrub task enabled by default ? I would like to get some feedback from users that run ZFS appliances regarding the impact of running a scrub on their appliances. Bruno On 28-4-2010 22:39, David Dyer-Bennet wrote: On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: On Wed, Apr 28 at 1:34, Tonmaus wrote: Zfs scrub needs to access all written data on all disks and is usually disk-seek or disk I/O bound so it is difficult to keep it from hogging the disk resources. A pool based on mirror devices will behave much more nicely while being scrubbed than one based on RAIDz2. Experience seconded entirely. I'd like to repeat that I think we need more efficient load balancing functions in order to keep housekeeping payload manageable. Detrimental side effects of scrub should not be a decision point for choosing certain hardware or redundancy concepts in my opinion. While there may be some possible optimizations, i'm sure everyone would love the random performance of mirror vdevs, combined with the redundancy of raidz3 and the space of a raidz1. However, as in all systems, there are tradeoffs. The situations being mentioned are much worse than what seem reasonable tradeoffs to me. Maybe that's because my intuition is misleading me about what's available. But if the normal workload of a system uses 25% of its sustained IOPS, and a scrub is run at low priority, I'd like to think that during a scrub I'd see a little degradation in performance, and that the scrub would take 25% or so longer than it would on an idle system. There's presumably some inefficiency, so the two loads don't just add perfectly; so maybe another 5% lost to that? That's the big uncertainty. I have a hard time believing in 20% lost to that. Do you think that's a reasonable outcome to hope for? Do you think ZFS is close to meeting it? People with systems that live at 75% all day are obviously going to have more problems than people who live at 25%! -- This message has been scanned for viruses and dangerous content by MailScanner , and is believed to be clean. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Euan Thoms I'm looking for a way to backup my entire system, the rpool zfs pool to an external HDD so that it can be recovered in full if the internal HDD fails. Previously with Solaris 10 using UFS I would use ufsdump and ufsrestore, which worked so well, I was very confident with it. Now ZFS doesn't have an exact replacement of this so I need to find a best practice to replace it. I'm guessing that I can format the external HDD as a pool called 'backup' and zfs send -R ... | zfs receive ... to it. What I'm not sure about is how to restore. Back in the days of UFS, I would boot of the Solaris 10 CD in single user mode command prompt, partition HDD with correct slices, format it, mount it and ufsrestore the entire filesystem. With zfs, I don't know what I'm doing. Can I just make a pool called rpool and zfs send/receive it back? An excellent question. One which many people would never bother to explore, but important nonethenless. I have not tested this, so I'll encourage testing it and coming back to say how it went: I would install solaris or opensolaris just as you did the first time. That way, the bootloader, partition tables, etc, are all configured for you automatically. (Just restoring the filesystem is not enough.) Then I'd boot from the CD, and zfs send | zfs receive, from the external backup disk to the actual rpool. Thus replacing the entire filesystem. You should test this, because I am only like 90% certain it will work. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On 28/04/2010 21:39, David Dyer-Bennet wrote: The situations being mentioned are much worse than what seem reasonable tradeoffs to me. Maybe that's because my intuition is misleading me about what's available. But if the normal workload of a system uses 25% of its sustained IOPS, and a scrub is run at low priority, I'd like to think that during a scrub I'd see a little degradation in performance, and that the scrub would take 25% or so longer than it would on an idle system. There's presumably some inefficiency, so the two loads don't just add perfectly; so maybe another 5% lost to that? That's the big uncertainty. I have a hard time believing in 20% lost to that. Well, it's not that easy as there are many other factors you need to take into account. For example how many IOs are you allowing to be queued per device? This might affect a latency for your application. Or if you have a disk array with its own cache - just by doing scrub you might be pushing other entries in a cache out which might impact the performance of your application. Then there might be SAN and and so on. I'm not saying there is no room for improvement here. All I'm saying is that it is not as easy problem as it seems. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Compellant announces zNAS
On 29/04/2010 07:57, Phil Harman wrote: That screen shot looks very much like Nexenta 3.0 with a different branding. Elsewhere, The Register confirms it's OpenSolaris. Well it looks like it is running Nexenta which is based on Open Solaris. But it is not the Open Solaris *distribution*. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes: I got this hint from Richard Elling, but haven't had time to test it much. Perhaps someone else could help? roy Interesting. If you'd like to experiment, you can change the limit of the number of scrub I/Os queued to each vdev. The default is 10, but that is too close to the normal limit. You can see the current scrub limit via: # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:10 you can change it with: # echo zfs_scrub_limit/W0t2 | mdb -kw zfs_scrub_limit:0xa = 0x2 # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:2 In theory, this should help your scenario, but I do not believe this has been exhaustively tested in the lab. Hopefully, it will help. -- richard If I'm reading the code right, it's only used when creating a new vdev (import, zpool create, maybe at boot).. So I took an alternate route: http://pastebin.com/hcYtQcJH (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices * zfs_scrub_limit(10) = 70..) With these lower numbers, our pool is much more responsive over NFS.. scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go Might take a while though. We've taken periodic snapshots and have snapshots from 2008, which probably has fragmented the pool beyond sanity or something.. - Bruno Sousa bso...@epinfante.com skrev: Indeed the scrub seems to take too much resources from a live system. For instance i have a server with 24 disks (SATA 1TB) serving as NFS store to a linux machine holding user mailboxes. I have around 200 users, with maybe 30-40% of active users at the same time. As soon as the scrub process kicks in, linux box starts to give messages like nfs server not available and the users start to complain that the Outlook gives connection timeout. Again, as soon as the scrub process stops everything comes to normal. So for me, it's real issue the fact that the scrub takes so many resources of the system, making it pretty much unusable. In my case i did a workaround, where basically i have zfs send/receive from this server to another server and the scrub process is now running on the second server. I don't know if this such a good idea, given the fact that i don't know for sure if the scrub process in the secondary machine will be usefull in case of data corruption...but so far so good , and it's probably better than nothing. I still remember before ZFS , that any good RAID controller would have a background consistency check task, and such a task would be possible to assign priority , like low, medium, high ...going back to ZFS what's the possibility of getting this feature as well? Just out as curiosity , the Sun OpenStorage appliances , or Nexenta based ones, have any scrub task enabled by default ? I would like to get some feedback from users that run ZFS appliances regarding the impact of running a scrub on their appliances. Bruno On 28-4-2010 22:39, David Dyer-Bennet wrote: On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: On Wed, Apr 28 at 1:34, Tonmaus wrote: Zfs scrub needs to access all written data on all disks and is usually disk-seek or disk I/O bound so it is difficult to keep it from hogging the disk resources. A pool based on mirror devices will behave much more nicely while being scrubbed than one based on RAIDz2. Experience seconded entirely. I'd like to repeat that I think we need more efficient load balancing functions in order to keep housekeeping payload manageable. Detrimental side effects of scrub should not be a decision point for choosing certain hardware or redundancy concepts in my opinion. While there may be some possible optimizations, i'm sure everyone would love the random performance of mirror vdevs, combined with the redundancy of raidz3 and the space of a raidz1. However, as in all systems, there are tradeoffs. The situations being mentioned are much worse than what seem reasonable tradeoffs to me. Maybe that's because my intuition is misleading me about what's available. But if the normal workload of a system uses 25% of its sustained IOPS, and a scrub is run at low priority, I'd like to think that during a scrub I'd see a little degradation in performance, and that the scrub would take 25% or so longer than it would on an idle system. There's presumably some inefficiency, so the two loads don't just add perfectly; so maybe another 5% lost to that? That's the big uncertainty. I have a hard time believing in 20% lost to that. Do you think that's a reasonable outcome to hope for? Do you think ZFS is close to meeting it? People with systems that live at 75% all day are obviously going to have more problems than people who live at 25%! -- This message has been scanned for viruses and dangerous content by MailScanner , and is
Re: [zfs-discuss] Performance drop during scrub?
On 29 April, 2010 - Tomas Ögren sent me these 5,8K bytes: On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes: I got this hint from Richard Elling, but haven't had time to test it much. Perhaps someone else could help? roy Interesting. If you'd like to experiment, you can change the limit of the number of scrub I/Os queued to each vdev. The default is 10, but that is too close to the normal limit. You can see the current scrub limit via: # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:10 you can change it with: # echo zfs_scrub_limit/W0t2 | mdb -kw zfs_scrub_limit:0xa = 0x2 # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:2 In theory, this should help your scenario, but I do not believe this has been exhaustively tested in the lab. Hopefully, it will help. -- richard If I'm reading the code right, it's only used when creating a new vdev (import, zpool create, maybe at boot).. So I took an alternate route: http://pastebin.com/hcYtQcJH (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices * zfs_scrub_limit(10) = 70..) With these lower numbers, our pool is much more responsive over NFS.. But taking snapshots is quite bad.. A single recursive snapshot over ~800 filesystems took about 45 minutes, with NFS operations taking 5-10 seconds.. Snapshots usually take 10-30 seconds.. scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go This is chugging along.. The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8. Should have enough oompf, but when you combine snapshot with a scrub/resilver, sync performance gets abysmal.. Should probably try adding a ZIL when u9 comes, so we can remove it again if performance goes crap. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
Hi Euan, For full root pool recovery see the ZFS Administration Guide, here: http://docs.sun.com/app/docs/doc/819-5461/ghzvz?l=ena=view Recovering the ZFS Root Pool or Root Pool Snapshots Additional scenarios and details are provided in the ZFS troubleshooting wiki. The link is here but the site is not responding at the moment: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Check back here later today. Thanks, Cindy On 04/28/10 23:02, Euan Thoms wrote: I'm looking for a way to backup my entire system, the rpool zfs pool to an external HDD so that it can be recovered in full if the internal HDD fails. Previously with Solaris 10 using UFS I would use ufsdump and ufsrestore, which worked so well, I was very confident with it. Now ZFS doesn't have an exact replacement of this so I need to find a best practice to replace it. I'm guessing that I can format the external HDD as a pool called 'backup' and zfs send -R ... | zfs receive ... to it. What I'm not sure about is how to restore. Back in the days of UFS, I would boot of the Solaris 10 CD in single user mode command prompt, partition HDD with correct slices, format it, mount it and ufsrestore the entire filesystem. With zfs, I don't know what I'm doing. Can I just make a pool called rpool and zfs send/receive it back? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Compellant announces zNAS
I believe the name is Compellent Technologies, http://www.google.com/finance?q=NYSE:CML. Regards, Andrey On Wed, Apr 28, 2010 at 5:54 AM, Richard Elling richard.ell...@richardelling.com wrote: Today, Compellant announced their zNAS addition to their unified storage line. zNAS uses ZFS behind the scenes. http://www.compellent.com/Community/Blog/Posts/2010/4/Compellent-zNAS.aspx Congrats Compellant! -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Apr 29, 2010, at 5:52 AM, Tomas Ögren wrote: On 29 April, 2010 - Tomas Ögren sent me these 5,8K bytes: On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 10K bytes: I got this hint from Richard Elling, but haven't had time to test it much. Perhaps someone else could help? roy Interesting. If you'd like to experiment, you can change the limit of the number of scrub I/Os queued to each vdev. The default is 10, but that is too close to the normal limit. You can see the current scrub limit via: # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:10 you can change it with: # echo zfs_scrub_limit/W0t2 | mdb -kw zfs_scrub_limit:0xa = 0x2 # echo zfs_scrub_limit/D | mdb -k zfs_scrub_limit: zfs_scrub_limit:2 In theory, this should help your scenario, but I do not believe this has been exhaustively tested in the lab. Hopefully, it will help. -- richard If I'm reading the code right, it's only used when creating a new vdev (import, zpool create, maybe at boot).. So I took an alternate route: http://pastebin.com/hcYtQcJH (spa_scrub_maxinflight used to be 0x46 (70 decimal) due to 7 devices * zfs_scrub_limit(10) = 70..) With these lower numbers, our pool is much more responsive over NFS.. But taking snapshots is quite bad.. A single recursive snapshot over ~800 filesystems took about 45 minutes, with NFS operations taking 5-10 seconds.. Snapshots usually take 10-30 seconds.. scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go This is chugging along.. The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8. slow disks == poor performance Should have enough oompf, but when you combine snapshot with a scrub/resilver, sync performance gets abysmal.. Should probably try adding a ZIL when u9 comes, so we can remove it again if performance goes crap. A separate log will not help. Try faster disks. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On 29 April, 2010 - Richard Elling sent me these 2,5K bytes: With these lower numbers, our pool is much more responsive over NFS.. But taking snapshots is quite bad.. A single recursive snapshot over ~800 filesystems took about 45 minutes, with NFS operations taking 5-10 seconds.. Snapshots usually take 10-30 seconds.. scrub: scrub in progress for 0h40m, 0.10% done, 697h29m to go scrub: scrub in progress for 1h41m, 2.10% done, 78h35m to go This is chugging along.. The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8. slow disks == poor performance I know they're not fast, but they're not should take 10-30 seconds to create a directory. They do perfectly well in all combinations, except when a scrub comes along (or sometimes when a snapshot feels like taking 45 minutes instead of 4.5 seconds). iostat says the disks aren't 100% busy, the storage box itself doesn't seem to be busy, yet with zfs they go downhill in some conditions.. Should have enough oompf, but when you combine snapshot with a scrub/resilver, sync performance gets abysmal.. Should probably try adding a ZIL when u9 comes, so we can remove it again if performance goes crap. A separate log will not help. Try faster disks. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, NFS, and ACLs ssues
I setup the share and mounted on linux client, permissions did not carry over from zfs share. hecate:~ zfs create zp-ext/test/mfitzpat hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat NAME PROPERTY VALUE SOURCE zp-ext/test/mfitzpat sharenfs oninherited from zp-ext hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat updated auto.home on linux client(nona-man) test-rw,hard,intr hecate:/zp-ext/test nona-man:/# cd /fs/test nona-man:/fs/test# ls -l total 3 drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat Permissions did not carry over from zfs share. Willing test/try next step. Mary Ellen Cindy Swearingen wrote: Hi Mary Ellen, We were looking at this problem and are unsure what the problem is... To rule out NFS as the root cause, could you create and share a test ZFS file system without any ACLs to see if you can access the data from the Linux client? Let us know the result of your test. Thanks, Cindy On 04/28/10 12:54, Mary Ellen Fitzpatrick wrote: New to Solairs/ZFS and having a difficult time getting ZFS, NFS and ACLs all working together, properly. Trying access/use zfs shared filesystems on a linux client. When I access the dir/files on the linux client, my permissions do not carry over, nor do the newly created files, and I can not create new files/dirs. The permissions/owner on the zfs share are set so the owner (mfitzpat) is allowed to do everything, but permissions are not carrying over via NFS to the linux client.I have googled/read and can not get it right. I think this has something to do with NSF4, but I can not figure it out. Any help appreciated Mary Ellen Running Solaris10 5/09 (u7) on a SunFire x4540 (hecate) with ZFS and zfs shares automounted to Centos5 client (nona-man). Running NIS on nona-man(Centos5) and hecate (zfs) is a client. All works well. I have created the following zfs filesystems to share and have sharenfs=on hecate:/zp-ext/spartans/umass zfs get sharenfs zp-ext/spartans/umass sharenfs oninherited from zp-ext/spartans zp-ext/spartans/umass/mfitzpat sharenfs oninherited from zp-ext/spartans set up inheritance: hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough zp-ext/spartans/umass hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough zp-ext/spartans/umass/mfitzpat hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough zp-ext/spartans/umass hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough zp-ext/spartans/umass/mfitzpat Set owner:group: hecate:/zp-ext/spartans/umass chown mfitzpat:umass mfitzpat hecate:/zp-ext/spartans/umass ls -l total 5 drwxr-xr-x 2 mfitzpat umass 2 Apr 28 13:18 mfitzpat Permissions: hecate:/zp-ext/spartans/umass ls -dv mfitzpat drwxr-xr-x 2 mfitzpat umass 2 Apr 28 14:06 mfitzpat 0:owner@::deny 1:owner@:list_directory/read_data/add_file/write_data/add_subdirectory /append_data/write_xattr/execute/write_attributes/write_acl /write_owner:allow 2:group@:add_file/write_data/add_subdirectory/append_data:deny 3:group@:list_directory/read_data/execute:allow 4:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr /write_attributes/write_acl/write_owner:deny 5:everyone@:list_directory/read_data/read_xattr/execute/read_attributes /read_acl/synchronize:allow I can access, create/delete files/dirs on the zfs system and permissions hold. [mfitz...@hecate mfitzpat]$ touch foo [mfitz...@hecate mfitzpat]$ ls -l total 1 -rw-r--r-- 1 mfitzpat umass 0 Apr 28 14:18 foo When I try to access the dir/files on the linux client, my permissions do no carry over, nor do the newly created files, and I can not create new files/dirs. [mfitz...@nona-man umass]$ ls -l drwxr-xr-x+ 2 root root 2 Apr 28 13:18 mfitzpat [mfitz...@nona-man mfitzpat]$ pwd /fs/umass/mfitzpat [mfitz...@nona-man mfitzpat]$ ls [mfitz...@nona-man mfitzpat]$ -- Thanks Mary Ellen Mary Ellen FitzPatrick Systems Analyst Bioinformatics Boston University 24 Cummington St. Boston, MA 02215 office 617-358-2771 cell 617-797-7856 mfitz...@bu.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to clear invisible, partially received snapshots?
I currently use zfs send/recv for onsite backups [1], and am configuring it for replication to an offsite server as well. I did an initial full send, and then a series of incrementals to bring the offsite pool up to date. During one of these transfers, the offsite server hung, and I had to power-cycle it. It came back up just fine, except that the snapshot it was receiving when it hung appeared to be both present and nonexistent, depending on which command was run. 'zfs recv' complained that the target snapshot already existed, but it did not show up in the output of 'zfs list', and 'zfs destroy' said it did not exist. I ran a scrub, which did not find any errors; nor did it solve the problem. I discovered some useful commands with zdb [2], and found more info: zdb -d showed the snapshot, with an unusual name: Dataset backup/ims/%zfs-auto-snap_daily-2010-04-22-1900 [ZPL], ID 6325, cr_txg 28137403, 2.62T, 123234 objects As opposed to a normal snapshot: Dataset backup/i...@zfs-auto-snap_daily-2010-04-21-1900 [ZPL], ID 5132, cr_txg 27472350, 2.61T, 123200 objects I then attempted 'zfs destroy backup/ims/%zfs-auto-snap_daily-2010-04-22-1900', but it still said the dataset did not exist. Finally I exported the pool, and after importing it, the snapshot was gone, and I could receive the snapshot normally. Is there a way to clear a partial snapshot without an export/import cycle? Thanks, Andrew [1] http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034554.html [2] http://www.cuddletech.com/blog/pivot/entry.php?id=980 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, NFS, and ACLs ssues
Hi Mary Ellen, I'm not really qualified to help you troubleshoot this problem. Other community members on this list have wrestled with similar problems and I hope they will comment... Your Linux client doesn't seem to be suffering from the nobody problem because you see mfitzpat on nona-man so UID/GIDs are translated correctly. This issue has come up often enough that I will start tracking this in our troubleshooting wiki as soon as we get more feedback. Thanks, Cindy On 04/29/10 09:23, Mary Ellen Fitzpatrick wrote: I setup the share and mounted on linux client, permissions did not carry over from zfs share. hecate:~ zfs create zp-ext/test/mfitzpat hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat NAME PROPERTY VALUE SOURCE zp-ext/test/mfitzpat sharenfs oninherited from zp-ext hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat updated auto.home on linux client(nona-man) test-rw,hard,intr hecate:/zp-ext/test nona-man:/# cd /fs/test nona-man:/fs/test# ls -l total 3 drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat Permissions did not carry over from zfs share. Willing test/try next step. Mary Ellen Cindy Swearingen wrote: Hi Mary Ellen, We were looking at this problem and are unsure what the problem is... To rule out NFS as the root cause, could you create and share a test ZFS file system without any ACLs to see if you can access the data from the Linux client? Let us know the result of your test. Thanks, Cindy On 04/28/10 12:54, Mary Ellen Fitzpatrick wrote: New to Solairs/ZFS and having a difficult time getting ZFS, NFS and ACLs all working together, properly. Trying access/use zfs shared filesystems on a linux client. When I access the dir/files on the linux client, my permissions do not carry over, nor do the newly created files, and I can not create new files/dirs. The permissions/owner on the zfs share are set so the owner (mfitzpat) is allowed to do everything, but permissions are not carrying over via NFS to the linux client.I have googled/read and can not get it right. I think this has something to do with NSF4, but I can not figure it out. Any help appreciated Mary Ellen Running Solaris10 5/09 (u7) on a SunFire x4540 (hecate) with ZFS and zfs shares automounted to Centos5 client (nona-man). Running NIS on nona-man(Centos5) and hecate (zfs) is a client. All works well. I have created the following zfs filesystems to share and have sharenfs=on hecate:/zp-ext/spartans/umass zfs get sharenfs zp-ext/spartans/umass sharenfs oninherited from zp-ext/spartans zp-ext/spartans/umass/mfitzpat sharenfs oninherited from zp-ext/spartans set up inheritance: hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough zp-ext/spartans/umass hecate:/zp-ext/spartans/umass zfs set aclinherit=passthrough zp-ext/spartans/umass/mfitzpat hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough zp-ext/spartans/umass hecate:/zp-ext/spartans/umass zfs set aclmode=passthrough zp-ext/spartans/umass/mfitzpat Set owner:group: hecate:/zp-ext/spartans/umass chown mfitzpat:umass mfitzpat hecate:/zp-ext/spartans/umass ls -l total 5 drwxr-xr-x 2 mfitzpat umass 2 Apr 28 13:18 mfitzpat Permissions: hecate:/zp-ext/spartans/umass ls -dv mfitzpat drwxr-xr-x 2 mfitzpat umass 2 Apr 28 14:06 mfitzpat 0:owner@::deny 1:owner@:list_directory/read_data/add_file/write_data/add_subdirectory /append_data/write_xattr/execute/write_attributes/write_acl /write_owner:allow 2:group@:add_file/write_data/add_subdirectory/append_data:deny 3:group@:list_directory/read_data/execute:allow 4:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr /write_attributes/write_acl/write_owner:deny 5:everyone@:list_directory/read_data/read_xattr/execute/read_attributes /read_acl/synchronize:allow I can access, create/delete files/dirs on the zfs system and permissions hold. [mfitz...@hecate mfitzpat]$ touch foo [mfitz...@hecate mfitzpat]$ ls -l total 1 -rw-r--r-- 1 mfitzpat umass 0 Apr 28 14:18 foo When I try to access the dir/files on the linux client, my permissions do no carry over, nor do the newly created files, and I can not create new files/dirs. [mfitz...@nona-man umass]$ ls -l drwxr-xr-x+ 2 root root 2 Apr 28 13:18 mfitzpat [mfitz...@nona-man mfitzpat]$ pwd /fs/umass/mfitzpat [mfitz...@nona-man mfitzpat]$ ls [mfitz...@nona-man mfitzpat]$ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Question about du and compression
Hi all Is there a good way to do a du that tells me how much data is there in case I want to move it to, say, an USB drive? Most filesystems don't have compression, but we're using it on (most of) our zfs filesystems, and it can be troublesome for someone that wants to copy a set of data to somewhere to find it's twice as big as reported by du. Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about du and compression
On 29 April, 2010 - Roy Sigurd Karlsbakk sent me these 1,2K bytes: Hi all Is there a good way to do a du that tells me how much data is there in case I want to move it to, say, an USB drive? Most filesystems don't have compression, but we're using it on (most of) our zfs filesystems, and it can be troublesome for someone that wants to copy a set of data to somewhere to find it's twice as big as reported by du. GNU du has --apparent-size which reports the file size instead of how much disk space it uses.. compression and sparse files will make this differ, and you can't really tell them apart. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
The server is a Fujitsu RX300 with a Quad Xeon 1.6GHz, 6G ram, 8x400G SATA through a U320SCSI-SATA box - Infortrend A08U-G1410, Sol10u8. slow disks == poor performance Should have enough oompf, but when you combine snapshot with a scrub/resilver, sync performance gets abysmal.. Should probably try adding a ZIL when u9 comes, so we can remove it again if performance goes crap. A separate log will not help. Try faster disks. We're seeing the same thing in Sol10u8 with both 300gb 15k rpm SAS disks in-board on a Sun x4250 and an external chassis with 1tb 7200 rpm SATA disks connected via SAS. Faster disks aren't the problem; there's a fundamental issue with ZFS [iscsi;nfs;cifs] share performance under scrub resilver. -K --- Karl Katzke Systems Analyst II TAMU DRGS ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs inherit vs. received properties
I'm seeing some weird behavior on b133 with 'zfs inherit' that seems to conflict with what the docs say. According to the man page it clears the specified property, causing it to be inherited from an ancestor but that's not the behavior I'm seeing. For example: basestar:~$ zfs get compress tank/export/vmware NAME PROPERTY VALUE SOURCE tank/export/vmware compression gzip local basestar:~$ zfs get compress tank/export/vmware/delusional NAME PROPERTY VALUE SOURCE tank/export/vmware/delusional compression on received bh...@basestar:~$ pfexec zfs inherit compress tank/export/vmware/delusional basestar:~$ zfs get compress tank/export/vmware/delusional NAME PROPERTY VALUE SOURCE tank/export/vmware/delusional compression on received Is this a bug in inherit, or is the documentation off? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrate ZFS volume to new pool
On Wed, 28 Apr 2010, Jim Horng wrote: Why would you recommend a spare for raidz2 or raidz3? Spare is to minimize the reconstruction time. Because remember a vdev can not start resilvering until there is a spare disk available. And with disks as big as they are today, resilvering also take many hours. I rather have the disk finished resilvering before I have the chance to replace the bad disk than to risk more disks fail before It had a chance to resilverize. Would your opinion change if the disks you used took 7 days to resilver? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Thu, 29 Apr 2010, Roy Sigurd Karlsbakk wrote: While there may be some possible optimizations, i'm sure everyone would love the random performance of mirror vdevs, combined with the redundancy of raidz3 and the space of a raidz1. However, as in all systems, there are tradeoffs. In my opinion periodic scrubs are most useful for pools based on mirrors, or raidz1, and much less useful for pools based on raidz2 or raidz3. It is useful to run a scrub at least once on a well-populated new pool in order to validate the hardware and OS, but otherwise, the scrub is most useful for discovering bit-rot in singly-redundant pools. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On 04/30/10 10:35 AM, Bob Friesenhahn wrote: On Thu, 29 Apr 2010, Roy Sigurd Karlsbakk wrote: While there may be some possible optimizations, i'm sure everyone would love the random performance of mirror vdevs, combined with the redundancy of raidz3 and the space of a raidz1. However, as in all systems, there are tradeoffs. In my opinion periodic scrubs are most useful for pools based on mirrors, or raidz1, and much less useful for pools based on raidz2 or raidz3. It is useful to run a scrub at least once on a well-populated new pool in order to validate the hardware and OS, but otherwise, the scrub is most useful for discovering bit-rot in singly-redundant pools. I agree. I look after an x4500 with a poll of raidz2 vdevs that I can't run scrubs on due the the dire impact on performance. That's one reason I'd never use raidz1 in a real system. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrate ZFS volume to new pool
Would your opinion change if the disks you used took 7 days to resilver? Bob That will only make a stronger case that hot spare is absolutely needed. This will also make a strong case for choosing raidz3 over raidz2 as well as vdev smaller number of disks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
I finally got it, I think. Somebody (with deep and intimate knowledge of ZFS development) please tell me if I've been hitting the crack pipe too hard. But . Part 1 of this email: Netapp snapshot security flaw. Inherent in their implementation of .snapshot directories. Part 2 of this email: How ZFS could do this, much better. (#1) Netapp snapshot security flaw. Inherent in their implementation of .snapshot directories. (as root) # mkdir -p a/b/c # echo secret info a/b/c/info.txt # chmod 777 a # chmod 700 a/b # chmod 777 a/b/c # chmod 666 a/b/c/info.txt # rsh netappfiler snap create vol0 test creating snapshot... # echo public info a/b/c/info.txt # mv a/b/c a/c (as a normal user) $ cat a/c/info.txt public info $ cat a/c/.snapshot/test/info.txt secret info D'oh!!! By changing permissions in the present filesystem, the normal user has been granted access to restricted information in the past. (#2) How ZFS could do this, much better. First let it be said, ZFS doesn't have this security flaw. (Kudos.) But also let it be said, the user experience of having the .snapshot always conveniently locally available, is a very positive thing. Even if you rename and move some directory all over the place like crazy, with zillions of snapshots being taken in all those locations, when you look in that directory's .snapshot, you still have access to *all* the previous snapshots of that directory, regardless of what that directory was formerly named, or where in the directory tree it was linked. In short, the user experience of .snapshot is more user friendly. But the .zfs style snapshot requires less development complexity and therefore immune to this sort of flaw. So here's the idea, in which ZFS could provide the best of both worlds: Each inode contain a link count. In most cases, each inode has a link count of 1, but of course that can't be assumed. It seems trivially simple to me, that along with the link count in each inode, the filesystem could also store a list of which inodes link to it. If link count is 2, then there's a list of 2 inodes, which are the parents of this inode. In which case, it would be trivially easy to walk back up the whole tree, almost instantly identifying every combination of paths that could possibly lead to this inode, while simultaneously correctly handling security concerns about bypassing security of parent directories and everything. Once the absolute path is generated, if the user doesn't have access to that path, then the user simply doesn't get that particular result returned to them. It seems too perfect and too simple. Instead of a one-directional directed graph, simply make a bidirectional. There's no significant additional overhead as far as I can tell. It seems like it would even be easy. By doing this, it will be very easy for zhist (or anything else) to instantly produce all the names of all the snapshot versions of any file or directory, even if that filename has been changing over time . even if that file is hardlinked in more than one directory path . Then ZFS has a technique, different from .snapshot directories, which perform more simply, more reliably, more securely than the netapp implementation. This technique works equally well for files or directories (unlike the netapp method.) And there is no danger of legal infringement upon any netapp invention. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, NFS, and ACLs ssues
On Thu, 29 Apr 2010, Mary Ellen Fitzpatrick wrote: hecate:/zp-ext/test zfs get sharenfs zp-ext/test/mfitzpat [...] hecate:/zp-ext/test chown -R mfitzpat:umass mfitzpat [...] test-rw,hard,intr hecate:/zp-ext/test [...] drwxr-xr-x+ 2 root root 2 Apr 29 11:15 mfitzpat Unless I'm missing something, you chown'd the filesystem zp-ext/test/mfitzpat but you mounted the filesystem zp-ext/test; hence you're seeing the mount point for the mfitzpat filesystem in the zp-ext/test filesystem over NFS, not the actual zp-ext/test/mfitzpat filesystem. Pending the availability of mirror mounts (http://hub.opensolaris.org/bin/download/Project+nfs-namespace/files/mm-PRS-open.html) you need to mount each ZFS filesystem you're exporting via NFS separately. -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey Each inode contain a link count. It seems trivially simple to me, that along with the link count in each inode, the filesystem could also store a list of which inodes link to it. Others may have better ideas for implementation. But at least for a starting point, here's how I imagine this: The goal is to be always able to instantly locate all the previous snapshot versions of any file or directory, regardless of whether or not that filename, directory name, or any path leading up to that file or directory may have ever changed. An additional goal is to obey security. Dont give the user any information that they couldn't have found by other (slower) means. In this described scenario, these goals has been achieved. Currently, there's a .zfs directory, which is not a real directory. By default, it's hidden until you explicitly try to access it by name. Inside the .zfs directory, there's presently a snapshot directory, and nothing else. Let's suppose my system has several snapshots. snap1, snap2, snap3, ... Then these appear as /tank/.zfs/snapshot/{snap1,snap2,snap3,...} And inside there, are all the subdirectories which lead to all the files. Let there be also, an inodes directory next to the snapshot directory. /tank/.zfs/snapshot /tank/.zfs/inodes Whenever a snap is created, let it be listed under both snapshot and inodes /tank/.zfs/snapshot/{snap1,snap2,snap3,...} /tank/.zfs/inodes/{snap1,snap2,snap3,...} If you simply ls /tank/.zfs/inodes/snap1 then you see nothing. The system will not generate a list of every single inode in the whole filesystem; that would be crazy. But, just as the .zfs directory was hidden and appears upon attempted access, let there be text files, whose names are inode numbers, and these text files only appear upon attempted access. ls /tank/.zfs/inodes/snap1 (no result) cat /tank/.zfs/inodes/snap1/12345 (gives the following results) /tank/.zfs/snapshot/snap1/foo/bar/baz (which is the abs path to the file having inode 12345) And so, a mechanism has been created, so a user can do this: ls -i /tank/exports/home/jbond/somefile.txt 12345 cat /tank/.zfs/inodes/snap1/12345 (result is: exports/home/jbond/Some-File.TXT) Thus, we have identified the former name of somefile.txt and ... cat /tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT Note: the above ls -i ; cat process is slightly tedious. I don't expect many users to do this directly. But I would happily automate and simplify this process By coding zhist to utilize this technique automatically. User could: zhist ls somefile.txt Result would be: /tank/.zfs/snapshot/snap1/exports/home/jbond/Some-File.TXT And of course, once the command-line verson of zhist Can do that, there's no obstacle preventing the GUI frontend. One important note: Since you're doing a reverse mapping, from inode number to path name, it's important to obey filesystem security. Fortunately, the process of generating absolute path names from an inode number is handled by kernel, and only after the complete absolute pathname has been generated, is anything returned to the user. Which means the kernel has the opportunity to test whether or not the user would have access to ls the specified inode by pathname, before returning that pathname to the user. In other words, if the user couldn't get that pathname via find /tank/.zfs/snapshot/snap1 -inum 12345 then the user could not get that pathname via .zfs/inodes either. The only difference is that the find command could run for a very long time, yet the .zfs/inodes directory returns that same result nearly instantly. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Cindy Swearingen For full root pool recovery see the ZFS Administration Guide, here: http://docs.sun.com/app/docs/doc/819-5461/ghzvz?l=ena=view Recovering the ZFS Root Pool or Root Pool Snapshots Unless I misunderstand, I think the intent of the OP question is how to do bare metal recovery after some catastrophic failure. In this situation, recovery is much more complex than what the ZFS Admin Guide says above. You would need to boot from CD, and partition and format the disk, then create a pool, and create a filesystem, and zfs send | zfs receive into that filesystem, and finally install the boot blocks. Only some of these steps are described in the ZFS Admin Guide, because simply expanding the rpool is a fundamentally easier thing to do. Even though I think I could do that ... I don't have a lot of confidence in it, and I can certainly imagine some pesky little detail being a problem. This is why I suggested the technique of: Reinstall the OS just like you did when you first built your machine, before the catastrophy. It doesn't even matter if you make the same selections you made before (IP address, package selection, authentication method, etc) as long as you're choosing to partition and install the bootloader like you did before. This way, you're sure the partitions, format, pool, filesystem, and bootloader are all configured properly. Then boot from CD again, and zfs send | zfs receive to overwrite your existing rpool. And as far as I know, that will take care of everything. But I only feel like 90% confident that would work. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Panic when deleting a large dedup snapshot
I tried destroying a large (710GB) snapshot from a dataset that had been written with dedup on. The host locked up almost immediately, but there wasn't a stack trace on the console and the host required a power cycle, but seemed to reboot normally. Once up, the snapshot was still there. I was able to get a dump from this. The data was written with b129, and the system is currently at b134. I tried destroying it again, and the host started behaving badly. 'less' would hang, and there were several zfs-auto-snapshot processes that were over an hour old, and the 'zfs snapshot' processes were stuck on the first dataset of the pool. Eventually the host became unusable and I rebooted again. The host seems to be fine now, and is currently running a scrub. Any ideas on how to avoid this in the future? I'm no longer using dedup due to performance issues with it, which implies that the DDT is very large. bh...@basestar:~$ pfexec zdb -DD tank DDT-sha256-zap-duplicate: 5339247 entries, size 348 on disk, 162 in core DDT-sha256-zap-unique: 1479972 entries, size 1859 on disk, 1070 in core -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot versus Netapp - Security and convenience
On 2010-Apr-30 10:24:14 +0800, Edward Ned Harvey solar...@nedharvey.com wrote: Each inode contain a link count. In most cases, each inode has a link count of 1, but of course that can't be assumed. It seems trivially simple to me, that along with the link count in each inode, the filesystem could also store a list of which inodes link to it. If link count is 2, then there's a list of 2 inodes, which are the parents of this inode. I'm not sure exactly what you are trying to say here but it don't think it will work. In a Unix FS (UFS or ZFS), a directory entry contains a filename and a pointer to an inode. The inode itself contains a count of the number of directory entries that point to it and pointers to the actual data. There is currently no provision for a reverse link back to the directory. I gather you are suggesting that the inode be extended to contain a list of the inode numbers of all directories that contain a filename referring to that inode. Whilst I agree that this would simplify inode to filename mapping and provide an alternate mechanism for checking file permissions, I think you are glossing over the issue of how/where to store these links. Whilst files can have a link count of 1 (I'm not sure if this is true in most cases), they can have up to 32767 links. Where is this list of (up to) 32767 parent inodes going to be stored? In which case, it would be trivially easy to walk back up the whole tree, almost instantly identifying every combination of paths that could possibly lead to this inode, while simultaneously correctly handling security concerns about bypassing security of parent directories and everything. Whilst it's trivially easy to get from the file to the list of directories containing that file, actually getting from one directory to its parent is less so: A directory containing N sub-directories has N+2 links. Whilst the '.' link is easy to identify (it points to its own inode), distinguishing between the name of this directory in its parent and the '..' entries in its subdirectories is rather messy (requiring directory scans) unless you mandate that the reference to the parent directory is in a fixed location (ie 1st or 2nd entry in the parent inode list). It seems too perfect and too simple. Instead of a one-directional directed graph, simply make a bidirectional. There's no significant additional overhead as far as I can tell. It seems like it would even be easy. Well, you need to find somewhere to store up to 32K inode numbers, whilst having minimal space overhead for small numbers of links. Then you will need to patch the vnode operations underlying creat(), link(), unlink(), rename(), mkdir() and rmdir() to manage the backlinks (taking into account transactional consistency). -- Peter Jeremy pgpLmGCkPtpSv.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss