Re: [zfs-discuss] Large scale performance query
Phil, Recently, we have built a large configuration on 4 way Xeon sever with 8 4U 24 Bay JBOD. We are using 2x LSI 6160 SAS switch so we can easy to expand the Storage in the future. 1) If you are planning to expand your storage, you should consider using LSI SAS switch for easy future expansion. 2) We carefully pick one HD from each JBOD to create RAIDZ2. So we can loss two JBOD at the same time while data is still accessible . It is good to know you have the same idea 3) Seq. read/write is currently limited by 10G NIC. Local storage can easily hit 1500MB/s + with even small number of HD. Again 10G is bottom-neck 4) I recommend you use native SAS HD in large scale system if possible. Native SAS HD work better 5) We are using DSM to locate fail disk and monitor FRU of JBOD http://dataonstorage.com/dsm. I hope the above points can help The configuration is similar to the configuration 3 in the following link http://dataonstorage.com/dataon-solutions/lsi-6gb-sas-switch-sas6160-storage .html Technical Specs: DNS-4800 4way Intel Xeon 7550 server with 256G RAM 2x LSI 9200-8E HBA 2x LSI 6160 SAS Switch 8x DNS-1600 4U 24bay JBOD(dual IO in MPxIO) with 2TB Seagate SAS HD RAIDZ2 STEC Zeus RAM for ZIL Intel 320 SSD for L2ARC 10G NIC Rocky From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Phil Harrison Sent: Sunday, July 24, 2011 11:34 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Large scale performance query Hi All, Hoping to gain some insight from some people who have done large scale systems before? I'm hoping to get some performance estimates, suggestions and/or general discussion/feedback. I cannot discuss the exact specifics of the purpose but will go into as much detail as I can. Technical Specs: 216x 3TB 7k3000 HDDs 24x 9 drive RAIDZ3 4x JBOD Chassis (45 bay) 1x server (36 bay) 2x AMD 12 Core CPU 128GB EEC RAM 2x 480GB SSD Cache 10Gbit NIC Workloads: Mainly streaming compressed data. That is, pulling compressed data in a sequential manner however could have multiple streams happening at once making it somewhat random. We are hoping to have 5 clients pull 500Mbit sustained. Considerations: The main reason RAIDZ3 was chosen was so we can distribute the parity across the JBOD enclosures. With this method even if an entire JBOD enclosure is taken offline the data is still accessible. Questions: How to manage the physical locations of such a vast number of drives? I have read this (http://blogs.oracle.com/eschrock/entry/external_storage_enclosures_in_solar is) and am hoping some can shed some light if the SES2 enclosure identification has worked for them? (enclosures are SES2) What kind of performance would you expect from this setup? I know we can multiple the base IOPS by 24 but what about max sequential read/write? Thanks, Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
Bullshit. I just got a OCZ Vertex 3, and the first fill was 450-500MB/s. Second and sequent fills are at half that speed. I'm quite confident that it's due to the flash erase cycle that's needed, and if stuff can be TRIM:ed (and thus flash erased as well), speed would be regained. Overwriting an previously used block requires a flash erase, and if that can be done in the background when the timing is not critical instead of just before you can actually write the block you want, performance will increase. I think TRIM is needed both for flash (for speed) and for thin provisioning; ZFS will dirty all of the volume even though only a small part of the volume is used at any particular time. That makes ZFS more or less unusable with thin provisioning; support for TRIM would fix that if the underlying volume management supports TRIM. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send/receive and ashift
Does anyone know if it's OK to do zfs send/receive between zpools with different ashift values? -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
On Tue, Jul 26, 2011 at 3:28 PM, casper@oracle.com wrote: Bullshit. I just got a OCZ Vertex 3, and the first fill was 450-500MB/s. Second and sequent fills are at half that speed. I'm quite confident that it's due to the flash erase cycle that's needed, and if stuff can be TRIM:ed (and thus flash erased as well), speed would be regained. Overwriting an previously used block requires a flash erase, and if that can be done in the background when the timing is not critical instead of just before you can actually write the block you want, performance will increase. I think TRIM is needed both for flash (for speed) and for thin provisioning; ZFS will dirty all of the volume even though only a small part of the volume is used at any particular time. That makes ZFS more or less unusable with thin provisioning; support for TRIM would fix that if the underlying volume management supports TRIM. Casper Shouldn't modern SSD controllers be smart enough already that they know: - if there's a request to overwrite a sector, then the old data on that sector is no longer needed - allocate a clean sector from pool of available sectors (part of wear-leveling mechanism) - clear the old sector, and add it to the pool (possibly done in background operation) It seems to be the case with sandforce-based SSDs. That would pretty much let the SSD work just fine even without TRIM (like when used under HW raid). -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
Shouldn't modern SSD controllers be smart enough already that they know: - if there's a request to overwrite a sector, then the old data on that sector is no longer needed - allocate a clean sector from pool of available sectors (part of wear-leveling mechanism) - clear the old sector, and add it to the pool (possibly done in background operation) It seems to be the case with sandforce-based SSDs. That would pretty much let the SSD work just fine even without TRIM (like when used under HW raid). That is possibly not sufficient. If ZFS writes bytes to every sector, even though the pool is not full, the controller cannot know where to reclaim the data. If it uses spare sectors then it can map them to the to the new data and add the overwritten sectors to the free pool. With TRIM, it gets more blocks to reuse and it gives more time to erase them, making the SSD faster. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 10:14, Andrew Gabriel wrote: Does anyone know if it's OK to do zfs send/receive between zpools with different ashift values? The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. The ashift is a vdev layer concept - ie below the DMU layer. There is nothing in the send stream format that knows what an ashift actually is. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. Even the data compressed/encrypted by ZFS will be decrypted? If it is true, will it be any CPU overhead? And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 11:28, Fred Liu wrote: The ZFS Send stream is at the DMU layer at this layer the data is uncompress and decrypted - ie exactly how the application wants it. Even the data compressed/encrypted by ZFS will be decrypted? Yes, which is exactly what I said. All data as seen by the DMU is decrypted and decompressed, the DMU layer is what the ZPL layer is built ontop of so it has to be that way. If it is true, will it be any CPU overhead? There is always some overhead for doing a decryption and decompression, the question is really can you detect it and if you can does it mater. If you are running Solaris on processors with built in support for AES (eg SPARC T2, T3 or Intel with AES-NI) the overhead is reduced significantly in many cases. For many people getting the stuff from disk takes more time than doing the transform to get back your plaintext. In some of the testing I did I found that gzip decompression can be more significant to a workload than doing the AES decryption. So basically yes of course but does it actually mater ? And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? That isn't the only way. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
Yes, which is exactly what I said. All data as seen by the DMU is decrypted and decompressed, the DMU layer is what the ZPL layer is built ontop of so it has to be that way. Understand. Thank you. ;-) There is always some overhead for doing a decryption and decompression, the question is really can you detect it and if you can does it mater. If you are running Solaris on processors with built in support for AES (eg SPARC T2, T3 or Intel with AES-NI) the overhead is reduced significantly in many cases. For many people getting the stuff from disk takes more time than doing the transform to get back your plaintext. In some of the testing I did I found that gzip decompression can be more significant to a workload than doing the AES decryption. So basically yes of course but does it actually mater ? It is up to how big the delta is. It does matter if the data backup can not be finished within the required backup window when people use zfs send/receive to do the mass data backup. BTW adding a sort of off-topic question -- will NDMP protocol in Solaris will do decompression and decryption? Thanks. And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? That isn't the only way. -- Any alternatives, if you don't mind? ;-) Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
Op 26-07-11 12:56, Fred Liu schreef: Any alternatives, if you don't mind? ;-) vpn's, openssl piped over netcat, a password-protected zip file,... ;) ssh would be the most practical, probably. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
On 07/26/11 11:56, Fred Liu wrote: It is up to how big the delta is. It does matter if the data backup can not be finished within the required backup window when people use zfs send/receive to do the mass data backup. The only way you will know of decrypting and decompressing causes a problem in that case is if you try it on your systems. I seriously doubt it will be unless the system is already heavily CPU bound and your backup window is already very tight. BTW adding a sort of off-topic question -- will NDMP protocol in Solaris will do decompression and decryption? Thanks. My understanding of the NDMP protocol is that it would be a translator that did that it isn't part of the core protocol. The way I would do it is to use a T1C tape drive and have it do the compression and encryption of the data. http://www.oracle.com/us/products/servers-storage/storage/tape-storage/t1c-tape-drive-292151.html The alternative is to have the node in your NDMP network that does the writing to the tape to do the compression and encryption of the data stream before putting it on the tape. And ZFS send/receive tunneled by ssh becomes the only way to encrypt the data transmission? That isn't the only way. -- Any alternatives, if you don't mind? ;-) For starters SSL/TLS (which is what the Oracle ZFSSA provides for replication) or IPsec are possibilities as well, depends what the risk is you are trying to protect against and what transport layer is. But basically it is not provided by ZFS itself it is up to the person building the system to secure the transport layer used for ZFS send. It could also be write directly to a T10k encrypting tape drive. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha Shouldn't modern SSD controllers be smart enough already that they know: - if there's a request to overwrite a sector, then the old data on that sector is no longer needed In the present state of the world, somebody I know in the trade describes SSD's as the pimple on the butt of the elephant when it comes to flash manufacturing. In other words, mobile devices account for a huge majority (something like 90%) of flash produced in the world, and SSD's are something like 4%, and for some reason (I don't know why) there's a benefit to optimizing on 8k pages. Which means no. If you overwrite a sector of a SSD, that does not mean you can erase the page. Because you can only erase the whole page, and the disk can only interact with the OS using 4k blocks or smaller. So there's a minimum of 2 logical blocks per page in the SSD. When you trim a block, only half of the page gets marked as free. Eventually the controller needs to read half a block from page A, half a block from page B, write them both to blank page C, and then erase pages A and B. - allocate a clean sector from pool of available sectors (part of wear-leveling mechanism) - clear the old sector, and add it to the pool (possibly done in background operation) The complexity here is much larger... In all the storage pages in the SSD, some are marked used, some are marked unused, some are erased, and some are not erased. You can only write to a sector if it's both unused and erased. Each sector takes half a page. You can write to an individual sector, but you cannot erase an individual sector. At the OS interface, only sectors are logically addressed, but internally the controller must map those to physical halves of pages. So the controller maintains a completely arbitrary lookup table so any sector can map to any sector or page. When the OS requests to overwrite some sector, the controller will actually write to some formerly unused sector and remap and mark the old one as unused. Later in the background, if the other half of the page is also unused, the page will be erased. Does that clarify anything? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Adding mirrors to an existing zfs-pool
G'Day, - zfs pool with 4 disks (from Clariion A) - must migrate to Clariion B (so I created 4 disks with the same size, avaiable for the zfs) The zfs pool has no mirrors, my idea was to add the new 4 disks from the Clariion B to the 4 disks which are still in the pool - and later remove the original 4 disks. I only found in all example how to create a new pool with mirrors but no example how to add to a pool without mirrors a mirror disk for each disk in the pool. - is it possible to add disks to each disk in the pool (they have different sizes, so I have exact add the correct disks form Clariion B to the original disk from Clariion B) - can I later remove the disks from the Clariion A, pool is intact, user can work with the pool ?? Sorry for the beginner questions Tnx for help -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding mirrors to an existing zfs-pool
Bernd W. Hennig wrote: G'Day, - zfs pool with 4 disks (from Clariion A) - must migrate to Clariion B (so I created 4 disks with the same size, avaiable for the zfs) The zfs pool has no mirrors, my idea was to add the new 4 disks from the Clariion B to the 4 disks which are still in the pool - and later remove the original 4 disks. I only found in all example how to create a new pool with mirrors but no example how to add to a pool without mirrors a mirror disk for each disk in the pool. - is it possible to add disks to each disk in the pool (they have different sizes, so I have exact add the correct disks form Clariion B to the original disk from Clariion B) - can I later remove the disks from the Clariion A, pool is intact, user can work with the pool Depends on a few things... What OS are you running, and what release/update or build? What's the RAID layout of your pool zpool status? -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
On Mon, July 25, 2011 10:03, Orvar Korvar wrote: There is at least a common perception (misperception?) that devices cannot process TRIM requests while they are 100% busy processing other tasks. Just to confirm; SSD disks can do TRIM while processing other tasks? Processing the request just means flagging the blocks, though, right? And the actual benefits only acrue if the garbage collection / block reshuffling background tasks get a chance to run? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Entire client hangs every few seconds
Hi all- We've been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don't even know where to start... any advices? Ian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding mirrors to an existing zfs-pool]
Subject: Re: [zfs-discuss] Adding mirrors to an existing zfs-pool Date: Tue, 26 Jul 2011 08:54:38 -0600 From: Cindy Swearingen cindy.swearin...@oracle.com To: Bernd W. Hennig consult...@hennig-consulting.com References: 342994905.11311662049567.JavaMail.Twebapp@sf-app1 Hi Bernd, If you are talking about attaching 4 new disks to a non redundant pool with 4 disks, and then you want to detach the previous disks then yes, this is possible and a good way to migrate to new disks. The new disks must be the equivalent size or larger than the original disks. See the hypothetical example below. If you mean something else, then please provide your zpool status output. Thanks, Cindy # zpool status tank pool: tank state: ONLINE scan: resilvered 1018K in 0h0m with 0 errors on Fri Jul 22 15:54:52 2011 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 # zpool attach tank c4t1d0 c6t1d0 # zpool attach tank c4t2d0 c6t2d0 # zpool attach tank c4t3d0 c6t3d0 # zpool attach tank c4t4d0 c6t4d0 The above syntax will create 4 mirrored pairs of disks. Attach each new disk, wait for it to resilver, attach the next disk, resilver, and so on. I would scrub the pool after resilvering is complete, and check fmdump to ensure all new devices are operational. When all the disks are replaced and the pool is operational, detach the original disks. # zpool detach tank c4t1d0 # zpool detach tank c4t2d0 # zpool detach tank c4t3d0 # zpool detach tank c4t4d0 On 07/26/11 00:33, Bernd W. Hennig wrote: G'Day, - zfs pool with 4 disks (from Clariion A) - must migrate to Clariion B (so I created 4 disks with the same size, avaiable for the zfs) The zfs pool has no mirrors, my idea was to add the new 4 disks from the Clariion B to the 4 disks which are still in the pool - and later remove the original 4 disks. I only found in all example how to create a new pool with mirrors but no example how to add to a pool without mirrors a mirror disk for each disk in the pool. - is it possible to add disks to each disk in the pool (they have different sizes, so I have exact add the correct disks form Clariion B to the original disk from Clariion B) - can I later remove the disks from the Clariion A, pool is intact, user can work with the pool ?? Sorry for the beginner questions Tnx for help ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
To add to that... iostat on the client boxes show the connection to always be around 98% util and tops at 100% whenever it hangs. The same clients are connected to another ZFS server with much lower specs and a smaller number of slower disks, it performs much better and rarely get past 5% util. They share the same network. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
Ian, Did you enable DeDup? Rocky -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ian D Sent: Tuesday, July 26, 2011 7:52 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Entire client hangs every few seconds Hi all- We've been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don't even know where to start... any advices? Ian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding mirrors to an existing zfs-pool
Hi It is better just ceate new ool in array 8 Then use cpio ro copy the data On 7/26/11, Bernd W. Hennig consult...@hennig-consulting.com wrote: G'Day, - zfs pool with 4 disks (from Clariion A) - must migrate to Clariion B (so I created 4 disks with the same size, avaiable for the zfs) The zfs pool has no mirrors, my idea was to add the new 4 disks from the Clariion B to the 4 disks which are still in the pool - and later remove the original 4 disks. I only found in all example how to create a new pool with mirrors but no example how to add to a pool without mirrors a mirror disk for each disk in the pool. - is it possible to add disks to each disk in the pool (they have different sizes, so I have exact add the correct disks form Clariion B to the original disk from Clariion B) - can I later remove the disks from the Clariion A, pool is intact, user can work with the pool ?? Sorry for the beginner questions Tnx for help -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sent from my mobile device Hung-Sheng Tsao, Ph.D. laot...@gmail.com laot...@gmail.com http://laotsao.wordpress.com cell:9734950840 gvoice:8623970640 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
No dedup. The hiccups started around 2am on Sunday while (obviously) nobody was interacting with neither the clients or the server. It's been running for months (as is) without any problem. My guess is that it's a defective hard drive that instead of totally failing, just stutters. Or maybe it's the cache. We disabled the SLOG with no effect, but we haven't tried with the L2ARC. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] recover zpool with a new installation
Hi all, I lost my storage because rpool don't boot. I try to recover, but opensolaris says to destroy and re-create. My rpool installed on flash drive, and my pool (with my info) it's on another disks. My question is: It's possible I reinstall opensolaris in new flash drive, without stirring on my pool of disks, and recover this pool? Thanks. Regards, -- Roberto Scudeller ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
On Tue, Jul 26, 2011 at 7:51 AM, David Dyer-Bennet d...@dd-b.net wrote: Processing the request just means flagging the blocks, though, right? And the actual benefits only acrue if the garbage collection / block reshuffling background tasks get a chance to run? I think that's right. TRIM just gives hints to the garbage collector that sectors are no longer in use. When the GC runs, it can find more flash blocks more easily that aren't used or combine several mostly-empty blocks and erase or otherwise free them for reuse later. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS resilvering loop from hell
I'm on S11E 150.0.1.9 and I replaced one of the drives and the pool seems to be stuck in a resilvering loop. I performed a 'zpool clear' and 'zpool scrub' and just complains that the drives I didn't replace are degraded because of too many errors. Oddly the replaced drive is reported as being fine. The CKSUM counts get up to about 108 or so when the resilver is completed. I'm now trying to evacuate the pool onto another pool, however the zfs send/receive is dying after 380GB into sending the first dataset. Here is some output. Any help or insights will be helpful. Thanks cfs pool: dpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jul 26 15:03:32 2011 63.4G scanned out of 5.02T at 6.81M/s, 212h12m to go 15.1G resilvered, 1.23% done config: NAMESTATE READ WRITE CKSUM dpool DEGRADED 0 0 6 raidz1-0 DEGRADED 0 012 c9t0d0 DEGRADED 0 0 0 too many errors c9t1d0 DEGRADED 0 0 0 too many errors c9t3d0 DEGRADED 0 0 0 too many errors c9t2d0 ONLINE 0 0 0 (resilvering) errors: Permanent errors have been detected in the following files: metadata:0x0 [redacted list of 20 files, mostly in the same directory] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
This is actually a recently known problem, and a fix for it is in the 3.1 version, which should be available any minute now, if it isn't already available. The problem has to do with some allocations which are sleeping, and jobs in the ZFS subsystem get backed behind some other work. If you have adequate system memory, you are less likely to see this problem, I think. - Garrett On Tue, 2011-07-26 at 08:29 -0700, Rocky Shek wrote: Ian, Did you enable DeDup? Rocky -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ian D Sent: Tuesday, July 26, 2011 7:52 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Entire client hangs every few seconds Hi all- We've been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don't even know where to start... any advices? Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
Hi Garrett- It is something that could happen at any time on a system that has been working fine for a while? That system has 256G of RAM, I think adequate is not a concern here :) We'll try 3.1 as soon as we can download it. Ian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recover zpool with a new installation
Hi Roberto, Yes, you can reinstall the OS on another disk and as long as the OS install doesn't touch the other pool's disks, your previous non-root pool should be intact. After the install is complete, just import the pool. Thanks, Cindy On 07/26/11 10:49, Roberto Scudeller wrote: Hi all, I lost my storage because rpool don't boot. I try to recover, but opensolaris says to destroy and re-create. My rpool installed on flash drive, and my pool (with my info) it's on another disks. My question is: It's possible I reinstall opensolaris in new flash drive, without stirring on my pool of disks, and recover this pool? Thanks. Regards, -- Roberto Scudeller ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding mirrors to an existing zfs-pool
On Tue, Jul 26, 2011 at 1:33 PM, Bernd W. Hennig consult...@hennig-consulting.com wrote: G'Day, - zfs pool with 4 disks (from Clariion A) - must migrate to Clariion B (so I created 4 disks with the same size, avaiable for the zfs) The zfs pool has no mirrors, my idea was to add the new 4 disks from the Clariion B to the 4 disks which are still in the pool - and later remove the original 4 disks. I only found in all example how to create a new pool with mirrors but no example how to add to a pool without mirrors a mirror disk for each disk in the pool. - is it possible to add disks to each disk in the pool (they have different sizes, so I have exact add the correct disks form Clariion B to the original disk from Clariion B) from man zpool zpool attach [-f] pool device new_device Attaches new_device to an existing zpool device. The existing device cannot be part of a raidz configuration. If device is not currently part of a mirrored configuration, device automatically transforms into a two-way mirror of device and new_device. If device is part of a two-way mirror, attaching new_device creates a three-way mirror, and so on. In either case, new_device begins to resilver immediately. -fForces use of new_device, even if its appears to be in use. Not all devices can be overridden in this manner. - can I later remove the disks from the Clariion A, pool is intact, user can work with the pool zpool detach pool device Detaches device from a mirror. The operation is refused if there are no other valid replicas of the data. If you're using raidz, you can't use zpool attach. Your best bet in this case is zpool replace. zpool replace [-f] pool old_device [new_device] Replaces old_device with new_device. This is equivalent to attaching new_device, waiting for it to resilver, and then detaching old_device. The size of new_device must be greater than or equal to the minimum size of all the devices in a mirror or raidz configuration. new_device is required if the pool is not redundant. If new_device is not specified, it defaults to old_device. This form of replacement is useful after an existing disk has failed and has been physically replaced. In this case, the new disk may have the same /dev path as the old device, even though it is actually a different disk. ZFS recognizes this. -fForces use of new_device, even if its appears to be in use. Not all devices can be overridden in this manner. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Entire client hangs every few seconds
Are the disk active lights typically ON when this happens? On Tue, Jul 26, 2011 at 3:27 PM, Garrett D'Amore garr...@damore.org wrote: This is actually a recently known problem, and a fix for it is in the 3.1 version, which should be available any minute now, if it isn't already available. The problem has to do with some allocations which are sleeping, and jobs in the ZFS subsystem get backed behind some other work. If you have adequate system memory, you are less likely to see this problem, I think. - Garrett On Tue, 2011-07-26 at 08:29 -0700, Rocky Shek wrote: Ian, Did you enable DeDup? Rocky -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ian D Sent: Tuesday, July 26, 2011 7:52 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Entire client hangs every few seconds Hi all- We've been experiencing a very strange problem for two days now. We have three client (Linux boxes) connected to a ZFS box (Nexenta) via iSCSI. Every few seconds (seems random), iostats shows the clients go from an normal 80K+ IOPS to zero. It lasts up to a few seconds and things are fine again. When that happens, I/Os on the local disks stops too, even the totally unrelated ones. How can that be? All three clients show the same pattern and everything was fine prior to Sunday. Nothing has changed on neither the clients or the server. The ZFS box is not even close to be saturated, nor the network. We don't even know where to start... any advices? Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] recover zpool with a new installation
On Tue, Jul 26, 2011 at 1:14 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Yes, you can reinstall the OS on another disk and as long as the OS install doesn't touch the other pool's disks, your previous non-root pool should be intact. After the install is complete, just import the pool. You can also use the Live CD or Live USB to access your pool or possibly fix your existing installation. You will have to force the zpool import with either a reinstall or a Live boot. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs hybrid drive - any advice?
On 2011-Jul-26 17:24:05 +0800, Fajar A. Nugraha w...@fajar.net wrote: Shouldn't modern SSD controllers be smart enough already that they know: - if there's a request to overwrite a sector, then the old data on that sector is no longer needed ZFS never does update-in-place and UFS only does update-in-place for metadata and where the application forces update-in-place. This means there will generally (always for ZFS) be a delay between when a filesystem frees (is no longer interested in the contents of) a sector and when it overwrites that sector. Without TRIM support, a SSD can only use overwrite to indicate that the contents of a sector are not needed. Which, in turn, means there is a pool of sectors that the FS knows are unused but the SSD doesn't - and is therefore forced to preserve. Since an overwrite almost never matches the erase page, this increases wear on the SSD because it is forced to rewrite unwanted data in order to free up pages for erasure to support external write requests. It also reduces performance for several reasons: - The SSD has to unnecessarily copy data - which takes time. - The space recovered by each erasure is effectively reduced by the amount of rewritten data so more time-consuming erasures are needed for a given external write load. - The pools of unused but not erased and erased (available) sectors are smaller, increasing the probability that an external write will require a synchronous erase cycle to complete. - allocate a clean sector from pool of available sectors (part of wear-leveling mechanism) As above, in the absence of TRIM, the pool will be smaller (and more likely to be empty). - clear the old sector, and add it to the pool (possibly done in background operation) Otherwise a sector could never be rewritten. It seems to be the case with sandforce-based SSDs. That would pretty much let the SSD work just fine even without TRIM (like when used under HW raid). Better SSDs mitigate the problem by having more hidden space (keeping the available pool larger to reduce the probability of a synchronous erase being needed) and higher performance (masking the impact of the additional internal writes and erasures). If TRIM support was available then the performance would still improve. This means you either get better system performance from the same SSD, or you can get the same system performance from a lower-performance (cheaper) SSD. -- Peter Jeremy pgpoOozgavEXj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss