[zfs-discuss] raidz2 + spare or raidz3 and no spare for nine 1.5 TB SATA disks?
The only other zfs pool in my system is a mirrored rpool (2 500 gb disks). This is for my own personal use, so it's not like the data is mission critical in some sort of production environment. The advantage I can see with going with raidz2 + spare over raidz3 and no spare is I would spend much less time running in a degraded state when a drive fails (I'd have to RMA the drive and wait most likely a week or more for a replacement). The disadvantage of raidz2 + spare is the event of a triple disk failure. This is most likely not going to occur with 9 disks, but certainly is possible. If 3 disks fail before one can be rebuilt with the spare, the data will be lost. So, I guess the main question I have is, how much a performance hit is noticed when a raidz3 array is running in a degraded state? Thanks - Jack -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz2 + spare or raidz3 and no spare for nine 1.5 TB SATA disks?
Thanks, Looks like I'll be using raidz3. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Howdy, I upgraded to snv_128a from snv_125 . I wanted to do some de-dup testing :). I have two zfs pools: rpool and vault. I upgraded my vault zpool version and turned on dedup on datastore vault/shared_storage. I also turned on gzip compression on this dataset as well. Before I turned on dedup, I made a new datastore and copied all data to vault/shared_storage_temp (just in case something crazy happened to my dedup'd datastore, since dedup is new). I removed all data on my dedup'd datastore and copied all data from my temp datastore. After I realized my space savings wasn't going to be that great, I decided to delete vault/shared_storage dataset. zfs destroy vault/shared_storage This hung, and couldn't be killed. I force rebooted my system, and I couldn't boot into Solaris. It hung at reading zfs config I then booted into single user mode (multiple times) and any zfs or zpool commands froze. I then rebooted to my snv_125 environment. As it should, it ignored my vault zpool, as it's version is higher than it can understand. I forced an zpool export of vault and rebooted, I could then boot back into snv_128 and zpool import listed the pool of vault. However, I cannot import via name or identifier, the command hangs, as well as any additional zfs or zpool commands. I cannot kill or kill -9 the processes. Is there anything I can do to get my pool imported? I haven't done much troubleshooting at all on opensolairs, I'd be happy to run any suggested commands and provide output. Thank you for the assistance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I waited about 20 minutes or so. I'll try your suggestions tonight. I didn't look at iostat. I just figured it was hung after waiting that long, but now that I know it can take a very long time, I will watch it and make sure it's doing something. Thanks. I'll post my results either tonight or tomorrow morning. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
The pool is roughly 4.5 TB (Raidz1, 4 1.5 TB Disks) I didn't attempt to destroy the pool, only a dataset within the pool. The dataset is/was about 1.2TB. System Specs Intel Q6600 (2.4 Ghz Quad Core) 4GB RAM 2x 500 GB drives in zfs mirror (rpool) 4x 1.5 TB drives in zfs raidz1 array (vault) The 1.5TB drives are attached to a PCI sata card (Silicon Image), the rpool drives are using the integrated SATA ports Please let me know if you need further specs. And thank you. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It's been about 45 minutes now since I started trying to import the pool. I see disk activity (see below) What concerns me is my free memory keeps shrinking as time goes on. Now have 185MB free out of 4 gigs (and 2 gigs of swap free). Hope this doesn't exhaust all my memory and freeze my box. I'll post updated information later. iostat output (taken every 5 seconds): extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c6d0 29.80.0 1894.70.0 0.0 0.60.0 20.2 0 60 c3d0 28.80.0 1843.30.0 0.0 0.40.0 12.6 0 36 c3d1 31.20.0 1984.30.0 0.0 0.70.0 21.0 0 65 c4d0 29.00.0 1830.90.0 0.0 0.30.0 11.5 0 33 c4d1 Tue Dec 8 17:34:15 CST 2009 cpu us sy wt id 1 1 0 97 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c6d0 29.80.0 1881.90.0 0.0 0.60.0 20.6 0 61 c3d0 32.60.0 2086.30.0 0.0 0.40.0 11.7 0 38 c3d1 30.20.0 1932.70.0 0.0 0.60.0 20.2 0 61 c4d0 30.20.0 1932.70.0 0.0 0.40.0 12.4 0 37 c4d1 Tue Dec 8 17:34:20 CST 2009 cpu us sy wt id 1 1 0 98 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c6d0 27.20.0 1728.20.0 0.0 0.60.0 20.3 0 55 c3d0 30.20.0 1932.80.0 0.0 0.40.0 13.0 0 39 c3d1 30.80.0 1958.60.0 0.0 0.70.0 21.5 0 66 c4d0 31.60.0 2009.80.0 0.0 0.40.0 11.3 0 36 c4d1 Tue Dec 8 17:34:25 CST 2009 cpu us sy wt id 1 1 0 98 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c6d0 30.80.0 1971.20.0 0.0 0.60.0 20.4 0 63 c3d0 29.20.0 1868.80.0 0.0 0.40.0 13.0 0 38 c3d1 30.20.0 1932.80.0 0.0 0.60.0 20.3 0 61 c4d0 30.20.0 1920.20.0 0.0 0.40.0 12.1 0 37 c4d1 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah, good to know! I'm learning all kinds of stuff here :) The command (zpool import) is still running and I'm still seeing disk activity. Any rough idea as to how long this command should last? Looks like each disk is being read at a rate of 1.5-2 megabytes per second. Going worst case, assuming each disk is 1572864 megs (the 1.5TB disks are actually smaller than this due to the 'rounding' drive manufacturers do) and 2 megs/sec read rate per disk, that means hopefully at most I should have to wait: 1572864(megs) / 2(megs/second) / 60 (seconds / minute) / 60 (minutes / hour) / 24 (hour / day): 9.1 days Again, I don't know if the zpool import is looking at the entire contents of the disks, or what exactly it's doing, but I'm hoping that would be the 'maximum' I'd have to wait for this command to finish :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
The server just went "almost" totally unresponsive :( I still hear the disks thrashing. If I press keys on the keyboard, my login screen will not show up. I had a VNC session hang and can no longer get back in. I can try to ssh to the server, I get prompted for my username and password, but it will not drop me off to a prompt. - login as: redshirt Using keyboard-interactive authentication. Password: - It just hangs there I also run Virtual Box with a Ubuntu VM, that VM went unresponsive as well. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I just hard-rebooted my server. I'm moving off my VM to my laptop so it can continue to run :) Then, if it "freezes" again I'll just let it sit, as I did hear the disks thrashing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, When searching for how to do that, I see that it requires a modification to /etc/system. I'm thinking I'll limit it to 1GB, so the entry (which must be in hex) appears to be: set zfs:zfs_arc_max = 0x4000 Then I'll reboot the server and try the import again. Thanks for the continued assistance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Upon further research, it appears I need to limit both the ncsize and the arc_max. I think I'll use: set ncsize = 0x3000 set zfs:zfs_arc_max = 0x1000 That should give me a max of 1GB used between both. If I should be using different values (or other settings), please let me know :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Yikes, Posted too soon. I don't want to set my ncsize that high!!! (Was thinking the entry was memory, but it's entries). set ncsize = 25 set zfs:zfs_arc_max = 0x1000 Now THIS should hopefully only make it so the process can take around 1GB of RAM. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, I have started the zpool import again. Looking at iostat, it looks like I'm getting compatible read speeds (possibly a little slower): extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c5d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c6d0 26.40.0 1689.60.0 0.0 0.60.0 21.9 0 58 c3d0 28.00.0 1792.00.0 0.0 0.40.0 14.0 0 39 c3d1 29.00.0 1856.00.0 0.0 0.60.0 22.0 0 64 c4d0 27.60.0 1766.40.0 0.0 0.40.0 13.2 0 37 c4d1 Tue Dec 8 23:44:50 CST 2009 cpu us sy wt id 0 0 0 99 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c5 0.00.00.00.0 0.0 0.00.00.0 0 0 c5t1d0 0.0 13.80.0 54.6 0.0 0.01.42.4 0 3 c5d0 0.6 12.6 26.2 54.6 0.0 0.00.41.3 0 2 c6d0 28.20.0 1804.80.0 0.0 0.60.0 21.9 0 62 c3d0 28.60.0 1817.80.0 0.0 0.40.0 13.8 0 40 c3d1 27.20.0 1740.80.0 0.0 0.60.0 21.0 0 57 c4d0 27.40.0 1741.00.0 0.0 0.40.0 12.9 0 35 c4d1 I also have a ton of free RAM now. I think I can bump up my settings quite a bit, but I'm not going to (at least not yet). I'm heading to bed. I'll post an update tomorrow. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> On Tue, Dec 8, 2009 at 6:36 PM, Jack Kielsmeier > wrote: > > Ah, good to know! I'm learning all kinds of stuff > here :) > > > > The command (zpool import) is still running and I'm > still seeing disk activity. > > > > Any rough idea as to how long this command should > last? Looks like each disk is being read at a rate of > 1.5-2 megabytes per second. > > > > Going worst case, assuming each disk is 1572864 > megs (the 1.5TB disks are actually smaller than this > due to the 'rounding' drive manufacturers do) and 2 > megs/sec read rate per disk, that means hopefully at > most I should have to wait: > > > > 1572864(megs) / 2(megs/second) / 60 (seconds / > minute) / 60 (minutes / hour) / 24 (hour / day): > > > > 9.1 days > > > > Again, I don't know if the zpool import is looking > at the entire contents of the disks, or what exactly > it's doing, but I'm hoping that would be the > 'maximum' I'd have to wait for this command to finish > :) > > -- > > This message posted from opensolaris.org > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > > > > I submitted a bug a while ago about this: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu > g_id=6855208 > > I'll escalate since I have a support contract. But > yes, I see this as > a serious bug, I thought my machine had locked up > entirely as well, it > took about 2 days to finish a destroy on a volume > about 12TB in size. > > -- > Brent Jones > br...@servuhome.net > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss Thanks for escalating this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> zpool import done! Back online. > > Total downtime for 4TB pool was about 8 hours, don't > know how much of this was completing the destroy > transaction. Lucky You! :) My box has gone totally unresponsive again :( I cannot even ping it now and I can't hear the disks thrashing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I have disabled all 'non-important' processes (gdm, ssh, vnc, etc). I am now starting this process locally on the server via the console with about 3.4 GB free of RAM. I still have my entries in /etc/system for limiting how much RAM zfs can use. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> I have disabled all 'non-important' processes (gdm, > ssh, vnc, etc). I am now starting this process > locally on the server via the console with about 3.4 > GB free of RAM. > > I still have my entries in /etc/system for limiting > how much RAM zfs can use. Going on 10 hours now, still importing. Still at just under 2MB/S read speed on each disk in the pool. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> > I have disabled all 'non-important' processes > (gdm, > > ssh, vnc, etc). I am now starting this process > > locally on the server via the console with about > 3.4 > > GB free of RAM. > > > > I still have my entries in /etc/system for > limiting > > how much RAM zfs can use. > > Going on 10 hours now, still importing. Still at just > under 2MB/S read speed on each disk in the pool. And it's now froze again. Been frozen for 10 minutes now. I had iostat working on the console, At the time of the freeze, it started writing to the zfs pool disks, previous to that, it has been all reads. The console cursor is still blinking at least, so it's not a hard lock. I'm just gonna let it sit for a while and see what happens. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah that could be it! This leaves me hopeful, as it looks like that bug says it'll eventually finish! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
My import is still going (I hope, as I can't confirm since my system appears to be totally locked except for the little blinking console cursor), been well over a day. I'm less hopeful now, but will still let it "do it's thing" for another couple of days. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It's been over 72 hours since my last import attempt. System still is non-responsive. No idea if it's doing anything -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
My system was pingable again, unfortunately I disabled all services such as ssh. My console was still hung, but I was wondering if I had hung USB crap (since I use a USB keyboard and everything had been hung for days). I force rebooted and the pool was not imported :(. I started the process off again, this time with remote services enabled and am telling myself to not touch the sucker for 7 days. We'll see if that lasts :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Thanks. I've decided now to only post when: 1) I have my zfs pool back or 2) I give up I should note that there are periods of time where I can ping my server (rarely), but most of the time not. I have not been able to ssh into it, and the console is hung (minus the little blinking cursor). I'm going to let this "run" until the end of the week. If I don't have my zpool back by then, I'm guessing I never will. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> On Dec 15, 2009, at 5:50, Jack Kielsmeier > wrote: > > > Thanks. > > > > I've decided now to only post when: > > > > 1) I have my zfs pool back > > or > > 2) I give up > > > > I should note that there are periods of time where > I can ping my > > server (rarely), but most of the time not. I have > not been able to > > ssh into it, and the console is hung (minus the > little blinking > > cursor). > > > > I'm going to let this "run" until the end of the > week. If I don't > > have my zpool back by then, I'm guessing I never > will. > > Don't give up! Let's wait a bit longer and if it > doesn't work we'll > see what can be done. > > Regards > Victor > > > -- > > This message posted from opensolaris.org > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss Ah, thanks. As long as there is stuff to try, I won't give up. I miss being able to use my server, but I'll live :) I should note, that I can live with losing the data that I had on my pool. While I would prefer recovering it, I can stand to lose it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> Jack, > > We'd like to get a crash dump from this system to > determine the root > cause of the system hang. You can get a crash dump > from a live system > like this: > > # savecore -L > dumping to /dev/zvol/dsk/rpool/dump, offset 65536, > content: kernel > 0:18 100% done > 0% done: 49953 pages dumped, dump succeeded > savecore: System dump time: Wed Dec 16 10:37:51 2009 > > savecore: Saving compressed system crash dump in > /var/crash/v120-brm-08/vmdump.0 > savecore: Decompress the crash dump with > 'savecore -vf /var/crash/v120-brm-08/vmdump.0' > > It won't impact the running system. > > Then, upload the crash dump file by following these > instructions: > > http://wikis.sun.com/display/supportfiles/Sun+Support+ > Files+-+Help+and+Users+Guide > > Let us know when you get it uploaded. > > Thanks, > > Cindy > > > On 12/15/09 00:25, Jack Kielsmeier wrote: > >> On Dec 15, 2009, at 5:50, Jack Kielsmeier > >> wrote: > >> > >>> Thanks. > >>> > >>> I've decided now to only post when: > >>> > >>> 1) I have my zfs pool back > >>> or > >>> 2) I give up > >>> > >>> I should note that there are periods of time > where > >> I can ping my > >>> server (rarely), but most of the time not. I have > >> not been able to > >>> ssh into it, and the console is hung (minus the > >> little blinking > >>> cursor). > >>> > >>> I'm going to let this "run" until the end of the > >> week. If I don't > >>> have my zpool back by then, I'm guessing I never > >> will. > >> > >> Don't give up! Let's wait a bit longer and if it > >> doesn't work we'll > >> see what can be done. > >> > >> Regards > >> Victor > >> > >>> -- > >>> This message posted from opensolaris.org > >>> ___ > >>> zfs-discuss mailing list > >>> zfs-discuss@opensolaris.org > >>> > >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > >> ss > >> ___ > >> zfs-discuss mailing list > >> zfs-discuss@opensolaris.org > >> > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > >> ss > > > > Ah, thanks. As long as there is stuff to try, I > won't give up. I miss being able to use my server, > but I'll live :) > > > > I should note, that I can live with losing the data > that I had on my pool. While I would prefer > recovering it, I can stand to lose it. > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss I'd be glad to do this, but I have a question. If the dump needs to happen while the system is hanging, how can I run the dump? :) I cannot ssh in, and my console is completely unresponsive. Would running the dump help at all when my system is not hung? If so I can hard reboot the server and run said command. I could also run the dump when I first start the zpool import. My system does not hang until stuff is being written to the pool. It takes several hours for this to happen (last time it was something like 14 hours of reading, and then mass writes started to happen when looking at iostat, and the freeze always happens exactly when writes to the disk get very busy). When iostat is refreshing every 5 seconds, I only get one output that shows writes before it freezes. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I'll see what I can do. I have a busy couple of days, so it may not be until Friday until I can spend much time on this. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, my console is 100% completely hung, not gonna be able to enter any commands when it freezes. I can't even get the numlock light to change it's status. This time I even plugged in a PS/2 keyboard instead of USB thinking maybe it was USB dying during the hang, but not so. I have hard rebooted my system again. I'm going to set up a script that will continuously run savecore, after 10, I'll reset the bounds file. Hopefully by doing it this way, I'll get a savecore right as the system starts to go unresponsive. I'll post the script I'll be running here shortly after I write it. Also, as far as using 'sync' I"m not sure what exactly I would do there. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, this is the script I am running (as a background process). This script doesn't matter much, it's just here for reference, as I'm running into problems just running the savecore command while the zpool import is running. #!/bin/bash count=1 rm /var/crash/opensol/bounds /usr/bin/savecore -L while [ 1 ] do if [ $count == 10 ] then count=1 rm /var/crash/opensol/bounds fi savecore -L count=`expr $count + 1` done opensol was the name of the system before I renamed it to wd40, crash data is still set to be put in /var/crash/opensol I have started another zpool import of the vault volume r...@wd40:~# zpool import pool: vault id: 4018273146420816291 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: vault ONLINE raidz1-0 ONLINE c3d0ONLINE c3d1ONLINE c4d0ONLINE c4d1ONLINE r...@wd40:~# zpool import 4018273146420816291 & [1] 1093 After starting the import, savecore -L no longer finishes r...@wd40:/var/adm# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 153601 pages dumped, dump succeeded It should be saying that it's saving to /var/crash/opensol/, but instead it just hangs and never returns me to a prompt Previous to running zpool import, the savecore command took anywhere from 10-15 seconds to finish. If I cd to /var/crash/opensol, there is not a new file created I tried firing off savecore again, same result. A ps listing shows the savecore command r...@wd40:/var/crash/opensol# ps -ef | grep savecore root 1092 1061 0 22:27:55 ? 0:01 savecore -L root 1134 1083 0 22:33:28 pts/3 0:00 grep savecore root 1113 787 0 22:30:23 ? 0:01 savecore -L (One of these is from the script I was running when I started the import manually, the other when I just ran the savecore -L command by itself). I cannot kill these processes, even with a kill -9 I then hard rebooted my server yet again (as it hangs if it's in process of a zpool import) After the reboot, all I did was ssh in, disable gdm, run my zfs import command, and try another savecore (this time not trying to use my script above first, just a simple savecore -L as root from the command line), once again it hangs r...@wd40:~# zpool import 4018273146420816291 & [1] 783 r...@wd40:~# savecore -L dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel 0:05 100% done 100% done: 138876 pages dumped, dump succeeded -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ah! Ok, I will give this a try tonight! Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, I have started my import after using the -k on my kernel line (I just did a test dump using this method just to make sure it works ok, and it does). I have also added the following to my /etc/system file and rebooted: set snooping=1 According to this page: http://developers.sun.com/solaris/articles/manage_core_dump.html "Sometimes, the system will hang without any response even when you use kmdb or OBP. In this case, use the "deadman timer." The deadman timer allows the OS to force a kernel panic in the event of a system hang. This feature is available on x86 and SPARC systems. Add the following line to /etc/system and reboot so the deadman timer will be enabled." And this will force a kernel panic. I'll wait for the system to hang and give it a try. Again, thanks for all the help. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Ok, dump uploaded! Thanks for your upload Your file has been stored as "/cores/redshirt-vmdump.0" on the Supportfiles service. Size of the file (in bytes) : 1743978496. The file has a cksum of : 2878443682 . It's about 1.7 GB compressed! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I don't mean to sound ungrateful (because I really do appreciate all the help I have received here), but I am really missing the use of my server. Over Christmas, I want to be able to use my laptop (right now, it's acting as a server for some of the things my OpenSolaris server did). This means I will need to get my server back up and running in full working order by then. All the data that I lost is unimportant data, so I'm not really missing anything there. Again, I do appreciate all the help, but I'm going to "give up" if no solution can be found in the next couple of days. This is simply because I want to be able to use my hardware. What I plan on doing is simply formatting each disk that was part of the bad pool and creating a new one. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I still haven't given up :) I moved my Virtual Machines to my main rig (which gets rebooted often, so this is 'not optimal' to say the least) :) I have since upgraded to 129. I noticed that even if timeslider/autosnaps are disabled, a zpool command still gets generated every 15 minutes. Since all zpool/zfs commands freeze during the import, I'd have hundreds of hung zpool processes. I stopped this by commenting out all jobs on the zfssnap crontab as well as the auto-snap cleanup job on roots crontab. This did nothing to resolve my issue, but I figured I should note it. I'd copy and past the exact jobs, but my server is once again hung. I'm going to upgrade my server (new motherboard that supports more than 4GB of RAM). I'll have double the RAM, perhaps there is some sort of RAM issue going on. I really wanted to get 16GB of RAM, by my own personal budget will not allow it :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Just wondering, How much RAM is in your system? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
It sounds like you have less data on yours, perhaps that is why yours freezes faster. Whatever mine is doing during the import, it reads my disks now for nearly 24-hours, and then starts writing to the disks. The reads start out fast, then they just sit, going at something like 20k / second on each disk in my raidz1 pool. As soon as it's done reading whatever it's reading, it starts to write, that is when the freeze happens. I think the folks here from Sun that have been assisting here are on holiday break. I'm guessing there won't be further assistance from them until after the first of the year. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
Here is iostat output of my disks being read: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 45.30.0 27.60.0 0.0 0.60.0 13.3 0 60 c3d0 44.30.0 27.00.0 0.0 0.30.07.7 0 34 c3d1 43.50.0 27.40.0 0.0 0.50.0 12.6 0 55 c4d0 41.10.0 24.90.0 0.0 0.30.08.0 0 33 c4d1 very very slow It didn't used to take as long to freeze for me, but every time I restart the process, the 'reading' portion of the zpool import seems to take much longer. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
One thing that bugged me is that I can not ssh as myself to my box when a zpool import is running. It just hangs after accepting my password. I had to convert root from a role to a user and ssh as root to my box. I now know why this is, when I log in, /usr/sbin/quota gets called. This must do a zfs or zpool command get get quota information which hangs during an import. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I got my pool back Did a rig upgrade (new motherboard, processor, and 8 GB of RAM), re-installed opensolaris 2009.06, did an upgrade to snv_130, and did the import! The import only took about 4 hours! I have a hunch that I was running into some sort of issue with not having enough RAM previously. Of course, that's just a guess. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I should note that my import command was: zpool import -f vault -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> Yeah, still no joy on getting my pool back. I think > I might have to try grabbing another server with a > lot more memory and slapping the HBA and the drives > in that. Can ZFS deal with a controller change? Just some more info that 'may' help. After I upgraded to 8GB of RAM, I did not limit the amount of RAM zfs can take. So if you are doing any kind of limiting in /etc/system, you may want to take that out. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
> That's the thing, the drive lights aren't blinking, > but I was thinking maybe the writes are going so slow > that it's possible they aren't registering. And since > I can't keep a running iostat, Ican't tell if > anything is going on. I can however get into the > KMDB. is there something in there that can monitor > storage activity or anything? > probably not, but it's worth asking. > > Oh, and for the other guys, was your ZIL on an ssd or > in the pool? My ZIL is on a 30GB ssd from ocz and my > arcl2 is on another ssd of the same type. I'm > wondering if your ZIL's are in the pool and therefore > is helping your recovery where I may be hitting > simultaneous bug. > > Message was edited by: tomwag No SSD's in my system. ZIL is in the pool. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] $100 SSD = >5x faster dedupe
Are you using the SSD for l2arc or zil or both? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] $100 SSD = >5x faster dedupe
> Just l2arc. Guess I can always repartition later. > > mike > > > On Sun, Jan 3, 2010 at 11:39 AM, Jack Kielsmeier > wrote: > > Are you using the SSD for l2arc or zil or both? > > -- > > This message posted from opensolaris.org > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss This is good to know. I have considered purchasing a SSD, but have not wanted to make the investment not knowing how much it would help. It is suggested not to put zil on a device external to the disks in the pool unless you mirror the zil device. This is suggested to prevent data loss if the zil device dies. With L2arc, no such redundancy is needed. So, with a $100 SSD, if you can get 8x the performance out of your dedup'd dataset, and you don't have to worry about "what if the device fails", I'd call that an awesome investment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] $100 SSD = >5x faster dedupe
> On Sun, 3 Jan 2010, Jack Kielsmeier wrote: > > > > help. It is suggested not to put zil on a device > external to the > > disks in the pool unless you mirror the zil device. > This is > > suggested to prevent data loss if the zil device > dies. > > The reason why it is suggested that the intent log > reside in the same > chassis where the pool disks live is so that it is > easier to move the > pool to a different computer if need be. This is an > independent > problem from mirroring. If the intent log device is > easily moved to > the new computer, then there is not really a problem. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, >http://www.GraphicsMagick.org/ > > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss Ah, it looks like the information I was recalling is now outdated with the zil: http://bugs.opensolaris.org/view_bug.do?bug_id=6707530 (this now appears to be fixed, and have been fixed for some time (since snv_96). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled
I'm thinking that the issue is simply with zfs destroy, not with dedup or compression. Yesterday I decided to do some iscsi testing, I created a new dataset in my pool, 1TB. I did not use compression or dedup. After copying about 700GB of data from my windows box (NTFS on top of the iscsi disk), I decided I didn't want to use it, so I attempted to delete the dataset. Once again, the command froze. I removed the zfs cache file and am now trying to import my pool... again. This time, the memory fills up QUICKLY, I hit 8GB used in about an hour, then the box completely freezes. iostat shows each of my disks being read at about 10 megs/S up until the freeze. It does not matter if I limit l2arc size in /etc/system, the behavior is the same. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS import hangs with over 66000 context switches shown in top
Howdy All, I made a 1 TB zfs volume within a 4.5 TB zpool called vault for testing iscsi. Both DeDup and Compression were off. After my tests, I issued a zfs destroy to remove the volume. This command hung. After 5 hours, I hard rebooted into single user mode and removed my zfs cache file (I had to do this in order to boot up, as with the zfs cache file, my system would hang at reading zfs config). Now I cannot import my pool as the box always hangs after about 30 minutes. It's not a complete hang, I can still ping the box, but I cannot do anything. The keyboard is still responsive, but the server will do nothing with any input I make. I cannot ssh to the box as well. The only thing I can do is hard reboot the box. At first I thought I was running out of RAM, because the hang always happened right when my free RAM hit 0 (still had swap available however), but I've made tweaks to /etc/system and now I get freezes with over a gig of RAM free (8GB total in the box). The strange thing is, the context switches shown in top skyrocket from about 2000-6000 to over 66,000 just before the freeze. Would anyone know what that would skyrocket like that? If I do a ps -ef before the freeze, there are a normal amount of processes running. I have also tried a zpool import -f vault using a snv_130 live CD, as well as trying a zpool import -fFX vault. The same thing happens. System Specs: snv_130 AMD Phenom 925 8GB DDR2 RAM 2x 500 GB rpool mirrored drives 4x 1.5TB vault raidz1 drives I have let the import run for over 24 hours with no luck. Thanks for the assistance. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS import hangs with over 66000 context switches shown in top
I should also mention that once the "lock" starts, the disk activity light on my case stays busy for a bit (1-2 minutes MAX), then does nothing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (snv_129, snv_130) can't import zfs pool
Just curious if anything has happened here. I had a similar issue that was solved by upgrading from 4GB to 8GB of RAM. I now have the issue again, and my box hard locks when doing the import after about 30 minutes. (This time not using de-dup, but was using iscsi). I debated on upgrading to 16GB of RAM, but can't justify the cost. Hoping there is some sort of bug found and fixed in a future release so that I may get my 4.5 TB pool back. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (snv_129, snv_130) can't import zfs pool
I'd like to thank Tim and Cindy at Sun for providing me with a new zfs binary file that fixed my issue. I was able to get my zpool back! Hurray! Thank You. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss