Re: [zfs-discuss] what is zfs doing during a log resilver?
On Thu, Sep 2, 2010 at 10:18 AM, Jeff Bacon wrote: > So, when you add a log device to a pool, it initiates a resilver. > > What is it actually doing, though? Isn't the slog a copy of the > in-memory intent log? Wouldn't it just simply replicate the data that's > in the other log, checked against what's in RAM? And presumably there > isn't that much data in the slog so there isn't that much to check? > > Or is it just doing a generic resilver for the sake of argument because > you changed something? > Good question. Here it takes little over 1 hour to resilver a 32GB SSD in a mirror. I've always wondered what exactly it was doing since it was supposed to be 30 seconds worth of data. It also generates lots of checksum errors. -- Giovanni Tirloni gtirl...@sysdroid.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool died during scrub
This may or may not be helpful, and I don't run a RAID but I do have an external USB drive where I've created a pool for rsync backups and to import snapshots, and the current status of the pool is unavail insufficient replicas, as yours shows above. I've found I can get it back online by turning on the drive then using 'zpool clear poolname' (in your case srv, and without quotes of course). It just might work for you, though I'm running Opensolaris snv_134 and your situation isn't quite the same. Cia W Jeff Bacon wrote: > ny-fs4(71)# zpool import pool: srv id: 6111323963551805601 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see:http://www.sun.com/msg/ZFS-8000-EY config: srv UNAVAIL insufficient replicas logs srv UNAVAIL insufficient replicas mirror ONLINE c3t0d0s4 ONLINE< box doesn't even have a c3 c0t0d0s4 ONLINE< what it's looking at - leftover from who knows what pool: srv id: 9515618289022845993 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see:http://www.sun.com/msg/ZFS-8000-6X config: ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with SAN's and HA
Hi Michael, Have a look at this Blog/WP http://blogs.sun.com/TF/entry/new_white_paper_practicing_solaris for an example on how to use a iSCSI target from a NAS device as storage, you can just replace the tomcat/mysql HA services with HA nfs and you have what you are looking for. /peter On 8/27/10 11:25 , Michael Dodwell wrote: Lao, I had a look at the HAStoragePlus etc and from what i understand that's to mirror local storage across 2 nodes for services to be able to access 'DRBD style'. Having a read thru the documentation on the oracle site the cluster software from what i gather is how to cluster services together (oracle/apache etc) and again any documentation i've found on storage is how to duplicate local storage to multiple hosts for HA failover. Can't really see anything on clustering services to use shared storage/zfs pools. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS offline ZIL corruption not detected
On 26/08/2010 15:42, David Magda wrote: Does a scrub go through the slog and/or L2ARC devices, or only the "primary" storage components? A scrub traverses datasets including the ZIL thus the scrub will read (and if needed resilver) on a slog device too. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/dmu_traverse.c A scrub does not traverse an L2ARC device because hold in memory checksums (in the ARC header) for everything on the cache devices if we get a checksum failure on read we remove the L2ARC cached entry and read from the main pool again. The L2ARC cache devices are purely caches there is NEVER data on them that isn't already in the main pool devices. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] possible ZFS-related panic?
Hi Marion, I'm not the right person to analyze your panic stack, but a quick search says the page_sub: bad arg(s): pp panic string might be associated with a bad CPU or a page locking problem. I would recommend running CPU/memory diagnostics on this system. Thanks, Cindy On 09/02/10 20:31, Marion Hakanson wrote: Folks, Has anyone seen a panic traceback like the following? This is Solaris-10u7 on a Thumper, acting as an NFS server. The machine was up for nearly a year, I added a dataset to an existing pool, set compression=on for the first time on this system, loaded some data in there (via "rsync"), then mounted it to the NFS client. The first data was written by the client itself in a 10pm cron-job, and the system crashed at 10:02pm as below: panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 872b5610, *ppp 0 fe8000f5c470 unix:mutex_exit_critical_size+20219 () fe8000f5c4b0 unix:page_list_sub_pages+161 () fe8000f5c510 unix:page_claim_contig_pages+190 () fe8000f5c600 unix:page_geti_contig_pages+44b () fe8000f5c660 unix:page_get_contig_pages+c2 () fe8000f5c6f0 unix:page_get_freelist+1a4 () fe8000f5c760 unix:page_create_get_something+95 () fe8000f5c7f0 unix:page_create_va+2a1 () fe8000f5c850 unix:segkmem_page_create+72 () fe8000f5c8b0 unix:segkmem_xalloc+60 () fe8000f5c8e0 unix:segkmem_alloc_vn+8a () fe8000f5c8f0 unix:segkmem_alloc+10 () fe8000f5c9c0 genunix:vmem_xalloc+315 () fe8000f5ca20 genunix:vmem_alloc+155 () fe8000f5ca90 genunix:kmem_slab_create+77 () fe8000f5cac0 genunix:kmem_slab_alloc+107 () fe8000f5caf0 genunix:kmem_cache_alloc+e9 () fe8000f5cb00 zfs:zio_buf_alloc+1d () fe8000f5cb50 zfs:zio_compress_data+ba () fe8000f5cba0 zfs:zio_write_compress+78 () fe8000f5cbc0 zfs:zio_execute+60 () fe8000f5cc40 genunix:taskq_thread+bc () fe8000f5cc50 unix:thread_start+8 () syncing file systems... done . . . Unencumbered by more than a gut feeling, I disabled compression on the dataset, and we've gotten through two nightly runs of the same NFS client job without crashing, but of course we would tecnically have to wait for nearly a year before we've exactly replicated the original situation (:-). Unfortunately the dump-slice was slightly too small, we were just short of enough space to capture the whole 10GB crash dump. I did get savecore to write something out, and I uploaded it to the Oracle support site,but it gives "scat" too much indigestion to be useful to the engineer I'm working with. They have not found any matching bugs so far, so I thought I'd ask a slightly wider audience here. Thanks and regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss