Re: [zfs-discuss] what is zfs doing during a log resilver?

2010-09-03 Thread Giovanni Tirloni
On Thu, Sep 2, 2010 at 10:18 AM, Jeff Bacon wrote:

> So, when you add a log device to a pool, it initiates a resilver.
>
> What is it actually doing, though? Isn't the slog a copy of the
> in-memory intent log? Wouldn't it just simply replicate the data that's
> in the other log, checked against what's in RAM? And presumably there
> isn't that much data in the slog so there isn't that much to check?
>
> Or is it just doing a generic resilver for the sake of argument because
> you changed something?
>

Good question. Here it takes little over 1 hour to resilver a 32GB SSD in a
mirror. I've always wondered what exactly it was doing since it was supposed
to be 30 seconds worth of data. It also generates lots of checksum errors.

-- 
Giovanni Tirloni
gtirl...@sysdroid.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool died during scrub

2010-09-03 Thread Cia Watson

This may or may not be helpful, and I don't run a RAID but I do have an 
external USB drive where I've created a pool for
rsync backups and to import snapshots, and the current status of the pool is 
unavail insufficient replicas, as yours shows
above. I've found I can get it back online by turning on the drive then using 
'zpool clear poolname' (in your case srv,
and without quotes of course).

It just might work for you, though I'm running Opensolaris snv_134 and your 
situation isn't quite the same.

Cia W



Jeff Bacon wrote:

> ny-fs4(71)# zpool import


   pool: srv
 id: 6111323963551805601
  state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
see:http://www.sun.com/msg/ZFS-8000-EY
config:

 srv   UNAVAIL  insufficient replicas
 logs
 srv   UNAVAIL  insufficient replicas
   mirror  ONLINE
 c3t0d0s4  ONLINE< box doesn't even have a c3
 c0t0d0s4  ONLINE< what it's looking at - leftover from
who knows what

   pool: srv
 id: 9515618289022845993
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
 devices and try again.
see:http://www.sun.com/msg/ZFS-8000-6X
config:









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with SAN's and HA

2010-09-03 Thread Peter Karlsson

Hi Michael,

Have a look at this Blog/WP 
http://blogs.sun.com/TF/entry/new_white_paper_practicing_solaris for an 
example on how to use a iSCSI target from a NAS device as storage, you 
can just replace the tomcat/mysql HA services with HA nfs and you have 
what you are looking for.


/peter

On 8/27/10 11:25 , Michael Dodwell wrote:

Lao,

I had a look at the HAStoragePlus etc and from what i understand that's to 
mirror local storage across 2 nodes for services to be able to access 'DRBD 
style'.

Having a read thru the documentation on the oracle site the cluster software 
from what i gather is how to cluster services together (oracle/apache etc) and 
again any documentation i've found on storage is how to duplicate local storage 
to multiple hosts for HA failover. Can't really see anything on clustering 
services to use shared storage/zfs pools.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS offline ZIL corruption not detected

2010-09-03 Thread Darren J Moffat

On 26/08/2010 15:42, David Magda wrote:

Does a scrub go through the slog and/or L2ARC devices, or only the
"primary" storage components?


A scrub traverses datasets including the ZIL thus the scrub will read 
(and if needed resilver) on a slog device too.


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/dmu_traverse.c

A scrub does not traverse an L2ARC device because hold in memory 
checksums (in the ARC header) for everything on the cache devices if we 
get a checksum failure on read we remove the L2ARC cached entry and read 
from the main pool again.   The L2ARC cache devices are purely caches 
there is NEVER data on them that isn't already in the main pool devices.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible ZFS-related panic?

2010-09-03 Thread Cindy Swearingen

Hi Marion,

I'm not the right person to analyze your panic stack, but a quick
search says the page_sub: bad arg(s): pp panic string might be
associated with a bad CPU or a page locking problem.

I would recommend running CPU/memory diagnostics on this system.


Thanks,

Cindy

On 09/02/10 20:31, Marion Hakanson wrote:

Folks,

Has anyone seen a panic traceback like the following?  This is Solaris-10u7
on a Thumper, acting as an NFS server.  The machine was up for nearly a
year, I added a dataset to an existing pool, set compression=on for the
first time on this system, loaded some data in there (via "rsync"),
then mounted it to the NFS client.

The first data was written by the client itself in a 10pm cron-job, and
the system crashed at 10:02pm as below:

panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 
872b5610, *ppp 0


fe8000f5c470 unix:mutex_exit_critical_size+20219 ()
fe8000f5c4b0 unix:page_list_sub_pages+161 ()
fe8000f5c510 unix:page_claim_contig_pages+190 ()
fe8000f5c600 unix:page_geti_contig_pages+44b ()
fe8000f5c660 unix:page_get_contig_pages+c2 ()
fe8000f5c6f0 unix:page_get_freelist+1a4 ()
fe8000f5c760 unix:page_create_get_something+95 ()
fe8000f5c7f0 unix:page_create_va+2a1 ()
fe8000f5c850 unix:segkmem_page_create+72 ()
fe8000f5c8b0 unix:segkmem_xalloc+60 ()
fe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
fe8000f5c8f0 unix:segkmem_alloc+10 ()
fe8000f5c9c0 genunix:vmem_xalloc+315 ()
fe8000f5ca20 genunix:vmem_alloc+155 ()
fe8000f5ca90 genunix:kmem_slab_create+77 ()
fe8000f5cac0 genunix:kmem_slab_alloc+107 ()
fe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
fe8000f5cb00 zfs:zio_buf_alloc+1d ()
fe8000f5cb50 zfs:zio_compress_data+ba ()
fe8000f5cba0 zfs:zio_write_compress+78 ()
fe8000f5cbc0 zfs:zio_execute+60 ()
fe8000f5cc40 genunix:taskq_thread+bc ()
fe8000f5cc50 unix:thread_start+8 ()

syncing file systems... done
. . .

Unencumbered by more than a gut feeling, I disabled compression on
the dataset, and we've gotten through two nightly runs of the same
NFS client job without crashing, but of course we would tecnically
have to wait for nearly a year before we've exactly replicated the
original situation (:-).

Unfortunately the dump-slice was slightly too small, we were just short
of enough space to capture the whole 10GB crash dump.  I did get savecore
to write something out, and I uploaded it to the Oracle support site,but it 
gives "scat" too much indigestion to be useful to the engineer I'm working

with.  They have not found any matching bugs so far, so I thought I'd ask a
slightly wider audience here.

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss