Re: [zfs-discuss] system unresponsive after issuing a zpool attach

Joseph Mocker Thu, 13 Jul 2006 13:01:10 -0700


Dennis Clarke wrote:


whoa whoa ... just one bloody second .. whoa ..

That looks like a real nasty bug description there.

What are the details on that?  Is this particular to a given system or
controller config or something liek that or are we talking global to Solaris
10 Update 2 everywhere ??  :-(

Thats a good question. Looking the internal evaluation, it appears
scrubs can be a little too aggressive.
Perhaps one of the ZFS engineers can comment, Jeff?

I am curious about the "fix delivered" state as well. Looks like its
been fixed in SNV 36, but I wonder if there will be a patch available.

 --joe

Bug ID: 6355416
Synopsis: zpool scrubbing consumes all memory, system hung
Category: kernel
Subcategory: zfs
State: 10-Fix Delivered   <<-- in a patch somewhere ?

Description:

On a 6800 domain with 8G of RAM I created a zpool using a single 18G drive
and on that pool created a file system and a zvol. The zvol was filled with
data.

# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
pool                  11.0G  5.58G  9.00K  /pool
pool/fs                  8K  5.58G     8K  /pool/fs
pool/[EMAIL PROTECTED]              0      -     8K  -
pool/root             11.0G  5.58G  11.0G  -
pool/[EMAIL PROTECTED]            783K      -  11.0G  -
#

I then attached a second 18g drive to the pool and all seemed well. After a
few minutes however the system ground to a halt.  No response from the
keyboard.

Aborting the system it failed to dump due to the dump device being to small.
 On rebooting it did not make it into multi user.

Booting milestone=none and then bringing it up by had I could see it hung
doing zfs mount -a.

Booting milestone=none again I was able to export the pool and then the
system would come up into multiuser.  Any attempt to import the pool would
hang the system , running vmstat showing it consumed all available memory.

With the pool exported I reinstalled the system with a larger dump device
and then imported the pool.  The same hung occurred however this time I got
the crash dump.

Dumps can be found here:

/net/enospc.uk/export/esc/pts-crashdumps/zfs_nomemory

Dump 0 is from stock build 72a dump 1 from my workspace and had KMF_AUDIT
set.  The only change in my workspace is to the isp driver.

::kmausers gives:> ::kmausers
365010944 bytes for 44557 allocations with data size 8192:
         kmem_cache_alloc+0x148
         segkmem_xalloc+0x40
         segkmem_alloc+0x9c
         vmem_xalloc+0x554
         vmem_alloc+0x214
         kmem_slab_create+0x44
         kmem_slab_alloc+0x3c
         kmem_cache_alloc+0x148
         kmem_zalloc+0x28
         zio_create+0x3c
         zio_vdev_child_io+0xc4
         vdev_mirror_io_start+0x1ac
         spa_scrub_cb+0xe4
         traverse_segment+0x2e8
         traverse_more+0x7c
362520576 bytes for 44253 allocations with data size 8192:
         kmem_cache_alloc+0x148
         segkmem_xalloc+0x40
         segkmem_alloc+0x9c
         vmem_xalloc+0x554
         vmem_alloc+0x214
         kmem_slab_create+0x44
         kmem_slab_alloc+0x3c
         kmem_cache_alloc+0x148
         kmem_zalloc+0x28
         zio_create+0x3c
         zio_read+0x54
         spa_scrub_io_start+0x88
         spa_scrub_cb+0xe4
         traverse_segment+0x2e8
         traverse_more+0x7c
241177600 bytes for 376840 allocations with data size 640:
         kmem_cache_alloc+0x88
         kmem_zalloc+0x28
         zio_create+0x3c
         zio_vdev_child_io+0xc4
         vdev_mirror_io_done+0x254
         taskq_thread+0x1a0
209665920 bytes for 327603 allocations with data size 640:
         kmem_cache_alloc+0x88
         kmem_zalloc+0x28
         zio_create+0x3c
         zio_read+0x54
         spa_scrub_io_start+0x88
         spa_scrub_cb+0xe4
         traverse_segment+0x2e8
         traverse_more+0x7c

I have attached the full output.

If I am quick I can detatch the disk and the export the pool before the

system grinds to a halt. Then reimporting the pool I can access the data.Attaching the disk again results in the system using all the memory again.


Date Modified: 2005-11-25 09:03:07 GMT+00:00


Work Around:
Suggested Fix:
Evaluation:
Fixed by patch:
Integrated in Build: snv_36
Duplicate of:
Related Change Request(s):6352306  6384439  6385428
Date Modified: 2006-03-23 23:58:15 GMT+00:00
Public Summary:



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] system unresponsive after issuing a zpool attach

Reply via email to