Dennis Clarke wrote:
whoa whoa ... just one bloody second .. whoa ..
That looks like a real nasty bug description there.
What are the details on that? Is this particular to a given system or
controller config or something liek that or are we talking global to Solaris
10 Update 2 everywhere ?? :-(
Thats a good question. Looking the internal evaluation, it appears
scrubs can be a little too aggressive.
Perhaps one of the ZFS engineers can comment, Jeff?
I am curious about the "fix delivered" state as well. Looks like its
been fixed in SNV 36, but I wonder if there will be a patch available.
--joe
Bug ID: 6355416
Synopsis: zpool scrubbing consumes all memory, system hung
Category: kernel
Subcategory: zfs
State: 10-Fix Delivered <<-- in a patch somewhere ?
Description:
On a 6800 domain with 8G of RAM I created a zpool using a single 18G drive
and on that pool created a file system and a zvol. The zvol was filled with
data.
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool 11.0G 5.58G 9.00K /pool
pool/fs 8K 5.58G 8K /pool/fs
pool/[EMAIL PROTECTED] 0 - 8K -
pool/root 11.0G 5.58G 11.0G -
pool/[EMAIL PROTECTED] 783K - 11.0G -
#
I then attached a second 18g drive to the pool and all seemed well. After a
few minutes however the system ground to a halt. No response from the
keyboard.
Aborting the system it failed to dump due to the dump device being to small.
On rebooting it did not make it into multi user.
Booting milestone=none and then bringing it up by had I could see it hung
doing zfs mount -a.
Booting milestone=none again I was able to export the pool and then the
system would come up into multiuser. Any attempt to import the pool would
hang the system , running vmstat showing it consumed all available memory.
With the pool exported I reinstalled the system with a larger dump device
and then imported the pool. The same hung occurred however this time I got
the crash dump.
Dumps can be found here:
/net/enospc.uk/export/esc/pts-crashdumps/zfs_nomemory
Dump 0 is from stock build 72a dump 1 from my workspace and had KMF_AUDIT
set. The only change in my workspace is to the isp driver.
::kmausers gives:> ::kmausers
365010944 bytes for 44557 allocations with data size 8192:
kmem_cache_alloc+0x148
segkmem_xalloc+0x40
segkmem_alloc+0x9c
vmem_xalloc+0x554
vmem_alloc+0x214
kmem_slab_create+0x44
kmem_slab_alloc+0x3c
kmem_cache_alloc+0x148
kmem_zalloc+0x28
zio_create+0x3c
zio_vdev_child_io+0xc4
vdev_mirror_io_start+0x1ac
spa_scrub_cb+0xe4
traverse_segment+0x2e8
traverse_more+0x7c
362520576 bytes for 44253 allocations with data size 8192:
kmem_cache_alloc+0x148
segkmem_xalloc+0x40
segkmem_alloc+0x9c
vmem_xalloc+0x554
vmem_alloc+0x214
kmem_slab_create+0x44
kmem_slab_alloc+0x3c
kmem_cache_alloc+0x148
kmem_zalloc+0x28
zio_create+0x3c
zio_read+0x54
spa_scrub_io_start+0x88
spa_scrub_cb+0xe4
traverse_segment+0x2e8
traverse_more+0x7c
241177600 bytes for 376840 allocations with data size 640:
kmem_cache_alloc+0x88
kmem_zalloc+0x28
zio_create+0x3c
zio_vdev_child_io+0xc4
vdev_mirror_io_done+0x254
taskq_thread+0x1a0
209665920 bytes for 327603 allocations with data size 640:
kmem_cache_alloc+0x88
kmem_zalloc+0x28
zio_create+0x3c
zio_read+0x54
spa_scrub_io_start+0x88
spa_scrub_cb+0xe4
traverse_segment+0x2e8
traverse_more+0x7c
I have attached the full output.
If I am quick I can detatch the disk and the export the pool before the
system grinds to a halt. Then reimporting the pool I can access the data.
Attaching the disk again results in the system using all the memory again.
Date Modified: 2005-11-25 09:03:07 GMT+00:00
Work Around:
Suggested Fix:
Evaluation:
Fixed by patch:
Integrated in Build: snv_36
Duplicate of:
Related Change Request(s):6352306 6384439 6385428
Date Modified: 2006-03-23 23:58:15 GMT+00:00
Public Summary:
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss