Re: [zfs-discuss] [dtrace-discuss] How to drill down cause of cross-calls in the kernel? (output provided)

2009-09-23 Thread Jim Leonard
 The only thing that jumps out at me is the ARC size -
 53.4GB, or
 most of your 64GB of RAM. This in-and-of-itself is
 not necessarily
 a bad thing - if there are no other memory consumers,
 let ZFS cache
 data in the ARC. But if something is coming along to
 flush dirty
 ARC pages periodically

The workload is a set of 50 python processes, each receiving a stream of data 
via TCP/IP.  The processes run until they notice something interesting in the 
stream (sorry I can't be more specific), then they connect to a server via 
TCP/IP and issue a command or two.  Log files are written that takes up about 
50M per day per process.  It's relatively low-traffic.
 
 I found what looked to be an applicable bug;
 CR 6699438 zfs induces crosscall storm under heavy
 mapped sequential 
 read workload
 but the stack signature for the above bug is
 different than yours, and
 it doesn't sound like your workload is doing mmap'd
 sequential reads.
 That said, I would be curious to know if your
 workload used mmap(),
 versus read/write?

I asked and they couldn't say.  It's python so I think it's unlikely.

 For the ZFS folks just seeing this, here's the stack
 frame;
 
   unix`xc_do_call+0x8f
 unix`xc_wait_sync+0x36
   unix`x86pte_invalidate_pfn+0x135
 unix`hat_pte_unmap+0xa9
   unix`hat_unload_callback+0x109
 unix`hat_unload+0x2a
   unix`segkmem_free_vn+0x82
 unix`segkmem_zio_free+0x10
   genunix`vmem_xfree+0xee
 genunix`vmem_free+0x28
   genunix`kmem_slab_destroy+0x80
 genunix`kmem_slab_free+0x1be
   genunix`kmem_magazine_destroy+0x54
 genunix`kmem_depot_ws_reap+0x4d
   genunix`taskq_thread+0xbc
 unix`thread_start+0x8
 
 Let's see what the fsstat and zpool iostat data looks
 like when this
 starts happening..

Both are unremarkable, I'm afraid.  Here's the fsstat from when it starts 
happening:

new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
0 0 0 75 0 0 0 0 0 10 1.25M zfs
0 0 0 83 0 0 0 0 0 7 896K zfs
0 0 0 78 0 0 0 0 0 13 1.62M zfs
0 0 0 229 0 0 0 0 0 29 3.62M zfs
0 0 0 217 0 0 0 0 0 28 3.37M zfs
0 0 0 212 0 0 0 0 0 26 3.03M zfs
0 0 0 151 0 0 0 0 0 18 2.07M zfs
0 0 0 184 0 0 0 0 0 31 3.41M zfs
0 0 0 187 0 0 0 0 0 32 2.74M zfs
0 0 0 219 0 0 0 0 0 24 2.61M zfs
0 0 0 222 0 0 0 0 0 29 3.29M zfs
0 0 0 206 0 0 0 0 0 29 3.26M zfs
0 0 0 205 0 0 0 0 0 19 2.26M zfs
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import hangs the entire server (please help; data included)

2009-07-03 Thread Jim Leonard
As the subject says, I can't import a seemingly okay raidz pool and I really 
need to as it has some information on it that is newer than the last backup 
cycle :-(  I'm really in a bind; I hope anyone can help...

Background:  A drive in a four-slice pool failed (I have to use slices due to a 
motherboard BIOS limitation; EFI labels cause POST to choke).  I exported the 
pool, powered down, replaced the drive, and now the entire server locks up when 
I attempt an import.  At first I suspected the .cache was fubar'd so I did an 
export to clear it:

--begin--
r...@fortknox:~# zpool export vault
cannot open 'vault': no such pool
===end===

Okay, so the .cache is good.  And it can somehow see there is a pool there, but 
when trying to import:

--begin--
r...@fortknox:~# zpool import
  pool: vault
id: 12084546386451079719
 state: DEGRADED
status: One or more devices are offlined.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
config:

vault DEGRADED
  raidz1  DEGRADED
c8t0d0s6  ONLINE
c8t1d0s6  ONLINE
c9t0d0s6  OFFLINE
c9t1d0s6  ONLINE
r...@fortknox:~# zpool import vault
===end===

...it hangs indefinitely, entire server locked up (although it responds to 
pings).  But I know the info is there because zdb -l /dev/dsk/c8t0d0s6 shows:

--begin--
r...@fortknox:~# zdb -l /dev/dsk/c8t0d0s6

LABEL 0

version=14
name='vault'
state=1
txg=689703
pool_guid=12084546386451079719
hostid=4288054
hostname='fortknox'
top_guid=18316851491481709534
guid=9202175319063431582
vdev_tree
type='raidz'
id=0
guid=18316851491481709534
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6000749838336
is_log=0
children[0]
type='disk'
id=0
guid=9202175319063431582
path='/dev/dsk/c8t0d0s6'
devid='id1,s...@sata_st31500341as9vs0n9kw/g'
phys_path='/p...@0,0/pci1462,7...@7/d...@0,0:g'
whole_disk=0
children[1]
type='disk'
id=1
guid=14662350669876577780
path='/dev/dsk/c8t1d0s6'
devid='id1,s...@sata_st31500341as9vs20p51/g'
phys_path='/p...@0,0/pci1462,7...@7/d...@1,0:g'
whole_disk=0
children[2]
type='disk'
id=2
guid=12094645433779503688
path='/dev/dsk/c9t0d0s6'
devid='id1,s...@sata_st31500341as9vs1l8vy/g'
phys_path='/p...@0,0/pci1462,7...@8/d...@0,0:g'
whole_disk=0
DTL=179
offline=1
faulted=1
children[3]
type='disk'
id=3
guid=15554931888608113584
path='/dev/dsk/c9t1d0s6'
devid='id1,s...@sata_st31500341as9vs232h8/g'
phys_path='/p...@0,0/pci1462,7...@8/d...@1,0:g'
whole_disk=0

LABEL 1

version=14
name='vault'
state=1
txg=689703
pool_guid=12084546386451079719
hostid=4288054
hostname='fortknox'
top_guid=18316851491481709534
guid=9202175319063431582
vdev_tree
type='raidz'
id=0
guid=18316851491481709534
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=6000749838336
is_log=0
children[0]
type='disk'
id=0
guid=9202175319063431582
path='/dev/dsk/c8t0d0s6'
devid='id1,s...@sata_st31500341as9vs0n9kw/g'
phys_path='/p...@0,0/pci1462,7...@7/d...@0,0:g'
whole_disk=0
children[1]
type='disk'
id=1
guid=14662350669876577780
path='/dev/dsk/c8t1d0s6'
devid='id1,s...@sata_st31500341as9vs20p51/g'
phys_path='/p...@0,0/pci1462,7...@7/d...@1,0:g'
whole_disk=0
children[2]
type='disk'
id=2
guid=12094645433779503688
path='/dev/dsk/c9t0d0s6'
devid='id1,s...@sata_st31500341as9vs1l8vy/g'
phys_path='/p...@0,0/pci1462,7...@8/d...@0,0:g'
whole_disk=0
DTL=179
offline=1
faulted=1
children[3]
type='disk'