[zfs-discuss] multiple crashes upon boot after upgrading build 134 to 138, 139 or 140

2010-05-25 Thread Steve Gonczi
Greetings,

I see repeatable crashes on some systems after upgrading.. the signature is 
always the same:

operating system: 5.11 snv_139 (i86pc)
panic message: BAD TRAP: type=e (#pf Page fault) rp=ff00175f88c0 addr=0 
occurred in module "genunix" due to a NULL pointer dereference

list_remove+0x1b(ff03e19339f0, ff03e0814640)
zfs_acl_release_nodes+0x34(ff03e19339c0)
zfs_acl_free+0x16(ff03e19339c0)
zfs_znode_free+0x5e(ff03e17fa600)
zfs_zinactive+0x9b(ff03e17fa600)
zfs_inactive+0x11c(ff03e17f8500, ff03ee867528, 0)
fop_inactive+0xaf(ff03e17f8500, ff03ee867528, 0)
vn_rele_dnlc+0x6c(ff03e17f8500)
dnlc_purge+0x175()
nfs_idmap_args+0x5e(ff00175f8c38)
nfssys+0x1e1(12, 8047dd8)

The stack always looks like the above, the vnode involved is sometimes a file,
sometimes a directory.

e.g.: I have seen the /boot/acpi directory  and the 
/kernel/drv/amd64/acpi_driver
fie in the vnode's path field.
 
looking at the data, I notice that  the z_acl.list_head  indicates a single 
member in the list ( presuming that is the case,
because list_prev and list_next point to the same address):

(ff03e19339c0)::print zfs_acl_t
{
z_acl_count = 0x6
z_acl_bytes = 0x30
z_version = 0x1
z_next_ace = 0xff03e171d210
z_hints = 0
z_curr_node = 0xff03e0814640
z_acl = {
list_size = 0x40
list_offset = 0
list_head = {
list_next = 0xff03e0814640
list_prev = 0xff03e0814640
}
}

This member's next pointer is bad ( sometimes zero, sometimes a low number, eg. 
0x10)
The null pointer  crash happens trying to follow the list_prev pointer:

 0xff03e0814640::print zfs_acl_node_t
{
z_next = {
list_next = 0
list_prev = 0
}
z_acldata = 0xff03e10b6230
z_allocdata = 0xff03e171d200
z_allocsize = 0x30
z_size = 0x30
z_ace_count = 0x6
z_ace_idx = 0x2
}


This is a repeating pattern,  seems to me always a single zfs_acl_node  in the 
list,
with null / garbaged out  list_next and list_prev pointers.
e.g.: in another instance of this crash, the zfs_acl_node looks like this:

::stack
list_remove+0x1b(ff03e10d24f0, ff03e0fc9a00)
zfs_acl_release_nodes+0x34(ff03e10d24c0)
zfs_acl_free+0x16(ff03e10d24c0)
zfs_znode_free+0x5e(ff03e10cc200)
zfs_zinactive+0x9b(ff03e10cc200)
zfs_inactive+0x11c(ff03e1281840, ff03ea5c7010, 0)
fop_inactive+0xaf(ff03e1281840, ff03ea5c7010, 0)
vn_rele_dnlc+0x6c(ff03e1281840)
dnlc_purge+0x175()
nfs_idmap_args+0x5e(ff001811ac38)
nfssys+0x1e1(12, 8047dd8)
_sys_sysenter_post_swapgs+0x149()
> ::status
...
panic message: BAD TRAP: type=e (#pf Page fault) rp=ff001811a8c0 addr=10 
occurred in module "genunix" due to a NULL pointer dereference

>  ff03e0fc9a00::print zfs_acl_node_t
{
z_next = {
list_next = 0xff03e10e1cd9
list_prev = 0x10
}
z_acldata = 0
z_allocdata = 0xff03e10cb5d0
z_allocsize = 0x30
z_size = 0x30
z_ace_count = 0x6
z_ace_idx = 0x2
}

Looks to me the crash here is the same, and list_next / list_prev are garbage.

Anybody seen this?
Am I skipping  too many versions when I am image-updating?
I am hoping someone who knows this code would chime in.

Steve
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple crashes upon boot after upgrading build 134 to 138, 139 or 140

2010-05-25 Thread Steve Gonczi
As I am looking at this further, I convince myself this should really be an 
assert.
(I am running release builds, so  assert-s do not fire).

I think in a debug build, I should be seeing the !list_empty()  assert in:
 
list_remove(list_t *list, void *object)
 {
list_node_t *lold = list_d2l(list, object);
ASSERT(!list_empty(list));
ASSERT(lold->list_next != NULL);
list_remove_node(lold);
 }
 

I am suspecting, maybe this is a race.

Assuming there is not other interfering thread, this crash could never happen..
tatic void
 zfs_acl_release_nodes(zfs_acl_t *aclp)
 {
zfs_acl_node_t *aclnode;
 
while (aclnode = list_head(&aclp->z_acl)) {
list_remove(&aclp->z_acl, aclnode);
zfs_acl_node_free(aclnode);
}
aclp->z_acl_count = 0;
aclp->z_acl_bytes = 0;
 }

List_head does a list_empty() check, and  returns null on empty.
So if we got past that, list_remove() should never find an empty list, perhaps 
there
is interference from another thread.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multiple crashes upon boot after upgrading build 134 to 138, 139 or 140

2010-05-26 Thread Steve Gonczi
More info:

The crashes go away just by swapping the cpu to a faster/more horsepower cpu.
On one box where the crash consistently happened (2 core slow cpu)
I no longer see the crash after swapping to a quad core.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS p[erformance drop with new Xeon 55xx and 56xx cpus

2010-08-11 Thread Steve Gonczi
Greetings,

I am seeing some unexplained performance drop using the above cpus,
using a fairly up-to-date build ( late 145).
Basically, the system seems to be 98% idle, spending most if its time in this 
stack:

  unix`i86_mwait+0xd
  unix`cpu_idle_mwait+0xf1
  unix`idle+0x114
  unix`thread_start+0x8
   455645

Most cpus seem to be idling most of the time, sitting on the mwait instruction.
No lock contention, not waiting on io, I am finding myself at a loss explaining 
what this system is doing. 
(I am monitoring the system w. lockstat, mpstat, prstat).  Despite the 
predominantly idle system, 
I see some latency reported by prstat microstate accounting on the zfs threads.
 
This is a fairly beefy box, 24G memory,  16 cpus.
Doing a local zfs send | receive, should be getting at least 100MB+, 
and I am only getting  5-10MB.  
I see some Intel errata on the 55xx series xeons, a problem with the 
monitor/mwait instructions, that could conceivably cause missed wake-up or 
mis-reported  mwait status.

Anybody else is seeing this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is ZFS metadata in regard to caching and redundant storage policy?

2011-07-07 Thread Steve Gonczi

Hi Jim, 

Non-metadata is the level zero blocks (leaf blocks) of those zfs objects 
that store user payload. (plain files, volumes and the like) 

Everything else is metadata. 

The place to look is dmu.c. 

Any object type enumerated in dmu_ot that is flagged "TRUE" is metadata 
in its entirety, including its payload ( ie its level zero blocks). 

Example: 
const dmu_object_type_info_t dmu_ot [ DMU_OT_NUMTYPES ] = { { 
byteswap_uint8_array , TRUE , "unallocated" }, { zap_byteswap , TRUE , "object 
directory" }, { zfs_oldacl_byteswap , TRUE , "ZFS V0 ACL" }, 
 

{ byteswap_uint8_array , FALSE , "ZFS plain file" }, { zap_byteswap , TRUE , 
"ZFS directory" }, { zap_byteswap , TRUE , "ZFS master node" }, 
{ zap_byteswap , TRUE , "ZFS delete queue" }, 
{ byteswap_uint8_array , FALSE , "zvol object" }, 
Cheers, 

Steve Gonczi 

- Original Message -
> I just found that I do not exactly know: What is ZFS metadata in regard 
> to caching and redundant storage policy? What sort of blocks does it 
> include and what - doesn't? 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0

2012-01-04 Thread Steve Gonczi
The interesting bit is what happens inside arc_reclaim_needed(), 
that is, how it arrives at the conclusion that there is memory pressure. 

Maybe we could trace arg0, which gives the location where 
we have left the function. This would finger which return path 
arc_reclaim_needed() took. 

Steve 


- Original Message -



Well it looks like the only place this get's changed is in the 
arc_reclaim_thread for opensolaris. I suppose you could dtrace it to see what 
is going on and investigate what is happening to the return code of the 
arc_reclaim_needed is. 



http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#2089
 


maybe 


dtrace -n 'fbt:zfs:arc_reclaim_needed:return { trace(arg1) }' 


Dave 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver restarting several times

2012-05-12 Thread Steve Gonczi
Jim, 

This makes sense. 
fmdump -eV reported that your original drive was experiencing repeated 
fread failures ( scsi command code 0x28) 


Steve 

- Original Message -
 
Well, I decided to bite the bullet and kick the original 
disk from the pool after replacing it with the spare, and 
to say the least, speed prognosis looks a lot better now, 
about two times better in zpool's resilver progress/estimate 
and by up to an order of magnitude better at iostat speeds: 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss