On 06/06/2012 04:55 PM, Richard Elling wrote:
> On Jun 6, 2012, at 12:48 AM, Sašo Kiselkov wrote:
> 
>> So I have this dual 16-core Opteron Dell R715 with 128G of RAM attached
>> to a SuperMicro disk enclosure with 45 2TB Toshiba SAS drives (via two
>> LSI 9200 controllers and MPxIO) running OpenIndiana 151a4 and I'm
>> occasionally seeing a storm of xcalls on one of the 32 VCPUs (>100000
>> xcalls a second).
> 
> That isn't much of a storm, I've seen > 1M xcalls in some cases...

Well it does make one of the cores 100% busy for around 10-15 seconds,
so it is processing at the maximum rate the core can do it. I'd call
that a sign of "something bad(tm)" going on.

>> The machine is pretty much idle, only receiving a
>> bunch of multicast video streams and dumping them to the drives (at a
>> rate of ~40MB/s). At an interval of roughly 1-2 minutes I get a storm of
>> xcalls that completely eat one of the CPUs, so the mpstat line for the
>> CPU looks like:
>>
>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>> 31    0   0 102191     1    0    0    0    0    0    0     0    0 100
>> 0   0
>>
>> 100% busy in the system processing cross-calls. When I tried dtracing
>> this issue, I found that this is the most likely culprit:
>>
>> dtrace -n 'sysinfo:::xcalls {@[stack()]=count();}'
>>   unix`xc_call+0x46
>>   unix`hat_tlb_inval+0x283
>>   unix`x86pte_inval+0xaa
>>   unix`hat_pte_unmap+0xed
>>   unix`hat_unload_callback+0x193
>>   unix`hat_unload+0x41
>>   unix`segkmem_free_vn+0x6f
>>   unix`segkmem_zio_free+0x27
>>   genunix`vmem_xfree+0x104
>>   genunix`vmem_free+0x29
>>   genunix`kmem_slab_destroy+0x87
>>   genunix`kmem_slab_free+0x2bb
>>   genunix`kmem_magazine_destroy+0x39a
>>   genunix`kmem_depot_ws_reap+0x66
>>   genunix`taskq_thread+0x285
>>   unix`thread_start+0x8
>> 3221701
>>
>> This happens in the sched (pid 0) process. My fsstat one looks like this:
>>
>> # fsstat /content 1
>> new  name   name  attr  attr lookup rddir  read read  write write
>> file remov  chng   get   set    ops   ops   ops bytes   ops bytes
>>    0     0     0   664     0    952     0     0     0   664 38.0M /content
>>    0     0     0   658     0    935     0     0     0   656 38.6M /content
>>    0     0     0   660     0    946     0     0     0   659 37.8M /content
>>    0     0     0   677     0    969     0     0     0   676 38.5M /content
>>
>> What's even more puzzling is that this happens apparently entirely
>> because of some factor other than userland, since I see no changes to
>> CPU usage of processes in prstat(1M) when this xcall storm happens, only
>> an increase of loadavg of +1.00 (the busy CPU).
> 
> What exactly is the workload doing?

As I wrote above, just receiving multicast video streams and writing
them to disk files, nothing else. The fsstat lines above show that -
pure write load.

> Local I/O, iSCSI, NFS, or CIFS?

Purely local I/O via the two LSI SAS controllers, nothing else.

>> I Googled and found that
>> http://mail.opensolaris.org/pipermail/dtrace-discuss/2009-September/008107.html
>> seems to have been an issue identical to mine, however, it remains
>> unresolved at that time and it worries me about putting this kind of
>> machine into production use.
>>
>> Could some ZFS guru please tell me what's going on in segkmem_zio_free?
> 
> It is freeing memory.

Yes, but why is this causing a ton of cross-calls?

>> When I disable the writers to the /content filesystem, this issue goes
>> away, so it has obviously something to do with disk IO. Thanks!
> 
> Not directly related to disk I/O bandwidth. Can be directly related to other
> use, such as deletions -- something that causes frees.

When I'm not writing to disk it doesn't happen, so my guess that it
indeed has something to do with (perhaps) ZFS freeing txg buffers or
something...

> Depending on the cause, there can be some tuning that applies for large
> memory machines, where large is >= 96 MB.
>  -- richard

I'll try and load the machine with dd(1) to the max to see if access
patterns of my software have something to do with it.

Cheers,
--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to