Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Jeff Victor
Hello Gael,

On Mon, Mar 2, 2009 at 10:08 PM, Gael  wrote:
> Hello
>
> Got a zone running SAS with cpu capping enabled using a processor set as we
> see a few processes using quite a bit of cpu there too often.

Is that zone assigned to a resource pool, or is it using the
dedicated-cpus feature?

> When the process is running (chewing 100% of its pset), the frame nic (server 
> is a E2900 with a ce interface) is dropping 20-30 % of its packets
> causing a headache.

My first guess is that the NICs interrupts are going to a CPU that the
zone is using, and the CPU doesn't have enough power to run the zone's
workload *and* be an effective NIC interrupt handler.

Please run the "intrstat" command as root in the global zone, to
determine which CPU is handling interrupts for that NIC. Also, check
which CPU(s) that zone can use.

Please let us know what you learn from those.


> Doesn't appear to be a network load issue. Not a lot happening there visibly.
>
> With Solaris 10 u4 or u6, what elegant way would you recommend to avoid that
> disruption caused by a single zone ?
>
> Regards
>
> --
> Gael
>
>
> ___
> zones-discuss mailing list
> zones-discuss@opensolaris.org
>



-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Gael
Many thanks to  Bob Netherton and Jeff for their quick help on that painful
issue.
The solution was to use psrset -f on the heavily used pset.

It is fully supported and a recommended situation when CPU starvation causes
interrupts not to be serviced in
time and they get lost.   Credit goes to Rickey Weisner for this tip.

I have monitored that zone today for multiple hours without seeing any
packet loss while it was cranking up its cpu usage...

Jeff, following a previous mail today, as a fervent customer ;), I would
love to see that feature directly accessible thru the zone configuration to
avoid having to create a script and a dirty workaround to enable that
feature on boot. Is there a RFE # out there that I can be added to thru Sun
Support ? Got a case opened on that issue.

Will continue to monitor the situation for a few days, and if I see anything
wrong, I will update that thread

Again, thanks !

Regards

On Tue, Mar 3, 2009 at 2:19 PM, Jeff Victor  wrote:

> Hello Gael,
>
> On Mon, Mar 2, 2009 at 10:08 PM, Gael  wrote:
> > Hello
> >
> > Got a zone running SAS with cpu capping enabled using a processor set as
> we
> > see a few processes using quite a bit of cpu there too often.
>
> Is that zone assigned to a resource pool, or is it using the
> dedicated-cpus feature?
>
> > When the process is running (chewing 100% of its pset), the frame nic
> (server is a E2900 with a ce interface) is dropping 20-30 % of its packets
> > causing a headache.
>
> My first guess is that the NICs interrupts are going to a CPU that the
> zone is using, and the CPU doesn't have enough power to run the zone's
> workload *and* be an effective NIC interrupt handler.
>
> Please run the "intrstat" command as root in the global zone, to
> determine which CPU is handling interrupts for that NIC. Also, check
> which CPU(s) that zone can use.
>
> Please let us know what you learn from those.
>
>
> > Doesn't appear to be a network load issue. Not a lot happening there
> visibly.
> >
> > With Solaris 10 u4 or u6, what elegant way would you recommend to avoid
> that
> > disruption caused by a single zone ?
> >
> > Regards
> >
> > --
> > Gael
> >
> >
> > ___
> > zones-discuss mailing list
> > zones-discuss@opensolaris.org
> >
>
>
>
> --
> --JeffV
>



-- 
Gael Martinez
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Jeff Victor
On Tue, Mar 3, 2009 at 8:39 PM, Gael  wrote:
>
> Many thanks to  Bob Netherton and Jeff for their quick help on that painful 
> issue.
> The solution was to use psrset -f on the heavily used pset.
> It is fully supported and a recommended situation when CPU starvation causes
> interrupts not to be serviced in time and they get lost.   Credit goes to 
> Rickey Weisner for this tip.
>
> I have monitored that zone today for multiple hours without seeing any
> packet loss while it was cranking up its cpu usage...
> Jeff, following a previous mail today, as a fervent customer ;), I would
> love to see that feature directly accessible thru the zone configuration to
> avoid having to create a script and a dirty workaround to enable that
> feature on boot. Is there a RFE # out there that I can be added to thru Sun
> Support ? Got a case opened on that issue.

Yes, the CR is 6199531 - "Device interrupts not bound to cpus
configured within a nonglobal zone"

Please ask your contact in Sun Service to add an SR for you.

> Will continue to monitor the situation for a few days, and if I see anything 
> wrong, I will update that thread
> Again, thanks !
> Regards
>
> On Tue, Mar 3, 2009 at 2:19 PM, Jeff Victor  wrote:
> - Show quoted text -
>>
>> Hello Gael,
>>
>> On Mon, Mar 2, 2009 at 10:08 PM, Gael  wrote:
>> > Hello
>> >
>> > Got a zone running SAS with cpu capping enabled using a processor set as we
>> > see a few processes using quite a bit of cpu there too often.
>>
>> Is that zone assigned to a resource pool, or is it using the
>> dedicated-cpus feature?
>>
>> > When the process is running (chewing 100% of its pset), the frame nic
>> > (server is a E2900 with a ce interface) is dropping 20-30 % of its packets
>> > causing a headache.
>>
>> My first guess is that the NICs interrupts are going to a CPU that the
>> zone is using, and the CPU doesn't have enough power to run the zone's
>> workload *and* be an effective NIC interrupt handler.
>>
>> Please run the "intrstat" command as root in the global zone, to
>> determine which CPU is handling interrupts for that NIC. Also, check
>> which CPU(s) that zone can use.
>>
>> Please let us know what you learn from those.
>>
>> > Doesn't appear to be a network load issue. Not a lot happening there 
>> > visibly.
>> >
>> > With Solaris 10 u4 or u6, what elegant way would you recommend to avoid 
>> > that
>> > disruption caused by a single zone ?



-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org