> At 6 threads, load is spread fairly evenly at 30-70 % on each of 24 cores.

This is because in the case of your single host loopback test, the data
isn't having to traverse the full stack and the iperf threads are free to
get scheduled pretty evenly across all your cores. In fact, because of TCP
Fusion, it won't even have to go down to layer 3. No network interrupt
handling is eating cycles on a single core (or handful of cores if you
increased the interrupt vectors).

Nick



On 21 July 2014 15:44, Chris Ferebee via smartos-discuss <
[email protected]> wrote:

> Keith,
>
> My understanding is that iperf runs parallel threads when launched with
> the -P # option. I'm using this one:
>
> # ./iperf -v
> iperf version 2.0.5 (08 Jul 2010) pthreads
>
> The test results I posted earlier showed that the total throughput
> reported by iperf is slightly lower when running 10 threads in parallel
> between the two servers (-P 10, SUM =  2.51 Gbit/s) than when running a
> single thread (2.93 Gbit/s).
>
> On the sender side, while running the tests, with a single thread one CPU
> core goes to about 85% utilization with two others around 15-25% (usr).
>
> With 3 or more threads, one core is pegged at 100% and a few others reach
> around 10-40%.
>
> Furthermore, to test your hypothesis I tried running the iperf server and
> client on the same machine (the one with Dual Xeon E5-2620).
>
> Total throughput reaches a maximum of about 200 Gbit/s at 6 threads,
> remaining fairly constant when the number of threads is increased further.
>
> At 6 threads, load is spread fairly evenly at 30-70 % on each of 24 cores.
> At 100 threads, total throughput is still about 170 Gbit/s, with all cores
> maxed out at 97+.
>
> To me, this looks like the single-threaded bottleneck is somewhere further
> down the stack. What do you think?
>
> Thanks,
> Chris
>
>
> Am 21.07.2014 um 15:51 schrieb Keith Wesolowski via smartos-discuss <
> [email protected]>:
>
> > On Mon, Jul 21, 2014 at 01:31:05PM +0200, Chris Ferebee via
> smartos-discuss wrote:
> >
> >> Observing CPU utilization during the test using mpstat, I see that all
> cores but one are mostly idle, and one core goes to 100% utilization, even
> when running iperf with a single thread.
> >>
> >> Nick suggested that based on this, I should try increasing
> rx_queue_number and tx_queue_number for the ixgbe driver. AFAICS, I would
> need to do that in /kernel/drv/ixgbe.conf, which in turn means I need to do
> something like
> >
> > A more likely hypothesis to test would be that the single-threaded
> > generation of data to be placed on the network is your limiting factor.
> >
> >
> > -------------------------------------------
> > smartos-discuss
> > Archives: https://www.listbox.com/member/archive/184463/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/184463/24804823-eebbfb1e
> > Modify Your Subscription: https://www.listbox.com/member/?&;
> > Powered by Listbox: http://www.listbox.com
>
>
>
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed:
> https://www.listbox.com/member/archive/rss/184463/22416839-083bd2e9
> Modify Your Subscription:
> https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com
>



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to