Ian,
I agree that something is wrong. The whole point of using those fancy Intel
NICs should be to reduce the CPU load, right?
Here is a technote from HELIOS, makers of the enterprise AFP server EtherShare,
about 10GbE tuning:
<http://www.helios.de/web/EN/support/TI/154.html>
They say that tuning should usually not be necessary unless you are running
10GbE on both ends of a connection.
Oh, and look at this. It sounds like X520 CPU load issues have been a problem
in FreeBSD too:
<http://unix.derkeiler.com/Mailing-Lists/FreeBSD/performance/2012-01/msg00021.html>
Unfortunately, I can't dig into this much further right now, but please keep us
in the loop if you solve things on your end!
Best,
Chris
Am 09.08.2014 um 01:29 schrieb Ian Collins via smartos-discuss
<[email protected]>:
> Chris Ferebee wrote:
>> Ian,
>>
>> Right now I'm fighting with my Finder/AFP/netatalk/getcwd() performance
>> issues, which are a landmine, so the 10GbE slowdowns are the least of my
>> worries. But here is what I did find out.
>>
>> It helps to tune the following TCP stack parameters:
>>
>> # ndd -set /dev/tcp tcp_recv_hiwat 400000
>> # ndd -set /dev/tcp tcp_xmit_hiwat 400000
>> # ndd -set /dev/tcp tcp_max_buf 2097152
>> # ndd -set /dev/tcp tcp_cwnd_max 16777216
>>
>> Still, I can max out one CPU core at 100% by running a small number of
>> threads of netstat or iperf (doesn't really matter which) in parallel. The
>> other cores stay mostly idle.
>>
>> After tuning the parameters as above, I was seeing about 3 Gbit/s throughput
>> over the X520 with several threads.
>>
>> I think the bottleneck is in the ixgbe driver, because running the same
>> tests on localhost gives about 200 Gbit/s throughput, so the threads
>> producing and consuming the data are definitely not at fault.
>>
>> Considering comments by Nick Perry and others I suspect it would be worth
>> trying to increase the ixgbe driver's rx_queue_number and tx_queue_number
>> via /kernel/drv/ixgbe.conf. AIR the max # of queues depends on your hardware
>> revision and is either 8 or 16 depending on the Intel part number, while the
>> default is 1. As I understand it, this would allow parallelizing the driver
>> load across multiple cores.
>>
>
> I'm still very skeptical about all these tweaks. I have tried them and I've
> been using them for streaming ZFS over 1GE networks.
>
> Running a test application of mine (without changing any TCP settings) that
> loads a big file (typically something that can't be compressed such as a
> video file) into RAM and then writes it multiple times to disk I get a peek
> throughput of >700MB/s to an NFD share using Intel X540 10GE cards to a
> Solaris 11 host. The overall transfer is limited by the pool write capacity
> (~400MB/s long term).
>
> --
> Ian.
>
>
>
> -------------------------------------------
> smartos-discuss
> Archives: https://www.listbox.com/member/archive/184463/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/184463/24804823-eebbfb1e
> Modify Your Subscription: https://www.listbox.com/member/?&
> Powered by Listbox: http://www.listbox.com
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com