Hi Murali,

Q. Since you mention two hypervisors - what is the phyiscal network topology in 
between these two servers? What theoretical link rates would be attainable?
Here is the topology

Iperf end points are on 2 different hypervisors. ——————————— ———————————————— —————— ——————-— | Linux VM1 | | BSD 13 VM 1 | | Linux VM2 | | BSD 13 VM 2 |
|___________|      |_ ____ ____ ___ |                                           
                                       |___________ |                |_ ____ 
____ ___ |
|          |                         |                                          
                                                                 |              
                     |
          |                          |                                          
                                                                 |              
                     |
———————————————                                                                 
                 ———————————————
|           ESX Hypervisor 1          |           10G link connected via L2 
Switch                      |           ESX Hypervisor  2            |
|                                               |————————————————————————   |   
                                             |
|—————————————— |                                                               
                    |——————————————|


Nic is of 10G capacity on both ESX server and it has below config.


So, when both VMs run on the same Hypervisor, maybe with another VM to simulate 
the 100ms delay, can you attain a lossless baseline scenario?


BDP for 16MB Socket buffer: 16 MB * (1000 ms * 100ms latency) * 8 bits/ 1024 = 
1.25 Gbps

So theoretically we should see close to 1.25Gbps of Bitrate and we see Linux 
reaching close to this number.

Under no loss, yes.


But BSD is not able to do that.


Q. Did you run iperf3? Did the transmitting endpoint report any retransmissions 
between Linux or FBSD hosts?

Yes, we used iper3. I see Linux doing less number retransmissions compared to BSD. On BSD, the best performance was around 600 Mbps bitrate and the number of retransmissions for this number seen is around 32K On Linux, the best performance was around 1.15 Gbps bitrate and the number of retransmissions for this number seen is only 2K. So as you pointed the number of retransmissions in BSD could be the real issue here.

There are other cc modules available; but I believe one major deviation is that 
Linux can perform mechanisms like hystart; ACKing every packet when the client 
detects slow start; perform pacing to achieve more uniform packet transmissions.

I think the next step would be to find out, at which queue those packet 
discards are coming from (external switch? delay generator? Vswitch? Eth stack 
inside the VM?)

Or alternatively, provide your ESX hypervisors with vastly more link speed, to 
rule out any L2 induced packet drops - provided your delay generator is not the 
source when momentarily overloaded.

Is there a way to reduce this packet loss by fine tuning some parameters w.r.t ring buffer or any other areas?

Finding where these arise (looking at queue and port counters) would be the 
next step. But this is not really my specific area of expertise beyond the high 
level, vendor independent observations.

Switching to other cc modules may give some more insights. But again, I suspect 
that momentary (microsecond) burstiness of BSD may be causing this 
significantly higher loss rate.

TCP RACK would be another option. That stack has pacing, more fine-grained 
timing, the RACK loss recovery mechanisms etc. Maybe that helps reduce the 
observed packet drops by iperf, and consequently, yield a higher overall 
throuhgput.



Attachment: OpenPGP_0x17BE5899E0B1439B.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to