Hi Hunter

I run an ESXi host on a USDT system and use a USB3 LAN dongle to give me a seperate network for user/management traffic so I can use the onboard one for iSCSI. This was done following the artivle here:

https://www.virtuallyghetto.com/2016/03/working-usb-ethernet-adapter-nic-for-esxi.html

I note that that USB interface can be dropping packets all the time, not a big problem if the protocols can handle that and RDP etc suffers no real issues. But running something like TotalNetworkMonitor on a VM there you do see that there are up to 50% or so ping packets lost in its probes.

Could be that you are seeing a similar behaviour where the protocol doesn't handle lost packets too well...

regards
Dave



On Fri, 20 Jul 2018 03:15:34 +0100, Hunter Goatley <[email protected]> wrote:

Here's where we stand on our cluster communications errors: nothing we did worked. We tried different ports on the switch. We tried forcing 1Gbps. >We tried forcing the port down to 10 Mbps. That actually seemed to help slightly, in that we only lost communications every 63 seconds or so, >instead of every 15--60 seconds. But it would lose and re-establish connection to the cluster every 63 seconds.

So I decided to try setting up and using a TAP device, just to see what would happen.

Using the dedicated Ethernet card, it made no difference. It still lost communications every 63 seconds.

When I say dedicated Ethernet card, I probably should have stated earlier that it's a USB -> Ethernet device plugged into the system. I don't know >what brand or model, but I can find out, if anyone wants to know.

So I decided to try tunneling through the "real" Ethernet port used by the Linux system. After figuring out what to do for the missing tunctl command >under CentOS, I was able to set up a tunnel, and I did "attach xq tap:tap0". I then booted the system and wonder of wonders, miracle of miracles, it >was seven minutes into the boot (yes, it takes a long time, mounting a slew of disks that needed to be rebuilt) before it lost communications. But it re->established them immediately, and as of my typing this, it was been twenty-nine minutes since that happened. No further drops. Normally, I wouldn't >think twenty-nine minutes is enough to prove anything, but when it was dropping every 15--63 seconds for two solid days, this sounds like a fix to >me.

So what does it mean? One thing it suggests is that the USB Ethernet device may be buggy or bad. I mean, it seems to work OK for TCP/IP >communications, etc, but it sure sounds like it may be the part responsible for the problems. Especially since tunneling through the built-in Ethernet >card seems to work and tunneling through the USB device did not.

These are the commands I used to set up the tap device for CentOS:
brctl addbr br0
ifconfig eno1 0.0.0.0          ; eno1 is the host's Ethernet device
ifconfig br0 XXX.XX.XX.XX up   ; the IP address of the host system
brctl addif br0 eno1
brctl setfd br0 0
#tunctl -t tap0
ip tuntap add tap0 mode tap    ; Replacement for tunctl on CentOS 7
brctl addif br0 tap0
ifconfig tap0 up

I then just did "xq attach tap:tap0" in the init file. I guess I should set up a special MAC address, but I haven't yet, and so far, nothing seems amiss.

While I thought having a dedicated Ethernet device would be the simplest thing, I can live with tunneling it through the shared Ethernet device, >especially since it works and the former does not. ;-)

Thank you for all of your input over the past couple of days, and thank you for all of your work on SIMH!

Hunter




--
_______________________________________________
Simh mailing list
[email protected]
http://mailman.trailing-edge.com/mailman/listinfo/simh

Reply via email to