FYI, Just changing the host ethernet port seems to have alleviated our issues with UNMATCHED datagrams. We saw something virtually identical to Ralph.
[451886.660655] EtherCAT 0: Domain 0: Working counter changed to 0/13. [451886.660663] EtherCAT 0: Domain 1: Working counter changed to 0/14. [451887.168147] EtherCAT WARNING: Datagram cea4900c (domain0-0-main) was SKIPPED 44 times. [451887.168154] EtherCAT WARNING: Datagram cea49c0c (domain1-332-main) was SKIPPED 44 times. [451887.492141] EtherCAT WARNING 0: 1 datagram TIMED OUT! [451887.492148] EtherCAT WARNING 0: 731 datagrams UNMATCHED! [451887.661361] EtherCAT 0: Domain 0: Working counter changed to 13/13. [451887.661369] EtherCAT 0: Domain 1: Working counter changed to 14/14. In our case the Advantech UNO industrial PC has 4 ethernet ports built into it. Only the 1st ethernet port built into the motherboard exhibits the issue, it shows up as an Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection. It appears that is just a PHY so the MAC I assume is in the Intel Corporation 6 Series/C200 Series Chipset. The other 3 ports are actually PCI Express MAC/PHYs, they show up as Intel Corporation 82574L Gigabit Network. Those 3 ports do not exhibit the UNMATCHED datagram issue. When using ethtool -k the only difference I see for the 82579LM versus the three 82574L is rx-vlan-filter: off for the 82579LM . rx/tx-checksumming is on for all adapters. FYI, The registers 0x300 and 0x310 remained 0 after the UNMATCHED datagram error occurred. I suggest you look into changing the NIC Ralf. On Mon, 2016-07-04 at 08:29 +0200, Ralf Roesch wrote: > We also are fighting with this type of problem on a customer laser > cutting machine. > Occasionally we see errors like this: > [122501.934306] EtherCAT 0: Domain 0: Working counter changed to 0/9. > [122501.934346] EtherCAT 0: Domain 1: Working counter changed to 0/9. > [122502.320449] EtherCAT WARNING 0: 5 datagrams TIMED OUT! > [122502.935224] EtherCAT 0: Domain 0: Working counter changed to 9/9. > [122502.935265] EtherCAT 0: Domain 1: Working counter changed to 9/9. > > This was the reason I modified the ethercat command line tool for > extended diagnostics regarding several ESC error registers. > > Attached you will find a patch which might help you. > After applying and building the ethercat command line tool it will > provide a new command "diag". > * Shortly after your ethercat master has been started > successfully call: > ethercat diag -r > This will reset all slaves ESC error registers including Lost > Link Counter Register and RX Error Counter Register. > * If you detect a an error UNMATCHED and TIMEOUT (sometimes > after hours or days) call: > ethercat diag > If you are lucky you will find one ore more ESC errors > displayed on your console. > For better understanding the displayed errors you should to > picture picture > http://www.automation.com/images/article/ethercat/Figure14.jpg > (part of > > http://www.automation.com/automation-news/article/diagnostics-with-ethercat-part-4). > > Would be happy about any kind of feedback. > > > @Henry: which type of drives do you use? > > > Regards, > Ralf > > > > On Mon Jul 04 2016 05:19:58 GMT+0200 (CEST), Graeme Foot > <graeme.f...@touchcut.com> wrote: > > > The only time we've had issues like that has been due to either a dodgy > > network cable or an RJ45 plug getting a bit grubby. First thing I usually > > do is unplug/replug all the plugs a few time to clean up the connections. > > If it persists then I start looking for bad cables. > > > > Another option is that there is an occasional noisy process causing noise > > on one of the links. > > > > Once or twice (only on non-ethercat machines so far) we've had cables that > > were in drag chains wearing out, where it showed a problem when at a > > specific position of the drag chain. > > > > You could track down if it's a problem with a link between two particular > > slaves by checking each slaves Link Lost Counter and CRC Bad Counter values. > > - Lost Link Counter Register (0x0310:0x0313) > > - RX Error Counter Register (0x0300:0x0307) > > > > This link describes some of the diagnostics: > > http://www.automation.com/automation-news/article/diagnostics-with-ethercat-part-4 > > > > I think you can set the above registers to zero after the fieldbus is up > > and running, then you can check them if a problem occurs. > > > > > > Haven't actually done it yet myself, so would be interested to see if it > > helps you. > > > > > > Regards, > > Graeme. > > > > > > > > > > -----Original Message----- > > From: etherlab-users [mailto:etherlab-users-boun...@etherlab.org] On Behalf > > Of Henry Bausley > > Sent: Saturday, 2 July 2016 5:56 a.m. > > To: etherlab-users@etherlab.org > > Subject: [etherlab-users] Intermittent Large number of datagrams UNMATCHED > > > > > > > > We have a etherlab 1.5.2 kernel mode application running in xenomai > > 2.4.6 on Ubuntu 14.04.1 Desktop that will get on rare occasions a large > > number of datagrams UNMATCHED. It occurs at random times and relatively > > rarely but when it occurs it can result in disaster as we are running a > > large number of servos in torque mode. > > > > For example we can run continuously for 5 days 24hours continuously then > > get a message like something below. > > > > [591785.735172] EtherCAT WARNING 0: 616 datagrams UNMATCHED! > > I am struggling as to where to look. Is this something in our app or a > > known bug in the stack? > > > > > > > > > > > > Outbound scan for Spam or Virus by Barracuda at Delta Tau > > > > _______________________________________________ > > etherlab-users mailing list > > etherlab-users@etherlab.org > > http://lists.etherlab.org/mailman/listinfo/etherlab-users > > _______________________________________________ > > etherlab-users mailing list > > etherlab-users@etherlab.org > > http://lists.etherlab.org/mailman/listinfo/etherlab-users > _______________________________________________ etherlab-users mailing list etherlab-users@etherlab.org http://lists.etherlab.org/mailman/listinfo/etherlab-users