----- Den 18 feb 2019, på kl 13:43, Jan Kiszka jan.kis...@siemens.com skrev:

> On 18.02.19 13:36, Per Oberg via Xenomai wrote:
> > Hello list

>> I have this issue where my e1000e network card gets into some kind of cyclic
>> hardware reset during operation. The weird thing is that this only happens 
>> when
>> I let systemd start the application. If it's started manually it always works
> > as intended.

>> I am running xenomai 3.0.7 with a linux-4.9.38 kernel and I use the network
> > connection in Linux non-rt mode. I use systemd and NetworkManager.

>> I do realize that once I get into the reset it will continue resetting 
>> because I
>> keep flooding the buffers. My issue is that it -never- happens when I start 
>> my
>> process manually, only when systemd starts it. Because the network goes down
>> quite badly I cannot log in and disable the service once it happens and
>> therefore I cannot really try starting it manually after letting the network
> > recover.

>> There is some information from intel in [1] below. There is talk about power
> > management function and EPROM etc. They specifically write:

> > "82573(V/L/E) TX Unit Hang Messages
>> Several adapters with the 82573 chipset display "TX unit hang" messages 
>> during
>> normal operation with the e1000 driver. The issue appears both with TSO 
>> enabled
>> and disabled, and is caused by a power management function that is enabled in
>> the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that
>> enabled the feature. After the issue was discovered newer adapters were
> > released with the feature disabled in the EEPROM."


> > I also read something about disabling GRO/TSO/GSO that helped some people.

> > My questions to the list are:

> > 1. Have you guys any experience with this?
> > 2. Would I be better of using the RT Net drivers?
>> 3. What could cause the issue to trigger only when run by systemd. (I thought
> > about timing issues and NetworkManager, but how do I debug this?)

>> [1]
> > https://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start

> > Thoughts anyone?

> Are you giving Linux enough time to work (no 100% RT domination of any core 
> for
> hundreds of milliseconds or longer)?

I am not sure, yet. I have this logging function for reporting back to me when 
I loose samples. Loosing samples would currently make the software try to catch 
up and this would mean 100% cpu till it does. I do see this being logged around 
the time it resets but I'm not sure if it's much worse than "usual". If for 
some reason the hardware reset happens because linux gets starved I can easily 
see this going cyclic.

Per Öberg 

Reply via email to