Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-31 Thread Àbéjídé Àyodélé
> Is it always the OOM errors followed by the Tx timeout? Yes, I believe I have the dmesg from one of the earlier incidence, I can clean that up and make it public if you want. > Is it an actual serial connection or is it something like serial over > LAN? Serial over LAN > Do you know if you h

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-31 Thread Àbéjídé Àyodélé
Hi Alex, Thanks for responding! > Have you been seeing this as a > reproducible issue or is this something that has only occurred once? It has occurred 3 times so far, we usually stop the hosts we see the issue on for a long while which is probably why we don't see it that frequent, it has occur

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-31 Thread Àbéjídé Àyodélé
> There shouldn't be any need. Basically what you want to check for is > to make sure those logs have the same pattern with OOM errors followed > by the rcu_sched warning about detecting a CPU stall. If that is the > case that is the most likely root cause for the Tx hangs that are > being reported

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-31 Thread Alexander Duyck
On Mon, Jul 30, 2018 at 4:43 PM, Àbéjídé Àyodélé wrote: >> Is it always the OOM errors followed by the Tx timeout? > > Yes, I believe I have the dmesg from one of the earlier incidence, I can > clean that up and make it public if you want. There shouldn't be any need. Basically what you want to c

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-30 Thread Alexander Duyck
On Mon, Jul 30, 2018 at 1:25 PM, Àbéjídé Àyodélé wrote: > Hi Alex, > > Thanks for responding! > >> Have you been seeing this as a >> reproducible issue or is this something that has only occurred once? > > It has occurred 3 times so far, we usually stop the hosts we see the issue > on for a long w

Re: [E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-30 Thread Alexander Duyck
On Sat, Jul 28, 2018 at 5:44 PM, Àbéjídé Àyodélé wrote: > Hi friends, > > On one of our machines at work, we observed a sequence of events starting > from an OOM in a secondary cgroup which ends up in the bond interface being > down for a period of up to 12 seconds. Below is some piece of dmesg ab

[E1000-devel] OOM in secondary cgroup leading to networking loss

2018-07-30 Thread Àbéjídé Àyodélé
Hi friends, On one of our machines at work, we observed a sequence of events starting from an OOM in a secondary cgroup which ends up in the bond interface being down for a period of up to 12 seconds. Below is some piece of dmesg about when the bond interface went down: [Wed Jul 25 19:20:45 2018]