Mr Cascardo,
If this is at a customer site, there is a workaround available for you.
In file /etc/modprobe.d/mlx4_core.conf, enter a line:
options mlx4_core internal_err_reset=0
(do "modinfo mlx4_core" to see a description of the module parameter).
Setting this parameter to 0 in the modprobe co
On Wednesday 29 February 2012 16:47, Jack Morgenstein wrote:
> Some comments.
>
> 1. Mr Cascardo's solution is only partial, and does not cover all the problem
> cases. He
>simply uncovered one of several examples of what lack-of-sync will do when
> removing a device.
>Mr. Cascardo found
On Tuesday 28 February 2012 22:46, David Miller wrote:
> From: Thadeu Lima de Souza Cascardo
> Date: Tue, 28 Feb 2012 17:34:38 -0300
>
> > On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote:
> >> From: Thadeu Lima de Souza Cascardo
> >> Date: Tue, 28 Feb 2012 15:36:16 -0300
> >>
> >>
From: Thadeu Lima de Souza Cascardo
Date: Tue, 28 Feb 2012 17:34:38 -0300
> On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote:
>> From: Thadeu Lima de Souza Cascardo
>> Date: Tue, 28 Feb 2012 15:36:16 -0300
>>
>> > When a EEH happens, the catas poll code will try to restart the devic
On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote:
> From: Thadeu Lima de Souza Cascardo
> Date: Tue, 28 Feb 2012 15:36:16 -0300
>
> > When a EEH happens, the catas poll code will try to restart the device,
> > removing it and adding it back again. The EEH code will try to do the
> > s
From: Thadeu Lima de Souza Cascardo
Date: Tue, 28 Feb 2012 15:36:16 -0300
> When a EEH happens, the catas poll code will try to restart the device,
> removing it and adding it back again. The EEH code will try to do the
> same. One of the threads ends up accessing memory that was freed by the
> o
When a EEH happens, the catas poll code will try to restart the device,
removing it and adding it back again. The EEH code will try to do the
same. One of the threads ends up accessing memory that was freed by the
other thread and we get a crash.
The EEH backtrace:
<4>Call Trace:
<4>[c0007fff