Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-10 Thread Niklas Schnelle
On 7/9/20 8:34 PM, Parav Pandit wrote: > On 7/9/2020 3:36 PM, Niklas Schnelle wrote: >> >> On 7/8/20 5:44 PM, Parav Pandit wrote: ... snip .. > >>> >> As is the patch above fixes the dereference but results in the same >> completion error >> as current 5.8-rc4 > > Below patch should hopefull

Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-09 Thread Parav Pandit
On 7/9/2020 3:36 PM, Niklas Schnelle wrote: > > On 7/8/20 5:44 PM, Parav Pandit wrote: > ... snip .. >>> >> >> It is likely because events_cleanup() freed the memory using kvfree() that >> health recovery context is trying to access in notifier chain. >> >> While reviewing I see few more errors a

Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-09 Thread Niklas Schnelle
On 7/8/20 5:44 PM, Parav Pandit wrote: ... snip .. >> > > It is likely because events_cleanup() freed the memory using kvfree() that > health recovery context is trying to access in notifier chain. > > While reviewing I see few more errors as below. > (a) mlx5_pci_err_detected() invokes mlx5_d

RE: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-08 Thread Parav Pandit
> From: Niklas Schnelle > Sent: Wednesday, July 8, 2020 5:14 PM > Hi Parav, Hi Shay, > > On 7/8/20 12:43 PM, Parav Pandit wrote: > > Hi Niklas, > > > ... snip ... > >>> > > > > Sorry for my late response. > > Yes, this looks good and I also found same in my analysis. > > With latest code mlx5_p

Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-08 Thread Niklas Schnelle
Hi Parav, Hi Shay, On 7/8/20 12:43 PM, Parav Pandit wrote: > Hi Niklas, > ... snip ... >>> > > Sorry for my late response. > Yes, this looks good and I also found same in my analysis. > With latest code mlx5_pci_close() already does drain_health_wq(), so the > additional call in remove_one() is

RE: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-07-08 Thread Parav Pandit
Hi Niklas, > From: Niklas Schnelle > Sent: Monday, June 15, 2020 3:32 PM > > Hello Saeed, > > On 6/13/20 12:01 AM, Saeed Mahameed wrote: > > On Fri, 2020-06-12 at 15:09 +0200, Niklas Schnelle wrote: > >> Hello Parav, Hello Saeed, > >> > ... snip ... > >> > >> So without really knowing anything

Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-06-15 Thread Niklas Schnelle
Hello Saeed, On 6/13/20 12:01 AM, Saeed Mahameed wrote: > On Fri, 2020-06-12 at 15:09 +0200, Niklas Schnelle wrote: >> Hello Parav, Hello Saeed, >> ... snip ... >> >> So without really knowing anything about these functions I would >> guess that with the device still registered the drained >> queu

Re: [REGRESSION] mlx5: Driver remove during hot unplug is broken

2020-06-12 Thread Saeed Mahameed
On Fri, 2020-06-12 at 15:09 +0200, Niklas Schnelle wrote: > Hello Parav, Hello Saeed, > > our CI system for IBM Z Linux found a hang[0] when hot unplugging a > ConnectX-4 Lx VF from a z/VM guest > in Linus' current tree and added during the merge window. > Sadly it didn't happen all the time which