Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-11 Thread Daniel Wagner
On Mon, Jan 11, 2021 at 05:00:18PM +0100, Hannes Reinecke wrote: > Yeah, using the controller node for devm allocations is quite dodgy. > Does this one help? > > diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c > index 6fdd07fb3001..7260af028cf7 100644 > --- a/drivers/nvme/host/h

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-11 Thread Christoph Hellwig
We could just stop using devm and do explicit resource management. Btw, for the next patch please cc the author of the nvme-hwmon code.

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-11 Thread Hannes Reinecke
On 1/5/21 10:45 AM, Daniel Wagner wrote: On Mon, Jan 04, 2021 at 06:06:10PM -0300, Enzo Matsumiya wrote: @Daniel maybe try tweaking your tests to use a smaller controller loss timeout (-l option)? I do this on my tests because the default value kicks in about 30min after hot-removal -- i.e. you

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-05 Thread Daniel Wagner
On Mon, Jan 04, 2021 at 06:06:10PM -0300, Enzo Matsumiya wrote: > @Daniel maybe try tweaking your tests to use a smaller controller > loss timeout (-l option)? I do this on my tests because the default > value kicks in about 30min after hot-removal -- i.e. you > have to actually wait for the timeou

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-04 Thread Enzo Matsumiya
Thank you all for the feedback. I'll re-evaluate my patch and resubmit it soon. The problem clearly exists as I can easily reproduce it on my test setup, but the strategy to fix it needs some rework apparently. @Daniel maybe try tweaking your tests to use a smaller controller loss timeout (-l op

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-30 Thread Daniel Wagner
On Wed, Dec 30, 2020 at 04:16:53PM +0100, Daniel Wagner wrote: > I've attached the hwmon to the nvme object, which fixes the obvious > problem. But I still don't see any 'DEVRES REM' message. This would > indicate the nvme object never really goes away... more debugging... Stupid me, the nvme devi

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-30 Thread Daniel Wagner
On Wed, Dec 30, 2020 at 03:38:05PM +0100, Daniel Wagner wrote: > I've enabled CONFIG_DEVRES_DEBUG and see only 'DEVRES ADD' message. If I > read it correctly the problem is that the resource is attached to the ctl > devm object and not for the nvme devm object. I've attached the hwmon to the nvme

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-30 Thread Daniel Wagner
On Fri, Dec 11, 2020 at 03:12:53PM +0100, Hannes Reinecke wrote: > So why do we have to deallocate the hwmon attributes? > And why on reset? And who's re-creating them after reset, seeing that > 'initialized' should be true? nvmet: adding nsid 1 to subsystem blktests-subsystem-1 nvmet: creating

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-30 Thread Daniel Wagner
Hi Chaitanya, On Thu, Dec 10, 2020 at 04:35:14AM +, Chaitanya Kulkarni wrote: > Enzo, > On 12/9/20 13:39, Enzo Matsumiya wrote: > > Fix a possible NULL pointer dereference when trying to read > > hwmon sysfs entries associated to NVMe-oF devices that were > > hot-removed or disconnected. > > >

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-11 Thread Hannes Reinecke
On 12/9/20 10:32 PM, Enzo Matsumiya wrote: Fix a possible NULL pointer dereference when trying to read hwmon sysfs entries associated to NVMe-oF devices that were hot-removed or disconnected. Unregister the NVMe hwmon device upon controller teardown (nvme_stop_ctrl()). Signed-off-by: Enzo Matsu

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-10 Thread Keith Busch
On Wed, Dec 09, 2020 at 06:32:27PM -0300, Enzo Matsumiya wrote: > +void nvme_hwmon_exit(struct nvme_ctrl *ctrl) > +{ > + hwmon_device_unregister(ctrl->dev); > +} The hwmon registration uses the devm_ version, so don't we need to use the devm_hwmon_device_unregister() here?

Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-09 Thread Chaitanya Kulkarni
Enzo, On 12/9/20 13:39, Enzo Matsumiya wrote: > Fix a possible NULL pointer dereference when trying to read > hwmon sysfs entries associated to NVMe-oF devices that were > hot-removed or disconnected. > > Unregister the NVMe hwmon device upon controller teardown > (nvme_stop_ctrl()). > > Signed-off