Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 02:00:43PM -0700, Dan Williams wrote: > Ok, so another case where I agree the instability exists but does not > matter in practice in this case because the cxl_memdev_ioctl() read > side is prepared for the state change to leak into the down_read() > section. There's no mea

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 12:52 PM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 12:43:15PM -0700, Dan Williams wrote: > > > Ok, so this is the disagreement. Strict adherence to the model where > > it does not matter in practice. > > The basic problem is that it is hard to know if it does not m

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 12:43:15PM -0700, Dan Williams wrote: > Ok, so this is the disagreement. Strict adherence to the model where > it does not matter in practice. The basic problem is that it is hard to know if it does not matter in practice because you never know what the compiler might deci

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 12:26 PM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 12:00:23PM -0700, Dan Williams wrote: > > > > > > IMHO this can't use 'dev->kobj.state_in_sysfs' as the RCU protected > > > > > data. > > > > > > > > This usage of srcu is functionally equivalent to replacing > >

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 12:00:23PM -0700, Dan Williams wrote: > > > > IMHO this can't use 'dev->kobj.state_in_sysfs' as the RCU protected > > > > data. > > > > > > This usage of srcu is functionally equivalent to replacing > > > srcu_read_lock() with down_read() and the shutdown path with: > > >

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 10:54 AM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 10:31:15AM -0700, Dan Williams wrote: > > On Tue, Mar 30, 2021 at 10:03 AM Jason Gunthorpe wrote: > > > > > > On Tue, Mar 30, 2021 at 09:05:29AM -0700, Dan Williams wrote: > > > > > > > > If you can't clearly poin

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 10:31:15AM -0700, Dan Williams wrote: > On Tue, Mar 30, 2021 at 10:03 AM Jason Gunthorpe wrote: > > > > On Tue, Mar 30, 2021 at 09:05:29AM -0700, Dan Williams wrote: > > > > > > If you can't clearly point to the *data* under RCU protection it is > > > > being used wrong. >

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 10:03 AM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 09:05:29AM -0700, Dan Williams wrote: > > > > If you can't clearly point to the *data* under RCU protection it is > > > being used wrong. > > > > Agree. > > > > The data being protected is the value of > > dev->kob

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 09:05:29AM -0700, Dan Williams wrote: > > If you can't clearly point to the *data* under RCU protection it is > > being used wrong. > > Agree. > > The data being protected is the value of > dev->kobj.state_in_sysfs. The So where is that read under: + idx = srcu_re

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 8:47 AM Jason Gunthorpe wrote: > > On Tue, Mar 30, 2021 at 08:37:19AM -0700, Dan Williams wrote: > > On Tue, Mar 30, 2021 at 4:16 AM Jason Gunthorpe wrote: > > > > > > On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote: > > > > > > > @@ -1155,21 +1175,12 @@ stati

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Tue, Mar 30, 2021 at 08:37:19AM -0700, Dan Williams wrote: > On Tue, Mar 30, 2021 at 4:16 AM Jason Gunthorpe wrote: > > > > On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote: > > > > > @@ -1155,21 +1175,12 @@ static void cxlmdev_unregister(void *_cxlmd) > > > struct cxl_memdev

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Dan Williams
On Tue, Mar 30, 2021 at 4:16 AM Jason Gunthorpe wrote: > > On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote: > > > @@ -1155,21 +1175,12 @@ static void cxlmdev_unregister(void *_cxlmd) > > struct cxl_memdev *cxlmd = _cxlmd; > > struct device *dev = &cxlmd->dev; > > > > -

Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-30 Thread Jason Gunthorpe
On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote: > @@ -1155,21 +1175,12 @@ static void cxlmdev_unregister(void *_cxlmd) > struct cxl_memdev *cxlmd = _cxlmd; > struct device *dev = &cxlmd->dev; > > - percpu_ref_kill(&cxlmd->ops_active); > cdev_device_del(&cxlmd-

[PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

2021-03-29 Thread Dan Williams
The percpu_ref to gate whether cxl_memdev_ioctl() is free to use the driver context (@cxlm) to issue I/O is overkill, implemented incorrectly (missing a device reference before accessing the percpu_ref), and the complexities of shutting down a percpu_ref contributed to a bug in the error unwind in