[PATCH] xen/xenbus: document will_handle argument for xenbus_watch_path()

2024-01-12 Thread SeongJae Park
Commit 2e85d32b1c86 ("xen/xenbus: Add 'will_handle' callback support in
xenbus_watch_path()") added will_handle argument to xenbus_watch_path()
and its wrapper, xenbus_watch_pathfmt(), but didn't document it on the
kerneldoc comments of the function.  This is causing warnings that
reported by kernel test robot.  Add the documentation to fix it.

Fixes: 2e85d32b1c86 ("xen/xenbus: Add 'will_handle' callback support in 
xenbus_watch_path()")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202401121154.fi8jdgun-...@intel.com/
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_client.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index d4b251925796..d539210b39d8 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -116,14 +116,16 @@ EXPORT_SYMBOL_GPL(xenbus_strstate);
  * @dev: xenbus device
  * @path: path to watch
  * @watch: watch to register
+ * @will_handle: events queuing determine callback
  * @callback: callback to register
  *
  * Register a @watch on the given path, using the given xenbus_watch structure
- * for storage, and the given @callback function as the callback.  Return 0 on
- * success, or -errno on error.  On success, the given @path will be saved as
- * @watch->node, and remains the caller's to free.  On error, @watch->node will
- * be NULL, the device will switch to %XenbusStateClosing, and the error will
- * be saved in the store.
+ * for storage, @will_handle function as the callback to determine if each
+ * event need to be queued, and the given @callback function as the callback.
+ * Return 0 on success, or -errno on error.  On success, the given @path will
+ * be saved as @watch->node, and remains the caller's to free.  On error,
+ * @watch->node will be NULL, the device will switch to %XenbusStateClosing,
+ * and the error will be saved in the store.
  */
 int xenbus_watch_path(struct xenbus_device *dev, const char *path,
  struct xenbus_watch *watch,
@@ -156,16 +158,18 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path);
  * xenbus_watch_pathfmt - register a watch on a sprintf-formatted path
  * @dev: xenbus device
  * @watch: watch to register
+ * @will_handle: events queuing determine callback
  * @callback: callback to register
  * @pathfmt: format of path to watch
  *
  * Register a watch on the given @path, using the given xenbus_watch
- * structure for storage, and the given @callback function as the callback.
- * Return 0 on success, or -errno on error.  On success, the watched path
- * (@path/@path2) will be saved as @watch->node, and becomes the caller's to
- * kfree().  On error, watch->node will be NULL, so the caller has nothing to
- * free, the device will switch to %XenbusStateClosing, and the error will be
- * saved in the store.
+ * structure for storage, @will_handle function as the callback to determine if
+ * each event need to be queued, and the given @callback function as the
+ * callback.  Return 0 on success, or -errno on error.  On success, the watched
+ * path (@path/@path2) will be saved as @watch->node, and becomes the caller's
+ * to kfree().  On error, watch->node will be NULL, so the caller has nothing
+ * to free, the device will switch to %XenbusStateClosing, and the error will
+ * be saved in the store.
  */
 int xenbus_watch_pathfmt(struct xenbus_device *dev,
 struct xenbus_watch *watch,
-- 
2.39.2




[PATCH for-stable-5.10.y] xen-blkfront: Cache feature_persistent value before advertisement

2022-09-06 Thread SeongJae Park
commit fe8f65b018effbf473f53af3538d0c1878b8b329 upstream.

Xen blkfront advertises its support of the persistent grants feature
when it first setting up and when resuming in 'talk_to_blkback()'.
Then, blkback reads the advertised value when it connects with blkfront
and decides if it will use the persistent grants feature or not, and
advertises its decision to blkfront.  Blkfront reads the blkback's
decision and it also makes the decision for the use of the feature.

Commit 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter
when connect"), however, made the blkfront's read of the parameter for
disabling the advertisement, namely 'feature_persistent', to be done
when it negotiate, not when advertise.  Therefore blkfront advertises
without reading the parameter.  As the field for caching the parameter
value is zero-initialized, it always advertises as the feature is
disabled, so that the persistent grants feature becomes always disabled.

This commit fixes the issue by making the blkfront does parmeter caching
just before the advertisement.

Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when 
connect")
Cc:  # 5.10.x
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: SeongJae Park 
Tested-by: Marek Marczykowski-Górecki 
Reviewed-by: Juergen Gross 
Link: https://lore.kernel.org/r/20220831165824.94815-4...@kernel.org
Signed-off-by: Juergen Gross 
---

This patch is a manual backport of the upstream commit on the 5.10.y
kernel.  Please note that this patch can be applied on the latest 5.10.y
only after the preceding patch[1] is applied.

[1] https://lore.kernel.org/stable/20220906132819.016040...@linuxfoundation.org/

 drivers/block/xen-blkfront.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9d5460f6e0ff..6f33d62331b1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1852,6 +1852,12 @@ static void free_info(struct blkfront_info *info)
kfree(info);
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
   struct blkfront_info *info)
@@ -1943,6 +1949,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
message = "writing protocol";
goto abort_transaction;
}
+   info->feature_persistent_parm = feature_persistent;
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
info->feature_persistent_parm);
if (err)
@@ -2019,12 +2026,6 @@ static int negotiate_mq(struct blkfront_info *info)
return 0;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 /**
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffer for communication with the backend, and
@@ -2394,7 +2395,6 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   info->feature_persistent_parm = feature_persistent;
if (info->feature_persistent_parm)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
-- 
2.25.1




Re: [PATCH v2 0/3] xen-blk{front,back}: Fix the broken semantic and flow of feature-persistent

2022-08-31 Thread SeongJae Park
On Wed, 31 Aug 2022 16:58:21 + SeongJae Park  wrote:

> Changes from v1
> (https://lore.kernel.org/xen-devel/20220825161511.94922-1...@kernel.org/)
> - Fix the wrong feature_persistent caching position of blkfront
> - Set blkfront's feature_persistent field setting with simple '&&'
>   instead of 'if' (Pratyush Yadav)
> 
> This patchset fixes misuse of the 'feature-persistent' advertisement
> semantic (patches 1 and 2), and the wrong timing of the
> 'feature_persistent' value caching, which made persistent grants feature
> always disabled.

Please note that I have some problem in my test setup and therefore was unable
to fully test this patchset.  I am posting this though, as the impact of the
bug is not trivial (always disabling persistent grants), and to make testing of
my proposed fix from others easier.  Hope to get someone's test results or code
review of this patchset even before I fix my test setup problem.

Juergen, I didn't add your 'Reviewed-by:'s to the first two patches of this
series because I changed some of the description for making it clear which bug
and commit it is really fixing.  Specifically, I wordsmithed the working and
changed 'Fixed:' tag.  Code change is almost same, though.


Thanks,
SJ

> 
> SeongJae Park (3):
>   xen-blkback: Advertise feature-persistent as user requested
>   xen-blkfront: Advertise feature-persistent as user requested
>   xen-blkfront: Cache feature_persistent value before advertisement
> 
>  drivers/block/xen-blkback/common.h |  3 +++
>  drivers/block/xen-blkback/xenbus.c |  6 --
>  drivers/block/xen-blkfront.c   | 20 
>  3 files changed, 19 insertions(+), 10 deletions(-)
> 
> -- 
> 2.25.1
> 



[PATCH v2 3/3] xen-blkfront: Cache feature_persistent value before advertisement

2022-08-31 Thread SeongJae Park
Xen blkfront advertises its support of the persistent grants feature
when it first setting up and when resuming in 'talk_to_blkback()'.
Then, blkback reads the advertised value when it connects with blkfront
and decides if it will use the persistent grants feature or not, and
advertises its decision to blkfront.  Blkfront reads the blkback's
decision and it also makes the decision for the use of the feature.

Commit 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter
when connect"), however, made the blkfront's read of the parameter for
disabling the advertisement, namely 'feature_persistent', to be done
when it negotiate, not when advertise.  Therefore blkfront advertises
without reading the parameter.  As the field for caching the parameter
value is zero-initialized, it always advertises as the feature is
disabled, so that the persistent grants feature becomes always disabled.

This commit fixes the issue by making the blkfront does parmeter caching
just before the advertisement.

Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when 
connect")
Cc:  # 5.10.x
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkfront.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index dfae08115450..35b9bcad9db9 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1759,6 +1759,12 @@ static int write_per_ring_nodes(struct 
xenbus_transaction xbt,
return err;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
   struct blkfront_info *info)
@@ -1850,6 +1856,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
message = "writing protocol";
goto abort_transaction;
}
+   info->feature_persistent_parm = feature_persistent;
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
info->feature_persistent_parm);
if (err)
@@ -1919,12 +1926,6 @@ static int negotiate_mq(struct blkfront_info *info)
return 0;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 /*
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffer for communication with the backend, and
@@ -2284,7 +2285,6 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   info->feature_persistent_parm = feature_persistent;
if (info->feature_persistent_parm)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
-- 
2.25.1




[PATCH v2 2/3] xen-blkfront: Advertise feature-persistent as user requested

2022-08-31 Thread SeongJae Park
The advertisement of the persistent grants feature (writing
'feature-persistent' to xenbus) should mean not the decision for using
the feature but only the availability of the feature.  However, commit
74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent
grants") made a field of blkfront, which was a place for saving only the
negotiation result, to be used for yet another purpose: caching of the
'feature_persistent' parameter value.  As a result, the advertisement,
which should follow only the parameter value, becomes inconsistent.

This commit fixes the misuse of the semantic by making blkfront saves
the parameter value in a separate place and advertises the support based
on only the saved value.

Fixes: 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Suggested-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkfront.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8e56e69fb4c4..dfae08115450 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -213,6 +213,9 @@ struct blkfront_info
unsigned int feature_fua:1;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
+   /* Connect-time cached feature_persistent parameter */
+   unsigned int feature_persistent_parm:1;
+   /* Persistent grants feature negotiation result */
unsigned int feature_persistent:1;
unsigned int bounce:1;
unsigned int discard_granularity;
@@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
goto abort_transaction;
}
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-   info->feature_persistent);
+   info->feature_persistent_parm);
if (err)
dev_warn(>dev,
 "writing persistent grants feature to xenbus");
@@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (feature_persistent)
+   info->feature_persistent_parm = feature_persistent;
+   if (info->feature_persistent_parm)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




[PATCH v2 0/3] xen-blk{front,back}: Fix the broken semantic and flow of feature-persistent

2022-08-31 Thread SeongJae Park
Changes from v1
(https://lore.kernel.org/xen-devel/20220825161511.94922-1...@kernel.org/)
- Fix the wrong feature_persistent caching position of blkfront
- Set blkfront's feature_persistent field setting with simple '&&'
  instead of 'if' (Pratyush Yadav)

This patchset fixes misuse of the 'feature-persistent' advertisement
semantic (patches 1 and 2), and the wrong timing of the
'feature_persistent' value caching, which made persistent grants feature
always disabled.

SeongJae Park (3):
  xen-blkback: Advertise feature-persistent as user requested
  xen-blkfront: Advertise feature-persistent as user requested
  xen-blkfront: Cache feature_persistent value before advertisement

 drivers/block/xen-blkback/common.h |  3 +++
 drivers/block/xen-blkback/xenbus.c |  6 --
 drivers/block/xen-blkfront.c   | 20 
 3 files changed, 19 insertions(+), 10 deletions(-)

-- 
2.25.1




[PATCH v2 1/3] xen-blkback: Advertise feature-persistent as user requested

2022-08-31 Thread SeongJae Park
The advertisement of the persistent grants feature (writing
'feature-persistent' to xenbus) should mean not the decision for using
the feature but only the availability of the feature.  However, commit
aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent
grants") made a field of blkback, which was a place for saving only the
negotiation result, to be used for yet another purpose: caching of the
'feature_persistent' parameter value.  As a result, the advertisement,
which should follow only the parameter value, becomes inconsistent.

This commit fixes the misuse of the semantic by making blkback saves the
parameter value in a separate place and advertises the support based on
only the saved value.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Suggested-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/common.h | 3 +++
 drivers/block/xen-blkback/xenbus.c | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index bda5c815e441..a28473470e66 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -226,6 +226,9 @@ struct xen_vbd {
sector_tsize;
unsigned intflush_support:1;
unsigned intdiscard_secure:1;
+   /* Connect-time cached feature_persistent parameter value */
+   unsigned intfeature_gnt_persistent_parm:1;
+   /* Persistent grants feature negotiation result */
unsigned intfeature_gnt_persistent:1;
unsigned intoverflow_max_grants:1;
 };
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index ee7ad2fb432d..c0227dfa4688 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-   be->blkif->vbd.feature_gnt_persistent);
+   be->blkif->vbd.feature_gnt_persistent_parm);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 dev->nodename);
@@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
return -ENOSYS;
}
 
-   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
+   blkif->vbd.feature_gnt_persistent =
+   blkif->vbd.feature_gnt_persistent_parm &&
xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
-- 
2.25.1




Re: [PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-31 Thread SeongJae Park
Hi Pratyush,

On Wed, 31 Aug 2022 15:50:45 + Pratyush Yadav  wrote:

> On 25/08/22 04:15PM, SeongJae Park wrote:
> > Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> > when connect") made blkback to advertise its support of the persistent
> > grants feature only if the user sets the 'feature_persistent' parameter
> > of the driver and the frontend advertised its support of the feature.
> > However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> > 'feature_persistent' parameter when connect") made the blkfront to work
> > in the same way.  That is, blkfront also advertises its support of the
> > persistent grants feature only if the user sets the 'feature_persistent'
> > parameter of the driver and the backend advertised its support of the
> > feature.
> > 
> > Hence blkback and blkfront will never advertise their support of the
> > feature but wait until the other advertises the support, even though
> > users set the 'feature_persistent' parameters of the drivers.  As a
> > result, the persistent grants feature is disabled always regardless of
> > the 'feature_persistent' values[1].
> > 
> > The problem comes from the misuse of the semantic of the advertisement
> > of the feature.  The advertisement of the feature should means only
> > availability of the feature not the decision for using the feature.
> > However, current behavior is working in the wrong way.
> > 
> > This commit fixes the issue by making the blkfront advertises its
> > support of the feature as user requested via 'feature_persistent'
> > parameter regardless of the otherend's support of the feature.
> > 
> > [1] 
> > https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> > 
> > Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter 
> > when connect")
> > Cc:  # 5.10.x
> > Reported-by: Marek Marczykowski-Górecki 
> > Suggested-by: Juergen Gross 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/block/xen-blkfront.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 8e56e69fb4c4..dfae08115450 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -213,6 +213,9 @@ struct blkfront_info
> > unsigned int feature_fua:1;
> > unsigned int feature_discard:1;
> > unsigned int feature_secdiscard:1;
> > +   /* Connect-time cached feature_persistent parameter */
> > +   unsigned int feature_persistent_parm:1;
> > +   /* Persistent grants feature negotiation result */
> > unsigned int feature_persistent:1;
> > unsigned int bounce:1;
> > unsigned int discard_granularity;
> > @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
> > goto abort_transaction;
> > }
> > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> > -   info->feature_persistent);
> > +   info->feature_persistent_parm);
> > if (err)
> > dev_warn(>dev,
> >  "writing persistent grants feature to xenbus");
> > @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
> > blkfront_info *info)
> > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
> > blkfront_setup_discard(info);
> >  
> > -   if (feature_persistent)
> > +   info->feature_persistent_parm = feature_persistent;
> 
> Same question as before. Why not just use feature_persistent directly?

Same answer as before, due to the possible race[1].

[1] https://lore.kernel.org/linux-block/20200922111259.GJ19254@Air-de-Roger/

> 
> > +   if (info->feature_persistent_parm)
> > info->feature_persistent =
> > !!xenbus_read_unsigned(info->xbdev->otherend,
> >"feature-persistent", 0);
> 
> Aside: IMO this would look nicer as below:
> 
>   info->feature_persistent = feature_persistent && 
> !!xenbus_read_unsigned();

Agreed, that would also make the code more consistent with the blkback side
code.

I would make the change in the next version of this patchset.


Thanks,
SJ



Re: [PATCH 1/2] xen-blkback: Advertise feature-persistent as user requested

2022-08-31 Thread SeongJae Park
Hi Pratyush,

On Wed, 31 Aug 2022 15:47:50 + Pratyush Yadav  wrote:

> Hi,
> 
> On 25/08/22 04:15PM, SeongJae Park wrote:
> > Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> > when connect") made blkback to advertise its support of the persistent
> > grants feature only if the user sets the 'feature_persistent' parameter
> > of the driver and the frontend advertised its support of the feature.
> > However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> > 'feature_persistent' parameter when connect") made the blkfront to work
> > in the same way.  That is, blkfront also advertises its support of the
> > persistent grants feature only if the user sets the 'feature_persistent'
> > parameter of the driver and the backend advertised its support of the
> > feature.
> > 
> > Hence blkback and blkfront will never advertise their support of the
> > feature but wait until the other advertises the support, even though
> > users set the 'feature_persistent' parameters of the drivers.  As a
> > result, the persistent grants feature is disabled always regardless of
> > the 'feature_persistent' values[1].
> > 
> > The problem comes from the misuse of the semantic of the advertisement
> > of the feature.  The advertisement of the feature should means only
> > availability of the feature not the decision for using the feature.
> > However, current behavior is working in the wrong way.
> > 
> > This commit fixes the issue by making the blkback advertises its support
> > of the feature as user requested via 'feature_persistent' parameter
> > regardless of the otherend's support of the feature.
> > 
> > [1] 
> > https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> > 
> > Fixes: e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter 
> > when connect")
> > Cc:  # 5.10.x
> > Reported-by: Marek Marczykowski-Górecki 
> > Suggested-by: Juergen Gross 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/block/xen-blkback/common.h | 3 +++
> >  drivers/block/xen-blkback/xenbus.c | 6 --
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkback/common.h 
> > b/drivers/block/xen-blkback/common.h
> > index bda5c815e441..a28473470e66 100644
> > --- a/drivers/block/xen-blkback/common.h
> > +++ b/drivers/block/xen-blkback/common.h
> > @@ -226,6 +226,9 @@ struct xen_vbd {
> > sector_tsize;
> > unsigned intflush_support:1;
> > unsigned intdiscard_secure:1;
> > +   /* Connect-time cached feature_persistent parameter value */
> > +   unsigned intfeature_gnt_persistent_parm:1;
> > +   /* Persistent grants feature negotiation result */
> > unsigned intfeature_gnt_persistent:1;
> > unsigned intoverflow_max_grants:1;
> >  };
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index ee7ad2fb432d..c0227dfa4688 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
> > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> >  
> > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> > -   be->blkif->vbd.feature_gnt_persistent);
> > +   be->blkif->vbd.feature_gnt_persistent_parm);
> > if (err) {
> > xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> >  dev->nodename);
> > @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
> > return -ENOSYS;
> > }
> >  
> > -   blkif->vbd.feature_gnt_persistent = feature_persistent &&
> > +   blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
> 
> If feature_gnt_persistent_parm is always going to be equal to 
> feature_persistent, then why introduce it at all? Why not just use 
> feature_persistent directly? This way you avoid adding an extra flag 
> whose purpose is not immediately clear, and you also avoid all the mess 
> with setting this flag at the right time.

Mainly because the parameter should read twice (once for advertisement, and
once later just before the negotitation, for checking if we advertised or not),
and the user might change the parameter value between the two reads.

For the detailed available sequence of the race, you could refer to the prior
conversation[1].

[1] https://lore.kernel.org/linux-block/20200922111259.GJ19254@Air-de-Roger/


Thanks,
SJ

> 
> > +   blkif->vbd.feature_gnt_persistent =
> > +   blkif->vbd.feature_gnt_persistent_parm &&
> > xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
> >  
> > blkif->vbd.overflow_max_grants = 0;
> > -- 
> > 2.25.1
> > 
> > 
> 



Re: [PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-26 Thread SeongJae Park
On Fri, 26 Aug 2022 21:20:39 + SeongJae Park  wrote:

> Hi Max,
> 
> On Fri, 26 Aug 2022 14:26:58 + Maximilian Heyne  wrote:
> 
> > On Thu, Aug 25, 2022 at 04:15:11PM +0000, SeongJae Park wrote:
> > > 
> > > Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> > > when connect") made blkback to advertise its support of the persistent
> > > grants feature only if the user sets the 'feature_persistent' parameter
> > > of the driver and the frontend advertised its support of the feature.
> > > However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> > > 'feature_persistent' parameter when connect") made the blkfront to work
> > > in the same way.  That is, blkfront also advertises its support of the
> > > persistent grants feature only if the user sets the 'feature_persistent'
> > > parameter of the driver and the backend advertised its support of the
> > > feature.
> > > 
> > > Hence blkback and blkfront will never advertise their support of the
> > > feature but wait until the other advertises the support, even though
> > > users set the 'feature_persistent' parameters of the drivers.  As a
> > > result, the persistent grants feature is disabled always regardless of
> > > the 'feature_persistent' values[1].
> > > 
> > > The problem comes from the misuse of the semantic of the advertisement
> > > of the feature.  The advertisement of the feature should means only
> > > availability of the feature not the decision for using the feature.
> > > However, current behavior is working in the wrong way.
> > > 
> > > This commit fixes the issue by making the blkfront advertises its
> > > support of the feature as user requested via 'feature_persistent'
> > > parameter regardless of the otherend's support of the feature.
> > > 
> > > [1] 
> > > https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> > > 
> > > Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter 
> > > when connect")
> > > Cc:  # 5.10.x
> > > Reported-by: Marek Marczykowski-Górecki 
> > > Suggested-by: Juergen Gross 
> > > Signed-off-by: SeongJae Park 
> > > ---
> > >  drivers/block/xen-blkfront.c | 8 ++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > > index 8e56e69fb4c4..dfae08115450 100644
> > > --- a/drivers/block/xen-blkfront.c
> > > +++ b/drivers/block/xen-blkfront.c
> > > @@ -213,6 +213,9 @@ struct blkfront_info
> > > unsigned int feature_fua:1;
> > > unsigned int feature_discard:1;
> > > unsigned int feature_secdiscard:1;
> > > +   /* Connect-time cached feature_persistent parameter */
> > > +   unsigned int feature_persistent_parm:1;
> > > +   /* Persistent grants feature negotiation result */
> > > unsigned int feature_persistent:1;
> > > unsigned int bounce:1;
> > > unsigned int discard_granularity;
> > > @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device 
> > > *dev,
> > > goto abort_transaction;
> > > }
> > > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", 
> > > "%u",
> > > -   info->feature_persistent);
> > > +   info->feature_persistent_parm);
> > > if (err)
> > > dev_warn(>dev,
> > >  "writing persistent grants feature to xenbus");
> > > @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
> > > blkfront_info *info)
> > > if (xenbus_read_unsigned(info->xbdev->otherend, 
> > > "feature-discard", 0))
> > > blkfront_setup_discard(info);
> > > 
> > > -   if (feature_persistent)
> > > +   info->feature_persistent_parm = feature_persistent;
> > 
> > I think setting this here is too late because "feature-persistent" was 
> > already
> > written to xenstore via talk_to_blkback but with default 0. So during the
> > connect blkback will not see that the guest supports the feature and falls 
> > back
> > to no persistent grants.
> > 
> > Tested only this patch with some hacky dom0 kern

Re: [PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-26 Thread SeongJae Park
On Fri, 26 Aug 2022 21:20:39 + SeongJae Park  wrote:

> Hi Max,
> 
> On Fri, 26 Aug 2022 14:26:58 + Maximilian Heyne  wrote:
> 
> > On Thu, Aug 25, 2022 at 04:15:11PM +0000, SeongJae Park wrote:
> > > 
> > > Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> > > when connect") made blkback to advertise its support of the persistent
> > > grants feature only if the user sets the 'feature_persistent' parameter
> > > of the driver and the frontend advertised its support of the feature.
> > > However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> > > 'feature_persistent' parameter when connect") made the blkfront to work
> > > in the same way.  That is, blkfront also advertises its support of the
> > > persistent grants feature only if the user sets the 'feature_persistent'
> > > parameter of the driver and the backend advertised its support of the
> > > feature.
> > > 
> > > Hence blkback and blkfront will never advertise their support of the
> > > feature but wait until the other advertises the support, even though
> > > users set the 'feature_persistent' parameters of the drivers.  As a
> > > result, the persistent grants feature is disabled always regardless of
> > > the 'feature_persistent' values[1].
> > > 
> > > The problem comes from the misuse of the semantic of the advertisement
> > > of the feature.  The advertisement of the feature should means only
> > > availability of the feature not the decision for using the feature.
> > > However, current behavior is working in the wrong way.
> > > 
> > > This commit fixes the issue by making the blkfront advertises its
> > > support of the feature as user requested via 'feature_persistent'
> > > parameter regardless of the otherend's support of the feature.
> > > 
> > > [1] 
> > > https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> > > 
> > > Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter 
> > > when connect")
> > > Cc:  # 5.10.x
> > > Reported-by: Marek Marczykowski-Górecki 
> > > Suggested-by: Juergen Gross 
> > > Signed-off-by: SeongJae Park 
> > > ---
> > >  drivers/block/xen-blkfront.c | 8 ++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > > index 8e56e69fb4c4..dfae08115450 100644
> > > --- a/drivers/block/xen-blkfront.c
> > > +++ b/drivers/block/xen-blkfront.c
> > > @@ -213,6 +213,9 @@ struct blkfront_info
> > > unsigned int feature_fua:1;
> > > unsigned int feature_discard:1;
> > > unsigned int feature_secdiscard:1;
> > > +   /* Connect-time cached feature_persistent parameter */
> > > +   unsigned int feature_persistent_parm:1;
> > > +   /* Persistent grants feature negotiation result */
> > > unsigned int feature_persistent:1;
> > > unsigned int bounce:1;
> > > unsigned int discard_granularity;
> > > @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device 
> > > *dev,
> > > goto abort_transaction;
> > > }
> > > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", 
> > > "%u",
> > > -   info->feature_persistent);
> > > +   info->feature_persistent_parm);
> > > if (err)
> > > dev_warn(>dev,
> > >  "writing persistent grants feature to xenbus");
> > > @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
> > > blkfront_info *info)
> > > if (xenbus_read_unsigned(info->xbdev->otherend, 
> > > "feature-discard", 0))
> > > blkfront_setup_discard(info);
> > > 
> > > -   if (feature_persistent)
> > > +   info->feature_persistent_parm = feature_persistent;
> > 
> > I think setting this here is too late because "feature-persistent" was 
> > already
> > written to xenstore via talk_to_blkback but with default 0. So during the
> > connect blkback will not see that the guest supports the feature and falls 
> > back
> > to no persistent grants.
> > 
> > Tested only this patch with some hacky d

Re: [PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-26 Thread SeongJae Park
Hi Max,

On Fri, 26 Aug 2022 14:26:58 + Maximilian Heyne  wrote:

> On Thu, Aug 25, 2022 at 04:15:11PM +0000, SeongJae Park wrote:
> > 
> > Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> > when connect") made blkback to advertise its support of the persistent
> > grants feature only if the user sets the 'feature_persistent' parameter
> > of the driver and the frontend advertised its support of the feature.
> > However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> > 'feature_persistent' parameter when connect") made the blkfront to work
> > in the same way.  That is, blkfront also advertises its support of the
> > persistent grants feature only if the user sets the 'feature_persistent'
> > parameter of the driver and the backend advertised its support of the
> > feature.
> > 
> > Hence blkback and blkfront will never advertise their support of the
> > feature but wait until the other advertises the support, even though
> > users set the 'feature_persistent' parameters of the drivers.  As a
> > result, the persistent grants feature is disabled always regardless of
> > the 'feature_persistent' values[1].
> > 
> > The problem comes from the misuse of the semantic of the advertisement
> > of the feature.  The advertisement of the feature should means only
> > availability of the feature not the decision for using the feature.
> > However, current behavior is working in the wrong way.
> > 
> > This commit fixes the issue by making the blkfront advertises its
> > support of the feature as user requested via 'feature_persistent'
> > parameter regardless of the otherend's support of the feature.
> > 
> > [1] 
> > https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> > 
> > Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter 
> > when connect")
> > Cc:  # 5.10.x
> > Reported-by: Marek Marczykowski-Górecki 
> > Suggested-by: Juergen Gross 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/block/xen-blkfront.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 8e56e69fb4c4..dfae08115450 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -213,6 +213,9 @@ struct blkfront_info
> > unsigned int feature_fua:1;
> > unsigned int feature_discard:1;
> > unsigned int feature_secdiscard:1;
> > +   /* Connect-time cached feature_persistent parameter */
> > +   unsigned int feature_persistent_parm:1;
> > +   /* Persistent grants feature negotiation result */
> > unsigned int feature_persistent:1;
> > unsigned int bounce:1;
> > unsigned int discard_granularity;
> > @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
> > goto abort_transaction;
> > }
> > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> > -   info->feature_persistent);
> > +   info->feature_persistent_parm);
> > if (err)
> > dev_warn(>dev,
> >  "writing persistent grants feature to xenbus");
> > @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
> > blkfront_info *info)
> > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 
> > 0))
> > blkfront_setup_discard(info);
> > 
> > -   if (feature_persistent)
> > +   info->feature_persistent_parm = feature_persistent;
> 
> I think setting this here is too late because "feature-persistent" was already
> written to xenstore via talk_to_blkback but with default 0. So during the
> connect blkback will not see that the guest supports the feature and falls 
> back
> to no persistent grants.
> 
> Tested only this patch with some hacky dom0 kernel that doesn't have the patch
> from your series yet. Will do more testing next week.

Appreciate for your test!  And you're right, this patch is not fixing the issue
completely.  That is, commit 402c43ea6b34 ("xen-blkfront: Apply
'feature_persistent' parameter when connect") introduced two bugs.  One is the
misuse of the semantic of the advertisement.  It's fixed by this patch.  The
second bug, which you found here, is caching the parameter in a wrong place.

In detail, blkfront does the advertisement b

Re: “Backend has not unmapped grant” errors

2022-08-25 Thread SeongJae Park
Hi Juergen,


Thank you for the quick and nice reply!

On Thu, 25 Aug 2022 08:20:33 +0200 Juergen Gross  wrote:

> 
[...]
> 
> Could you please send it as two proper patches (one for each driver) with
> the correct "Fixes:" tags?

Sure, just posted:
https://lore.kernel.org/xen-devel/20220825161511.94922-2...@kernel.org/


Thanks,
SJ

> 
> 
> Juergen



[PATCH 0/2] xen-blk{front,back}: Advertise feature-persistent as user requested

2022-08-25 Thread SeongJae Park
Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
when connect") made blkback to advertise its support of the persistent
grants feature only if the user sets the 'feature_persistent' parameter
of the driver and the frontend advertised its support of the feature.
However, following commit 402c43ea6b34 ("xen-blkfront: Apply
'feature_persistent' parameter when connect") made the blkfront to work
in the same way.  That is, blkfront also advertises its support of the
persistent grants feature only if the user sets the 'feature_persistent'
parameter of the driver and the backend advertised its support of the
feature.

Hence blkback and blkfront will never advertise their support of the
feature but wait until the other advertises the support, even though
users set the 'feature_persistent' parameters of the drivers.  As a
result, the persistent grants feature is disabled always regardless of
the 'feature_persistent' values[1].

The problem comes from the misuse of the semantic of the advertisement
of the feature.  The advertisement of the feature should means only
availability of the feature not the decision for using the feature.
However, current behavior is working in the wrong way.

This patchset fixes the issue by making both blkback and blkfront
advertise their support of the feature as user requested via
'feature_persistent' parameter regardless of the otherend's support of
the feature.

[1] 
https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/

SeongJae Park (2):
  xen-blkback: Advertise feature-persistent as user requested
  xen-blkfront: Advertise feature-persistent as user requested

 drivers/block/xen-blkback/common.h | 3 +++
 drivers/block/xen-blkback/xenbus.c | 6 --
 drivers/block/xen-blkfront.c   | 8 ++--
 3 files changed, 13 insertions(+), 4 deletions(-)

-- 
2.25.1




[PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-25 Thread SeongJae Park
Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
when connect") made blkback to advertise its support of the persistent
grants feature only if the user sets the 'feature_persistent' parameter
of the driver and the frontend advertised its support of the feature.
However, following commit 402c43ea6b34 ("xen-blkfront: Apply
'feature_persistent' parameter when connect") made the blkfront to work
in the same way.  That is, blkfront also advertises its support of the
persistent grants feature only if the user sets the 'feature_persistent'
parameter of the driver and the backend advertised its support of the
feature.

Hence blkback and blkfront will never advertise their support of the
feature but wait until the other advertises the support, even though
users set the 'feature_persistent' parameters of the drivers.  As a
result, the persistent grants feature is disabled always regardless of
the 'feature_persistent' values[1].

The problem comes from the misuse of the semantic of the advertisement
of the feature.  The advertisement of the feature should means only
availability of the feature not the decision for using the feature.
However, current behavior is working in the wrong way.

This commit fixes the issue by making the blkfront advertises its
support of the feature as user requested via 'feature_persistent'
parameter regardless of the otherend's support of the feature.

[1] 
https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/

Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when 
connect")
Cc:  # 5.10.x
Reported-by: Marek Marczykowski-Górecki 
Suggested-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkfront.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8e56e69fb4c4..dfae08115450 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -213,6 +213,9 @@ struct blkfront_info
unsigned int feature_fua:1;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
+   /* Connect-time cached feature_persistent parameter */
+   unsigned int feature_persistent_parm:1;
+   /* Persistent grants feature negotiation result */
unsigned int feature_persistent:1;
unsigned int bounce:1;
unsigned int discard_granularity;
@@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
goto abort_transaction;
}
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-   info->feature_persistent);
+   info->feature_persistent_parm);
if (err)
dev_warn(>dev,
 "writing persistent grants feature to xenbus");
@@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (feature_persistent)
+   info->feature_persistent_parm = feature_persistent;
+   if (info->feature_persistent_parm)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




[PATCH 1/2] xen-blkback: Advertise feature-persistent as user requested

2022-08-25 Thread SeongJae Park
Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
when connect") made blkback to advertise its support of the persistent
grants feature only if the user sets the 'feature_persistent' parameter
of the driver and the frontend advertised its support of the feature.
However, following commit 402c43ea6b34 ("xen-blkfront: Apply
'feature_persistent' parameter when connect") made the blkfront to work
in the same way.  That is, blkfront also advertises its support of the
persistent grants feature only if the user sets the 'feature_persistent'
parameter of the driver and the backend advertised its support of the
feature.

Hence blkback and blkfront will never advertise their support of the
feature but wait until the other advertises the support, even though
users set the 'feature_persistent' parameters of the drivers.  As a
result, the persistent grants feature is disabled always regardless of
the 'feature_persistent' values[1].

The problem comes from the misuse of the semantic of the advertisement
of the feature.  The advertisement of the feature should means only
availability of the feature not the decision for using the feature.
However, current behavior is working in the wrong way.

This commit fixes the issue by making the blkback advertises its support
of the feature as user requested via 'feature_persistent' parameter
regardless of the otherend's support of the feature.

[1] 
https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/

Fixes: e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter when 
connect")
Cc:  # 5.10.x
Reported-by: Marek Marczykowski-Górecki 
Suggested-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/common.h | 3 +++
 drivers/block/xen-blkback/xenbus.c | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index bda5c815e441..a28473470e66 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -226,6 +226,9 @@ struct xen_vbd {
sector_tsize;
unsigned intflush_support:1;
unsigned intdiscard_secure:1;
+   /* Connect-time cached feature_persistent parameter value */
+   unsigned intfeature_gnt_persistent_parm:1;
+   /* Persistent grants feature negotiation result */
unsigned intfeature_gnt_persistent:1;
unsigned intoverflow_max_grants:1;
 };
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index ee7ad2fb432d..c0227dfa4688 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
-   be->blkif->vbd.feature_gnt_persistent);
+   be->blkif->vbd.feature_gnt_persistent_parm);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 dev->nodename);
@@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
return -ENOSYS;
}
 
-   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
+   blkif->vbd.feature_gnt_persistent =
+   blkif->vbd.feature_gnt_persistent_parm &&
xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
-- 
2.25.1




Re: “Backend has not unmapped grant” errors

2022-08-24 Thread SeongJae Park
+ Roger

On Wed, 24 Aug 2022 17:44:42 + SeongJae Park  wrote:

> Hello,
> 
> On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross  wrote:
> 
> > 
> > [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]
> > 
> > On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > > FWIW, I hit this issue twice already in this week CI run, while it never
> > > happened before. The difference compared to previous run is Linux
> > > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
> > > only related commits I see there are three commits indeed related to
> > > persistent grants:
> > > 
> > >c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when 
> > > connect
> > >ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when 
> > > connect
> > >7304be4c985d xen-blkback: fix persistent grants negotiation
> > > 
> > > But none of the commit messages suggests intentional disabling it
> > > without explicit request for doing so. I did not requested disabling it
> > > in toolstack (although I have set backend as "trusted" - XSA-403).
> > > I have confirmed it's the frontend version that matters. Running older
> > > frontend kernel with 5.15.61 backend results in persistent grants
> > > enabled (and both frontend and backend xenstore "feature-persistent"
> > > entries are "1" in this case).
> > 
> > This is a mess.
> > 
> > I think the main problem seems to be that the feature negotiation process
> > isn't specified in a sane way.
> > 
> >  From the blkif.h header:
> > 
> > Backend-side:
> >   * feature-persistent
> >   *  Values: 0/1 (boolean)
> >   *  Default Value:  0
> >   *  Notes: 7
> >   *
> >   *  A value of "1" indicates that the backend can keep the grants used
> >   *  by the frontend driver mapped, so the same set of grants should be
> >   *  used in all transactions. The maximum number of grants the backend
> >   *  can map persistently depends on the implementation, but ideally it
> >   *  should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
> >   *  feature the backend doesn't need to unmap each grant, preventing
> >   *  costly TLB flushes. The backend driver should only map grants
> >   *  persistently if the frontend supports it. If a backend driver 
> > chooses
> >   *  to use the persistent protocol when the frontend doesn't support 
> > it,
> >   *  it will probably hit the maximum number of persistently mapped 
> > grants
> >   *  (due to the fact that the frontend won't be reusing the same 
> > grants),
> >   *  and fall back to non-persistent mode. Backend implementations may
> >   *  shrink or expand the number of persistently mapped grants without
> >   *  notifying the frontend depending on memory constraints (this might
> >   *  cause a performance degradation).
> > 
> > Frontend-side:
> >   * feature-persistent
> >   *  Values: 0/1 (boolean)
> >   *  Default Value:  0
> >   *  Notes: 7, 8, 9
> >   *
> >   *  A value of "1" indicates that the frontend will reuse the same 
> > grants
> >   *  for all transactions, allowing the backend to map them with write
> >   *  access (even when it should be read-only). If the frontend hits the
> >   *  maximum number of allowed persistently mapped grants, it can 
> > fallback
> >   *  to non persistent mode. This will cause a performance degradation,
> >   *  since the the backend driver will still try to map those grants
> >   *  persistently. Since the persistent grants protocol is compatible 
> > with
> >   *  the previous protocol, a frontend driver can choose to work in
> >   *  persistent mode even when the backend doesn't support it.
> > 
> > Those definitions don't make clear, which side is the one to decide whether
> > the feature should be used or not. In my understanding the related drivers
> > should just advertise their setting (the _ability_ to use the feature), and
> > it should be used only if both sides have written a "1".
> > 
> > With above patches applied, the frontend will set 'feature-persistent' in
> > Xenstore only, if the backend has done so, but the backend will set it
> > only, if the frontend has done it. This results in persistent grants
> > always being disabled.
> 
> Sorry for making the

Re: “Backend has not unmapped grant” errors

2022-08-24 Thread SeongJae Park
Hello,

On Wed, 24 Aug 2022 08:02:40 +0200 Juergen Gross  wrote:

> 
> [-- Attachment #1.1.1: Type: text/plain, Size: 4312 bytes --]
> 
> On 24.08.22 02:20, Marek Marczykowski-Górecki wrote:
> > FWIW, I hit this issue twice already in this week CI run, while it never
> > happened before. The difference compared to previous run is Linux
> > 5.15.57 vs 5.15.61. The latter reports persistent grants disabled. The
> > only related commits I see there are three commits indeed related to
> > persistent grants:
> > 
> >c98e956ef489 xen-blkfront: Apply 'feature_persistent' parameter when 
> > connect
> >ef26b5d530d4 xen-blkback: Apply 'feature_persistent' parameter when 
> > connect
> >7304be4c985d xen-blkback: fix persistent grants negotiation
> > 
> > But none of the commit messages suggests intentional disabling it
> > without explicit request for doing so. I did not requested disabling it
> > in toolstack (although I have set backend as "trusted" - XSA-403).
> > I have confirmed it's the frontend version that matters. Running older
> > frontend kernel with 5.15.61 backend results in persistent grants
> > enabled (and both frontend and backend xenstore "feature-persistent"
> > entries are "1" in this case).
> 
> This is a mess.
> 
> I think the main problem seems to be that the feature negotiation process
> isn't specified in a sane way.
> 
>  From the blkif.h header:
> 
> Backend-side:
>   * feature-persistent
>   *  Values: 0/1 (boolean)
>   *  Default Value:  0
>   *  Notes: 7
>   *
>   *  A value of "1" indicates that the backend can keep the grants used
>   *  by the frontend driver mapped, so the same set of grants should be
>   *  used in all transactions. The maximum number of grants the backend
>   *  can map persistently depends on the implementation, but ideally it
>   *  should be RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST. Using this
>   *  feature the backend doesn't need to unmap each grant, preventing
>   *  costly TLB flushes. The backend driver should only map grants
>   *  persistently if the frontend supports it. If a backend driver chooses
>   *  to use the persistent protocol when the frontend doesn't support it,
>   *  it will probably hit the maximum number of persistently mapped grants
>   *  (due to the fact that the frontend won't be reusing the same grants),
>   *  and fall back to non-persistent mode. Backend implementations may
>   *  shrink or expand the number of persistently mapped grants without
>   *  notifying the frontend depending on memory constraints (this might
>   *  cause a performance degradation).
> 
> Frontend-side:
>   * feature-persistent
>   *  Values: 0/1 (boolean)
>   *  Default Value:  0
>   *  Notes: 7, 8, 9
>   *
>   *  A value of "1" indicates that the frontend will reuse the same grants
>   *  for all transactions, allowing the backend to map them with write
>   *  access (even when it should be read-only). If the frontend hits the
>   *  maximum number of allowed persistently mapped grants, it can fallback
>   *  to non persistent mode. This will cause a performance degradation,
>   *  since the the backend driver will still try to map those grants
>   *  persistently. Since the persistent grants protocol is compatible with
>   *  the previous protocol, a frontend driver can choose to work in
>   *  persistent mode even when the backend doesn't support it.
> 
> Those definitions don't make clear, which side is the one to decide whether
> the feature should be used or not. In my understanding the related drivers
> should just advertise their setting (the _ability_ to use the feature), and
> it should be used only if both sides have written a "1".
> 
> With above patches applied, the frontend will set 'feature-persistent' in
> Xenstore only, if the backend has done so, but the backend will set it
> only, if the frontend has done it. This results in persistent grants
> always being disabled.

Sorry for making the mess, and thank you for the kind report and detailed
explanation of the problem.

> 
> This is wrong, as the value written should not reflect the current state
> of the interface. That state should be set according to both sides' value,
> probably a cached one on the blkback side (using a new flag for caching it,
> not the current state).

Agreed.  So, I think the issue comes from the fact that we are using one field,
which was a place for saving only the negotiation result, for yet another
purpose: caching of the parameter value.  As a result, the advertisement, which
should follow only the parameter value, becomes inconsistent.

How about simply adding another field for the caching purpose, so that the
advertisation could be done regardless of the negotiation?  For example:

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index bda5c815e441..a28473470e66 100644
--- 

[PATCH v4 0/3] xen-blk{back,front}: Fix two bugs in 'feature_persistent'

2022-07-15 Thread SeongJae Park
Introduction of 'feature_persistent' made two bugs.  First one is wrong
overwrite of 'vbd->feature_gnt_persistent' in 'blkback' due to wrong
parameter value caching position, and the second one is unintended
behavioral change that could break previous dynamic frontend/backend
persistent feature support changes.  This patchset fixes the issues.

Changes from v3
(https://lore.kernel.org/xen-devel/20220715175521.126649-1...@kernel.org/)
- Split 'blkback' patch for each of the two issues
- Add 'Reported-by: Andrii Chepurnyi '

Changes from v2
(https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
- Keep the behavioral change of v1
- Update blkfront's counterpart to follow the changed behavior
- Update documents for the changed behavior

Changes from v1
(https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
- Avoid the behavioral change
  (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@

Maximilian Heyne (1):
  xen-blkback: Apply 'feature_persistent' parameter when connect

SeongJae Park (2):
  xen-blkback: fix persistent grants negotiation
  xen-blkfront: Apply 'feature_persistent' parameter when connect

 .../ABI/testing/sysfs-driver-xen-blkback  |  2 +-
 .../ABI/testing/sysfs-driver-xen-blkfront |  2 +-
 drivers/block/xen-blkback/xenbus.c| 20 ---
 drivers/block/xen-blkfront.c  |  4 +---
 4 files changed, 11 insertions(+), 17 deletions(-)

-- 
2.25.1




[PATCH v4 3/3] xen-blkfront: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
In some use cases[1], the backend is created while the frontend doesn't
support the persistent grants feature, but later the frontend can be
changed to support the feature and reconnect.  In the past, 'blkback'
enabled the persistent grants feature since it unconditionally checked
if frontend supports the persistent grants feature for every connect
('connect_ring()') and decided whether it should use persistent grans or
not.

However, commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has mistakenly changed the behavior.
It made the frontend feature support check to not be repeated once it
shown the 'feature_persistent' as 'false', or the frontend doesn't
support persistent grants.

Similar behavioral change has made on 'blkfront' by commit 74a852479c68
("xen-blkfront: add a parameter for disabling of persistent grants").
This commit changes the behavior of the parameter to make effect for
every connect, so that the previous behavior of 'blkfront' can be
restored.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/

Fixes: 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkfront.c| 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 7f646c58832e..4d36c5a10546 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -15,5 +15,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created frontends.
+that this option only takes effect on newly connected 
frontends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3646c0cae672..4e763701b372 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev,
info->vdevice = vdevice;
info->connected = BLKIF_STATE_DISCONNECTED;
 
-   info->feature_persistent = feature_persistent;
-
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(>dev, info);
@@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (info->feature_persistent)
+   if (feature_persistent)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




[PATCH v4 1/3] xen-blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
Persistent grants feature can be used only when both backend and the
frontend supports the feature.  The feature was always supported by
'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has introduced a parameter for
disabling it runtime.

To avoid the parameter be updated while being used by 'blkback', the
commit caches the parameter into 'vbd->feature_gnt_persistent' in
'xen_vbd_create()', and then check if the guest also supports the
feature and finally updates the field in 'connect_ring()'.

However, 'connect_ring()' could be called before 'xen_vbd_create()', so
later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to
'vbd->feature_gnt_persistent'.  As a result, 'blkback' could try to use
'persistent grants' feature even if the guest doesn't support the
feature.

This commit fixes the issue by moving the parameter value caching to
'xen_blkif_alloc()', which allocates the 'blkif'.  Because the struct
embeds 'vbd' object, which will be used by 'connect_ring()' later, this
should be called before 'connect_ring()' and therefore this should be
the right and safe place to do the caching.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..16c6785d260c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
return 0;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature");
+
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
@@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(>free_work, xen_blkif_deferred_free);
 
+   blkif->vbd.feature_gnt_persistent = feature_persistent;
+
return blkif;
 }
 
@@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
-- 
2.25.1




[PATCH v4 2/3] xen-blkback: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
From: Maximilian Heyne 

In some use cases[1], the backend is created while the frontend doesn't
support the persistent grants feature, but later the frontend can be
changed to support the feature and reconnect.  In the past, 'blkback'
enabled the persistent grants feature since it unconditionally checked
if frontend supports the persistent grants feature for every connect
('connect_ring()') and decided whether it should use persistent grans or
not.

However, commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has mistakenly changed the behavior.
It made the frontend feature support check to not be repeated once it
shown the 'feature_persistent' as 'false', or the frontend doesn't
support persistent grants.

This commit changes the behavior of the parameter to make effect for
every connect, so that the previous workflow can work again as expected.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/

Reported-by: Andrii Chepurnyi 
Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +-
 drivers/block/xen-blkback/xenbus.c | 9 +++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 7faf719af165..fac0f429a869 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -42,5 +42,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created backends.
+that this option only takes effect on newly connected backends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 16c6785d260c..ee7ad2fb432d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -186,8 +186,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(>free_work, xen_blkif_deferred_free);
 
-   blkif->vbd.feature_gnt_persistent = feature_persistent;
-
return blkif;
 }
 
@@ -1086,10 +1084,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   if (blkif->vbd.feature_gnt_persistent)
-   blkif->vbd.feature_gnt_persistent =
-   xenbus_read_unsigned(dev->otherend,
-   "feature-persistent", 0);
+
+   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.25.1




Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
Hi all,

On Fri, 15 Jul 2022 18:12:26 + SeongJae Park  wrote:

> Hi all,
> 
> On Fri, 15 Jul 2022 17:55:19 + SeongJae Park  wrote:
> 
> > The first patch of this patchset fixes 'feature_persistent' parameter
> > handling in 'blkback' to respect the frontend's persistent grants
> > support always.  The fix makes a behavioral change, so the second patch
> > makes the counterpart of 'blkfront' to consistently follow the behavior
> > change.
> 
> I made the behavior change as requested by Andrii[1].  I therefore made 
> similar
> behavior change to blkfront and Cc-ed stable for the second change, too.

Now I realize that commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") introduced two issues.  One is what Max
reported with his patch, and the second one is unintended behavioral change
that broke Andrii's use case.

That is, Andrii's use case should had no problem at all before the introduction
of 'feature_persistent', as at that time 'blkback' checked if the frontend
support the persistent grants for every 'reconnect()' and enables it if so.
However, introduction of the parameter made it behaves differently.

Yes, we intended to make the prameter to make effects to newly created devices.
But, as it breaks user workflows, this should be fixed.  Same for the
'blkfront' side 'feature_persistent'.

> 
> To make the change history clear and reduce the stable side overhead, however,
> it might be better to apply the v2, which don't make behavior change but only
> fix the issue, Cc stable@ for it, make the behavior change commits for both
> blkback and blkfront, update the documents, and don't Cc stable@ for the
> behavior change and documents update commits.

I'd say having one patch for each issue would be the right way to go, and all
fixes should Cc stable@.

> 
> One downside of that would be that it will make a behavioral difference in
> pre-5.19.x and post-5.19.x.

The unintended behavioral fix should also be considered fix and therefore
should be merged into stable@, so above concern is not valid.

I will send the next spin soon.


Thanks,
SJ

[...]



Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
Hi all,

On Fri, 15 Jul 2022 17:55:19 + SeongJae Park  wrote:

> The first patch of this patchset fixes 'feature_persistent' parameter
> handling in 'blkback' to respect the frontend's persistent grants
> support always.  The fix makes a behavioral change, so the second patch
> makes the counterpart of 'blkfront' to consistently follow the behavior
> change.

I made the behavior change as requested by Andrii[1].  I therefore made similar
behavior change to blkfront and Cc-ed stable for the second change, too.

To make the change history clear and reduce the stable side overhead, however,
it might be better to apply the v2, which don't make behavior change but only
fix the issue, Cc stable@ for it, make the behavior change commits for both
blkback and blkfront, update the documents, and don't Cc stable@ for the
behavior change and documents update commits.

One downside of that would be that it will make a behavioral difference in
pre-5.19.x and post-5.19.x.

I think both downsides are not critical, so posted this patchset in this shape.
If anyone prefer some changes, please let me know, though.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/


Thanks,
SJ

> 
> Changes from v2
> (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
> - Keep the behavioral change of v1
> - Update blkfront's counterpart to follow the changed behavior
> - Update documents for the changed behavior
> 
> Changes from v1
> (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
> - Avoid the behavioral change
>   (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
> - Rebase on latest xen/tip/linux-next
> - Re-work by SeongJae Park 
> - Cc stable@
> 
> 
> 
> Maximilian Heyne (1):
>   xen, blkback: fix persistent grants negotiation
> 
> SeongJae Park (1):
>   xen-blkfront: Apply 'feature_persistent' parameter when connect
> 
>  Documentation/ABI/testing/sysfs-driver-xen-blkback  | 2 +-
>  Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
>  drivers/block/xen-blkback/xenbus.c  | 9 +++--
>  drivers/block/xen-blkfront.c| 4 +---
>  4 files changed, 6 insertions(+), 11 deletions(-)
> 
> -- 
> 2.25.1



[PATCH 1/2] xen, blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
From: Maximilian Heyne 

Given dom0 supports persistent grants but the guest does not.
Then, when attaching a block device during runtime of the guest, dom0
will enable persistent grants for this newly attached block device:

  $ xenstore-ls -f | grep 20674 | grep persistent
  /local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
  /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"

Here disk 768 was attached during guest creation while 51792 was
attached at runtime. If the guest would have advertised the persistent
grant feature, there would be a xenstore entry like:

  /local/domain/20674/device/vbd/51792/feature-persistent = "1"

Persistent grants are also used when the guest tries to access the disk
which can be seen when enabling log stats:

  $ echo 1 > /sys/module/xen_blkback/parameters/log_stats
  $ dmesg
  xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f0 |  ds  
  0 | pg:1/1056

The "pg: 1/1056" shows that one persistent grant is used.

Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
of persistent grants") vbd->feature_gnt_persistent was set in
connect_ring. After the commit it was intended to be initialized in
xen_vbd_create and then set according to the guest feature availability
in connect_ring. However, with a running guest, connect_ring might be
called before xen_vbd_create and vbd->feature_gnt_persistent will be
incorrectly initialized. xen_vbd_create will overwrite it with the value
of feature_persistent regardless whether the guest actually supports
persistent grants.

With this commit, vbd->feature_gnt_persistent is set only in
connect_ring and this is the only use of the module parameter
feature_persistent. This avoids races when the module parameter changes
during the block attachment process.

Note that vbd->feature_gnt_persistent doesn't need to be initialized in
xen_vbd_create. It's next use is in connect which can only be called
once connect_ring has initialized the rings. xen_update_blkif_status is
checking for this.

Please also note that this commit makes a behavioral change of the
parameter.  That is, the parameter made effect on only newly created
backends before this commit, but it makes the effect for every new
connection after this commit.  Therefore, this commit also updates the
document.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +-
 drivers/block/xen-blkback/xenbus.c | 9 +++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 7faf719af165..fac0f429a869 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -42,5 +42,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created backends.
+that this option only takes effect on newly connected backends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..874b846fb622 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -520,8 +520,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -1087,10 +1085,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   if (blkif->vbd.feature_gnt_persistent)
-   blkif->vbd.feature_gnt_persistent =
-   xenbus_read_unsigned(dev->otherend,
-   "feature-persistent", 0);
+
+   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.25.1




[PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
The first patch of this patchset fixes 'feature_persistent' parameter
handling in 'blkback' to respect the frontend's persistent grants
support always.  The fix makes a behavioral change, so the second patch
makes the counterpart of 'blkfront' to consistently follow the behavior
change.

Changes from v2
(https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
- Keep the behavioral change of v1
- Update blkfront's counterpart to follow the changed behavior
- Update documents for the changed behavior

Changes from v1
(https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
- Avoid the behavioral change
  (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@



Maximilian Heyne (1):
  xen, blkback: fix persistent grants negotiation

SeongJae Park (1):
  xen-blkfront: Apply 'feature_persistent' parameter when connect

 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 2 +-
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkback/xenbus.c  | 9 +++--
 drivers/block/xen-blkfront.c| 4 +---
 4 files changed, 6 insertions(+), 11 deletions(-)

-- 
2.25.1




[PATCH 2/2] xen-blkfront: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
Previous commit made xen-blkback's 'feature_persistent' parameter to
make effect for not only newly created backends but also for every
reconnected backends.  This commit makes xen-blkfront's counterpart
parameter to works in same manner and update the document to avoid any
confusion due to inconsistent behavior of same-named parameters.

Cc:  # 5.10.x
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkfront.c| 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 7f646c58832e..4d36c5a10546 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -15,5 +15,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created frontends.
+that this option only takes effect on newly connected 
frontends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3646c0cae672..4e763701b372 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev,
info->vdevice = vdevice;
info->connected = BLKIF_STATE_DISCONNECTED;
 
-   info->feature_persistent = feature_persistent;
-
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(>dev, info);
@@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (info->feature_persistent)
+   if (feature_persistent)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




Re: [PATCH v2] xen-blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
Hello,


Oleksandr, thank you for Cc-ing Andrii.  Andrii, thank you for the comment!

On Fri, 15 Jul 2022 15:00:10 +0300 Andrii Chepurnyi 
 wrote:

> [-- Attachment #1: Type: text/plain, Size: 5237 bytes --]
> 
> Hello All,
> 
> I faced the mentioned issue recently and just to bring more context here is
> our setup:
> We use pvblock backend for Android guest. It starts using u-boot with
> pvblock support(which frontend doesn't support the persistent grants
> feature), later it loads and starts the Linux kernel(which frontend
> supports the persistent grants feature). So in total, we have sequent two
> different frontends reconnection, the first of which doesn't support
> persistent grants.
> So the original patch [1] perfectly solves the original issue and provides
> the ability to use persistent grants after the reconnection when Linux
> frontend which supports persistent grants comes into play.
> At the same time [2] will disable the persistent grants feature for the
> first and second frontend.

Thank you for this great explanation of your situation.

> Is it possible to keep [1]  as is?

Yes, my concerns about Max's original patch[1] are conflicting behavior
description in the document[1] and different behavior on blkfront-side
'feature_persistent' parameter.  I will post Max's patch again with patches for
blkfront behavior change and Documents updates.

[1] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/


Thanks,
SJ

> 
> [1]
> https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/
> [2] https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/
> 
> Best regards,
> Andrii
> 
> On Fri, Jul 15, 2022 at 1:15 PM Oleksandr  wrote:
> 
> >
> > On 15.07.22 01:44, SeongJae Park wrote:
> >
> >
> > Hello all.
> >
> > Adding Andrii Chepurnyi to CC who have played with the use-case which
> > required reconnect recently and faced some issues with
> > feature_persistent handling.
[...]



[PATCH v2] xen-blkback: fix persistent grants negotiation

2022-07-14 Thread SeongJae Park
Persistent grants feature can be used only when both backend and the
frontend supports the feature.  The feature was always supported by
'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has introduced a parameter for
disabling it runtime.

To avoid the parameter be updated while being used by 'blkback', the
commit caches the parameter into 'vbd->feature_gnt_persistent' in
'xen_vbd_create()', and then check if the guest also supports the
feature and finally updates the field in 'connect_ring()'.

However, 'connect_ring()' could be called before 'xen_vbd_create()', so
later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to
'vbd->feature_gnt_persistent'.  As a result, 'blkback' could try to use
'persistent grants' feature even if the guest doesn't support the
feature.

This commit fixes the issue by moving the parameter value caching to
'xen_blkif_alloc()', which allocates the 'blkif'.  Because the struct
embeds 'vbd' object, which will be used by 'connect_ring()' later, this
should be called before 'connect_ring()' and therefore this should be
the right and safe place to do the caching.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---

Changes from v1[1]
- Avoid the behavioral change[2]
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@

[1] https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/
[2] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/

 drivers/block/xen-blkback/xenbus.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..16c6785d260c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
return 0;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature");
+
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
@@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(>free_work, xen_blkif_deferred_free);
 
+   blkif->vbd.feature_gnt_persistent = feature_persistent;
+
return blkif;
 }
 
@@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
-- 
2.25.1




[PATCH v2] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent

2022-04-20 Thread SeongJae Park
SeongJae is currently listed as a contact point for some blk{back,front}
features, but he will not work for XEN for a while.  This commit
therefore updates the contact point to his colleague, Maximilian, who is
understanding the context and actively working with the features now.

Signed-off-by: SeongJae Park 
Signed-off-by: Maximilian Heyne 
Acked-by: Roger Pau Monné 
---

Changes from v1
(https://lore.kernel.org/xen-devel/20220301144628.2858-1...@kernel.org/)
- Add Acked-by from Roger

 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 4 ++--
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index a74dfe52dd76..7faf719af165 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -29,7 +29,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
 Date:   December 2019
 KernelVersion:  5.6
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 When memory pressure is reported to blkback this option
 controls the duration in milliseconds that blkback will not
@@ -39,7 +39,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created backends.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 61fd173fabfe..7f646c58832e 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -12,7 +12,7 @@ Description:
 What:   /sys/module/xen_blkfront/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created frontends.
-- 
2.25.1




[PATCH] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent

2022-03-01 Thread SeongJae Park
SeongJae is currently listed as a contact point for some blk{back,front}
features, but he will not work for XEN for a while.  This commit
therefore updates the contact point to his colleague, Maximilian, who is
understanding the context and actively working with the features now.

Signed-off-by: SeongJae Park 
Signed-off-by: Maximilian Heyne 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 4 ++--
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index a74dfe52dd76..7faf719af165 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -29,7 +29,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
 Date:   December 2019
 KernelVersion:  5.6
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 When memory pressure is reported to blkback this option
 controls the duration in milliseconds that blkback will not
@@ -39,7 +39,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created backends.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 61fd173fabfe..7f646c58832e 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -12,7 +12,7 @@ Description:
 What:   /sys/module/xen_blkfront/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created frontends.
-- 
2.17.1




Re: [PATCH] xen, blkback: fix persistent grants negotiation

2022-01-21 Thread SeongJae Park
On Tue, 11 Jan 2022 13:26:50 +0100 "Roger Pau Monné"  
wrote:

> On Tue, Jan 11, 2022 at 11:50:32AM +, Durrant, Paul wrote:
> > On 11/01/2022 11:11, Roger Pau Monné wrote:
> > > On Thu, Jan 06, 2022 at 09:10:13AM +, Maximilian Heyne wrote:
> > > > Given dom0 supports persistent grants but the guest does not.
> > > > Then, when attaching a block device during runtime of the guest, dom0
> > > > will enable persistent grants for this newly attached block device:
> > > > 
> > > >$ xenstore-ls -f | grep 20674 | grep persistent
> > > >/local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
> > > >/local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"
> > > 
> > > The mechanism that we use to advertise persistent grants support is
> > > wrong. 'feature-persistent' should always be set if the backend
> > > supports persistent grant (like it's done for other features in
> > > xen_blkbk_probe). The usage of the feature depends on whether both
> > > parties support persistent grants, and the xenstore entry printed by
> > > blkback shouldn't reflect whether persistent grants are in use, but
> > > rather whether blkback supports the feature.
> > > 
> > > > 
> > > > Here disk 768 was attached during guest creation while 51792 was
> > > > attached at runtime. If the guest would have advertised the persistent
> > > > grant feature, there would be a xenstore entry like:
> > > > 
> > > >/local/domain/20674/device/vbd/51792/feature-persistent = "1"
> > > > 
> > > > Persistent grants are also used when the guest tries to access the disk
> > > > which can be seen when enabling log stats:
> > > > 
> > > >$ echo 1 > /sys/module/xen_blkback/parameters/log_stats
> > > >$ dmesg
> > > >xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f
> > > > 0 |  ds0 | pg:1/1056
> > > > 
> > > > The "pg: 1/1056" shows that one persistent grant is used.
> > > > 
> > > > Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
> > > > of persistent grants") vbd->feature_gnt_persistent was set in
> > > > connect_ring. After the commit it was intended to be initialized in
> > > > xen_vbd_create and then set according to the guest feature availability
> > > > in connect_ring. However, with a running guest, connect_ring might be
> > > > called before xen_vbd_create and vbd->feature_gnt_persistent will be
> > > > incorrectly initialized. xen_vbd_create will overwrite it with the value
> > > > of feature_persistent regardless whether the guest actually supports
> > > > persistent grants.
> > > > 
> > > > With this commit, vbd->feature_gnt_persistent is set only in
> > > > connect_ring and this is the only use of the module parameter
> > > > feature_persistent. This avoids races when the module parameter changes
> > > > during the block attachment process.
> > > > 
> > > > Note that vbd->feature_gnt_persistent doesn't need to be initialized in
> > > > xen_vbd_create. It's next use is in connect which can only be called
> > > > once connect_ring has initialized the rings. xen_update_blkif_status is
> > > > checking for this.
> > > > 
> > > > Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of 
> > > > persistent grants")
> > > > Signed-off-by: Maximilian Heyne 
> > > > ---
> > > >   drivers/block/xen-blkback/xenbus.c | 9 +++--
> > > >   1 file changed, 3 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > > > b/drivers/block/xen-blkback/xenbus.c
> > > > index 914587aabca0c..51b6ec0380ca4 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
> > > > blkif_vdev_t handle,
> > > > if (q && blk_queue_secure_erase(q))
> > > > vbd->discard_secure = true;
> > > > -   vbd->feature_gnt_persistent = feature_persistent;
> > > > -
> > > > pr_debug("Successful creation of handle=%04x (dom=%u)\n",
> > > > handle, blkif->domid);
> > > > return 0;
> > > > @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
> > > > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", 
> > > > protocol);
> > > > return -ENOSYS;
> > > > }
> > > > -   if (blkif->vbd.feature_gnt_persistent)
> > > > -   blkif->vbd.feature_gnt_persistent =
> > > > -   xenbus_read_unsigned(dev->otherend,
> > > > -   "feature-persistent", 0);
> > > > +
> > > > +   blkif->vbd.feature_gnt_persistent = feature_persistent &&
> > > > +   xenbus_read_unsigned(dev->otherend, 
> > > > "feature-persistent", 0);
> > > 
> > > I'm not sure it's correct to potentially read feature_persistent
> > > multiple times like it's done here.
> > > 
> > > A device can be disconnected and re-attached multiple times, and that
> > > implies 

Re: [PATCH] xen, blkback: fix persistent grants negotiation

2022-01-06 Thread SeongJae Park
From: SeongJae Park 

On Thu, 6 Jan 2022 09:10:13 + Maximilian Heyne  wrote:

> Given dom0 supports persistent grants but the guest does not.
> Then, when attaching a block device during runtime of the guest, dom0
> will enable persistent grants for this newly attached block device:
> 
>   $ xenstore-ls -f | grep 20674 | grep persistent
>   /local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
>   /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"
> 
> Here disk 768 was attached during guest creation while 51792 was
> attached at runtime. If the guest would have advertised the persistent
> grant feature, there would be a xenstore entry like:
> 
>   /local/domain/20674/device/vbd/51792/feature-persistent = "1"
> 
> Persistent grants are also used when the guest tries to access the disk
> which can be seen when enabling log stats:
> 
>   $ echo 1 > /sys/module/xen_blkback/parameters/log_stats
>   $ dmesg
>   xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f0 |  
> ds0 | pg:1/1056
> 
> The "pg: 1/1056" shows that one persistent grant is used.
> 
> Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
> of persistent grants") vbd->feature_gnt_persistent was set in
> connect_ring. After the commit it was intended to be initialized in
> xen_vbd_create and then set according to the guest feature availability
> in connect_ring. However, with a running guest, connect_ring might be
> called before xen_vbd_create and vbd->feature_gnt_persistent will be
> incorrectly initialized. xen_vbd_create will overwrite it with the value
> of feature_persistent regardless whether the guest actually supports
> persistent grants.
> 
> With this commit, vbd->feature_gnt_persistent is set only in
> connect_ring and this is the only use of the module parameter
> feature_persistent. This avoids races when the module parameter changes
> during the block attachment process.
> 
> Note that vbd->feature_gnt_persistent doesn't need to be initialized in
> xen_vbd_create. It's next use is in connect which can only be called
> once connect_ring has initialized the rings. xen_update_blkif_status is
> checking for this.
> 
> Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of 
> persistent grants")
> Signed-off-by: Maximilian Heyne 

Thank you for this patch!

Reviewed-by: SeongJae Park 

Also, I guess this tag is needed?

Cc:  # 5.10.x


Thanks,
SJ

> ---
>  drivers/block/xen-blkback/xenbus.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 914587aabca0c..51b6ec0380ca4 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
> blkif_vdev_t handle,
>   if (q && blk_queue_secure_erase(q))
>   vbd->discard_secure = true;
>  
> - vbd->feature_gnt_persistent = feature_persistent;
> -
>   pr_debug("Successful creation of handle=%04x (dom=%u)\n",
>   handle, blkif->domid);
>   return 0;
> @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
>   xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
>   return -ENOSYS;
>   }
> - if (blkif->vbd.feature_gnt_persistent)
> - blkif->vbd.feature_gnt_persistent =
> - xenbus_read_unsigned(dev->otherend,
> - "feature-persistent", 0);
> +
> + blkif->vbd.feature_gnt_persistent = feature_persistent &&
> + xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
>  
>   blkif->vbd.overflow_max_grants = 0;
>  
> -- 
> 2.32.0
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 



Re: [PATCH linux-next] drivers/xen/xenbus/xenbus_client.c: fix bugon.cocci warnings

2021-08-25 Thread SeongJae Park
From: SeongJae Park 

On Tue, 24 Aug 2021 23:24:51 -0700 CGEL  wrote:

> From: Jing Yangyang 
> 
> Use BUG_ON instead of a if condition followed by BUG.
> 
> Generated by: scripts/coccinelle/misc/bugon.cocci
> 
> Reported-by: Zeal Robot 
> Signed-off-by: Jing Yangyang 

Reviewed-by: SeongJae Park 


Thanks,
SJ

[...]



Re: [PATCH 1/2] xen/blkback: turn the cache purge LRU interval into a parameter

2020-10-15 Thread SeongJae Park
On Thu, 15 Oct 2020 16:24:15 +0200 Roger Pau Monne  wrote:

> Assume that reads and writes to the variable will be atomic. The worse
> that could happen is that one of the LRU intervals is not calculated
> properly if a partially written value is read, but that would only be
> a transient issue.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Jens Axboe 
> Cc: Boris Ostrovsky 
> Cc: SeongJae Park 
> Cc: xen-devel@lists.xenproject.org
> Cc: linux-bl...@vger.kernel.org
> Cc: J. Roeleveld 
> Cc: Jürgen Groß 
> ---
>  Documentation/ABI/testing/sysfs-driver-xen-blkback | 10 ++
>  drivers/block/xen-blkback/blkback.c|  9 ++---
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> index ecb7942ff146..776f25d335ca 100644
> --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> @@ -35,3 +35,13 @@ Description:
>  controls the duration in milliseconds that blkback will not
>  cache any page not backed by a grant mapping.
>  The default is 10ms.
> +
> +What:   /sys/module/xen_blkback/parameters/lru_internval
> +Date:   October 2020
> +KernelVersion:  5.10
> +Contact:Roger Pau Monné 
> +Description:
> +The LRU mechanism to clean the lists of persistent grants 
> needs
> +to be executed periodically. This parameter controls the time
> +interval between consecutive executions of the purge 
> mechanism
> +is set in ms.

I think noticing the default value (100ms) here would be better.


Thanks,
SeongJae Park



Re: [PATCH] xen-blkback: add a parameter for disabling of persistent grants

2020-09-24 Thread SeongJae Park
On Wed, 23 Sep 2020 16:09:30 -0400 Konrad Rzeszutek Wilk 
 wrote:

> On Tue, Sep 22, 2020 at 09:01:25AM +0200, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overhead[1] and thus it is
> > required to be disabled.  But, there is no option to disable it.  For
> > the reason, this commit adds a module parameter for disabling of the
> > feature.
> 
> Would it be better suited to have it per guest?

The latest version of this patchset[1] supports blkfront side disablement.
Could that partially solves your concern?

[1] https://lore.kernel.org/xen-devel/20200923061841.20531-1-sjp...@amazon.com/


Thanks,
SeongJae Park



[PATCH v4 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-23 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds a module parameter for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: Anthony Liguori 
Signed-off-by: SeongJae Park 
Reviewed-by: Juergen Gross 
Acked-by: Roger Pau Monné 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 drivers/block/xen-blkback/xenbus.c| 22 ++-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index ecb7942ff146..ac2947b98950 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -35,3 +35,12 @@ Description:
 controls the duration in milliseconds that blkback will not
 cache any page not backed by a grant mapping.
 The default is 10ms.
+
+What:   /sys/module/xen_blkback/parameters/feature_persistent
+Date:   September 2020
+KernelVersion:  5.10
+Contact:SeongJae Park 
+Description:
+Whether to enable the persistent grants feature or not.  Note
+that this option only takes effect on newly created backends.
+The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b9aa5d1ac10b..8fc34211dc8b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -474,6 +474,12 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -519,6 +525,8 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (q && blk_queue_secure_erase(q))
vbd->discard_secure = true;
 
+   vbd->feature_gnt_persistent = feature_persistent;
+
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -906,7 +914,8 @@ static void connect(struct backend_info *be)
 
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
-   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
+   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
+   be->blkif->vbd.feature_gnt_persistent);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 dev->nodename);
@@ -1067,7 +1076,6 @@ static int connect_ring(struct backend_info *be)
 {
struct xenbus_device *dev = be->dev;
struct xen_blkif *blkif = be->blkif;
-   unsigned int pers_grants;
char protocol[64] = "";
int err, i;
char *xspath;
@@ -1093,9 +1101,11 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
-  0);
-   blkif->vbd.feature_gnt_persistent = pers_grants;
+   if (blkif->vbd.feature_gnt_persistent)
+   blkif->vbd.feature_gnt_persistent =
+   xenbus_read_unsigned(dev->otherend,
+   "feature-persistent", 0);
+
blkif->vbd.overflow_max_grants = 0;
 
/*
@@ -1118,7 +1128,7 @@ static int connect_ring(struct backend_info *be)
 
pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
 blkif->nr_rings, blkif->blk_protocol, protocol,
-pers_grants ? "persistent grants" : "");
+blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
ring_page_order = xenbus_read_unsigned(dev->otherend,
   "ring-page-order", 0);
-- 
2.17.1




[PATCH v4 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-23 Thread SeongJae Park
From: SeongJae Park 

Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
parameter") changed the name of the module parameter for the maximum
amount of segments in indirect requests but missed updating the
document.  This commit updates the document.

Signed-off-by: SeongJae Park 
Reviewed-by: Juergen Gross 
Acked-by: Roger Pau Monné 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 9c31334cb2e6..28008905615f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -1,4 +1,4 @@
-What:   /sys/module/xen_blkfront/parameters/max
+What:   /sys/module/xen_blkfront/parameters/max_indirect_segments
 Date:   June 2013
 KernelVersion:  3.11
 Contact:Konrad Rzeszutek Wilk 
-- 
2.17.1




Re: [PATCH v3 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 16:44:25 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 04:27:39PM +0200, Jürgen Groß wrote:
> > On 22.09.20 16:15, SeongJae Park wrote:
> > > From: SeongJae Park 
> > > 
> > > Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
> > > parameter") changed the name of the module parameter for the maximum
> > > amount of segments in indirect requests but missed updating the
> > > document.  This commit updates the document.
> > > 
> > > Signed-off-by: SeongJae Park 
> > 
> > Reviewed-by: Juergen Gross 
> 
> Does this need to be backported to stable branches?

I don't think so, as this is not a bug affecting users?


Thanks,
SeongJae Park



[PATCH v3 0/3] xen-blk(back|front): Let users disable persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds module parameters for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Baseline and Complete Git Trees
===

The patches are based on the v5.9-rc6.  You can also clone the complete
git tree:

$ git clone git://github.com/sjp38/linux -b pgrants_disable_v3

The web is also available:
https://github.com/sjp38/linux/tree/pgrants_disable_v3

Patch History
=

Changes from v2
(https://lore.kernel.org/linux-block/20200922105209.5284-1-sjp...@amazon.com/)
- Avoid race conditions (Roger Pau Monné)

Changes from v1
(https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/)
- use 'bool' parameter type (Jürgen Groß)
- Let blkfront can also disable the feature from its side
  (Roger Pau Monné)
- Avoid unnecessary xenbus_printf (Roger Pau Monné)
- Update frontend parameter doc


SeongJae Park (3):
  xen-blkback: add a parameter for disabling of persistent grants
  xen-blkfront: add a parameter for disabling of persistent grants
  xen-blkfront: Apply changed parameter name to the document

 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 .../ABI/testing/sysfs-driver-xen-blkfront | 11 +-
 drivers/block/xen-blkback/xenbus.c| 22 ++-
 drivers/block/xen-blkfront.c  | 20 -
 4 files changed, 50 insertions(+), 12 deletions(-)

-- 
2.17.1




Re: [PATCH v2 2/3] xen-blkfront: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 14:11:32 +0200 "Jürgen Groß"  wrote:

> On 22.09.20 12:52, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overheads[1] and thus it is
> > required to be disabled.  It can be disabled from blkback side using a
> > module parameter, 'feature_persistent'.  But, it is impossible from
> > blkfront side.  For the reason, this commit adds a blkfront module
> > parameter for disabling of the feature.
> > 
> > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > 
> > Signed-off-by: SeongJae Park 
> > ---
> >   .../ABI/testing/sysfs-driver-xen-blkfront |  9 ++
> >   drivers/block/xen-blkfront.c  | 28 +--
> >   2 files changed, 29 insertions(+), 8 deletions(-)
> > 
[...]
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 91de2e0755ae..49c324f377de 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -149,6 +149,13 @@ static unsigned int xen_blkif_max_ring_order;
> >   module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
> > 0444);
> >   MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used 
> > for the shared ring");
> >   
> > +/* Enable the persistent grants feature. */
> > +static bool feature_persistent = true;
> > +module_param(feature_persistent, bool, 0644);
> > +MODULE_PARM_DESC(feature_persistent,
> > +   "Enables the persistent grants feature");
> > +
> > +
> >   #define BLK_RING_SIZE(info)   \
> > __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
> >   
> > @@ -1866,11 +1873,13 @@ static int talk_to_blkback(struct xenbus_device 
> > *dev,
> > message = "writing protocol";
> > goto abort_transaction;
> > }
> > -   err = xenbus_printf(xbt, dev->nodename,
> > -   "feature-persistent", "%u", 1);
> > -   if (err)
> > -   dev_warn(>dev,
> > -"writing persistent grants feature to xenbus");
> > +   if (feature_persistent) {
> > +   err = xenbus_printf(xbt, dev->nodename,
> > +   "feature-persistent", "%u", 1);
> > +   if (err)
> > +   dev_warn(>dev,
> > +"writing persistent grants feature to xenbus");
> > +   }
> >   
> > err = xenbus_transaction_end(xbt, 0);
> > if (err) {
> > @@ -2316,9 +2325,12 @@ static void blkfront_gather_backend_features(struct 
> > blkfront_info *info)
> > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
> > blkfront_setup_discard(info);
> >   
> > -   info->feature_persistent =
> > -   !!xenbus_read_unsigned(info->xbdev->otherend,
> > -  "feature-persistent", 0);
> > +   if (feature_persistent)
> > +   info->feature_persistent =
> > +   !!xenbus_read_unsigned(info->xbdev->otherend,
> > +  "feature-persistent", 0);
> > +   else
> > +   info->feature_persistent = 0;
> >   
> > indirect_segments = xenbus_read_unsigned(info->xbdev->otherend,
> > "feature-max-indirect-segments", 0);
> > 
> 
> Here you have the same problem as in blkback: feature_persistent could
> change its value between the two tests.

Yes, indeed.  I will fix this in the next version.


Thanks,
SeongJae Park



Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:35:30 +0200 "Jürgen Groß"  wrote:

> On 22.09.20 13:26, SeongJae Park wrote:
> > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
> > wrote:
> > 
> >> On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> Persistent grants feature provides high scalability.  On some small
> >>> systems, however, it could incur data copy overheads[1] and thus it is
> >>> required to be disabled.  But, there is no option to disable it.  For
> >>> the reason, this commit adds a module parameter for disabling of the
> >>> feature.
> >>>
> >>> [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> >>>
> >>> Signed-off-by: Anthony Liguori 
> >>> Signed-off-by: SeongJae Park 
> >>> ---
> >>>   .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> >>>   drivers/block/xen-blkback/xenbus.c| 28 ++-
> >>>   2 files changed, 30 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> >>> b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> index ecb7942ff146..ac2947b98950 100644
> >>> --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> @@ -35,3 +35,12 @@ Description:
> >>>   controls the duration in milliseconds that blkback will 
> >>> not
> >>>   cache any page not backed by a grant mapping.
> >>>   The default is 10ms.
> >>> +
> >>> +What:   /sys/module/xen_blkback/parameters/feature_persistent
> >>> +Date:   September 2020
> >>> +KernelVersion:  5.10
> >>> +Contact:SeongJae Park 
> >>> +Description:
> >>> +Whether to enable the persistent grants feature or not.  
> >>> Note
> >>> +that this option only takes effect on newly created 
> >>> backends.
> >>> +The default is Y (enable).
> >>> diff --git a/drivers/block/xen-blkback/xenbus.c 
> >>> b/drivers/block/xen-blkback/xenbus.c
> >>> index b9aa5d1ac10b..8a95ddd08b13 100644
> >>> --- a/drivers/block/xen-blkback/xenbus.c
> >>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>> @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
> >>>   
> >>>   /* ** Connection ** */
> >>>   
> >>> +/* Enable the persistent grants feature. */
> >>> +static bool feature_persistent = true;
> >>> +module_param(feature_persistent, bool, 0644);
> >>> +MODULE_PARM_DESC(feature_persistent,
> >>> + "Enables the persistent grants feature");
> >>> +
> >>>   /*
> >>>* Write the physical details regarding the block device to the store, 
> >>> and
> >>>* switch to Connected state.
> >>> @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> >>>   
> >>>   xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> >>>   
> >>> - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
> >>> - if (err) {
> >>> - xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> >>> -  dev->nodename);
> >>> - goto abort;
> >>> + if (feature_persistent) {
> >>> + err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
> >>> + "%u", feature_persistent);
> >>> + if (err) {
> >>> + xenbus_dev_fatal(dev, err,
> >>> + "writing %s/feature-persistent",
> >>> + dev->nodename);
> >>> + goto abort;
> >>> + }
> >>>   }
> >>>   
> >>>   err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
> >>> @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
> >>>   xenbus_dev_fatal(dev, err, "unknown fe protocol %s", 
> >>> pr

Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:35:11 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 01:26:38PM +0200, SeongJae Park wrote:
> > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
> > wrote:
> > 
> > > On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> > > > From: SeongJae Park 
> > > > 
> > > > Persistent grants feature provides high scalability.  On some small
> > > > systems, however, it could incur data copy overheads[1] and thus it is
> > > > required to be disabled.  But, there is no option to disable it.  For
> > > > the reason, this commit adds a module parameter for disabling of the
> > > > feature.
> > > > 
> > > > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > > > 
> > > > Signed-off-by: Anthony Liguori 
> > > > Signed-off-by: SeongJae Park 
> > > > ---
> > > >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> > > >  drivers/block/xen-blkback/xenbus.c| 28 ++-
> > > >  2 files changed, 30 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > > > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > index ecb7942ff146..ac2947b98950 100644
> > > > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > @@ -35,3 +35,12 @@ Description:
> > > >  controls the duration in milliseconds that blkback 
> > > > will not
> > > >  cache any page not backed by a grant mapping.
> > > >  The default is 10ms.
> > > > +
> > > > +What:   /sys/module/xen_blkback/parameters/feature_persistent
> > > > +Date:   September 2020
> > > > +KernelVersion:  5.10
> > > > +Contact:SeongJae Park 
> > > > +Description:
> > > > +Whether to enable the persistent grants feature or 
> > > > not.  Note
> > > > +that this option only takes effect on newly created 
> > > > backends.
> > > > +The default is Y (enable).
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > > > b/drivers/block/xen-blkback/xenbus.c
> > > > index b9aa5d1ac10b..8a95ddd08b13 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device 
> > > > *dev)
> > > >  
> > > >  /* ** Connection ** */
> > > >  
> > > > +/* Enable the persistent grants feature. */
> > > > +static bool feature_persistent = true;
> > > > +module_param(feature_persistent, bool, 0644);
> > > > +MODULE_PARM_DESC(feature_persistent,
> > > > +   "Enables the persistent grants feature");
> > > > +
> > > >  /*
> > > >   * Write the physical details regarding the block device to the store, 
> > > > and
> > > >   * switch to Connected state.
> > > > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> > > >  
> > > > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> > > >  
> > > > -   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", 
> > > > "%u", 1);
> > > > -   if (err) {
> > > > -   xenbus_dev_fatal(dev, err, "writing 
> > > > %s/feature-persistent",
> > > > -dev->nodename);
> > > > -   goto abort;
> > > > +   if (feature_persistent) {
> > > > +   err = xenbus_printf(xbt, dev->nodename, 
> > > > "feature-persistent",
> > > > +   "%u", feature_persistent);
> > > > +   if (err) {
> > > > +   xenbus_dev_fatal(dev, err,
> > > > +   "writing %s/feature-persistent",
> > > > +   dev->nodename);
> > > > +   goto abort;
> > > > +   }
> > > > 

Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overheads[1] and thus it is
> > required to be disabled.  But, there is no option to disable it.  For
> > the reason, this commit adds a module parameter for disabling of the
> > feature.
> > 
> > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > 
> > Signed-off-by: Anthony Liguori 
> > Signed-off-by: SeongJae Park 
> > ---
> >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> >  drivers/block/xen-blkback/xenbus.c| 28 ++-
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > index ecb7942ff146..ac2947b98950 100644
> > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > @@ -35,3 +35,12 @@ Description:
> >  controls the duration in milliseconds that blkback will not
> >  cache any page not backed by a grant mapping.
> >  The default is 10ms.
> > +
> > +What:   /sys/module/xen_blkback/parameters/feature_persistent
> > +Date:   September 2020
> > +KernelVersion:  5.10
> > +Contact:SeongJae Park 
> > +Description:
> > +Whether to enable the persistent grants feature or not.  
> > Note
> > +that this option only takes effect on newly created 
> > backends.
> > +The default is Y (enable).
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index b9aa5d1ac10b..8a95ddd08b13 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
> >  
> >  /* ** Connection ** */
> >  
> > +/* Enable the persistent grants feature. */
> > +static bool feature_persistent = true;
> > +module_param(feature_persistent, bool, 0644);
> > +MODULE_PARM_DESC(feature_persistent,
> > +   "Enables the persistent grants feature");
> > +
> >  /*
> >   * Write the physical details regarding the block device to the store, and
> >   * switch to Connected state.
> > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> >  
> > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> >  
> > -   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
> > -   if (err) {
> > -   xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> > -dev->nodename);
> > -   goto abort;
> > +   if (feature_persistent) {
> > +   err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
> > +   "%u", feature_persistent);
> > +   if (err) {
> > +   xenbus_dev_fatal(dev, err,
> > +   "writing %s/feature-persistent",
> > +   dev->nodename);
> > +   goto abort;
> > +   }
> > }
> >  
> > err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
> > @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
> > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> > return -ENOSYS;
> > }
> > -   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
> > -  0);
> > +   if (feature_persistent)
> > +   pers_grants = xenbus_read_unsigned(dev->otherend,
> > +   "feature-persistent", 0);
> > +   else
> > +   pers_grants = 0;
> > +
> 
> Sorry for not realizing earlier, but looking at it again I think you
> need to cache the value of feature_persistent when it's first used in
> the blkback state data, so that it's consistent.
> 
> What would happen for example with the following flow (assume a
> persistent grants enabled frontend):
> 
> feature_persistent = false
> 
> connect(...)
> feature-persistent is not written to xenstore
> 
> User changes feature_persistent = true
> 
> connect_ring(...)
> pers_grants = true, because feature-persistent is set unconditionally
> by the frontend and feature_persistent variable is now true.
> 
> Then blkback will try to use persistent grants and the whole
> connection will malfunction because the frontend won't.

Ah, you're right.  I should also catch this before but didn't, sorry.

> 
> The other option is to prevent changing the variable when there are
> blkback instances already running.

I think storing the option value in xenstore would be simpler.  That said, if
you prefer this way, please let me know.


Thanks,
SeongJae Park



[PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds a module parameter for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: Anthony Liguori 
Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
 drivers/block/xen-blkback/xenbus.c| 28 ++-
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index ecb7942ff146..ac2947b98950 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -35,3 +35,12 @@ Description:
 controls the duration in milliseconds that blkback will not
 cache any page not backed by a grant mapping.
 The default is 10ms.
+
+What:   /sys/module/xen_blkback/parameters/feature_persistent
+Date:   September 2020
+KernelVersion:  5.10
+Contact:SeongJae Park 
+Description:
+Whether to enable the persistent grants feature or not.  Note
+that this option only takes effect on newly created backends.
+The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b9aa5d1ac10b..8a95ddd08b13 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
@@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
 
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
-   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
-   if (err) {
-   xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
-dev->nodename);
-   goto abort;
+   if (feature_persistent) {
+   err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
+   "%u", feature_persistent);
+   if (err) {
+   xenbus_dev_fatal(dev, err,
+   "writing %s/feature-persistent",
+   dev->nodename);
+   goto abort;
+   }
}
 
err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
@@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
-  0);
+   if (feature_persistent)
+   pers_grants = xenbus_read_unsigned(dev->otherend,
+   "feature-persistent", 0);
+   else
+   pers_grants = 0;
+
blkif->vbd.feature_gnt_persistent = pers_grants;
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.17.1




[PATCH v2 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
parameter") changed the name of the module parameter for the maximum
amount of segments in indirect requests but missed updating the
document.  This commit updates the document.

Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 9c31334cb2e6..28008905615f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -1,4 +1,4 @@
-What:   /sys/module/xen_blkfront/parameters/max
+What:   /sys/module/xen_blkfront/parameters/max_indirect_segments
 Date:   June 2013
 KernelVersion:  3.11
 Contact:Konrad Rzeszutek Wilk 
-- 
2.17.1




[PATCH v2 0/3] xen-blk(back|front): Let users disable persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds module parameters for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Baseline and Complete Git Trees
===

The patches are based on the v5.9-rc6.  You can also clone the complete
git tree:

$ git clone git://github.com/sjp38/linux -b pgrants_disable_v2

The web is also available:
https://github.com/sjp38/linux/tree/pgrants_disable_v2

Patch History
=

Changes from v1
(https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/)
- use 'bool' parameter type (Jürgen Groß)
- Let blkfront can also disable the feature from its side
  (Roger Pau Monné)
- Avoid unnecessary xenbus_printf (Roger Pau Monné)
- Update frontend parameter doc

SeongJae Park (3):
  xen-blkback: add a parameter for disabling of persistent grants
  xen-blkfront: add a parameter for disabling of persistent grants
  xen-blkfront: Apply changed parameter name to the document

 .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
 .../ABI/testing/sysfs-driver-xen-blkfront | 11 +++-
 drivers/block/xen-blkback/xenbus.c| 28 ++-
 drivers/block/xen-blkfront.c  | 28 +--
 4 files changed, 60 insertions(+), 16 deletions(-)

-- 
2.17.1




Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2020-01-13 Thread SeongJae Park
On Mon, 13 Jan 2020 10:55:07 +0100 "Roger Pau Monné"  
wrote:

> On Mon, Jan 13, 2020 at 10:49:52AM +0100, SeongJae Park wrote:
> > Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' 
> > from
> > appropriate maintainers by last Wednesday, and after that, got no comment 
> > yet.
> > May I ask some more comments?
> 
> I'm not sure why more comments are needed, patches have all the
> relevant Acks and will be pushed in due time unless someone has
> objections.
> 
> Please be patient and wait at least until the next merge window, this
> patches are not bug fixes so pushing them now would be wrong.

Ok, I will.  Thank you for your quick and nice reply.


Thanks,
SeongJae Park

> 
> Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2020-01-13 Thread SeongJae Park
Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' from
appropriate maintainers by last Wednesday, and after that, got no comment yet.
May I ask some more comments?


Thanks,
SeongJae Park

On Wed, 18 Dec 2019 19:37:13 +0100 SeongJae Park  wrote:

> Granting pages consumes backend system memory.  In systems configured
> with insufficient spare memory for those pages, it can cause a memory
> pressure situation.  However, finding the optimal amount of the spare
> memory is challenging for large systems having dynamic resource
> utilization patterns.  Also, such a static configuration might lack
> flexibility.
> 
> To mitigate such problems, this patchset adds a memory reclaim callback
> to 'xenbus_driver' (patch 1) and then introduce a lock for race
> condition avoidance (patch 2).  After that, patch 3 applies the callback
> mechanism to mitigate the problem in 'xen-blkback'.  The fourth and
> fifth patches are trivial cleanups; those fix nits we found during the
> development of this patchset.
> 
> Note that patches 1, 4, and 5 are not changed since v9.
> 
> 
> Base Version
> 
> 
> This patch is based on v5.4.  A complete tree is also available at my
> public git repo:
> https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13
> 
> 
> Patch History
> -
> 
> Changes from v12
> (https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/)
>  - Do not unnecessarily disable interrupts (suggested by Juergen)
>  - Hold lock from xenbus side (suggested by Juergen)
> 
> Changes from v11
> (https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/)
>  - Fix wrong trylock use (reported by Juergen)
>  - Merge patch 3 and 4 (suggested by Juergen)
>  - Update test result
> 
> Changes from v10
> (https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
>  - Fix race condition (reported by SeongJae, suggested by Juergen)
> 
> Changes from v9
> (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
>  - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
>  - Update the commit message for overhead test of the 2nd path
> 
> Changes from v8
> (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
>  - Drop 'Reviewed-by: Juergen' from the second patch
>(suggested by Roger Pau Monné)
>  - Update contact of the new module param to SeongJae Park
>
>(suggested by Roger Pau Monné)
>  - Wordsmith the description of the parameter
>(suggested by Roger Pau Monné)
>  - Fix dumb bugs
>(suggested by Roger Pau Monné)
>  - Move module param definition to xenbus.c and reduce the number of
>lines for this change
>(suggested by Roger Pau Monné)
>  - Add a comment for the new callback, reclaim_memory, as other
>callbacks also have
>  - Add another trivial cleanup of xenbus.c file (4th patch)
> 
> Changes from v7
> (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
>  - Update sysfs-driver-xen-blkback for new parameter
>(suggested by Roger Pau Monné)
>  - Use per-xen_blkif buffer_squeeze_end instead of global variable
>(suggested by Roger Pau Monné)
> 
> Changes from v6
> (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
>  - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
>  - Constify a variable (suggested by Roger Pau Monné)
>  - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
>  - More wordsmith of the commit message (suggested by Roger Pau Monné)
> 
> Changes from v5
> (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
>  - Wordsmith the commit messages (suggested by Roger Pau Monné)
>  - Change the reclaim callback return type (suggested by Roger Pau
>Monné)
>  - Change the type of the blkback squeeze duration variable
>(suggested by Roger Pau Monné)
>  - Add a patch for removal of unnecessary static variable name prefixes
>(suggested by Roger Pau Monné)
>  - Fix checkpatch.pl warnings
> 
> Changes from v4
> (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
>  - Remove domain id parameter from the callback (suggested by Juergen
>Gross)
>  - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)
> 
> Changes from v3
> (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
>  - Add general callback in xen_driver and use it (suggested by Juergen
>Gross)
> 
> Changes from v2
> (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
>  - Rename the module parameter and variables for brevity
>  

Re: [Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected

2020-01-02 Thread SeongJae Park
Hello Roger,

Sorry if I'm disturbing your vacation.  If you are already came back to work,
may I ask your opinion about this patch?

On Wed, 18 Dec 2019 19:37:16 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and is increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, host administrators can cause memory pressure in blkback by
> attaching a large number of block devices and inducing I/O.  Such
> problematic situations can be avoided by limiting the maximum number of
> devices that can be attached, but finding the optimal limit is not so
> easy.  Improper set of the limit can results in memory pressure or a
> resource underutilization.  This commit avoids such problematic
> situations by squeezing the pools (returns every free page in the pool
> to the system) for a while (users can set this duration via a module
> parameter) if memory pressure is detected.
> 
> Discussions
> ===
> 
> The `blkback`'s original shrinking mechanism returns only pages in the
> pool which are not currently be used by `blkback` to the system.  In
> other words, the pages that are not mapped with granted pages.  Because
> this commit is changing only the shrink limit but still uses the same
> freeing mechanism it does not touch pages which are currently mapping
> grants.
> 
> Once memory pressure is detected, this commit keeps the squeezing limit
> for a user-specified time duration.  The duration should be neither too
> long nor too short.  If it is too long, the squeezing incurring overhead
> can reduce the I/O performance.  If it is too short, `blkback` will not
> free enough pages to reduce the memory pressure.  This commit sets the
> value as `10 milliseconds` by default because it is a short time in
> terms of I/O while it is a long time in terms of memory operations.
> Also, as the original shrinking mechanism works for at least every 100
> milliseconds, this could be a somewhat reasonable choice.  I also tested
> other durations (refer to the below section for more details) and
> confirmed that 10 milliseconds is the one that works best with the test.
> That said, the proper duration depends on actual configurations and
> workloads.  That's why this commit allows users to set the duration as a
> module parameter.
> 
> Memory Pressure Test
> 
> 
> To show how this commit fixes the memory pressure situation well, I
> configured a test environment on a xen-running virtualization system.
> On the `blkfront` running guest instances, I attach a large number of
> network-backed volume devices and induce I/O to those.  Meanwhile, I
> measure the number of pages that swapped in (pswpin) and out (pswpout)
> on the `blkback` running guest.  The test ran twice, once for the
> `blkback` before this commit and once for that after this commit.  As
> shown below, this commit has dramatically reduced the memory pressure:
> 
> pswpin  pswpout
> before  76,672  185,799
> after  8673,967
> 
> Optimal Aggressive Shrinking Duration
> -
> 
> To find a best squeezing duration, I repeated the test with three
> different durations (1ms, 10ms, and 100ms).  The results are as below:
> 
> durationpswpin  pswpout
> 1   707 5,095
> 10  867 3,967
> 100 362 3,348
> 
> As expected, the memory pressure decreases as the duration increases,
> but the reduction become slow from the `10ms`.  Based on this results, I
> chose the default duration as 10ms.
> 
> Performance Overhead Test
> =
> 
> This commit could incur I/O performance degradation under severe memory
> pressure because the squeezing will require more page allocations per
> I/O.  To show the overhead, I artificially made a worst-case squeezing
> situation and measured the I/O performance of a `blkfront` running
> guest.
> 
> For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> test, I set the value to `1024` and `0`.  The `1024` is the default
> value.  Setting the value as `0` is same to a situation doing the
> squeezing always (worst-case).
> 
> If the underlying block device is slow enough, the squeezing overhead
> could be hidden.  For the reason, I use a fast block device, namely the
> rbd[1]:
> 
> # xl block-attach 

[Xen-devel] [PATCH v13 5/5] xen/blkback: Consistently insert one empty line between functions

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 24172c180f5f..c7f820db190a 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -846,7 +844,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

[Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and then introduce a lock for race
condition avoidance (patch 2).  After that, patch 3 applies the callback
mechanism to mitigate the problem in 'xen-blkback'.  The fourth and
fifth patches are trivial cleanups; those fix nits we found during the
development of this patchset.

Note that patches 1, 4, and 5 are not changed since v9.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13


Patch History
-

Changes from v12
(https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/)
 - Do not unnecessarily disable interrupts (suggested by Juergen)
 - Hold lock from xenbus side (suggested by Juergen)

Changes from v11
(https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/)
 - Fix wrong trylock use (reported by Juergen)
 - Merge patch 3 and 4 (suggested by Juergen)
 - Update test result

Changes from v10
(https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
 - Fix race condition (reported by SeongJae, suggested by Juergen)

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch


SeongJae Park (5):
  xenbus/backend: Add memory pressure handler callback
  xenbus/backend: Protect xenbus callback with lock
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  |

[Xen-devel] [PATCH v13 1/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v13 4/5] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

[Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  8673,967

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   707 5,095
10  867 3,967
100 362 3,348

As expected, the memory pressure decreases as the duration increases,
but the reduction become slow from the `10ms`.  Based on this results, I
chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing

[Xen-devel] [PATCH v13 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A driver's 'reclaim_memory' callback can race with 'probe' or 'remove'
because it will be called whenever memory pressure is detected.  To
avoid such race, this commit embeds a spinlock in each 'xenbus_device'
and make 'xenbus' to hold the lock while the corresponded callbacks are
running.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe.c |  8 +++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
 include/xen/xenbus.h  |  1 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..9ed556ba4fd4 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -232,7 +232,9 @@ int xenbus_dev_probe(struct device *_dev)
return err;
}
 
+   spin_lock(>reclaim_lock);
err = drv->probe(dev, id);
+   spin_unlock(>reclaim_lock);
if (err)
goto fail;
 
@@ -260,8 +262,11 @@ int xenbus_dev_remove(struct device *_dev)
 
free_otherend_watch(dev);
 
-   if (drv->remove)
+   if (drv->remove) {
+   spin_lock(>reclaim_lock);
drv->remove(dev);
+   spin_unlock(>reclaim_lock);
+   }
 
free_otherend_details(dev);
 
@@ -472,6 +477,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
goto fail;
 
dev_set_name(>dev, "%s", devname);
+   spin_lock_init(>reclaim_lock);
 
/* Register with generic device framework. */
err = device_register(>dev);
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 7e78ebef7c54..bc61372e00a1 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
 static int backend_reclaim_memory(struct device *dev, void *data)
 {
const struct xenbus_driver *drv;
+   struct xenbus_device *xdev;
 
if (!dev->driver)
return 0;
drv = to_xenbus_driver(dev->driver);
-   if (drv && drv->reclaim_memory)
-   drv->reclaim_memory(to_xenbus_device(dev));
+   if (drv && drv->reclaim_memory) {
+   xdev = to_xenbus_device(dev);
+   if (!spin_trylock(>reclaim_lock))
+   return 0;
+   drv->reclaim_memory(xdev);
+   spin_unlock(>reclaim_lock);
+   }
return 0;
 }
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c861cfb6f720..45cd61cb6e86 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -76,6 +76,7 @@ struct xenbus_device {
enum xenbus_state state;
struct completion down;
struct work_struct work;
+   spinlock_t reclaim_lock;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
On Wed, 18 Dec 2019 16:11:51 +0100 "Jürgen Groß"  wrote:

> On 18.12.19 15:40, SeongJae Park wrote:
> > On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 18.12.19 13:42, SeongJae Park wrote:
> >>> On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:
> >>>
> >>>> On 18.12.19 11:42, SeongJae Park wrote:
> >>>>> From: SeongJae Park 
> >>>>>
> >>>>> 'reclaim_memory' callback can race with a driver code as this callback
> >>>>> will be called from any memory pressure detected context.  To deal with
> >>>>> the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> >>>>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>>>> to the callback as its argument is locked.  Thus, drivers registering
> >>>>> their 'reclaim_memory' callback should protect the data that might race
> >>>>> with the callback with the lock by themselves.
> >>>>
> >>>> Any reason you don't take the lock around the .probe() and .remove()
> >>>> calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
> >>>> would eliminate the need to do that in each backend instead.
> >>>
> >>> First of all, I would like to keep the critical section as small as 
> >>> possible.
> >>> With my small test, I could see slightly increasing memory pressure as the
> >>> critical section becomes wider.  Also, some drivers might share the data 
> >>> their
> >>> 'reclaim_memory' callback touches with other functions.  I think only the
> >>> driver owners can know what data is shared and what is the minimum 
> >>> critical
> >>> section to protect it.
> >>
> >> But this kind of serialization can still be added on top.
> > 
> > I'm still worrying about the unnecessarily large critical section, but it 
> > might
> > be small enough to be ignored.  If no others have strong objection, I will 
> > take
> > the lock around the '->probe()' and '->remove()'.
> 
> The lock is per device, so contention is possible only for the
> reclaim case. In case probe or remove are running reclaim will have
> nothing to free (in probe case nothing is allocated yet, in remove
> case everything should be freed anyway). So the larger critical section
> is no problem at all IMO.

Agreed.  I think I was worried about nothing really existing now.

> 
> >> And with the trylock in the reclaim path I believe you can even avoid
> >> the irq variants of the spinlock. But I might be wrong, so you should
> >> try that with lockdep enabled. If it is working there is no harm done
> >> when making the critical section larger, as memory allocations will
> >> work as before.
> > 
> > Yes, you're right.  I will try test with lockdep.
> 
> Thanks,

Good news, lockdep says it's okay :)

Will post next version soon.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
On Wed, 18 Dec 2019 14:30:44 +0100 "Jürgen Groß"  wrote:

> On 18.12.19 13:42, SeongJae Park wrote:
> > On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 18.12.19 11:42, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> 'reclaim_memory' callback can race with a driver code as this callback
> >>> will be called from any memory pressure detected context.  To deal with
> >>> the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> >>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>> to the callback as its argument is locked.  Thus, drivers registering
> >>> their 'reclaim_memory' callback should protect the data that might race
> >>> with the callback with the lock by themselves.
> >>
> >> Any reason you don't take the lock around the .probe() and .remove()
> >> calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
> >> would eliminate the need to do that in each backend instead.
> > 
> > First of all, I would like to keep the critical section as small as 
> > possible.
> > With my small test, I could see slightly increasing memory pressure as the
> > critical section becomes wider.  Also, some drivers might share the data 
> > their
> > 'reclaim_memory' callback touches with other functions.  I think only the
> > driver owners can know what data is shared and what is the minimum critical
> > section to protect it.
> 
> But this kind of serialization can still be added on top.

I'm still worrying about the unnecessarily large critical section, but it might
be small enough to be ignored.  If no others have strong objection, I will take
the lock around the '->probe()' and '->remove()'.

> 
> And with the trylock in the reclaim path I believe you can even avoid
> the irq variants of the spinlock. But I might be wrong, so you should
> try that with lockdep enabled. If it is working there is no harm done
> when making the critical section larger, as memory allocations will
> work as before.

Yes, you're right.  I will try test with lockdep.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
On Wed, 18 Dec 2019 13:27:37 +0100 "Jürgen Groß"  wrote:

> On 18.12.19 11:42, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > 'reclaim_memory' callback can race with a driver code as this callback
> > will be called from any memory pressure detected context.  To deal with
> > the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> > 'reclaim_memory' callback is called, the lock of the device which passed
> > to the callback as its argument is locked.  Thus, drivers registering
> > their 'reclaim_memory' callback should protect the data that might race
> > with the callback with the lock by themselves.
> 
> Any reason you don't take the lock around the .probe() and .remove()
> calls of the backend (xenbus_dev_probe() and xenbus_dev_remove())? This
> would eliminate the need to do that in each backend instead.

First of all, I would like to keep the critical section as small as possible.
With my small test, I could see slightly increasing memory pressure as the
critical section becomes wider.  Also, some drivers might share the data their
'reclaim_memory' callback touches with other functions.  I think only the
driver owners can know what data is shared and what is the minimum critical
section to protect it.

If you think differently or I am missing something, please let me know.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 5/5] xen/blkback: Consistently insert one empty line between functions

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 20045827a391..453f97dd533d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -575,6 +575,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -663,7 +664,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -755,7 +755,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -830,7 +829,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -855,7 +853,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 4/5] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

[Xen-devel] [PATCH v12 2/5] xenbus/backend: Protect xenbus callback with lock

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

'reclaim_memory' callback can race with a driver code as this callback
will be called from any memory pressure detected context.  To deal with
the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
'reclaim_memory' callback is called, the lock of the device which passed
to the callback as its argument is locked.  Thus, drivers registering
their 'reclaim_memory' callback should protect the data that might race
with the callback with the lock by themselves.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe.c |  1 +
 drivers/xen/xenbus/xenbus_probe_backend.c | 11 +--
 include/xen/xenbus.h  |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..b86393f172e6 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
goto fail;
 
dev_set_name(>dev, "%s", devname);
+   spin_lock_init(>reclaim_lock);
 
/* Register with generic device framework. */
err = device_register(>dev);
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 7e78ebef7c54..e862cb932cc4 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -251,12 +251,19 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
 static int backend_reclaim_memory(struct device *dev, void *data)
 {
const struct xenbus_driver *drv;
+   struct xenbus_device *xdev;
+   unsigned long flags;
 
if (!dev->driver)
return 0;
drv = to_xenbus_driver(dev->driver);
-   if (drv && drv->reclaim_memory)
-   drv->reclaim_memory(to_xenbus_device(dev));
+   if (drv && drv->reclaim_memory) {
+   xdev = to_xenbus_device(dev);
+   if (!spin_trylock_irqsave(>reclaim_lock, flags))
+   return 0;
+   drv->reclaim_memory(xdev);
+   spin_unlock_irqrestore(>reclaim_lock, flags);
+   }
return 0;
 }
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c861cfb6f720..d9468313061d 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -76,6 +76,8 @@ struct xenbus_device {
enum xenbus_state state;
struct completion down;
struct work_struct work;
+   /* 'reclaim_memory' callback is called while this lock is acquired */
+   spinlock_t reclaim_lock;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  8673,967

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   707 5,095
10  867 3,967
100 362 3,348

As expected, the memory pressure decreases as the duration increases,
but the reduction become slow from the `10ms`.  Based on this results, I
chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing

[Xen-devel] [PATCH v12 1/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v12 0/5] xenbus/backend: Add memory pressure handler callback

2019-12-18 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and then introduce a lock for race
condition avoidance (patch 2).  After that, patch 3 applies the callback
mechanism to mitigate the problem in 'xen-blkback'.  The fourth and
fifth patches are trivial cleanups; those fix nits we found during the
development of this patchset.

Note that patch 1, 4, and 5 are not changed since v9.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v12


Patch History
-

Changes from v11
(https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/)
 - Fix wrong trylock use (reported by Juergen)
 - Merge patch 3 and 4 (suggested by Juergen)
 - Update test result

Changes from v10
(https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
 - Fix race condition (reported by SeongJae, suggested by Juergen)

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch


SeongJae Park (5):
  xenbus/backend: Add memory pressure handler callback
  xenbus/backend: Protect xenbus callback with lock
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   | 42 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 37 +---
 drivers/

Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 18:10:19 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 17:24, SeongJae Park wrote:
> > On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 17.12.19 17:07, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> 'reclaim_memory' callback can race with a driver code as this callback
> >>> will be called from any memory pressure detected context.  To deal with
> >>> the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> >>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>> to the callback as its argument is locked.  Thus, drivers registering
> >>> their 'reclaim_memory' callback should protect the data that might race
> >>> with the callback with the lock by themselves.
> >>>
> >>> Signed-off-by: SeongJae Park 
> >>> ---
> >>>drivers/xen/xenbus/xenbus_probe.c |  1 +
> >>>drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
> >>>include/xen/xenbus.h  |  2 ++
> >>>3 files changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> >>> b/drivers/xen/xenbus/xenbus_probe.c
> >>> index 5b471889d723..b86393f172e6 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe.c
> >>> @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
> >>>   goto fail;
> >>>
> >>>   dev_set_name(>dev, "%s", devname);
> >>> + spin_lock_init(>reclaim_lock);
> >>>
> >>>   /* Register with generic device framework. */
> >>>   err = device_register(>dev);
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> >>> b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> index 7e78ebef7c54..516aa64b9967 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct 
> >>> notifier_block *notifier,
> >>>static int backend_reclaim_memory(struct device *dev, void *data)
> >>>{
> >>>   const struct xenbus_driver *drv;
> >>> + struct xenbus_device *xdev;
> >>> + unsigned long flags;
> >>>
> >>>   if (!dev->driver)
> >>>   return 0;
> >>>   drv = to_xenbus_driver(dev->driver);
> >>> - if (drv && drv->reclaim_memory)
> >>> - drv->reclaim_memory(to_xenbus_device(dev));
> >>> + if (drv && drv->reclaim_memory) {
> >>> + xdev = to_xenbus_device(dev);
> >>> + spin_trylock_irqsave(>reclaim_lock, flags);
> >>
> >> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine,
> >> too? I can't see a reason why you'd want to disable irqs here.
> > 
> > I needed to diable irq here as this is called from the memory shrinker 
> > context.
> 
> Okay.
> 
> > 
> > Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver
> > might include memory allocation.  And the xen-blkback actually does.  If the
> > allocation shows a memory pressure during the allocation, it will trigger 
> > this
> > shrinker callback again and then deadlock.
> 
> In that case you need to either return when you didn't get the lock or

Yes, it should.  Cannot believe how I posted this code.  Seems I made some
terrible mistake while formatting patches.  Anyway, will return if fail to
acquire the lock, in the next version.


Thanks,
SeongJae Park

> 
> - when obtaining the lock during probe() and remove() set a variable
>containing the current cpu number
> - and reset that to e.g NR_CPUS before releasing the lock again
> - in the shrinker callback do trylock, and if you didn't get the lock
>test whether the cpu-variable above is set to your current cpu and
>continue only if yes; if not, redo the the trylock
> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

I though it would be better to review separated patches, but seems it
was my mistake.  As Juergen asked, merged them again and post here.
Also, dropped Roger's reviewed-by.


Thanks,
SeongJae Park


 >8 ---
Subject: [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is
 detected

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The resu

Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 17:07, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > 'reclaim_memory' callback can race with a driver code as this callback
> > will be called from any memory pressure detected context.  To deal with
> > the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> > 'reclaim_memory' callback is called, the lock of the device which passed
> > to the callback as its argument is locked.  Thus, drivers registering
> > their 'reclaim_memory' callback should protect the data that might race
> > with the callback with the lock by themselves.
> > 
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe.c |  1 +
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
> >   include/xen/xenbus.h  |  2 ++
> >   3 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> > b/drivers/xen/xenbus/xenbus_probe.c
> > index 5b471889d723..b86393f172e6 100644
> > --- a/drivers/xen/xenbus/xenbus_probe.c
> > +++ b/drivers/xen/xenbus/xenbus_probe.c
> > @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
> > goto fail;
> >   
> > dev_set_name(>dev, "%s", devname);
> > +   spin_lock_init(>reclaim_lock);
> >   
> > /* Register with generic device framework. */
> > err = device_register(>dev);
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index 7e78ebef7c54..516aa64b9967 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   static int backend_reclaim_memory(struct device *dev, void *data)
> >   {
> > const struct xenbus_driver *drv;
> > +   struct xenbus_device *xdev;
> > +   unsigned long flags;
> >   
> > if (!dev->driver)
> > return 0;
> > drv = to_xenbus_driver(dev->driver);
> > -   if (drv && drv->reclaim_memory)
> > -   drv->reclaim_memory(to_xenbus_device(dev));
> > +   if (drv && drv->reclaim_memory) {
> > +   xdev = to_xenbus_device(dev);
> > +   spin_trylock_irqsave(>reclaim_lock, flags);
> 
> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine,
> too? I can't see a reason why you'd want to disable irqs here.

I needed to diable irq here as this is called from the memory shrinker context.

Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver
might include memory allocation.  And the xen-blkback actually does.  If the
allocation shows a memory pressure during the allocation, it will trigger this
shrinker callback again and then deadlock.


Thanks,
SeongJae Park

> 
> > +   drv->reclaim_memory(xdev);
> > +   spin_unlock_irqrestore(>reclaim_lock, flags);
> > +   }
> > return 0;
> >   }
> >   
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index c861cfb6f720..d9468313061d 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -76,6 +76,8 @@ struct xenbus_device {
> > enum xenbus_state state;
> > struct completion down;
> > struct work_struct work;
> > +   /* 'reclaim_memory' callback is called while this lock is acquired */
> > +   spinlock_t reclaim_lock;
> >   };
> >   
> >   static inline struct xenbus_device *to_xenbus_device(struct device *dev)
> > 
> 
> 
> Juergen
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 09:30:32 +0100 SeongJae Park  wrote:

> On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß"  wrote:
> 
> > On 17.12.19 08:59, SeongJae Park wrote:
> > > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:
> > > 
> > >> On 16.12.19 20:48, SeongJae Park wrote:
> > >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > >>>
> > >>>> On 16.12.19 17:15, SeongJae Park wrote:
> > >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> > >>>>> wrote:
> > >>>>>
> > >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> From: SeongJae Park 
> > >>>>>>>
> > >>>>> [...]
> > >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
> > >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> > >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> > >>>>>>> xenbus_device *dev,
> > >>>>>>> }
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for 
> > >>>>>>> a while. */
> > >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> > >>>>>>> +module_param_named(buffer_squeeze_duration_ms,
> > >>>>>>> +   buffer_squeeze_duration_ms, int, 0644);
> > >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > >>>>>>> detected");
> > >>>>>>> +
> > >>>>>>> +/*
> > >>>>>>> + * Callback received when the memory pressure is detected.
> > >>>>>>> + */
> > >>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
> > >>>>>>> +{
> > >>>>>>> +   struct backend_info *be = dev_get_drvdata(>dev);
> > >>>>>>> +
> > >>>>>>> +   be->blkif->buffer_squeeze_end = jiffies +
> > >>>>>>> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> > >>>>>>
> > >>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> > >>>>>> result in
> > >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after 
> > >>>>>> it links
> > >>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> > >>>>>>
> > >>>>>> I will do more test and share results.  Meanwhile, if you have any 
> > >>>>>> opinion,
> > >>>>>> please let me know.
> > >>>
> > >>> I reduced system memory and attached bunch of devices in short time so 
> > >>> that
> > >>> memory pressure occurs while device attachments are ongoing.  Under this
> > >>> circumstance, I was able to see the race.
> > >>>
> > >>>>>
> > >>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > >>>>> concurrency issues could be in other drivers in their way, I suggest 
> > >>>>> to change
> > >>>>> the reclaim callback ('->reclaim_memory') to be called for each 
> > >>>>> driver instead
> > >>>>> of each device.  Then, each driver could be able to deal with its 
> > >>>>> concurrency
> > >>>>> issues by itself.
> > >>>>
> > >>>> Hmm, I don't like that. This would need to be changed back in case we
> > >>>> add per-guest quota.
> > >>>
> > >>> Extending this callback in that way would be still not too hard.  We 
> > >>> could use
> > >>> the argument to the callback.  I would keep the argument of the 
> > >>> callback to
> > >>> 'struct device *' as is, and will add a comment saying 'NULL' value of 
> > >>> the
> > >>> argumen

[Xen-devel] [PATCH v11 6/6] xen/blkback: Consistently insert one empty line between functions

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 20045827a391..453f97dd533d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -575,6 +575,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -663,7 +664,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -755,7 +755,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -830,7 +829,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -855,7 +853,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 5/6] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

[Xen-devel] [PATCH v11 4/6] xen/blkback: Protect 'reclaim_memory()' with 'reclaim_lock'

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

The 'reclaim_memory()' callback of blkback could race with
'xen_blkbk_probe()' and 'xen_blkbk_remove()'.  In the case, incompletely
linked 'backend_info' and 'blkif' might be exposed to the callback, thus
result in bad results including NULL dereference.  This commit fixes the
problem by applying the 'reclaim_lock' protection to those.

Note that this commit is separated for review purpose only.  As the
previous commit might result in race condition and might make bisect
confuse, please squash this commit into previous commit if possible.

Signed-off-by: SeongJae Park 

---
 drivers/block/xen-blkback/xenbus.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..20045827a391 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -492,6 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
+   unsigned long flags;
 
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -504,6 +505,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
be->backend_watch.node = NULL;
}
 
+   spin_lock_irqsave(>reclaim_lock, flags);
dev_set_drvdata(>dev, NULL);
 
if (be->blkif) {
@@ -512,6 +514,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
/* Put the reference we set in xen_blkif_alloc(). */
xen_blkif_put(be->blkif);
}
+   spin_unlock_irqrestore(>reclaim_lock, flags);
 
return 0;
 }
@@ -597,6 +600,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
int err;
struct backend_info *be = kzalloc(sizeof(struct backend_info),
  GFP_KERNEL);
+   unsigned long flags;
 
/* match the pr_debug in xen_blkbk_remove */
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
@@ -607,6 +611,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return -ENOMEM;
}
be->dev = dev;
+   spin_lock_irqsave(>reclaim_lock, flags);
dev_set_drvdata(>dev, be);
 
be->blkif = xen_blkif_alloc(dev->otherend_id);
@@ -614,8 +619,10 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
err = PTR_ERR(be->blkif);
be->blkif = NULL;
xenbus_dev_fatal(dev, err, "creating block interface");
+   spin_unlock_irqrestore(>reclaim_lock, flags);
goto fail;
}
+   spin_unlock_irqrestore(>reclaim_lock, flags);
 
err = xenbus_printf(XBT_NIL, dev->nodename,
"feature-max-indirect-segments", "%u",
@@ -838,6 +845,10 @@ static void reclaim_memory(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
 
+   /* Device is registered but not probed yet */
+   if (!be)
+   return;
+
be->blkif->buffer_squeeze_end = jiffies +
msecs_to_jiffies(buffer_squeeze_duration_ms);
 }
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 3/6] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing

[Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

'reclaim_memory' callback can race with a driver code as this callback
will be called from any memory pressure detected context.  To deal with
the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
'reclaim_memory' callback is called, the lock of the device which passed
to the callback as its argument is locked.  Thus, drivers registering
their 'reclaim_memory' callback should protect the data that might race
with the callback with the lock by themselves.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe.c |  1 +
 drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
 include/xen/xenbus.h  |  2 ++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..b86393f172e6 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
goto fail;
 
dev_set_name(>dev, "%s", devname);
+   spin_lock_init(>reclaim_lock);
 
/* Register with generic device framework. */
err = device_register(>dev);
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 7e78ebef7c54..516aa64b9967 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
 static int backend_reclaim_memory(struct device *dev, void *data)
 {
const struct xenbus_driver *drv;
+   struct xenbus_device *xdev;
+   unsigned long flags;
 
if (!dev->driver)
return 0;
drv = to_xenbus_driver(dev->driver);
-   if (drv && drv->reclaim_memory)
-   drv->reclaim_memory(to_xenbus_device(dev));
+   if (drv && drv->reclaim_memory) {
+   xdev = to_xenbus_device(dev);
+   spin_trylock_irqsave(>reclaim_lock, flags);
+   drv->reclaim_memory(xdev);
+   spin_unlock_irqrestore(>reclaim_lock, flags);
+   }
return 0;
 }
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c861cfb6f720..d9468313061d 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -76,6 +76,8 @@ struct xenbus_device {
enum xenbus_state state;
struct completion down;
struct work_struct work;
+   /* 'reclaim_memory' callback is called while this lock is acquired */
+   spinlock_t reclaim_lock;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 0/6] xenbus/backend: Add a memory pressure handler callback

2019-12-17 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and then introduce a lock for race
condition avoidance (patch 2).  Those two patches could be merged into
one patch if necessary.

The third patch applies the callback mechanism to mitigate the problem
in 'xen-blkback' (patch 3), but it lacks use of the race condition
mitigation.  Following change (patch 4) applies the race protection
mechanism to the blkback.  Patch 3 and patch 4 has seperated for only
review convenience.  Highly recommend to merge those into one patch as
patch 3 applied version might confuse bisecting.

The fifth and sixth patches are trivial cleanups; those fix nits we
found during the development of this patchset.

Note that patch 1, 3, 5, 6 are same with previous version.  I made the
changes in this version to different commits (only second and fourth
patches) to make review more comfortable.  Especially, the third and
fourth patches should be merged into one patch, as the third one alone
might make bisecting confuse.  Tthe next version of this patchset will
also merge those.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v11


Patch History
-

Changes from v10
(https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
 - Fix race condition (reported by SeongJae, suggested by Juergen)

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch


SeongJae Park (6):
  xenbus/backend: Add memory pressure handler callback
  xenbus/backend: Protect xenbus callb

[Xen-devel] [PATCH v11 1/6] xenbus/backend: Add memory pressure handler callback

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 12:39:15 +0100 "Roger Pau Monné"  
wrote:

> On Mon, Dec 16, 2019 at 08:48:03PM +0100, SeongJae Park wrote:
> > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > 
> > > On 16.12.19 17:15, SeongJae Park wrote:
> > > > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> > > > wrote:
> > > > 
> > > >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> > > >> wrote:
> > > >>
> > > >>> From: SeongJae Park 
> > > >>>
> > > > [...]
> > > >>> --- a/drivers/block/xen-blkback/xenbus.c
> > > >>> +++ b/drivers/block/xen-blkback/xenbus.c
> > > >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> > > >>> xenbus_device *dev,
> > > >>>   }
> > > >>>   
> > > >>>   
> > > >>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> > > >>> while. */
> > > >>> +static unsigned int buffer_squeeze_duration_ms = 10;
> > > >>> +module_param_named(buffer_squeeze_duration_ms,
> > > >>> + buffer_squeeze_duration_ms, int, 0644);
> > > >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > > >>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > > >>> detected");
> > > >>> +
> > > >>> +/*
> > > >>> + * Callback received when the memory pressure is detected.
> > > >>> + */
> > > >>> +static void reclaim_memory(struct xenbus_device *dev)
> > > >>> +{
> > > >>> + struct backend_info *be = dev_get_drvdata(>dev);
> > > >>> +
> > > >>> + be->blkif->buffer_squeeze_end = jiffies +
> > > >>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> > > >>
> > > >> This callback might race with 'xen_blkbk_probe()'.  The race could 
> > > >> result in
> > > >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> > > >> links
> > > >> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> > > >>
> > > >> I will do more test and share results.  Meanwhile, if you have any 
> > > >> opinion,
> > > >> please let me know.
> > 
> > I reduced system memory and attached bunch of devices in short time so that
> > memory pressure occurs while device attachments are ongoing.  Under this
> > circumstance, I was able to see the race.
> > 
> > > > 
> > > > Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > > > concurrency issues could be in other drivers in their way, I suggest to 
> > > > change
> > > > the reclaim callback ('->reclaim_memory') to be called for each driver 
> > > > instead
> > > > of each device.  Then, each driver could be able to deal with its 
> > > > concurrency
> > > > issues by itself.
> > > 
> > > Hmm, I don't like that. This would need to be changed back in case we
> > > add per-guest quota.
> > 
> > Extending this callback in that way would be still not too hard.  We could 
> > use
> > the argument to the callback.  I would keep the argument of the callback to
> > 'struct device *' as is, and will add a comment saying 'NULL' value of the
> > argument means every devices.  As an example, xenbus would pass NULL-ending
> > array of the device pointers that need to free its resources.
> > 
> > After seeing this race, I am now also thinking it could be better to 
> > delegate
> > detailed control of each device to its driver, as some drivers have some
> > complicated and unique relation with its devices.
> > 
> > > 
> > > Wouldn't a get_device() before calling the callback and a put_device()
> > > afterwards avoid that problem?
> > 
> > I didn't used the reference count manipulation operations because other 
> > similar
> > parts also didn't.  But, if there is no implicit reference count guarantee, 
> > it
> > seems those operations are indeed necessary.
> > 
> > That said, as get/put operations only adjust the reference count, those will
> > not make the callback to wait until the linking of the 'backend' and 
> > 'blkif' to
> > the device (xen_blkbk_probe()) is finished.  Thus, t

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 08:59, SeongJae Park wrote:
> > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 16.12.19 20:48, SeongJae Park wrote:
> >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> >>>
> >>>> On 16.12.19 17:15, SeongJae Park wrote:
> >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> >>>>> wrote:
> >>>>>
> >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> >>>>>> wrote:
> >>>>>>
> >>>>>>> From: SeongJae Park 
> >>>>>>>
> >>>>> [...]
> >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
> >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> >>>>>>> xenbus_device *dev,
> >>>>>>> }
> >>>>>>> 
> >>>>>>> 
> >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>>>>> while. */
> >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>>>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>>>>> + buffer_squeeze_duration_ms, int, 0644);
> >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>>>>> detected");
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * Callback received when the memory pressure is detected.
> >>>>>>> + */
> >>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
> >>>>>>> +{
> >>>>>>> + struct backend_info *be = dev_get_drvdata(>dev);
> >>>>>>> +
> >>>>>>> + be->blkif->buffer_squeeze_end = jiffies +
> >>>>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>>>>
> >>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> >>>>>> result in
> >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >>>>>> links
> >>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>>>>>
> >>>>>> I will do more test and share results.  Meanwhile, if you have any 
> >>>>>> opinion,
> >>>>>> please let me know.
> >>>
> >>> I reduced system memory and attached bunch of devices in short time so 
> >>> that
> >>> memory pressure occurs while device attachments are ongoing.  Under this
> >>> circumstance, I was able to see the race.
> >>>
> >>>>>
> >>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> >>>>> concurrency issues could be in other drivers in their way, I suggest to 
> >>>>> change
> >>>>> the reclaim callback ('->reclaim_memory') to be called for each driver 
> >>>>> instead
> >>>>> of each device.  Then, each driver could be able to deal with its 
> >>>>> concurrency
> >>>>> issues by itself.
> >>>>
> >>>> Hmm, I don't like that. This would need to be changed back in case we
> >>>> add per-guest quota.
> >>>
> >>> Extending this callback in that way would be still not too hard.  We 
> >>> could use
> >>> the argument to the callback.  I would keep the argument of the callback 
> >>> to
> >>> 'struct device *' as is, and will add a comment saying 'NULL' value of the
> >>> argument means every devices.  As an example, xenbus would pass 
> >>> NULL-ending
> >>> array of the device pointers that need to free its resources.
> >>>
> >>> After seeing this race, I am now also thinking it could be better to 
> >>> delegate
> >>> detailed control of each device to its driver, as some drivers have some
> >>> complicated and unique relation with its devices.
> >>&g

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:

> On 16.12.19 20:48, SeongJae Park wrote:
> > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > 
> >> On 16.12.19 17:15, SeongJae Park wrote:
> >>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> >>> wrote:
> >>>
> >>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> >>>> wrote:
> >>>>
> >>>>> From: SeongJae Park 
> >>>>>
> >>> [...]
> >>>>> --- a/drivers/block/xen-blkback/xenbus.c
> >>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device 
> >>>>> *dev,
> >>>>>}
> >>>>>
> >>>>>
> >>>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>>> while. */
> >>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>>> +   buffer_squeeze_duration_ms, int, 0644);
> >>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>>> detected");
> >>>>> +
> >>>>> +/*
> >>>>> + * Callback received when the memory pressure is detected.
> >>>>> + */
> >>>>> +static void reclaim_memory(struct xenbus_device *dev)
> >>>>> +{
> >>>>> +   struct backend_info *be = dev_get_drvdata(>dev);
> >>>>> +
> >>>>> +   be->blkif->buffer_squeeze_end = jiffies +
> >>>>> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>>
> >>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> >>>> result in
> >>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >>>> links
> >>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>>>
> >>>> I will do more test and share results.  Meanwhile, if you have any 
> >>>> opinion,
> >>>> please let me know.
> > 
> > I reduced system memory and attached bunch of devices in short time so that
> > memory pressure occurs while device attachments are ongoing.  Under this
> > circumstance, I was able to see the race.
> > 
> >>>
> >>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> >>> concurrency issues could be in other drivers in their way, I suggest to 
> >>> change
> >>> the reclaim callback ('->reclaim_memory') to be called for each driver 
> >>> instead
> >>> of each device.  Then, each driver could be able to deal with its 
> >>> concurrency
> >>> issues by itself.
> >>
> >> Hmm, I don't like that. This would need to be changed back in case we
> >> add per-guest quota.
> > 
> > Extending this callback in that way would be still not too hard.  We could 
> > use
> > the argument to the callback.  I would keep the argument of the callback to
> > 'struct device *' as is, and will add a comment saying 'NULL' value of the
> > argument means every devices.  As an example, xenbus would pass NULL-ending
> > array of the device pointers that need to free its resources.
> > 
> > After seeing this race, I am now also thinking it could be better to 
> > delegate
> > detailed control of each device to its driver, as some drivers have some
> > complicated and unique relation with its devices.
> > 
> >>
> >> Wouldn't a get_device() before calling the callback and a put_device()
> >> afterwards avoid that problem?
> > 
> > I didn't used the reference count manipulation operations because other 
> > similar
> > parts also didn't.  But, if there is no implicit reference count guarantee, 
> > it
> > seems those operations are indeed necessary.
> > 
> > That said, as get/put operations only adjust the reference count, those will
> > not make the callback to wait until the linking of the 'backend' and 
> > 'blkif' to
> > the device (xen_blkbk_probe()) is finished.  Thus, the race could still 
> > happen.
> > Or, am I missing something?
> 
> No, I think we need a xenbus lock per device which will need to be
> taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the
> callback.

I also agree that locking should be used at last.  But, as each driver manages
its devices and resources in their way, it could have its unique race
conditions.  And, each unique race condition might have its unique efficient
way to synchronize it.  Therefore, I think the synchronization should be done
by each driver, not by xenbus and thus we should make the callback to be called
per-driver.


Thanks,
SeongJae Park

> 
> 
> Juergen
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:

> On 16.12.19 17:15, SeongJae Park wrote:
> > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  wrote:
> > 
> >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:
> >>
> >>> From: SeongJae Park 
> >>>
> > [...]
> >>> --- a/drivers/block/xen-blkback/xenbus.c
> >>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device 
> >>> *dev,
> >>>   }
> >>>   
> >>>   
> >>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>> while. */
> >>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>> +module_param_named(buffer_squeeze_duration_ms,
> >>> + buffer_squeeze_duration_ms, int, 0644);
> >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>> detected");
> >>> +
> >>> +/*
> >>> + * Callback received when the memory pressure is detected.
> >>> + */
> >>> +static void reclaim_memory(struct xenbus_device *dev)
> >>> +{
> >>> + struct backend_info *be = dev_get_drvdata(>dev);
> >>> +
> >>> + be->blkif->buffer_squeeze_end = jiffies +
> >>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>
> >> This callback might race with 'xen_blkbk_probe()'.  The race could result 
> >> in
> >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >> links
> >> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>
> >> I will do more test and share results.  Meanwhile, if you have any opinion,
> >> please let me know.

I reduced system memory and attached bunch of devices in short time so that
memory pressure occurs while device attachments are ongoing.  Under this
circumstance, I was able to see the race.

> > 
> > Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > concurrency issues could be in other drivers in their way, I suggest to 
> > change
> > the reclaim callback ('->reclaim_memory') to be called for each driver 
> > instead
> > of each device.  Then, each driver could be able to deal with its 
> > concurrency
> > issues by itself.
> 
> Hmm, I don't like that. This would need to be changed back in case we
> add per-guest quota.

Extending this callback in that way would be still not too hard.  We could use
the argument to the callback.  I would keep the argument of the callback to
'struct device *' as is, and will add a comment saying 'NULL' value of the
argument means every devices.  As an example, xenbus would pass NULL-ending
array of the device pointers that need to free its resources.

After seeing this race, I am now also thinking it could be better to delegate
detailed control of each device to its driver, as some drivers have some
complicated and unique relation with its devices.

> 
> Wouldn't a get_device() before calling the callback and a put_device()
> afterwards avoid that problem?

I didn't used the reference count manipulation operations because other similar
parts also didn't.  But, if there is no implicit reference count guarantee, it
seems those operations are indeed necessary.

That said, as get/put operations only adjust the reference count, those will
not make the callback to wait until the linking of the 'backend' and 'blkif' to
the device (xen_blkbk_probe()) is finished.  Thus, the race could still happen.
Or, am I missing something?

I also modified the code to do 'get_device()' and 'put_device()' as you
suggested and did test, but the race was still reproducible.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  wrote:

> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:
> 
> > From: SeongJae Park 
> > 
[...]
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
> >  }
> >  
> >  
> > +/* Once a memory pressure is detected, squeeze free page pools for a 
> > while. */
> > +static unsigned int buffer_squeeze_duration_ms = 10;
> > +module_param_named(buffer_squeeze_duration_ms,
> > +   buffer_squeeze_duration_ms, int, 0644);
> > +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > detected");
> > +
> > +/*
> > + * Callback received when the memory pressure is detected.
> > + */
> > +static void reclaim_memory(struct xenbus_device *dev)
> > +{
> > +   struct backend_info *be = dev_get_drvdata(>dev);
> > +
> > +   be->blkif->buffer_squeeze_end = jiffies +
> > +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> 
> This callback might race with 'xen_blkbk_probe()'.  The race could result in
> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links
> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> 
> I will do more test and share results.  Meanwhile, if you have any opinion,
> please let me know.

Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
concurrency issues could be in other drivers in their way, I suggest to change
the reclaim callback ('->reclaim_memory') to be called for each driver instead
of each device.  Then, each driver could be able to deal with its concurrency
issues by itself.

For blkback, we could reuse the global variable based approach, as similar to
the v7[1] of this patchset.  As the callback is called for each driver instead
of each device now, the duplicated set of the timeout will not happen.


Thanks,
SeongJae Park

[1] https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/

> 
> 
> Thanks,
> SeongJae Park
> 
> > +}
> > +
> >  /* ** Connection ** */
> >  
> >  
> > @@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = {
> > .ids  = xen_blkbk_ids,
> > .probe = xen_blkbk_probe,
> > .remove = xen_blkbk_remove,
> > -   .otherend_changed = frontend_changed
> > +   .otherend_changed = frontend_changed,
> > +   .reclaim_memory = reclaim_memory,
> >  };
> >  
> >  int xen_blkif_xenbus_init(void)
> > -- 
> > 2.17.1
> > 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and is increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, host administrators can cause memory pressure in blkback by
> attaching a large number of block devices and inducing I/O.  Such
> problematic situations can be avoided by limiting the maximum number of
> devices that can be attached, but finding the optimal limit is not so
> easy.  Improper set of the limit can results in memory pressure or a
> resource underutilization.  This commit avoids such problematic
> situations by squeezing the pools (returns every free page in the pool
> to the system) for a while (users can set this duration via a module
> parameter) if memory pressure is detected.
> 
> Discussions
> ===
> 
> The `blkback`'s original shrinking mechanism returns only pages in the
> pool which are not currently be used by `blkback` to the system.  In
> other words, the pages that are not mapped with granted pages.  Because
> this commit is changing only the shrink limit but still uses the same
> freeing mechanism it does not touch pages which are currently mapping
> grants.
> 
> Once memory pressure is detected, this commit keeps the squeezing limit
> for a user-specified time duration.  The duration should be neither too
> long nor too short.  If it is too long, the squeezing incurring overhead
> can reduce the I/O performance.  If it is too short, `blkback` will not
> free enough pages to reduce the memory pressure.  This commit sets the
> value as `10 milliseconds` by default because it is a short time in
> terms of I/O while it is a long time in terms of memory operations.
> Also, as the original shrinking mechanism works for at least every 100
> milliseconds, this could be a somewhat reasonable choice.  I also tested
> other durations (refer to the below section for more details) and
> confirmed that 10 milliseconds is the one that works best with the test.
> That said, the proper duration depends on actual configurations and
> workloads.  That's why this commit allows users to set the duration as a
> module parameter.
> 
> Memory Pressure Test
> 
> 
> To show how this commit fixes the memory pressure situation well, I
> configured a test environment on a xen-running virtualization system.
> On the `blkfront` running guest instances, I attach a large number of
> network-backed volume devices and induce I/O to those.  Meanwhile, I
> measure the number of pages that swapped in (pswpin) and out (pswpout)
> on the `blkback` running guest.  The test ran twice, once for the
> `blkback` before this commit and once for that after this commit.  As
> shown below, this commit has dramatically reduced the memory pressure:
> 
> pswpin  pswpout
> before  76,672  185,799
> after  2123,325
> 
> Optimal Aggressive Shrinking Duration
> -
> 
> To find a best squeezing duration, I repeated the test with three
> different durations (1ms, 10ms, and 100ms).  The results are as below:
> 
> durationpswpin  pswpout
> 1   852 6,424
> 10  212 3,325
> 100 203 3,340
> 
> As expected, the memory pressure has decreased as the duration is
> increased, but the reduction stopped from the `10ms`.  Based on this
> results, I chose the default duration as 10ms.
> 
> Performance Overhead Test
> =
> 
> This commit could incur I/O performance degradation under severe memory
> pressure because the squeezing will require more page allocations per
> I/O.  To show the overhead, I artificially made a worst-case squeezing
> situation and measured the I/O performance of a `blkfront` running
> guest.
> 
> For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> test, I set the value to `1024` and `0`.  The `1024` is the default
> value.  Setting the value as `0` is same to a situation doing the
> squeezing always (worst-case).
> 
> If the underlying block device is slow enough, the squeezing overhead
> could be hidden.  For the reason, I use a fast block device, namely the
> rbd[1]:
> 
> # xl block-attach guest phy:/dev/ram0 xvdb w
> 
> For the I/O performance measurement, I run a simple `dd` command 5 times
> directly to

[Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing

[Xen-devel] [PATCH v10 4/4] xen/blkback: Consistently insert one empty line between functions

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..dc0ea123c74c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 3/4] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

[Xen-devel] [PATCH v10 1/4] xenbus/backend: Add memory pressure handler callback

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 0/4] xenbus/backend: Add a memory pressure handler callback

2019-12-16 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third and fourth patches are trivial
cleanups.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v10


Patch History
-

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (4):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   | 42 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 26 +---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++
 include/xen/xenbus.h  |  1 +
 6 files changed, 86 insertions(+), 26 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 10:37:55 +0100 "Roger Pau Monné"  
wrote:

> On Fri, Dec 13, 2019 at 03:35:44PM +0000, SeongJae Park wrote:
> > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > the pool starts from zero and is increased on demand while processing
> > the I/O requests.  If current I/O requests handling is finished or 100
> > milliseconds has passed since last I/O requests handling, it checks and
> > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> > 
> > Therefore, host administrators can cause memory pressure in blkback by
> > attaching a large number of block devices and inducing I/O.  Such
> > problematic situations can be avoided by limiting the maximum number of
> > devices that can be attached, but finding the optimal limit is not so
> > easy.  Improper set of the limit can results in memory pressure or a
> > resource underutilization.  This commit avoids such problematic
> > situations by squeezing the pools (returns every free page in the pool
> > to the system) for a while (users can set this duration via a module
> > parameter) if memory pressure is detected.
> > 
> > Discussions
> > ===
> > 
> > The `blkback`'s original shrinking mechanism returns only pages in the
> > pool which are not currently be used by `blkback` to the system.  In
> > other words, the pages that are not mapped with granted pages.  Because
> > this commit is changing only the shrink limit but still uses the same
> > freeing mechanism it does not touch pages which are currently mapping
> > grants.
> > 
> > Once memory pressure is detected, this commit keeps the squeezing limit
> > for a user-specified time duration.  The duration should be neither too
> > long nor too short.  If it is too long, the squeezing incurring overhead
> > can reduce the I/O performance.  If it is too short, `blkback` will not
> > free enough pages to reduce the memory pressure.  This commit sets the
> > value as `10 milliseconds` by default because it is a short time in
> > terms of I/O while it is a long time in terms of memory operations.
> > Also, as the original shrinking mechanism works for at least every 100
> > milliseconds, this could be a somewhat reasonable choice.  I also tested
> > other durations (refer to the below section for more details) and
> > confirmed that 10 milliseconds is the one that works best with the test.
> > That said, the proper duration depends on actual configurations and
> > workloads.  That's why this commit allows users to set the duration as a
> > module parameter.
> > 
> > Memory Pressure Test
> > 
> > 
> > To show how this commit fixes the memory pressure situation well, I
> > configured a test environment on a xen-running virtualization system.
> > On the `blkfront` running guest instances, I attach a large number of
> > network-backed volume devices and induce I/O to those.  Meanwhile, I
> > measure the number of pages that swapped in (pswpin) and out (pswpout)
> > on the `blkback` running guest.  The test ran twice, once for the
> > `blkback` before this commit and once for that after this commit.  As
> > shown below, this commit has dramatically reduced the memory pressure:
> > 
> > pswpin  pswpout
> > before  76,672  185,799
> > after  2123,325
> > 
> > Optimal Aggressive Shrinking Duration
> > -
> > 
> > To find a best squeezing duration, I repeated the test with three
> > different durations (1ms, 10ms, and 100ms).  The results are as below:
> > 
> > durationpswpin  pswpout
> > 1   852 6,424
> > 10  212 3,325
> > 100 203 3,340
> > 
> > As expected, the memory pressure has decreased as the duration is
> > increased, but the reduction stopped from the `10ms`.  Based on this
> > results, I chose the default duration as 10ms.
> > 
> > Performance Overhead Test
> > =
> > 
> > This commit could incur I/O performance degradation under severe memory
> > pressure because the squeezing will require more page allocations per
> > I/O.  To show the overhead, I artificially made a worst-case squeezing
> > situation and measured the I/O performance of a `blkfront` running
> > guest.
> > 
> > For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> > test, I set the value to `1024` and

[Xen-devel] [PATCH v9 1/4] xenbus/backend: Add memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 4/4] xen/blkback: Consistently insert one empty line between functions

2019-12-13 Thread SeongJae Park
The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..dc0ea123c74c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(>dev, _attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(>dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(>dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 3/4] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-13 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
  

[Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
   40.12  3.1752165
No difference proven at 95.0% confidence

On the fast block device


max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine with your workload to find the
optimal squeezing duration for you.

[1] https://aws.amazon.com/ebs/
[2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html

Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   |  7 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 21 ++-
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 4e7babb3ba1f..f01224231f3f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -25,3 +25,13 @@ Description:
 allocated without being in use. The time is in
 seconds, 0 means indefinitely long.
 The default is 60 seconds.
+
+What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
+Date:   December 2019
+KernelVersion:  5.5
+Contact:SeongJae Park 
+Description:
+When memory pressure is reported to blkback this option
+controls the duration in milliseconds that blkback will not
+cache any page not backed by a grant mapping.
+The default is 10ms.
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..79f677aeb5cc 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -656,8 +656,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, blkif->buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..536c84f61fed 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -319,6 +319,7 @@ struct xen_blkif {
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
+   unsigned long   buffer_squeeze_end;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..4f6ea4feca79 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
 }
 
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+/*
+ * Callback received when the memory pressure is detected.
+ */
+static void reclaim_memory(struct xenbus_device *dev)
+{
+   struct backend_info *be = dev_get_drvdata(>dev);
+
+   be->blkif->buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 /* ** Connection ** */
 
 
@@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = {
.ids  = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
-   .otherend_changed = frontend_changed
+   .otherend_changed = frontend_changed,
+   .reclaim_memory = reclaim_memory,
 };
 
 int xen_blkif_xenbus_init(void)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 0/4] xenbus/backend: Add a memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third patch is a trivial cleanup of
variable names.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v9


Patch History
-

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park 
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (4):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   | 42 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 26 +---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++
 include/xen/xenbus.h  |  1 +
 6 files changed, 86 insertions(+), 26 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
s `0` is same to a situation doing the
> > squeezing always (worst-case).
> > 
> > For the I/O performance measurement, I run a simple `dd` command 5 times
> > as below and collect the 'MB/s' results.
> > 
> > $ for i in {1..5}; do dd if=/dev/zero of=file \
> >  bs=4k count=$((256*512)); sync; done
> > 
> > If the underlying block device is slow enough, the squeezing overhead
> > could be hidden.  For the reason, I do this test for both a slow block
> > device and a fast block device.  I use a popular cloud block storage
> > service, ebs[1] as a slow device and the ramdisk block device[2] for the
> > fast device.
> > 
> > The results are as below.  'max_pgs' represents the value of the
> > `blkback.max_buffer_pages` parameter.
> > 
> > On the slow block device
> > 
> > 
> > max_pgs   Min   Max   Median AvgStddev
> > 0 38.7  45.8  38.7   40.12  3.1752165
> > 1024  38.7  45.8  38.7   40.12  3.1752165
> > No difference proven at 95.0% confidence
> > 
> > On the fast block device
> > 
> > 
> > max_pgs   Min   Max   Median AvgStddev
> > 0 417   423   420419.4  2.5099801
> > 1024  414   425   416417.8  4.4384682
> > No difference proven at 95.0% confidence
> > 
> > In short, even worst case squeezing on ramdisk based fast block device
> > makes no visible performance degradation.  Please note that this is just
> > a very simple and minimal test.  On systems using super-fast block
> > devices and a special I/O workload, the results might be different.  If
> > you have any doubt, test on your machine with your workload to find the
> > optimal squeezing duration for you.
> > 
> > [1] https://aws.amazon.com/ebs/
> > [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html
> > 
> > Reviewed-by: Juergen Gross 
> 
> You should likely have dropped Juergen RB, since you made some
> non-trivial changes.

Yes, I will!

> 
> > Signed-off-by: SeongJae Park 
> > ---
> >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 
> >  drivers/block/xen-blkback/blkback.c   | 22 +--
> >  drivers/block/xen-blkback/common.h|  2 ++
> >  drivers/block/xen-blkback/xenbus.c| 11 +-
> >  4 files changed, 41 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > index 4e7babb3ba1f..a74a6d513c9f 100644
> > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > @@ -25,3 +25,12 @@ Description:
> >  allocated without being in use. The time is in
> >  seconds, 0 means indefinitely long.
> >  The default is 60 seconds.
> > +
> > +What:   
> > /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
> > +Date:   December 2019
> > +KernelVersion:  5.5
> > +Contact:Roger Pau Monn� 
> 
> I think you should be the contact for this feature, you are the one
> that implemented it :).
> 
> > +Description:
> > +How long the block backend buffers release every free 
> > pages in
> > +those under memory pressure.  The time is in milliseconds.
> 
> "When memory pressure is reported to blkback this option controls the
> duration in milliseconds that blkback will not cache any page not
> backed by a grant mapping. The default is 10ms."

Great, will change!

> 
> > +The default is 10 milliseconds.
> > diff --git a/drivers/block/xen-blkback/blkback.c 
> > b/drivers/block/xen-blkback/blkback.c
> > index fd1e19f1a49f..26606c4896fd 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
> > persistent_gnt *persistent_gnt)
> > HZ * xen_blkif_pgrant_timeout);
> >  }
> >  
> > +/* Once a memory pressure is detected, squeeze free page pools for a 
> > while. */
> > +static unsigned int buffer_squeeze_duration_ms = 10;
> > +module_param_named(buffer_squeeze_duration_ms,
> > +   buffer_squeeze_duration_ms, int, 0644);
> > +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > +&

[Xen-devel] [PATCH v8 3/3] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-13 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 26606c4896fd..85ff629a7546 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
@@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
continue;
 

[Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
   40.12  3.1752165
No difference proven at 95.0% confidence

On the fast block device


max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine with your workload to find the
optimal squeezing duration for you.

[1] https://aws.amazon.com/ebs/
[2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 drivers/block/xen-blkback/blkback.c   | 22 +--
 drivers/block/xen-blkback/common.h|  2 ++
 drivers/block/xen-blkback/xenbus.c| 11 +-
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 4e7babb3ba1f..a74a6d513c9f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -25,3 +25,12 @@ Description:
 allocated without being in use. The time is in
 seconds, 0 means indefinitely long.
 The default is 60 seconds.
+
+What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
+Date:   December 2019
+KernelVersion:  5.5
+Contact:Roger Pau Monné 
+Description:
+How long the block backend buffers release every free pages in
+those under memory pressure.  The time is in milliseconds.
+The default is 10 milliseconds.
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..26606c4896fd 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
 }
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long buffer_squeeze_end;
+
+void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif)
+{
+   blkif->buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
@@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..ba653126177d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -319,6 +319,7 @@ struct xen_blkif {
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
+   unsigned long   buffer_squeeze_end;
 };
 
 struct seg_buf {
@@ -383,6 +384,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
  struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..09fe6cb5c4ea 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -824,6 +824,14 @@ static void frontend_changed(struct xenbu

[Xen-devel] [PATCH v8 1/3] xenbus/backend: Add memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(_notifier);
 
+   if (register_shrinker(_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
On Fri, Dec 13, 2019 at 10:33 AM Jürgen Groß  wrote:
>
> On 13.12.19 10:27, Roger Pau Monné wrote:
> > On Thu, Dec 12, 2019 at 05:06:58PM +0100, SeongJae Park wrote:
> >> On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné" 
> >>  wrote:
> >>
> >>>> diff --git a/drivers/block/xen-blkback/blkback.c 
> >>>> b/drivers/block/xen-blkback/blkback.c
> >>>> index fd1e19f1a49f..98823d150905 100644
> >>>> --- a/drivers/block/xen-blkback/blkback.c
> >>>> +++ b/drivers/block/xen-blkback/blkback.c
> >>>> @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
> >>>> persistent_gnt *persistent_gnt)
> >>>>HZ * xen_blkif_pgrant_timeout);
> >>>>   }
> >>>>
> >>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>> while. */
> >>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>> +  buffer_squeeze_duration_ms, int, 0644);
> >>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>> detected");
> >>>> +
> >>>> +static unsigned long buffer_squeeze_end;
> >>>> +
> >>>> +void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>>> +{
> >>>> +  buffer_squeeze_end = jiffies +
> >>>> +  msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>
> >>> I'm not sure this is fully correct. This function will be called for
> >>> each blkback instance, but the timeout is stored in a global variable
> >>> that's shared between all blkback instances. Shouldn't this timeout be
> >>> stored in xen_blkif so each instance has it's own local variable?
> >>>
> >>> Or else in the case you have 1k blkback instances the timeout is
> >>> certainly going to be longer than expected, because each call to
> >>> xen_blkbk_reclaim_memory will move it forward.
> >>
> >> Agreed that.  I think the extended timeout would not make a visible
> >> performance, though, because the time that 1k-loop take would be short 
> >> enough
> >> to be ignored compared to the millisecond-scope duration.
> >>
> >> I took this way because I wanted to minimize such structural changes as 
> >> far as
> >> I can, as this is just a point-fix rather than ultimate solution.  That 
> >> said,
> >> it is not fully correct and very confusing.  My another colleague also 
> >> pointed
> >> out it in internal review.  Correct solution would be to adding a variable 
> >> in
> >> the struct as you suggested or avoiding duplicated update of the variable 
> >> by
> >> initializing the variable once the squeezing duration passes.  I would 
> >> prefer
> >> the later way, as it is more straightforward and still not introducing
> >> structural change.  For example, it might be like below:
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c 
> >> b/drivers/block/xen-blkback/blkback.c
> >> index f41c698dd854..6856c8ef88de 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end;
> >>
> >>   void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>   {
> >> -   buffer_squeeze_end = jiffies +
> >> -   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >> +   if (!buffer_squeeze_end)
> >> +   buffer_squeeze_end = jiffies +
> >> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>   }
> >>
> >>   static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
> >> **page)
> >> @@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg)
> >>  }
> >>
> >>  /* Shrink the free pages pool if it is too large. */
> >> -   if (time_before(jiffies, buffer_squeeze_end))
> >> +   if (time_before(jiffies, buffer_squeeze_end)) {
> >>  shrink_free_pagepool(ring, 0);
> >> -   else
> >> +   } else {
> >> +   if (unlikely(buffer_squeeze_end))
> &g

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-12 Thread SeongJae Park
On Thu, 12 Dec 2019 16:23:17 +0100 "Roger Pau Monné"  
wrote:

> > On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné"  
> > wrote:
> > > > On the slow block device
> > > > 
> > > > 
> > > > max_pgs   Min   Max   Median AvgStddev
> > > > 0 38.7  45.8  38.7   40.12  3.1752165
> > > > 1024  38.7  45.8  38.7   40.12  3.1752165
> > > > No difference proven at 95.0% confidence
> > > > 
> > > > On the fast block device
> > > > 
> > > > 
> > > > max_pgs   Min   Max   Median AvgStddev
> > > > 0 417   423   420419.4  2.5099801
> > > > 1024  414   425   416417.8  4.4384682
> > > > No difference proven at 95.0% confidence
> > > 
> > > This is intriguing, as it seems to prove that the usage of a cache of
> > > free pages is irrelevant performance wise.
> > > 
> > > The pool of free pages was introduced long ago, and it's possible that
> > > recent improvements to the balloon driver had made such pool useless,
> > > at which point it could be removed instead of worked around.
> > 
> > I guess the grant page allocation overhead in this test scenario is really
> > small.  In an absence of memory pressure, fragmentation, and NUMA imbalance,
> > the latency of the page allocation ('get_page()') is very short, as it will
> > success in the fast path.
> 
> The allocation of the pool of free pages involves more than get_page,
> it uses gnttab_alloc_pages which in the worse case will allocate a
> page and balloon it out issuing one hypercall.
> 
> > Few years ago, I once measured the page allocation latency on my machine.
> > Roughly speaking, it was about 1us in best case, 100us in worst case, and 
> > 5us
> > in average.  Please keep in mind that the measurement was not designed and
> > performed in serious way.  Thus the results could have profile overhead in 
> > it,
> > though.  While keeping that in mind, let's simply believe the number and 
> > ignore
> > the latency of the block layer, blkback itself (including the grant
> > mapping), and anything else including context switch, cache miss, but the
> > allocation.  In other words, suppose that the grant page allocation is only 
> > one
> > source of the overhead.  It will be able to achieve 1 million IOPS (4KB *
> > 1MIOPS = 4 GB/s) in the best case, 200 thousand IOPS (800 MB/s) in average, 
> > and
> > 10 thousand IOPS (40 MB/s) in worst case.  Based on this coarse 
> > calculation, I
> > think the test results is reasonable.
> > 
> > This also means that the effect of the blkback's free pages pool might be
> > visible under page allocation fast path failure situation.  Nevertheless, it
> > would be also hard to measure that in micro level unless the measurement is
> > well designed and controlled.
> > 
> > > 
> > > Do you think you could perform some more tests (as pointed out above
> > > against the block device to skip the fs overhead) and report back the
> > > results?
> > 
> > To be honest, I'm not sure whether additional tests are really necessary,
> > because I think the `dd` test and the results explanation already makes some
> > sense and provide the minimal proof of the concept.  Also, this change is a
> > fallback for the memory pressure situation, which is an error path in some
> > point of view.  Such errorneous situation might not happen frequently and if
> > the situation is not solved in short time, something much worse (e.g., OOM 
> > kill
> > of the user space xen control processes) than temporal I/O performance
> > degradation could happen.  Thus, I'm not sure whether such detailed 
> > performance
> > measurement is necessary for this rare error handling change.
> 
> Right, my main concern is that we seem to be adding duck tape so
> things don't fall apart, but if such cache is really not beneficial
> from a performance PoV I would rather see it go away than adding more
> stuff to it in order to workaround corner cases like memory
> starvation.

Right, if the cache is really giving no benefit, it would be much better to
simply remove it.  However, as mentioned before, I'm not sure whether it is
useless at all.  Maybe we could do some more detailed test to know that, but it
would be an out of scope of this patch.

> 
> Anyway, I guess we can take such change, but long term we need to look
> into fixing grants to not use ballooned pages, and figure out if the
> blkback free page cache is really useful or not.

Totally agreed.


Thanks,
SeongJae Park

> 
> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >