Re: [Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected

2020-01-02 Thread SeongJae Park
Hello Roger,

Sorry if I'm disturbing your vacation.  If you are already came back to work,
may I ask your opinion about this patch?

On Wed, 18 Dec 2019 19:37:16 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and is increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, host administrators can cause memory pressure in blkback by
> attaching a large number of block devices and inducing I/O.  Such
> problematic situations can be avoided by limiting the maximum number of
> devices that can be attached, but finding the optimal limit is not so
> easy.  Improper set of the limit can results in memory pressure or a
> resource underutilization.  This commit avoids such problematic
> situations by squeezing the pools (returns every free page in the pool
> to the system) for a while (users can set this duration via a module
> parameter) if memory pressure is detected.
> 
> Discussions
> ===
> 
> The `blkback`'s original shrinking mechanism returns only pages in the
> pool which are not currently be used by `blkback` to the system.  In
> other words, the pages that are not mapped with granted pages.  Because
> this commit is changing only the shrink limit but still uses the same
> freeing mechanism it does not touch pages which are currently mapping
> grants.
> 
> Once memory pressure is detected, this commit keeps the squeezing limit
> for a user-specified time duration.  The duration should be neither too
> long nor too short.  If it is too long, the squeezing incurring overhead
> can reduce the I/O performance.  If it is too short, `blkback` will not
> free enough pages to reduce the memory pressure.  This commit sets the
> value as `10 milliseconds` by default because it is a short time in
> terms of I/O while it is a long time in terms of memory operations.
> Also, as the original shrinking mechanism works for at least every 100
> milliseconds, this could be a somewhat reasonable choice.  I also tested
> other durations (refer to the below section for more details) and
> confirmed that 10 milliseconds is the one that works best with the test.
> That said, the proper duration depends on actual configurations and
> workloads.  That's why this commit allows users to set the duration as a
> module parameter.
> 
> Memory Pressure Test
> 
> 
> To show how this commit fixes the memory pressure situation well, I
> configured a test environment on a xen-running virtualization system.
> On the `blkfront` running guest instances, I attach a large number of
> network-backed volume devices and induce I/O to those.  Meanwhile, I
> measure the number of pages that swapped in (pswpin) and out (pswpout)
> on the `blkback` running guest.  The test ran twice, once for the
> `blkback` before this commit and once for that after this commit.  As
> shown below, this commit has dramatically reduced the memory pressure:
> 
> pswpin  pswpout
> before  76,672  185,799
> after  8673,967
> 
> Optimal Aggressive Shrinking Duration
> -
> 
> To find a best squeezing duration, I repeated the test with three
> different durations (1ms, 10ms, and 100ms).  The results are as below:
> 
> durationpswpin  pswpout
> 1   707 5,095
> 10  867 3,967
> 100 362 3,348
> 
> As expected, the memory pressure decreases as the duration increases,
> but the reduction become slow from the `10ms`.  Based on this results, I
> chose the default duration as 10ms.
> 
> Performance Overhead Test
> =
> 
> This commit could incur I/O performance degradation under severe memory
> pressure because the squeezing will require more page allocations per
> I/O.  To show the overhead, I artificially made a worst-case squeezing
> situation and measured the I/O performance of a `blkfront` running
> guest.
> 
> For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> test, I set the value to `1024` and `0`.  The `1024` is the default
> value.  Setting the value as `0` is same to a situation doing the
> squeezing always (worst-case).
> 
> If the underlying block device is slow enough, the squeezing overhead
> could be hidden.  For the reason, I use a fast block device, namely the
> rbd[1]:
> 
> # x

Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2020-01-13 Thread SeongJae Park
Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' from
appropriate maintainers by last Wednesday, and after that, got no comment yet.
May I ask some more comments?


Thanks,
SeongJae Park

On Wed, 18 Dec 2019 19:37:13 +0100 SeongJae Park  wrote:

> Granting pages consumes backend system memory.  In systems configured
> with insufficient spare memory for those pages, it can cause a memory
> pressure situation.  However, finding the optimal amount of the spare
> memory is challenging for large systems having dynamic resource
> utilization patterns.  Also, such a static configuration might lack
> flexibility.
> 
> To mitigate such problems, this patchset adds a memory reclaim callback
> to 'xenbus_driver' (patch 1) and then introduce a lock for race
> condition avoidance (patch 2).  After that, patch 3 applies the callback
> mechanism to mitigate the problem in 'xen-blkback'.  The fourth and
> fifth patches are trivial cleanups; those fix nits we found during the
> development of this patchset.
> 
> Note that patches 1, 4, and 5 are not changed since v9.
> 
> 
> Base Version
> 
> 
> This patch is based on v5.4.  A complete tree is also available at my
> public git repo:
> https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13
> 
> 
> Patch History
> -
> 
> Changes from v12
> (https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/)
>  - Do not unnecessarily disable interrupts (suggested by Juergen)
>  - Hold lock from xenbus side (suggested by Juergen)
> 
> Changes from v11
> (https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/)
>  - Fix wrong trylock use (reported by Juergen)
>  - Merge patch 3 and 4 (suggested by Juergen)
>  - Update test result
> 
> Changes from v10
> (https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
>  - Fix race condition (reported by SeongJae, suggested by Juergen)
> 
> Changes from v9
> (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
>  - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
>  - Update the commit message for overhead test of the 2nd path
> 
> Changes from v8
> (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
>  - Drop 'Reviewed-by: Juergen' from the second patch
>(suggested by Roger Pau Monné)
>  - Update contact of the new module param to SeongJae Park
>
>(suggested by Roger Pau Monné)
>  - Wordsmith the description of the parameter
>(suggested by Roger Pau Monné)
>  - Fix dumb bugs
>(suggested by Roger Pau Monné)
>  - Move module param definition to xenbus.c and reduce the number of
>lines for this change
>(suggested by Roger Pau Monné)
>  - Add a comment for the new callback, reclaim_memory, as other
>callbacks also have
>  - Add another trivial cleanup of xenbus.c file (4th patch)
> 
> Changes from v7
> (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
>  - Update sysfs-driver-xen-blkback for new parameter
>(suggested by Roger Pau Monné)
>  - Use per-xen_blkif buffer_squeeze_end instead of global variable
>(suggested by Roger Pau Monné)
> 
> Changes from v6
> (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
>  - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
>  - Constify a variable (suggested by Roger Pau Monné)
>  - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
>  - More wordsmith of the commit message (suggested by Roger Pau Monné)
> 
> Changes from v5
> (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
>  - Wordsmith the commit messages (suggested by Roger Pau Monné)
>  - Change the reclaim callback return type (suggested by Roger Pau
>Monné)
>  - Change the type of the blkback squeeze duration variable
>(suggested by Roger Pau Monné)
>  - Add a patch for removal of unnecessary static variable name prefixes
>(suggested by Roger Pau Monné)
>  - Fix checkpatch.pl warnings
> 
> Changes from v4
> (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
>  - Remove domain id parameter from the callback (suggested by Juergen
>Gross)
>  - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)
> 
> Changes from v3
> (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
>  - Add general callback in xen_driver and use it (suggested by Juergen
>Gross)
> 
> Changes from v2
> (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3

Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback

2020-01-13 Thread SeongJae Park
On Mon, 13 Jan 2020 10:55:07 +0100 "Roger Pau Monné"  
wrote:

> On Mon, Jan 13, 2020 at 10:49:52AM +0100, SeongJae Park wrote:
> > Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' 
> > from
> > appropriate maintainers by last Wednesday, and after that, got no comment 
> > yet.
> > May I ask some more comments?
> 
> I'm not sure why more comments are needed, patches have all the
> relevant Acks and will be pushed in due time unless someone has
> objections.
> 
> Please be patient and wait at least until the next merge window, this
> patches are not bug fixes so pushing them now would be wrong.

Ok, I will.  Thank you for your quick and nice reply.


Thanks,
SeongJae Park

> 
> Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds a module parameter for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: Anthony Liguori 
Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
 drivers/block/xen-blkback/xenbus.c| 28 ++-
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index ecb7942ff146..ac2947b98950 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -35,3 +35,12 @@ Description:
 controls the duration in milliseconds that blkback will not
 cache any page not backed by a grant mapping.
 The default is 10ms.
+
+What:   /sys/module/xen_blkback/parameters/feature_persistent
+Date:   September 2020
+KernelVersion:  5.10
+Contact:SeongJae Park 
+Description:
+Whether to enable the persistent grants feature or not.  Note
+that this option only takes effect on newly created backends.
+The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b9aa5d1ac10b..8a95ddd08b13 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
@@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
 
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
-   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
-   if (err) {
-   xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
-dev->nodename);
-   goto abort;
+   if (feature_persistent) {
+   err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
+   "%u", feature_persistent);
+   if (err) {
+   xenbus_dev_fatal(dev, err,
+   "writing %s/feature-persistent",
+   dev->nodename);
+   goto abort;
+   }
}
 
err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
@@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
-  0);
+   if (feature_persistent)
+   pers_grants = xenbus_read_unsigned(dev->otherend,
+   "feature-persistent", 0);
+   else
+   pers_grants = 0;
+
blkif->vbd.feature_gnt_persistent = pers_grants;
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.17.1




[PATCH v2 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
parameter") changed the name of the module parameter for the maximum
amount of segments in indirect requests but missed updating the
document.  This commit updates the document.

Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 9c31334cb2e6..28008905615f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -1,4 +1,4 @@
-What:   /sys/module/xen_blkfront/parameters/max
+What:   /sys/module/xen_blkfront/parameters/max_indirect_segments
 Date:   June 2013
 KernelVersion:  3.11
 Contact:Konrad Rzeszutek Wilk 
-- 
2.17.1




[PATCH v2 0/3] xen-blk(back|front): Let users disable persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds module parameters for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Baseline and Complete Git Trees
===

The patches are based on the v5.9-rc6.  You can also clone the complete
git tree:

$ git clone git://github.com/sjp38/linux -b pgrants_disable_v2

The web is also available:
https://github.com/sjp38/linux/tree/pgrants_disable_v2

Patch History
=

Changes from v1
(https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/)
- use 'bool' parameter type (Jürgen Groß)
- Let blkfront can also disable the feature from its side
  (Roger Pau Monné)
- Avoid unnecessary xenbus_printf (Roger Pau Monné)
- Update frontend parameter doc

SeongJae Park (3):
  xen-blkback: add a parameter for disabling of persistent grants
  xen-blkfront: add a parameter for disabling of persistent grants
  xen-blkfront: Apply changed parameter name to the document

 .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
 .../ABI/testing/sysfs-driver-xen-blkfront | 11 +++-
 drivers/block/xen-blkback/xenbus.c| 28 ++-
 drivers/block/xen-blkfront.c  | 28 +--
 4 files changed, 60 insertions(+), 16 deletions(-)

-- 
2.17.1




Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overheads[1] and thus it is
> > required to be disabled.  But, there is no option to disable it.  For
> > the reason, this commit adds a module parameter for disabling of the
> > feature.
> > 
> > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > 
> > Signed-off-by: Anthony Liguori 
> > Signed-off-by: SeongJae Park 
> > ---
> >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> >  drivers/block/xen-blkback/xenbus.c| 28 ++-
> >  2 files changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > index ecb7942ff146..ac2947b98950 100644
> > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > @@ -35,3 +35,12 @@ Description:
> >  controls the duration in milliseconds that blkback will not
> >  cache any page not backed by a grant mapping.
> >  The default is 10ms.
> > +
> > +What:   /sys/module/xen_blkback/parameters/feature_persistent
> > +Date:   September 2020
> > +KernelVersion:  5.10
> > +Contact:SeongJae Park 
> > +Description:
> > +Whether to enable the persistent grants feature or not.  
> > Note
> > +that this option only takes effect on newly created 
> > backends.
> > +The default is Y (enable).
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index b9aa5d1ac10b..8a95ddd08b13 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
> >  
> >  /* ** Connection ** */
> >  
> > +/* Enable the persistent grants feature. */
> > +static bool feature_persistent = true;
> > +module_param(feature_persistent, bool, 0644);
> > +MODULE_PARM_DESC(feature_persistent,
> > +   "Enables the persistent grants feature");
> > +
> >  /*
> >   * Write the physical details regarding the block device to the store, and
> >   * switch to Connected state.
> > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> >  
> > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> >  
> > -   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
> > -   if (err) {
> > -   xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> > -dev->nodename);
> > -   goto abort;
> > +   if (feature_persistent) {
> > +   err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
> > +   "%u", feature_persistent);
> > +   if (err) {
> > +   xenbus_dev_fatal(dev, err,
> > +   "writing %s/feature-persistent",
> > +   dev->nodename);
> > +   goto abort;
> > +   }
> > }
> >  
> > err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
> > @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
> > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
> > return -ENOSYS;
> > }
> > -   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
> > -  0);
> > +   if (feature_persistent)
> > +   pers_grants = xenbus_read_unsigned(dev->otherend,
> > +   "feature-persistent", 0);
> > +   else
> > +   pers_grants = 0;
> > +
> 
> Sorry for not realizing earlier, but looking at it again I think you
> need to cache the value of feature_persistent when it's first used in
> the blkback state data, so that it's consistent.
> 
> What would happen for example with the following flow (assume a
> persistent grants enabled frontend):
> 
> feature_persistent = false
> 
> connect(...)
> feature-persistent is not written to xenstore
> 
> User changes feature_persistent = true
> 
> connect_ring(...)
> pers_grants = true, because feature-persistent is set unconditionally
> by the frontend and feature_persistent variable is now true.
> 
> Then blkback will try to use persistent grants and the whole
> connection will malfunction because the frontend won't.

Ah, you're right.  I should also catch this before but didn't, sorry.

> 
> The other option is to prevent changing the variable when there are
> blkback instances already running.

I think storing the option value in xenstore would be simpler.  That said, if
you prefer this way, please let me know.


Thanks,
SeongJae Park



Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:35:11 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 01:26:38PM +0200, SeongJae Park wrote:
> > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
> > wrote:
> > 
> > > On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> > > > From: SeongJae Park 
> > > > 
> > > > Persistent grants feature provides high scalability.  On some small
> > > > systems, however, it could incur data copy overheads[1] and thus it is
> > > > required to be disabled.  But, there is no option to disable it.  For
> > > > the reason, this commit adds a module parameter for disabling of the
> > > > feature.
> > > > 
> > > > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > > > 
> > > > Signed-off-by: Anthony Liguori 
> > > > Signed-off-by: SeongJae Park 
> > > > ---
> > > >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> > > >  drivers/block/xen-blkback/xenbus.c| 28 ++-
> > > >  2 files changed, 30 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > > > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > index ecb7942ff146..ac2947b98950 100644
> > > > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > > > @@ -35,3 +35,12 @@ Description:
> > > >  controls the duration in milliseconds that blkback 
> > > > will not
> > > >  cache any page not backed by a grant mapping.
> > > >  The default is 10ms.
> > > > +
> > > > +What:   /sys/module/xen_blkback/parameters/feature_persistent
> > > > +Date:   September 2020
> > > > +KernelVersion:  5.10
> > > > +Contact:SeongJae Park 
> > > > +Description:
> > > > +Whether to enable the persistent grants feature or 
> > > > not.  Note
> > > > +that this option only takes effect on newly created 
> > > > backends.
> > > > +The default is Y (enable).
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > > > b/drivers/block/xen-blkback/xenbus.c
> > > > index b9aa5d1ac10b..8a95ddd08b13 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device 
> > > > *dev)
> > > >  
> > > >  /* ** Connection ** */
> > > >  
> > > > +/* Enable the persistent grants feature. */
> > > > +static bool feature_persistent = true;
> > > > +module_param(feature_persistent, bool, 0644);
> > > > +MODULE_PARM_DESC(feature_persistent,
> > > > +   "Enables the persistent grants feature");
> > > > +
> > > >  /*
> > > >   * Write the physical details regarding the block device to the store, 
> > > > and
> > > >   * switch to Connected state.
> > > > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> > > >  
> > > > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> > > >  
> > > > -   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", 
> > > > "%u", 1);
> > > > -   if (err) {
> > > > -   xenbus_dev_fatal(dev, err, "writing 
> > > > %s/feature-persistent",
> > > > -dev->nodename);
> > > > -   goto abort;
> > > > +   if (feature_persistent) {
> > > > +   err = xenbus_printf(xbt, dev->nodename, 
> > > > "feature-persistent",
> > > > +   "%u", feature_persistent);
> > > > +   if (err) {
> > > > +   xenbus_dev_fatal(dev, err,
> > > > +   "writing %s/feature-persistent",
> > > > +   dev->nodename);
> > > > +   goto abort;
> > > > +   }
> > > > 

Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 13:35:30 +0200 "Jürgen Groß"  wrote:

> On 22.09.20 13:26, SeongJae Park wrote:
> > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné"  
> > wrote:
> > 
> >> On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> Persistent grants feature provides high scalability.  On some small
> >>> systems, however, it could incur data copy overheads[1] and thus it is
> >>> required to be disabled.  But, there is no option to disable it.  For
> >>> the reason, this commit adds a module parameter for disabling of the
> >>> feature.
> >>>
> >>> [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> >>>
> >>> Signed-off-by: Anthony Liguori 
> >>> Signed-off-by: SeongJae Park 
> >>> ---
> >>>   .../ABI/testing/sysfs-driver-xen-blkback  |  9 ++
> >>>   drivers/block/xen-blkback/xenbus.c| 28 ++-
> >>>   2 files changed, 30 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> >>> b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> index ecb7942ff146..ac2947b98950 100644
> >>> --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> >>> @@ -35,3 +35,12 @@ Description:
> >>>   controls the duration in milliseconds that blkback will 
> >>> not
> >>>   cache any page not backed by a grant mapping.
> >>>   The default is 10ms.
> >>> +
> >>> +What:   /sys/module/xen_blkback/parameters/feature_persistent
> >>> +Date:   September 2020
> >>> +KernelVersion:  5.10
> >>> +Contact:SeongJae Park 
> >>> +Description:
> >>> +Whether to enable the persistent grants feature or not.  
> >>> Note
> >>> +that this option only takes effect on newly created 
> >>> backends.
> >>> +The default is Y (enable).
> >>> diff --git a/drivers/block/xen-blkback/xenbus.c 
> >>> b/drivers/block/xen-blkback/xenbus.c
> >>> index b9aa5d1ac10b..8a95ddd08b13 100644
> >>> --- a/drivers/block/xen-blkback/xenbus.c
> >>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>> @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev)
> >>>   
> >>>   /* ** Connection ** */
> >>>   
> >>> +/* Enable the persistent grants feature. */
> >>> +static bool feature_persistent = true;
> >>> +module_param(feature_persistent, bool, 0644);
> >>> +MODULE_PARM_DESC(feature_persistent,
> >>> + "Enables the persistent grants feature");
> >>> +
> >>>   /*
> >>>* Write the physical details regarding the block device to the store, 
> >>> and
> >>>* switch to Connected state.
> >>> @@ -906,11 +912,15 @@ static void connect(struct backend_info *be)
> >>>   
> >>>   xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> >>>   
> >>> - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
> >>> - if (err) {
> >>> - xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> >>> -  dev->nodename);
> >>> - goto abort;
> >>> + if (feature_persistent) {
> >>> + err = xenbus_printf(xbt, dev->nodename, "feature-persistent",
> >>> + "%u", feature_persistent);
> >>> + if (err) {
> >>> + xenbus_dev_fatal(dev, err,
> >>> + "writing %s/feature-persistent",
> >>> + dev->nodename);
> >>> + goto abort;
> >>> + }
> >>>   }
> >>>   
> >>>   err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu",
> >>> @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be)
> >>>   xenbus_dev_fatal(dev, err, "unknown fe protocol %s", 
> >>> protoco

Re: [PATCH v2 2/3] xen-blkfront: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 14:11:32 +0200 "Jürgen Groß"  wrote:

> On 22.09.20 12:52, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overheads[1] and thus it is
> > required to be disabled.  It can be disabled from blkback side using a
> > module parameter, 'feature_persistent'.  But, it is impossible from
> > blkfront side.  For the reason, this commit adds a blkfront module
> > parameter for disabling of the feature.
> > 
> > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability
> > 
> > Signed-off-by: SeongJae Park 
> > ---
> >   .../ABI/testing/sysfs-driver-xen-blkfront |  9 ++
> >   drivers/block/xen-blkfront.c  | 28 +--
> >   2 files changed, 29 insertions(+), 8 deletions(-)
> > 
[...]
> > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> > index 91de2e0755ae..49c324f377de 100644
> > --- a/drivers/block/xen-blkfront.c
> > +++ b/drivers/block/xen-blkfront.c
> > @@ -149,6 +149,13 @@ static unsigned int xen_blkif_max_ring_order;
> >   module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
> > 0444);
> >   MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used 
> > for the shared ring");
> >   
> > +/* Enable the persistent grants feature. */
> > +static bool feature_persistent = true;
> > +module_param(feature_persistent, bool, 0644);
> > +MODULE_PARM_DESC(feature_persistent,
> > +   "Enables the persistent grants feature");
> > +
> > +
> >   #define BLK_RING_SIZE(info)   \
> > __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages)
> >   
> > @@ -1866,11 +1873,13 @@ static int talk_to_blkback(struct xenbus_device 
> > *dev,
> > message = "writing protocol";
> > goto abort_transaction;
> > }
> > -   err = xenbus_printf(xbt, dev->nodename,
> > -   "feature-persistent", "%u", 1);
> > -   if (err)
> > -   dev_warn(&dev->dev,
> > -"writing persistent grants feature to xenbus");
> > +   if (feature_persistent) {
> > +   err = xenbus_printf(xbt, dev->nodename,
> > +   "feature-persistent", "%u", 1);
> > +   if (err)
> > +   dev_warn(&dev->dev,
> > +"writing persistent grants feature to xenbus");
> > +   }
> >   
> > err = xenbus_transaction_end(xbt, 0);
> > if (err) {
> > @@ -2316,9 +2325,12 @@ static void blkfront_gather_backend_features(struct 
> > blkfront_info *info)
> > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
> > blkfront_setup_discard(info);
> >   
> > -   info->feature_persistent =
> > -   !!xenbus_read_unsigned(info->xbdev->otherend,
> > -  "feature-persistent", 0);
> > +   if (feature_persistent)
> > +   info->feature_persistent =
> > +   !!xenbus_read_unsigned(info->xbdev->otherend,
> > +  "feature-persistent", 0);
> > +   else
> > +   info->feature_persistent = 0;
> >   
> > indirect_segments = xenbus_read_unsigned(info->xbdev->otherend,
> > "feature-max-indirect-segments", 0);
> > 
> 
> Here you have the same problem as in blkback: feature_persistent could
> change its value between the two tests.

Yes, indeed.  I will fix this in the next version.


Thanks,
SeongJae Park



[PATCH v3 0/3] xen-blk(back|front): Let users disable persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds module parameters for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Baseline and Complete Git Trees
===

The patches are based on the v5.9-rc6.  You can also clone the complete
git tree:

$ git clone git://github.com/sjp38/linux -b pgrants_disable_v3

The web is also available:
https://github.com/sjp38/linux/tree/pgrants_disable_v3

Patch History
=

Changes from v2
(https://lore.kernel.org/linux-block/20200922105209.5284-1-sjp...@amazon.com/)
- Avoid race conditions (Roger Pau Monné)

Changes from v1
(https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/)
- use 'bool' parameter type (Jürgen Groß)
- Let blkfront can also disable the feature from its side
  (Roger Pau Monné)
- Avoid unnecessary xenbus_printf (Roger Pau Monné)
- Update frontend parameter doc


SeongJae Park (3):
  xen-blkback: add a parameter for disabling of persistent grants
  xen-blkfront: add a parameter for disabling of persistent grants
  xen-blkfront: Apply changed parameter name to the document

 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 .../ABI/testing/sysfs-driver-xen-blkfront | 11 +-
 drivers/block/xen-blkback/xenbus.c| 22 ++-
 drivers/block/xen-blkfront.c  | 20 -
 4 files changed, 50 insertions(+), 12 deletions(-)

-- 
2.17.1




Re: [PATCH v3 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-22 Thread SeongJae Park
On Tue, 22 Sep 2020 16:44:25 +0200 "Roger Pau Monné"  
wrote:

> On Tue, Sep 22, 2020 at 04:27:39PM +0200, Jürgen Groß wrote:
> > On 22.09.20 16:15, SeongJae Park wrote:
> > > From: SeongJae Park 
> > > 
> > > Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
> > > parameter") changed the name of the module parameter for the maximum
> > > amount of segments in indirect requests but missed updating the
> > > document.  This commit updates the document.
> > > 
> > > Signed-off-by: SeongJae Park 
> > 
> > Reviewed-by: Juergen Gross 
> 
> Does this need to be backported to stable branches?

I don't think so, as this is not a bug affecting users?


Thanks,
SeongJae Park



[PATCH v4 1/3] xen-blkback: add a parameter for disabling of persistent grants

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Persistent grants feature provides high scalability.  On some small
systems, however, it could incur data copy overheads[1] and thus it is
required to be disabled.  But, there is no option to disable it.  For
the reason, this commit adds a module parameter for disabling of the
feature.

[1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability

Signed-off-by: Anthony Liguori 
Signed-off-by: SeongJae Park 
Reviewed-by: Juergen Gross 
Acked-by: Roger Pau Monné 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 drivers/block/xen-blkback/xenbus.c| 22 ++-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index ecb7942ff146..ac2947b98950 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -35,3 +35,12 @@ Description:
 controls the duration in milliseconds that blkback will not
 cache any page not backed by a grant mapping.
 The default is 10ms.
+
+What:   /sys/module/xen_blkback/parameters/feature_persistent
+Date:   September 2020
+KernelVersion:  5.10
+Contact:SeongJae Park 
+Description:
+Whether to enable the persistent grants feature or not.  Note
+that this option only takes effect on newly created backends.
+The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b9aa5d1ac10b..8fc34211dc8b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -474,6 +474,12 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent,
+   "Enables the persistent grants feature");
+
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -519,6 +525,8 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (q && blk_queue_secure_erase(q))
vbd->discard_secure = true;
 
+   vbd->feature_gnt_persistent = feature_persistent;
+
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -906,7 +914,8 @@ static void connect(struct backend_info *be)
 
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
 
-   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1);
+   err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
+   be->blkif->vbd.feature_gnt_persistent);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
 dev->nodename);
@@ -1067,7 +1076,6 @@ static int connect_ring(struct backend_info *be)
 {
struct xenbus_device *dev = be->dev;
struct xen_blkif *blkif = be->blkif;
-   unsigned int pers_grants;
char protocol[64] = "";
int err, i;
char *xspath;
@@ -1093,9 +1101,11 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
-  0);
-   blkif->vbd.feature_gnt_persistent = pers_grants;
+   if (blkif->vbd.feature_gnt_persistent)
+   blkif->vbd.feature_gnt_persistent =
+   xenbus_read_unsigned(dev->otherend,
+   "feature-persistent", 0);
+
blkif->vbd.overflow_max_grants = 0;
 
/*
@@ -1118,7 +1128,7 @@ static int connect_ring(struct backend_info *be)
 
pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
 blkif->nr_rings, blkif->blk_protocol, protocol,
-pers_grants ? "persistent grants" : "");
+blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
ring_page_order = xenbus_read_unsigned(dev->otherend,
   "ring-page-order", 0);
-- 
2.17.1




[PATCH v4 3/3] xen-blkfront: Apply changed parameter name to the document

2020-09-22 Thread SeongJae Park
From: SeongJae Park 

Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor
parameter") changed the name of the module parameter for the maximum
amount of segments in indirect requests but missed updating the
document.  This commit updates the document.

Signed-off-by: SeongJae Park 
Reviewed-by: Juergen Gross 
Acked-by: Roger Pau Monné 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 9c31334cb2e6..28008905615f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -1,4 +1,4 @@
-What:   /sys/module/xen_blkfront/parameters/max
+What:   /sys/module/xen_blkfront/parameters/max_indirect_segments
 Date:   June 2013
 KernelVersion:  3.11
 Contact:Konrad Rzeszutek Wilk 
-- 
2.17.1




Re: [PATCH] xen-blkback: add a parameter for disabling of persistent grants

2020-09-23 Thread SeongJae Park
On Wed, 23 Sep 2020 16:09:30 -0400 Konrad Rzeszutek Wilk 
 wrote:

> On Tue, Sep 22, 2020 at 09:01:25AM +0200, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Persistent grants feature provides high scalability.  On some small
> > systems, however, it could incur data copy overhead[1] and thus it is
> > required to be disabled.  But, there is no option to disable it.  For
> > the reason, this commit adds a module parameter for disabling of the
> > feature.
> 
> Would it be better suited to have it per guest?

The latest version of this patchset[1] supports blkfront side disablement.
Could that partially solves your concern?

[1] https://lore.kernel.org/xen-devel/20200923061841.20531-1-sjp...@amazon.com/


Thanks,
SeongJae Park



Re: [PATCH 1/2] xen/blkback: turn the cache purge LRU interval into a parameter

2020-10-15 Thread SeongJae Park
On Thu, 15 Oct 2020 16:24:15 +0200 Roger Pau Monne  wrote:

> Assume that reads and writes to the variable will be atomic. The worse
> that could happen is that one of the LRU intervals is not calculated
> properly if a partially written value is read, but that would only be
> a transient issue.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Jens Axboe 
> Cc: Boris Ostrovsky 
> Cc: SeongJae Park 
> Cc: xen-devel@lists.xenproject.org
> Cc: linux-bl...@vger.kernel.org
> Cc: J. Roeleveld 
> Cc: Jürgen Groß 
> ---
>  Documentation/ABI/testing/sysfs-driver-xen-blkback | 10 ++
>  drivers/block/xen-blkback/blkback.c|  9 ++---
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> index ecb7942ff146..776f25d335ca 100644
> --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> @@ -35,3 +35,13 @@ Description:
>  controls the duration in milliseconds that blkback will not
>  cache any page not backed by a grant mapping.
>  The default is 10ms.
> +
> +What:   /sys/module/xen_blkback/parameters/lru_internval
> +Date:   October 2020
> +KernelVersion:  5.10
> +Contact:Roger Pau Monné 
> +Description:
> +The LRU mechanism to clean the lists of persistent grants 
> needs
> +to be executed periodically. This parameter controls the time
> +interval between consecutive executions of the purge 
> mechanism
> +is set in ms.

I think noticing the default value (100ms) here would be better.


Thanks,
SeongJae Park



Re: [PATCH linux-next] drivers/xen/xenbus/xenbus_client.c: fix bugon.cocci warnings

2021-08-24 Thread SeongJae Park
From: SeongJae Park 

On Tue, 24 Aug 2021 23:24:51 -0700 CGEL  wrote:

> From: Jing Yangyang 
> 
> Use BUG_ON instead of a if condition followed by BUG.
> 
> Generated by: scripts/coccinelle/misc/bugon.cocci
> 
> Reported-by: Zeal Robot 
> Signed-off-by: Jing Yangyang 

Reviewed-by: SeongJae Park 


Thanks,
SJ

[...]



[PATCH] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent

2022-03-01 Thread SeongJae Park
SeongJae is currently listed as a contact point for some blk{back,front}
features, but he will not work for XEN for a while.  This commit
therefore updates the contact point to his colleague, Maximilian, who is
understanding the context and actively working with the features now.

Signed-off-by: SeongJae Park 
Signed-off-by: Maximilian Heyne 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 4 ++--
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index a74dfe52dd76..7faf719af165 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -29,7 +29,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
 Date:   December 2019
 KernelVersion:  5.6
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 When memory pressure is reported to blkback this option
 controls the duration in milliseconds that blkback will not
@@ -39,7 +39,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created backends.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 61fd173fabfe..7f646c58832e 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -12,7 +12,7 @@ Description:
 What:   /sys/module/xen_blkfront/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created frontends.
-- 
2.17.1




Re: [PATCH] xen, blkback: fix persistent grants negotiation

2022-01-06 Thread SeongJae Park
From: SeongJae Park 

On Thu, 6 Jan 2022 09:10:13 + Maximilian Heyne  wrote:

> Given dom0 supports persistent grants but the guest does not.
> Then, when attaching a block device during runtime of the guest, dom0
> will enable persistent grants for this newly attached block device:
> 
>   $ xenstore-ls -f | grep 20674 | grep persistent
>   /local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
>   /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"
> 
> Here disk 768 was attached during guest creation while 51792 was
> attached at runtime. If the guest would have advertised the persistent
> grant feature, there would be a xenstore entry like:
> 
>   /local/domain/20674/device/vbd/51792/feature-persistent = "1"
> 
> Persistent grants are also used when the guest tries to access the disk
> which can be seen when enabling log stats:
> 
>   $ echo 1 > /sys/module/xen_blkback/parameters/log_stats
>   $ dmesg
>   xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f0 |  
> ds0 | pg:1/1056
> 
> The "pg: 1/1056" shows that one persistent grant is used.
> 
> Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
> of persistent grants") vbd->feature_gnt_persistent was set in
> connect_ring. After the commit it was intended to be initialized in
> xen_vbd_create and then set according to the guest feature availability
> in connect_ring. However, with a running guest, connect_ring might be
> called before xen_vbd_create and vbd->feature_gnt_persistent will be
> incorrectly initialized. xen_vbd_create will overwrite it with the value
> of feature_persistent regardless whether the guest actually supports
> persistent grants.
> 
> With this commit, vbd->feature_gnt_persistent is set only in
> connect_ring and this is the only use of the module parameter
> feature_persistent. This avoids races when the module parameter changes
> during the block attachment process.
> 
> Note that vbd->feature_gnt_persistent doesn't need to be initialized in
> xen_vbd_create. It's next use is in connect which can only be called
> once connect_ring has initialized the rings. xen_update_blkif_status is
> checking for this.
> 
> Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of 
> persistent grants")
> Signed-off-by: Maximilian Heyne 

Thank you for this patch!

Reviewed-by: SeongJae Park 

Also, I guess this tag is needed?

Cc:  # 5.10.x


Thanks,
SJ

> ---
>  drivers/block/xen-blkback/xenbus.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 914587aabca0c..51b6ec0380ca4 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
> blkif_vdev_t handle,
>   if (q && blk_queue_secure_erase(q))
>   vbd->discard_secure = true;
>  
> - vbd->feature_gnt_persistent = feature_persistent;
> -
>   pr_debug("Successful creation of handle=%04x (dom=%u)\n",
>   handle, blkif->domid);
>   return 0;
> @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
>   xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
>   return -ENOSYS;
>   }
> - if (blkif->vbd.feature_gnt_persistent)
> - blkif->vbd.feature_gnt_persistent =
> - xenbus_read_unsigned(dev->otherend,
> - "feature-persistent", 0);
> +
> + blkif->vbd.feature_gnt_persistent = feature_persistent &&
> + xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
>  
>   blkif->vbd.overflow_max_grants = 0;
>  
> -- 
> 2.32.0
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 



Re: [PATCH] xen, blkback: fix persistent grants negotiation

2022-01-21 Thread SeongJae Park
On Tue, 11 Jan 2022 13:26:50 +0100 "Roger Pau Monné"  
wrote:

> On Tue, Jan 11, 2022 at 11:50:32AM +, Durrant, Paul wrote:
> > On 11/01/2022 11:11, Roger Pau Monné wrote:
> > > On Thu, Jan 06, 2022 at 09:10:13AM +, Maximilian Heyne wrote:
> > > > Given dom0 supports persistent grants but the guest does not.
> > > > Then, when attaching a block device during runtime of the guest, dom0
> > > > will enable persistent grants for this newly attached block device:
> > > > 
> > > >$ xenstore-ls -f | grep 20674 | grep persistent
> > > >/local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
> > > >/local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"
> > > 
> > > The mechanism that we use to advertise persistent grants support is
> > > wrong. 'feature-persistent' should always be set if the backend
> > > supports persistent grant (like it's done for other features in
> > > xen_blkbk_probe). The usage of the feature depends on whether both
> > > parties support persistent grants, and the xenstore entry printed by
> > > blkback shouldn't reflect whether persistent grants are in use, but
> > > rather whether blkback supports the feature.
> > > 
> > > > 
> > > > Here disk 768 was attached during guest creation while 51792 was
> > > > attached at runtime. If the guest would have advertised the persistent
> > > > grant feature, there would be a xenstore entry like:
> > > > 
> > > >/local/domain/20674/device/vbd/51792/feature-persistent = "1"
> > > > 
> > > > Persistent grants are also used when the guest tries to access the disk
> > > > which can be seen when enabling log stats:
> > > > 
> > > >$ echo 1 > /sys/module/xen_blkback/parameters/log_stats
> > > >$ dmesg
> > > >xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f
> > > > 0 |  ds0 | pg:1/1056
> > > > 
> > > > The "pg: 1/1056" shows that one persistent grant is used.
> > > > 
> > > > Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
> > > > of persistent grants") vbd->feature_gnt_persistent was set in
> > > > connect_ring. After the commit it was intended to be initialized in
> > > > xen_vbd_create and then set according to the guest feature availability
> > > > in connect_ring. However, with a running guest, connect_ring might be
> > > > called before xen_vbd_create and vbd->feature_gnt_persistent will be
> > > > incorrectly initialized. xen_vbd_create will overwrite it with the value
> > > > of feature_persistent regardless whether the guest actually supports
> > > > persistent grants.
> > > > 
> > > > With this commit, vbd->feature_gnt_persistent is set only in
> > > > connect_ring and this is the only use of the module parameter
> > > > feature_persistent. This avoids races when the module parameter changes
> > > > during the block attachment process.
> > > > 
> > > > Note that vbd->feature_gnt_persistent doesn't need to be initialized in
> > > > xen_vbd_create. It's next use is in connect which can only be called
> > > > once connect_ring has initialized the rings. xen_update_blkif_status is
> > > > checking for this.
> > > > 
> > > > Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of 
> > > > persistent grants")
> > > > Signed-off-by: Maximilian Heyne 
> > > > ---
> > > >   drivers/block/xen-blkback/xenbus.c | 9 +++--
> > > >   1 file changed, 3 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > > > b/drivers/block/xen-blkback/xenbus.c
> > > > index 914587aabca0c..51b6ec0380ca4 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
> > > > blkif_vdev_t handle,
> > > > if (q && blk_queue_secure_erase(q))
> > > > vbd->discard_secure = true;
> > > > -   vbd->feature_gnt_persistent = feature_persistent;
> > > > -
> > > > pr_debug("Successful creation of handle=%04x (dom=%u)\n",
> > > > handle, blkif->domid);
> > > > return 0;
> > > > @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
> > > > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", 
> > > > protocol);
> > > > return -ENOSYS;
> > > > }
> > > > -   if (blkif->vbd.feature_gnt_persistent)
> > > > -   blkif->vbd.feature_gnt_persistent =
> > > > -   xenbus_read_unsigned(dev->otherend,
> > > > -   "feature-persistent", 0);
> > > > +
> > > > +   blkif->vbd.feature_gnt_persistent = feature_persistent &&
> > > > +   xenbus_read_unsigned(dev->otherend, 
> > > > "feature-persistent", 0);
> > > 
> > > I'm not sure it's correct to potentially read feature_persistent
> > > multiple times like it's done here.
> > > 
> > > A device can be disconnected and re-attached multiple times, and that
> > > implies multiple

[PATCH v2] xen-blkback: fix persistent grants negotiation

2022-07-14 Thread SeongJae Park
Persistent grants feature can be used only when both backend and the
frontend supports the feature.  The feature was always supported by
'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has introduced a parameter for
disabling it runtime.

To avoid the parameter be updated while being used by 'blkback', the
commit caches the parameter into 'vbd->feature_gnt_persistent' in
'xen_vbd_create()', and then check if the guest also supports the
feature and finally updates the field in 'connect_ring()'.

However, 'connect_ring()' could be called before 'xen_vbd_create()', so
later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to
'vbd->feature_gnt_persistent'.  As a result, 'blkback' could try to use
'persistent grants' feature even if the guest doesn't support the
feature.

This commit fixes the issue by moving the parameter value caching to
'xen_blkif_alloc()', which allocates the 'blkif'.  Because the struct
embeds 'vbd' object, which will be used by 'connect_ring()' later, this
should be called before 'connect_ring()' and therefore this should be
the right and safe place to do the caching.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---

Changes from v1[1]
- Avoid the behavioral change[2]
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@

[1] https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/
[2] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/

 drivers/block/xen-blkback/xenbus.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..16c6785d260c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
return 0;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature");
+
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
@@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
+   blkif->vbd.feature_gnt_persistent = feature_persistent;
+
return blkif;
 }
 
@@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
-- 
2.25.1




Re: [PATCH v2] xen-blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
Hello,


Oleksandr, thank you for Cc-ing Andrii.  Andrii, thank you for the comment!

On Fri, 15 Jul 2022 15:00:10 +0300 Andrii Chepurnyi 
 wrote:

> [-- Attachment #1: Type: text/plain, Size: 5237 bytes --]
> 
> Hello All,
> 
> I faced the mentioned issue recently and just to bring more context here is
> our setup:
> We use pvblock backend for Android guest. It starts using u-boot with
> pvblock support(which frontend doesn't support the persistent grants
> feature), later it loads and starts the Linux kernel(which frontend
> supports the persistent grants feature). So in total, we have sequent two
> different frontends reconnection, the first of which doesn't support
> persistent grants.
> So the original patch [1] perfectly solves the original issue and provides
> the ability to use persistent grants after the reconnection when Linux
> frontend which supports persistent grants comes into play.
> At the same time [2] will disable the persistent grants feature for the
> first and second frontend.

Thank you for this great explanation of your situation.

> Is it possible to keep [1]  as is?

Yes, my concerns about Max's original patch[1] are conflicting behavior
description in the document[1] and different behavior on blkfront-side
'feature_persistent' parameter.  I will post Max's patch again with patches for
blkfront behavior change and Documents updates.

[1] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/


Thanks,
SJ

> 
> [1]
> https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/
> [2] https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/
> 
> Best regards,
> Andrii
> 
> On Fri, Jul 15, 2022 at 1:15 PM Oleksandr  wrote:
> 
> >
> > On 15.07.22 01:44, SeongJae Park wrote:
> >
> >
> > Hello all.
> >
> > Adding Andrii Chepurnyi to CC who have played with the use-case which
> > required reconnect recently and faced some issues with
> > feature_persistent handling.
[...]



[PATCH 2/2] xen-blkfront: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
Previous commit made xen-blkback's 'feature_persistent' parameter to
make effect for not only newly created backends but also for every
reconnected backends.  This commit makes xen-blkfront's counterpart
parameter to works in same manner and update the document to avoid any
confusion due to inconsistent behavior of same-named parameters.

Cc:  # 5.10.x
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkfront.c| 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 7f646c58832e..4d36c5a10546 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -15,5 +15,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created frontends.
+that this option only takes effect on newly connected 
frontends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3646c0cae672..4e763701b372 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev,
info->vdevice = vdevice;
info->connected = BLKIF_STATE_DISCONNECTED;
 
-   info->feature_persistent = feature_persistent;
-
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);
@@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (info->feature_persistent)
+   if (feature_persistent)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




[PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
The first patch of this patchset fixes 'feature_persistent' parameter
handling in 'blkback' to respect the frontend's persistent grants
support always.  The fix makes a behavioral change, so the second patch
makes the counterpart of 'blkfront' to consistently follow the behavior
change.

Changes from v2
(https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
- Keep the behavioral change of v1
- Update blkfront's counterpart to follow the changed behavior
- Update documents for the changed behavior

Changes from v1
(https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
- Avoid the behavioral change
  (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@



Maximilian Heyne (1):
  xen, blkback: fix persistent grants negotiation

SeongJae Park (1):
  xen-blkfront: Apply 'feature_persistent' parameter when connect

 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 2 +-
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkback/xenbus.c  | 9 +++--
 drivers/block/xen-blkfront.c| 4 +---
 4 files changed, 6 insertions(+), 11 deletions(-)

-- 
2.25.1




[PATCH 1/2] xen, blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
From: Maximilian Heyne 

Given dom0 supports persistent grants but the guest does not.
Then, when attaching a block device during runtime of the guest, dom0
will enable persistent grants for this newly attached block device:

  $ xenstore-ls -f | grep 20674 | grep persistent
  /local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
  /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"

Here disk 768 was attached during guest creation while 51792 was
attached at runtime. If the guest would have advertised the persistent
grant feature, there would be a xenstore entry like:

  /local/domain/20674/device/vbd/51792/feature-persistent = "1"

Persistent grants are also used when the guest tries to access the disk
which can be seen when enabling log stats:

  $ echo 1 > /sys/module/xen_blkback/parameters/log_stats
  $ dmesg
  xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f0 |  ds  
  0 | pg:1/1056

The "pg: 1/1056" shows that one persistent grant is used.

Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
of persistent grants") vbd->feature_gnt_persistent was set in
connect_ring. After the commit it was intended to be initialized in
xen_vbd_create and then set according to the guest feature availability
in connect_ring. However, with a running guest, connect_ring might be
called before xen_vbd_create and vbd->feature_gnt_persistent will be
incorrectly initialized. xen_vbd_create will overwrite it with the value
of feature_persistent regardless whether the guest actually supports
persistent grants.

With this commit, vbd->feature_gnt_persistent is set only in
connect_ring and this is the only use of the module parameter
feature_persistent. This avoids races when the module parameter changes
during the block attachment process.

Note that vbd->feature_gnt_persistent doesn't need to be initialized in
xen_vbd_create. It's next use is in connect which can only be called
once connect_ring has initialized the rings. xen_update_blkif_status is
checking for this.

Please also note that this commit makes a behavioral change of the
parameter.  That is, the parameter made effect on only newly created
backends before this commit, but it makes the effect for every new
connection after this commit.  Therefore, this commit also updates the
document.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +-
 drivers/block/xen-blkback/xenbus.c | 9 +++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 7faf719af165..fac0f429a869 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -42,5 +42,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created backends.
+that this option only takes effect on newly connected backends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..874b846fb622 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -520,8 +520,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -1087,10 +1085,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   if (blkif->vbd.feature_gnt_persistent)
-   blkif->vbd.feature_gnt_persistent =
-   xenbus_read_unsigned(dev->otherend,
-   "feature-persistent", 0);
+
+   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.25.1




Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
Hi all,

On Fri, 15 Jul 2022 17:55:19 + SeongJae Park  wrote:

> The first patch of this patchset fixes 'feature_persistent' parameter
> handling in 'blkback' to respect the frontend's persistent grants
> support always.  The fix makes a behavioral change, so the second patch
> makes the counterpart of 'blkfront' to consistently follow the behavior
> change.

I made the behavior change as requested by Andrii[1].  I therefore made similar
behavior change to blkfront and Cc-ed stable for the second change, too.

To make the change history clear and reduce the stable side overhead, however,
it might be better to apply the v2, which don't make behavior change but only
fix the issue, Cc stable@ for it, make the behavior change commits for both
blkback and blkfront, update the documents, and don't Cc stable@ for the
behavior change and documents update commits.

One downside of that would be that it will make a behavioral difference in
pre-5.19.x and post-5.19.x.

I think both downsides are not critical, so posted this patchset in this shape.
If anyone prefer some changes, please let me know, though.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/


Thanks,
SJ

> 
> Changes from v2
> (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
> - Keep the behavioral change of v1
> - Update blkfront's counterpart to follow the changed behavior
> - Update documents for the changed behavior
> 
> Changes from v1
> (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
> - Avoid the behavioral change
>   (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
> - Rebase on latest xen/tip/linux-next
> - Re-work by SeongJae Park 
> - Cc stable@
> 
> 
> 
> Maximilian Heyne (1):
>   xen, blkback: fix persistent grants negotiation
> 
> SeongJae Park (1):
>   xen-blkfront: Apply 'feature_persistent' parameter when connect
> 
>  Documentation/ABI/testing/sysfs-driver-xen-blkback  | 2 +-
>  Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
>  drivers/block/xen-blkback/xenbus.c  | 9 +++--
>  drivers/block/xen-blkfront.c| 4 +---
>  4 files changed, 6 insertions(+), 11 deletions(-)
> 
> -- 
> 2.25.1



Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change

2022-07-15 Thread SeongJae Park
Hi all,

On Fri, 15 Jul 2022 18:12:26 + SeongJae Park  wrote:

> Hi all,
> 
> On Fri, 15 Jul 2022 17:55:19 + SeongJae Park  wrote:
> 
> > The first patch of this patchset fixes 'feature_persistent' parameter
> > handling in 'blkback' to respect the frontend's persistent grants
> > support always.  The fix makes a behavioral change, so the second patch
> > makes the counterpart of 'blkfront' to consistently follow the behavior
> > change.
> 
> I made the behavior change as requested by Andrii[1].  I therefore made 
> similar
> behavior change to blkfront and Cc-ed stable for the second change, too.

Now I realize that commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") introduced two issues.  One is what Max
reported with his patch, and the second one is unintended behavioral change
that broke Andrii's use case.

That is, Andrii's use case should had no problem at all before the introduction
of 'feature_persistent', as at that time 'blkback' checked if the frontend
support the persistent grants for every 'reconnect()' and enables it if so.
However, introduction of the parameter made it behaves differently.

Yes, we intended to make the prameter to make effects to newly created devices.
But, as it breaks user workflows, this should be fixed.  Same for the
'blkfront' side 'feature_persistent'.

> 
> To make the change history clear and reduce the stable side overhead, however,
> it might be better to apply the v2, which don't make behavior change but only
> fix the issue, Cc stable@ for it, make the behavior change commits for both
> blkback and blkfront, update the documents, and don't Cc stable@ for the
> behavior change and documents update commits.

I'd say having one patch for each issue would be the right way to go, and all
fixes should Cc stable@.

> 
> One downside of that would be that it will make a behavioral difference in
> pre-5.19.x and post-5.19.x.

The unintended behavioral fix should also be considered fix and therefore
should be merged into stable@, so above concern is not valid.

I will send the next spin soon.


Thanks,
SJ

[...]



[PATCH v4 2/3] xen-blkback: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
From: Maximilian Heyne 

In some use cases[1], the backend is created while the frontend doesn't
support the persistent grants feature, but later the frontend can be
changed to support the feature and reconnect.  In the past, 'blkback'
enabled the persistent grants feature since it unconditionally checked
if frontend supports the persistent grants feature for every connect
('connect_ring()') and decided whether it should use persistent grans or
not.

However, commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has mistakenly changed the behavior.
It made the frontend feature support check to not be repeated once it
shown the 'feature_persistent' as 'false', or the frontend doesn't
support persistent grants.

This commit changes the behavior of the parameter to make effect for
every connect, so that the previous workflow can work again as expected.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/

Reported-by: Andrii Chepurnyi 
Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +-
 drivers/block/xen-blkback/xenbus.c | 9 +++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 7faf719af165..fac0f429a869 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -42,5 +42,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created backends.
+that this option only takes effect on newly connected backends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 16c6785d260c..ee7ad2fb432d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -186,8 +186,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-   blkif->vbd.feature_gnt_persistent = feature_persistent;
-
return blkif;
 }
 
@@ -1086,10 +1084,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   if (blkif->vbd.feature_gnt_persistent)
-   blkif->vbd.feature_gnt_persistent =
-   xenbus_read_unsigned(dev->otherend,
-   "feature-persistent", 0);
+
+   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.25.1




[PATCH v4 3/3] xen-blkfront: Apply 'feature_persistent' parameter when connect

2022-07-15 Thread SeongJae Park
In some use cases[1], the backend is created while the frontend doesn't
support the persistent grants feature, but later the frontend can be
changed to support the feature and reconnect.  In the past, 'blkback'
enabled the persistent grants feature since it unconditionally checked
if frontend supports the persistent grants feature for every connect
('connect_ring()') and decided whether it should use persistent grans or
not.

However, commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has mistakenly changed the behavior.
It made the frontend feature support check to not be repeated once it
shown the 'feature_persistent' as 'false', or the frontend doesn't
support persistent grants.

Similar behavioral change has made on 'blkfront' by commit 74a852479c68
("xen-blkfront: add a parameter for disabling of persistent grants").
This commit changes the behavior of the parameter to make effect for
every connect, so that the previous behavior of 'blkfront' can be
restored.

[1] 
https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/

Fixes: 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: SeongJae Park 
---
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 drivers/block/xen-blkfront.c| 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 7f646c58832e..4d36c5a10546 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -15,5 +15,5 @@ KernelVersion:  5.10
 Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
-that this option only takes effect on newly created frontends.
+that this option only takes effect on newly connected 
frontends.
 The default is Y (enable).
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 3646c0cae672..4e763701b372 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev,
info->vdevice = vdevice;
info->connected = BLKIF_STATE_DISCONNECTED;
 
-   info->feature_persistent = feature_persistent;
-
/* Front end dir is a number, which is used as the id. */
info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0);
dev_set_drvdata(&dev->dev, info);
@@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
blkfront_setup_discard(info);
 
-   if (info->feature_persistent)
+   if (feature_persistent)
info->feature_persistent =
!!xenbus_read_unsigned(info->xbdev->otherend,
   "feature-persistent", 0);
-- 
2.25.1




[PATCH v4 1/3] xen-blkback: fix persistent grants negotiation

2022-07-15 Thread SeongJae Park
Persistent grants feature can be used only when both backend and the
frontend supports the feature.  The feature was always supported by
'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for
disabling of persistent grants") has introduced a parameter for
disabling it runtime.

To avoid the parameter be updated while being used by 'blkback', the
commit caches the parameter into 'vbd->feature_gnt_persistent' in
'xen_vbd_create()', and then check if the guest also supports the
feature and finally updates the field in 'connect_ring()'.

However, 'connect_ring()' could be called before 'xen_vbd_create()', so
later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to
'vbd->feature_gnt_persistent'.  As a result, 'blkback' could try to use
'persistent grants' feature even if the guest doesn't support the
feature.

This commit fixes the issue by moving the parameter value caching to
'xen_blkif_alloc()', which allocates the 'blkif'.  Because the struct
embeds 'vbd' object, which will be used by 'connect_ring()' later, this
should be called before 'connect_ring()' and therefore this should be
the right and safe place to do the caching.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Cc:  # 5.10.x
Signed-off-by: Maximilian Heyne 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 97de13b14175..16c6785d260c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
return 0;
 }
 
+/* Enable the persistent grants feature. */
+static bool feature_persistent = true;
+module_param(feature_persistent, bool, 0644);
+MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature");
+
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
@@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
__module_get(THIS_MODULE);
INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
+   blkif->vbd.feature_gnt_persistent = feature_persistent;
+
return blkif;
 }
 
@@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd)
vbd->bdev = NULL;
 }
 
-/* Enable the persistent grants feature. */
-static bool feature_persistent = true;
-module_param(feature_persistent, bool, 0644);
-MODULE_PARM_DESC(feature_persistent,
-   "Enables the persistent grants feature");
-
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
  unsigned major, unsigned minor, int readonly,
  int cdrom)
@@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (bdev_max_secure_erase_sectors(bdev))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
-- 
2.25.1




[PATCH v4 0/3] xen-blk{back,front}: Fix two bugs in 'feature_persistent'

2022-07-15 Thread SeongJae Park
Introduction of 'feature_persistent' made two bugs.  First one is wrong
overwrite of 'vbd->feature_gnt_persistent' in 'blkback' due to wrong
parameter value caching position, and the second one is unintended
behavioral change that could break previous dynamic frontend/backend
persistent feature support changes.  This patchset fixes the issues.

Changes from v3
(https://lore.kernel.org/xen-devel/20220715175521.126649-1...@kernel.org/)
- Split 'blkback' patch for each of the two issues
- Add 'Reported-by: Andrii Chepurnyi '

Changes from v2
(https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
- Keep the behavioral change of v1
- Update blkfront's counterpart to follow the changed behavior
- Update documents for the changed behavior

Changes from v1
(https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
- Avoid the behavioral change
  (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
- Rebase on latest xen/tip/linux-next
- Re-work by SeongJae Park 
- Cc stable@

Maximilian Heyne (1):
  xen-blkback: Apply 'feature_persistent' parameter when connect

SeongJae Park (2):
  xen-blkback: fix persistent grants negotiation
  xen-blkfront: Apply 'feature_persistent' parameter when connect

 .../ABI/testing/sysfs-driver-xen-blkback  |  2 +-
 .../ABI/testing/sysfs-driver-xen-blkfront |  2 +-
 drivers/block/xen-blkback/xenbus.c| 20 ---
 drivers/block/xen-blkfront.c  |  4 +---
 4 files changed, 11 insertions(+), 17 deletions(-)

-- 
2.25.1




[PATCH v2] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent

2022-04-20 Thread SeongJae Park
SeongJae is currently listed as a contact point for some blk{back,front}
features, but he will not work for XEN for a while.  This commit
therefore updates the contact point to his colleague, Maximilian, who is
understanding the context and actively working with the features now.

Signed-off-by: SeongJae Park 
Signed-off-by: Maximilian Heyne 
Acked-by: Roger Pau Monné 
---

Changes from v1
(https://lore.kernel.org/xen-devel/20220301144628.2858-1...@kernel.org/)
- Add Acked-by from Roger

 Documentation/ABI/testing/sysfs-driver-xen-blkback  | 4 ++--
 Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index a74dfe52dd76..7faf719af165 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -29,7 +29,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
 Date:   December 2019
 KernelVersion:  5.6
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 When memory pressure is reported to blkback this option
 controls the duration in milliseconds that blkback will not
@@ -39,7 +39,7 @@ Description:
 What:   /sys/module/xen_blkback/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created backends.
diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront 
b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
index 61fd173fabfe..7f646c58832e 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront
@@ -12,7 +12,7 @@ Description:
 What:   /sys/module/xen_blkfront/parameters/feature_persistent
 Date:   September 2020
 KernelVersion:  5.10
-Contact:SeongJae Park 
+Contact:Maximilian Heyne 
 Description:
 Whether to enable the persistent grants feature or not.  Note
 that this option only takes effect on newly created frontends.
-- 
2.25.1




[Xen-devel] [PATCH] xen/blkback: Avoid unmapping unmapped grant pages

2019-11-26 Thread SeongJae Park
From: SeongJae Park 

For each I/O request, blkback first maps the foreign pages for the
request to its local pages.  If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.

However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped.  Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure.  On a
system that fails the allocation frequently, this problem leads to
following kernel crash.

  [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 
0001
  [  372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
  [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
  [  372.012562] Oops: 0002 [#1] SMP
  [  372.012566] Modules linked in: act_police sch_ingress cls_u32
  ...
  [  372.012746] Call Trace:
  [  372.012752]  [] gnttab_unmap_refs+0x34/0x40
  [  372.012759]  [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
  ...
  [  372.012802]  [] dispatch_rw_block_io+0x970/0x980 
[xen_blkback]
  ...
  Decompressing Linux... Parsing ELF... done.
  Booting the kernel.
  [0.00] Initializing cgroup subsys cpuset

This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.

Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

Signed-off-by: SeongJae Park 
Reviewed-by: David Woodhouse 
Reviewed-by: Maximilian Heyne 
Reviewed-by: Paul Durrant 
---
 drivers/block/xen-blkback/blkback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..3666afa639d1 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -936,6 +936,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 out_of_memory:
pr_alert("%s: out of memory\n", __func__);
put_free_pages(ring, pages_to_gnt, segs_to_map);
+   for (i = last_map; i < num; i++)
+   pages[i]->handle = BLKBACK_INVALID_HANDLE;
return -ENOMEM;
 }
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected

2019-12-04 Thread SeongJae Park
isible
performance degradation.  I think this is due to the slow speed of the
I/O.  In other words, the additional page allocation overhead is hidden
under the much slower I/O time.

SeongJae Park (2):
  xen/blkback: Aggressively shrink page pools if a memory pressure is
detected
  blkback: Add a module parameter for aggressive pool shrinking duration

 drivers/block/xen-blkback/blkback.c | 35 +++--
 1 file changed, 33 insertions(+), 2 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected

2019-12-04 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching arbitrarily large number of block
devices and inducing I/O.  This commit avoids such problematic
situations by shrinking the pools aggressively (further the limit) for a
while (one millisecond) if a memory pressure is detected.

Discussions
===

The shrinking mechanism returns only pages in the pool which are not
currently be used by blkback.  In other words, the pages that will be
shrunk are not mapped with foreign pages.  Because this commit is
changing only the shrink limit but uses the shrinking mechanism as is,
this commit does not introduce security issues such as improper
unmappings.

This commit keeps the aggressive shrinking limit for one milisecond from
last memory pressure detected time.  The duration should be neither too
short nor too long.  If it is too long, free pages pool shrinking
overhead can reduce the I/O performance.  If it is too short, blkback
will not free enough pages to reduce the memory pressure.  I believe
that one millisecond is a short duration in terms of I/O while it is a
long duration in terms of memory operations.  Also, as the original
shrinking mechanism works for every 100 milliseconds, this 1 millisecond
could be a somewhat reasonable choice.  Also, this duration worked well
for our testing environment simulating the memory pressure situation
(will be described in detail below).

Memory Pressure Test


To show whether this commit fixes the above mentioned memory pressure
situation well, I configured a test environment.  On the `blkfront`
running guest instances of a virtualized environment, I attach
arbitrarily large number of network-backed volume devices and induce I/O
to those.  Meanwhile, I measure the number of pages that swapped in and
out on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.

Roughly speaking, this commit has reduced those numbers 130x (pswpin)
and 34x (pswpout) as below:

pswpin  pswpout
before  76,672  185,799
after  5875,402

Performance Overhead Test
=

This commit could incur I/O performance degradation under memory
pressure because the aggressive shrinking will require more page
allocations.  To show the overhead, I artificially made an aggressive
pages pool shrinking situation and measured the I/O performance of a
`blkfront` running guest.

For the artificial shrinking, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` incurs the worst-case aggressive shrinking stress.

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
---

[dom0]# echo 1024 >  /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s

Worst-case Performance
--

[dom0]# echo 0 >  /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s

In short, even worst case aggressive pools shrinking makes no visible
performance degradation.  I think this is due to the slow speed of the
I/O.  In other words, the additional page allocation overhead is hidden
under the much sl

[Xen-devel] [PATCH 2/2] blkback: Add a module parameter for aggressive pool shrinking duration

2019-12-04 Thread SeongJae Park
From: SeongJae Park 

As discussed by the previous commit ("xen/blkback: Aggressively shrink
page pools if a memory pressure is detected"), the aggressive pool
shrinking duration should be carefully selected:
``If it is too long, free pages pool shrinking overhead can reduce the
I/O performance.  If it is too short, blkback will not free enough pages
to reduce the memory pressure.``

That said, the proper duration would depends on given configurations and
workloads.  For the reason, this commit allows users to set it via a
module parameter interface.

Signed-off-by: SeongJae Park 
Suggested-by: Amit Shah 
---
 drivers/block/xen-blkback/blkback.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index aa1a127093e5..88c011300ee9 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -137,9 +137,13 @@ module_param(log_stats, int, 0644);
 
 /*
  * Once a memory pressure is detected, keep aggressive shrinking of the free
- * page pools for this time (msec)
+ * page pools for this time (milliseconds)
  */
-#define AGGRESSIVE_SHRINKING_DURATION  1
+static int xen_blkif_aggressive_shrinking_duration = 1;
+module_param_named(aggressive_shrinking_duration,
+   xen_blkif_aggressive_shrinking_duration, int, 0644);
+MODULE_PARM_DESC(aggressive_shrinking_duration,
+"Duration to do aggressive shrinking when a memory pressure is detected");
 
 static unsigned long xen_blk_mem_pressure_end;
 
@@ -147,7 +151,7 @@ static unsigned long blkif_shrink_count(struct shrinker 
*shrinker,
struct shrink_control *sc)
 {
xen_blk_mem_pressure_end = jiffies +
-   msecs_to_jiffies(AGGRESSIVE_SHRINKING_DURATION);
+   msecs_to_jiffies(xen_blkif_aggressive_shrinking_duration);
return 0;
 }
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

2019-12-09 Thread SeongJae Park
Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_aggressive_shrinking_v3


Patch History
-

Changes from v2 
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity (aggressive
   shrinking -> squeezing)

Changes from v1 
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily` (suggested
   by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (1):
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c | 35 +++--
 1 file changed, 33 insertions(+), 2 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-09 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool, which are not currently be used by `blkback`, to the system.  In
other words, the pages are not mapped with foreign pages.  Because this
commit is changing only the shrink limit but uses the mechanism as is,
this commit does not introduce improper mappings related security
issues.

Once a memory pressure is detected, this commit keeps the squeezing
limit for a user-specified time duration.  The duration should be
neither too long nor too short.  If it is too long, the squeezing
incurring overhead can reduce the I/O performance.  If it is too short,
`blkback` will not free enough pages to reduce the memory pressure.
This commit sets the value as `10 milliseconds` by default because it is
a short time in terms of I/O while it is a long time in terms of memory
operations.  Also, as the original shrinking mechanism works for at
least every 100 milliseconds, this could be a somewhat reasonable
choice.  I also tested other durations (refer to the below section for
more details) and confirmed that 10 milliseconds is the one that works
best with the test.  That said, the proper duration depends on actual
configurations and workloads.  That's why this commit is allowing users
to set it as their optimal value via the module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in and out on the `blkback`
running guest.  The test ran twice, once for the `blkback` before this
commit and once for that after this commit.  As shown below, this commit
has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` is same to a situation doing the squeezing always
(worst-case).

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
---

[dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
131072+0 records in
131072+0 records out
5368709

Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

2019-12-09 Thread SeongJae Park
On   Mon, 9 Dec 2019 10:39:02 +0100  Juergen  wrote:

>On 09.12.19 09:58, SeongJae Park wrote:
>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>> the pool starts from zero and be increased on demand while processing
>> the I/O requests.  If current I/O requests handling is finished or 100
>> milliseconds has passed since last I/O requests handling, it checks and
>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>
>> Therefore, `blkfront` running guests can cause a memory pressure in the
>> `blkback` running guest by attaching a large number of block devices and
>> inducing I/O.
>
>I'm having problems to understand how a guest can attach a large number
>of block devices without those having been configured by the host admin
>before.
>
>If those devices have been configured, dom0 should be ready for that
>number of devices, e.g. by having enough spare memory area for ballooned
>pages.

As mentioned in the original message as below, administrators _can_ avoid this
problem, but finding the optimal configuration is hard, especially if the
number of the guests is large.

System administrators can avoid such problematic situations by limiting
the maximum number of devices each guest can attach.  However, finding
the optimal limit is not so easy.  Improper set of the limit can
    results in the memory pressure or a resource underutilization.


Thanks,
SeongJae Park

>
>So either I'm missing something here or your reasoning for the need of
>the patch is wrong.
>
>
>Juergen
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

2019-12-09 Thread SeongJae Park
On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß"  wrote:

>On 09.12.19 10:46, Durrant, Paul wrote:
>>> -Original Message-
>>> From: Jürgen Groß 
>>> Sent: 09 December 2019 09:39
>>> To: Park, Seongjae ; ax...@kernel.dk;
>>> konrad.w...@oracle.com; roger@citrix.com
>>> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Durrant,
>>> Paul ; sj38.p...@gmail.com; xen-
>>> de...@lists.xenproject.org
>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>>> pressure
>>>
>>> On 09.12.19 09:58, SeongJae Park wrote:
>>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>>> the pool starts from zero and be increased on demand while processing
>>>> the I/O requests.  If current I/O requests handling is finished or 100
>>>> milliseconds has passed since last I/O requests handling, it checks and
>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>>
>>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>>> `blkback` running guest by attaching a large number of block devices and
>>>> inducing I/O.
>>>
>>> I'm having problems to understand how a guest can attach a large number
>>> of block devices without those having been configured by the host admin
>>> before.
>>>
>>> If those devices have been configured, dom0 should be ready for that
>>> number of devices, e.g. by having enough spare memory area for ballooned
>>> pages.
>>>
>>> So either I'm missing something here or your reasoning for the need of
>>> the patch is wrong.
>>>
>>
>> I think the underlying issue is that persistent grant support is hogging 
>> memory in the backends, thereby compromising scalability. IIUC this patch is 
>> essentially a band-aid to get back to the scalability that was possible 
>> before persistent grant support was added. Ultimately the right answer 
>> should be to get rid of persistent grants support and use grant copy, but 
>> such a change is clearly more invasive and would need far more testing.
>
>Persistent grants are hogging ballooned pages, which is equivalent to
>memory only in case of the backend's domain memory being equal or
>rather near to its max memory size.
>
>So configuring the backend domain with enough spare area for ballooned
>pages should make this problem much less serious.
>
>Another problem in this area is the amount of maptrack frames configured
>for a driver domain, which will limit the number of concurrent foreign
>mappings of that domain.

Right, similar problems from other backends are possible.

>
>So instead of having a blkback specific solution I'd rather have a
>common callback for backends to release foreign mappings in order to
>enable a global resource management.

This patch is also based on a common callback, namely the shrinker callback
system.  As the shrinker callback is designed for the general memory pressure
handling, I thought this is a right one to use.  Other backends having similar
problems can use this in their way.


Thanks,
SeongJae Park


>
>
>Juergen
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

2019-12-09 Thread SeongJae Park
On Mon, 9 Dec 2019 12:08:10 +0100 "Jürgen Groß"  wrote:

>On 09.12.19 11:52, SeongJae Park wrote:
>> On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß"  wrote:
>>
>>> On 09.12.19 10:46, Durrant, Paul wrote:
>>>>> -Original Message-
>>>>> From: Jürgen Groß 
>>>>> Sent: 09 December 2019 09:39
>>>>> To: Park, Seongjae ; ax...@kernel.dk;
>>>>> konrad.w...@oracle.com; roger@citrix.com
>>>>> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Durrant,
>>>>> Paul ; sj38.p...@gmail.com; xen-
>>>>> de...@lists.xenproject.org
>>>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>>>>> pressure
>>>>>
>>>>> On 09.12.19 09:58, SeongJae Park wrote:
>>>>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>>>>> the pool starts from zero and be increased on demand while processing
>>>>>> the I/O requests.  If current I/O requests handling is finished or 100
>>>>>> milliseconds has passed since last I/O requests handling, it checks and
>>>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>>>>
>>>>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>>>>> `blkback` running guest by attaching a large number of block devices and
>>>>>> inducing I/O.
>>>>>
>>>>> I'm having problems to understand how a guest can attach a large number
>>>>> of block devices without those having been configured by the host admin
>>>>> before.
>>>>>
>>>>> If those devices have been configured, dom0 should be ready for that
>>>>> number of devices, e.g. by having enough spare memory area for ballooned
>>>>> pages.
>>>>>
>>>>> So either I'm missing something here or your reasoning for the need of
>>>>> the patch is wrong.
>>>>>
>>>>
>>>> I think the underlying issue is that persistent grant support is hogging 
>>>> memory in the backends, thereby compromising scalability. IIUC this patch 
>>>> is essentially a band-aid to get back to the scalability that was possible 
>>>> before persistent grant support was added. Ultimately the right answer 
>>>> should be to get rid of persistent grants support and use grant copy, but 
>>>> such a change is clearly more invasive and would need far more testing.
>>>
>>> Persistent grants are hogging ballooned pages, which is equivalent to
>>> memory only in case of the backend's domain memory being equal or
>>> rather near to its max memory size.
>>>
>>> So configuring the backend domain with enough spare area for ballooned
>>> pages should make this problem much less serious.
>>>
>>> Another problem in this area is the amount of maptrack frames configured
>>> for a driver domain, which will limit the number of concurrent foreign
>>> mappings of that domain.
>>
>> Right, similar problems from other backends are possible.
>>
>>>
>>> So instead of having a blkback specific solution I'd rather have a
>>> common callback for backends to release foreign mappings in order to
>>> enable a global resource management.
>>
>> This patch is also based on a common callback, namely the shrinker callback
>> system.  As the shrinker callback is designed for the general memory pressure
>> handling, I thought this is a right one to use.  Other backends having 
>> similar
>> problems can use this in their way.
>
> But this is addressing memory shortage only and it is acting globally.
>
> What I'd like to have in some (maybe distant) future is a way to control
> resource usage per guest. Why would you want to throttle performance of
> all guests instead of only the one causing the pain by hogging lots of
> resources?

Good point.  I was also concerned about the performance fairness at first, but
settled in this ugly but simple solution mainly because my worst-case
performance test (detailed in 1st patch's commit msg) shows no visible
performance degradation, though it is a minimal test on my test environment.

Anyway, I agree with your future direction.

>
> The new backend callback should (IMO) have a domid as parameter for
> specifying which guest should be taken away resources (including the
> possibility to select "any d

[Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
a memory pressure and request specific domains of specific backend
drivers which causing the given pressure to voluntarily release its
memory.

That said, this commit simply requests every callback registered driver
to release its memory for every domain, rather than issueing the
requests to the drivers and domain in charge.  Such things would be a
future work.  Also, this commit focuses on memory only.  However, it
would be ablt to be extended for general resources.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..cd5fd1cd8de3 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int xenbus_backend_reclaim(struct device *dev, void *data)
+{
+   struct xenbus_driver *drv;
+   if (!dev->driver)
+   return -ENOENT;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   xenbus_backend_reclaim);
+   return 0;
+}
+
+static struct shrinker xenbus_backend_shrinker = {
+   .count_objects = xenbus_backend_shrink_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&xenbus_backend_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..52aaf4f78400 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v4 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-09 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool, which are not currently be used by `blkback`, to the system.  In
other words, the pages are not mapped with foreign pages.  Because this
commit is changing only the shrink limit but uses the mechanism as is,
this commit does not introduce improper mappings related security
issues.

Once a memory pressure is detected, this commit keeps the squeezing
limit for a user-specified time duration.  The duration should be
neither too long nor too short.  If it is too long, the squeezing
incurring overhead can reduce the I/O performance.  If it is too short,
`blkback` will not free enough pages to reduce the memory pressure.
This commit sets the value as `10 milliseconds` by default because it is
a short time in terms of I/O while it is a long time in terms of memory
operations.  Also, as the original shrinking mechanism works for at
least every 100 milliseconds, this could be a somewhat reasonable
choice.  I also tested other durations (refer to the below section for
more details) and confirmed that 10 milliseconds is the one that works
best with the test.  That said, the proper duration depends on actual
configurations and workloads.  That's why this commit is allowing users
to set it as their optimal value via the module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in and out on the `blkback`
running guest.  The test ran twice, once for the `blkback` before this
commit and once for that after this commit.  As shown below, this commit
has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` is same to a situation doing the squeezing always
(worst-case).

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
---

[dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
131072+0 records in
131072+0 records out
5368709

[Xen-devel] [PATCH v4 0/2] xenbus/backend: Add a memory pressure handler callback

2019-12-09 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lacks a
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v4


Patch History
-

Changes from v3 
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2 
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity (aggressive
   shrinking -> squeezing)

Changes from v1 
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily` (suggested
   by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (1):
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c | 35 +++--
 1 file changed, 33 insertions(+), 2 deletions(-)

SeongJae Park (2):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c   | 23 +++--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c|  3 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
On Tue, Dec 10, 2019 at 7:11 AM Jürgen Groß  wrote:
>
> On 09.12.19 20:43, SeongJae Park wrote:
> > From: SeongJae Park 
> >
> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lacks a
> > flexibility.
> >
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
> > a memory pressure and request specific domains of specific backend
> > drivers which causing the given pressure to voluntarily release its
> > memory.
> >
> > That said, this commit simply requests every callback registered driver
> > to release its memory for every domain, rather than issueing the
> > requests to the drivers and domain in charge.  Such things would be a
> > future work.  Also, this commit focuses on memory only.  However, it
> > would be ablt to be extended for general resources.
> >
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
> >   include/xen/xenbus.h  |  1 +
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..cd5fd1cd8de3 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   return NOTIFY_DONE;
> >   }
> >
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> > +{
> > + struct xenbus_driver *drv;
> > + if (!dev->driver)
> > + return -ENOENT;
> > + drv = to_xenbus_driver(dev->driver);
> > + if (drv && drv->reclaim)
> > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
> > + return 0;
> > +}
> > +
> > +/*
> > + * Returns 0 always because we are using shrinker to only detect memory
> > + * pressure.
> > + */
> > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
> > + struct shrink_control *sc)
> > +{
> > + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
> > + xenbus_backend_reclaim);
> > + return 0;
> > +}
> > +
> > +static struct shrinker xenbus_backend_shrinker = {
> > + .count_objects = xenbus_backend_shrink_count,
> > + .seeks = DEFAULT_SEEKS,
> > +};
> > +
> >   static int __init xenbus_probe_backend_init(void)
> >   {
> >   static struct notifier_block xenstore_notifier = {
> > @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
> >
> >   register_xenstore_notifier(&xenstore_notifier);
> >
> > + if (register_shrinker(&xenbus_backend_shrinker))
> > + pr_warn("shrinker registration failed\n");
> > +
> >   return 0;
> >   }
> >   subsys_initcall(xenbus_probe_backend_init);
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index 869c816d5f8c..52aaf4f78400 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -104,6 +104,7 @@ struct xenbus_driver {
> >   struct device_driver driver;
> >   int (*read_otherend_details)(struct xenbus_device *dev);
> >   int (*is_ready)(struct xenbus_device *dev);
> > + unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid);
>
> Can you please add a comment here regarding semantics of specifying
> DOMID_INVALID as domid?

Yes, of course.  Will do with the next version.


Thanks,
SeongJae Park

>
> Block maintainers, would you be fine with me carrying this series
> through the Xen tree?
>
>
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-09 Thread SeongJae Park
On Tue, Dec 10, 2019 at 7:23 AM Jürgen Groß  wrote:
>
> On 09.12.19 20:43, SeongJae Park wrote:
> > From: SeongJae Park 
> >
> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lacks a
> > flexibility.
> >
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
> > a memory pressure and request specific domains of specific backend
> > drivers which causing the given pressure to voluntarily release its
> > memory.
> >
> > That said, this commit simply requests every callback registered driver
> > to release its memory for every domain, rather than issueing the
> > requests to the drivers and domain in charge.  Such things would be a
> > future work.  Also, this commit focuses on memory only.  However, it
> > would be ablt to be extended for general resources.
> >
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
> >   include/xen/xenbus.h  |  1 +
> >   2 files changed, 32 insertions(+)
> >
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..cd5fd1cd8de3 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   return NOTIFY_DONE;
> >   }
> >
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> > +{
> > + struct xenbus_driver *drv;
> > + if (!dev->driver)
> > + return -ENOENT;
> > + drv = to_xenbus_driver(dev->driver);
> > + if (drv && drv->reclaim)
> > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID);
>
> Oh, sorry for first requesting you to add the domid as a parameter,
> but now I realize this could be handled in the xenbus driver, as
> struct xenbus_device already contains the otherend_id.
>
> Would you mind dropping the parameter again, please?

Oh, I also missed it!  Will do!


Thanks,
SeongJae Park

>
>
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-10 Thread SeongJae Park
ed, 13.8702 s, 38.7 MB/s

Worst-case Performance
--

[dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s

In short, even worst case squeezing makes no visible performance
degradation.  I think this is due to the slow speed of the I/O.  In
other words, the additional page allocation overhead is hidden under the
much slower I/O latency.

Nevertheless, pleaset note that this is just a very simple and minimal
test.

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 23 +--
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  |  3 ++-
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..4d4dba7ea721 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,22 @@ static inline bool persistent_gnt_timeout(struct 
persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
 }
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static int xen_blkif_buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   xen_blkif_buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long xen_blk_buffer_squeeze_end;
+
+unsigned xen_blkbk_reclaim(struct xenbus_device *dev)
+{
+   xen_blk_buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(xen_blkif_buffer_squeeze_duration_ms);
+   return 0;
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
@@ -656,8 +672,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, xen_blk_buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..c0334cda79fe 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+unsigned xen_blkbk_reclaim(struct xenbus_device *dev);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
  struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..de49a09e6933 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = {
.ids  = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
-   .otherend_changed = frontend_changed
+   .otherend_changed = frontend_changed,
+   .reclaim = xen_blkbk_reclaim
 };
 
 int xen_blkif_xenbus_init(void)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-10 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack a
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
a memory pressure and request specific devices of specific backend
drivers which causing the given pressure to voluntarily release its
memory.

That said, this commit simply requests every callback registered driver
to release its memory for every domain, rather than issueing the
requests to the drivers and the domain in charge.  Such things will be
done in a futur.  Also, this commit focuses on memory only.  However, it
would be ablt to be extended for general resources.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 32 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..5a5ba29e39df 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int xenbus_backend_reclaim(struct device *dev, void *data)
+{
+   struct xenbus_driver *drv;
+   if (!dev->driver)
+   return -ENOENT;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   xenbus_backend_reclaim);
+   return 0;
+}
+
+static struct shrinker xenbus_backend_shrinker = {
+   .count_objects = xenbus_backend_shrink_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&xenbus_backend_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..cdb075e4182f 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   unsigned (*reclaim)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v5 0/2] xenbus/backend: Add a memory pressure handler callback

2019-12-10 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack a
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v5


Patch History
-

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Jergen Gross)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity (aggressive
   shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily` (suggested
   by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (2):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c   | 23 +++--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c|  3 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-10 Thread SeongJae Park
On Tue, Dec 10, 2019 at 11:21 AM Roger Pau Monné  wrote:
>
> On Tue, Dec 10, 2019 at 11:16:35AM +0100, Roger Pau Monné wrote:
> > On Tue, Dec 10, 2019 at 08:06:27AM +0000, SeongJae Park wrote:
> > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > > index 869c816d5f8c..cdb075e4182f 100644
> > > --- a/include/xen/xenbus.h
> > > +++ b/include/xen/xenbus.h
> > > @@ -104,6 +104,7 @@ struct xenbus_driver {
> > > struct device_driver driver;
> > > int (*read_otherend_details)(struct xenbus_device *dev);
> > > int (*is_ready)(struct xenbus_device *dev);
> > > +   unsigned (*reclaim)(struct xenbus_device *dev);
> >
> > ... hence I wonder why it's returning an unsigned when it's just
> > ignored.
> >
> > IMO it should return an int to signal errors, and the return should be
> > ignored.
>
> Meant to write 'shouldn't be ignored' sorry.

Thanks for good opinions and comments!  I will apply your comments in the next
version.


Thanks,
SeongJae Park

>
> Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-10 Thread SeongJae Park
On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné"  
wrote:

> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lack a
> 
> s/lack a/lack/
> 
> > flexibility.
> > 
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  Using this facility, 'xenbus' would be able to monitor
> > a memory pressure and request specific devices of specific backend
> 
> s/monitor a/monitor/
> 
> > drivers which causing the given pressure to voluntarily release its
> 
> ...which are causing...
> 
> > memory.
> > 
> > That said, this commit simply requests every callback registered driver
> > to release its memory for every domain, rather than issueing the
> 
> s/issueing/issuing/
> 
> > requests to the drivers and the domain in charge.  Such things will be
> 
> I'm afraid I don't understand the "domain in charge" part of this
> sentence.
> 
> > done in a futur.  Also, this commit focuses on memory only.  However, it
> 
> ... done in a future change. Also I think the period after only should
> be removed in order to tie both sentences together.
> 
> > would be ablt to be extended for general resources.
> 
> s/ablt/able/
> 
> > 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
> >  include/xen/xenbus.h  |  1 +
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..5a5ba29e39df 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> > return NOTIFY_DONE;
> >  }
> >  
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> > +{
> > +   struct xenbus_driver *drv;
> 
> Newline and const.
> 
> > +   if (!dev->driver)
> > +   return -ENOENT;
> > +   drv = to_xenbus_driver(dev->driver);
> > +   if (drv && drv->reclaim)
> > +   drv->reclaim(to_xenbus_device(dev));
> 
> You seem to completely ignore the return of the reclaim hook...
> 
> > +   return 0;
> > +}
> > +
> > +/*
> > + * Returns 0 always because we are using shrinker to only detect memory
> > + * pressure.
> > + */
> > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
> > +   struct shrink_control *sc)
> > +{
> > +   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
> > +   xenbus_backend_reclaim);
> > +   return 0;
> > +}
> > +
> > +static struct shrinker xenbus_backend_shrinker = {
> > +   .count_objects = xenbus_backend_shrink_count,
> > +   .seeks = DEFAULT_SEEKS,
> > +};
> > +
> >  static int __init xenbus_probe_backend_init(void)
> >  {
> > static struct notifier_block xenstore_notifier = {
> > @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void)
> >  
> > register_xenstore_notifier(&xenstore_notifier);
> >  
> > +   if (register_shrinker(&xenbus_backend_shrinker))
> > +   pr_warn("shrinker registration failed\n");
> > +
> > return 0;
> >  }
> >  subsys_initcall(xenbus_probe_backend_init);
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index 869c816d5f8c..cdb075e4182f 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -104,6 +104,7 @@ struct xenbus_driver {
> > struct device_driver driver;
> > int (*read_otherend_details)(struct xenbus_device *dev);
> > int (*is_ready)(struct xenbus_device *dev);
> > +   unsigned (*reclaim)(struct xenbus_device *dev);
> 
> ... hence I wonder why it's returning an unsigned when it's just
> ignored.
> 
> IMO it should return an int to signal errors, and the return should be
> ignored.

I first thought similarly and set the callback to return something.  However,
as this callback is called to simply notify the memory pressure and ask the
drive

Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-10 Thread SeongJae Park
t; after  2123,325
> > 
> > Optimal Aggressive Shrinking Duration
> > -
> > 
> > To find a best squeezing duration, I repeated the test with three
> > different durations (1ms, 10ms, and 100ms).  The results are as below:
> > 
> > durationpswpin  pswpout
> > 1   852 6,424
> > 10  212 3,325
> > 100 203 3,340
> > 
> > As expected, the memory pressure has decreased as the duration is
> > increased, but the reduction stopped from the `10ms`.  Based on this
> > results, I chose the default duration as 10ms.
> > 
> > Performance Overhead Test
> > =
> > 
> > This commit could incur I/O performance degradation under severe memory
> > pressure because the squeezing will require more page allocations per
> > I/O.  To show the overhead, I artificially made a worst-case squeezing
> > situation and measured the I/O performance of a `blkfront` running
> > guest.
> > 
> > For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
> > the value to `1024` and `0`.  The `1024` is the default value.  Setting
> > the value as `0` is same to a situation doing the squeezing always
> > (worst-case).
> > 
> > For the I/O performance measurement, I use a simple `dd` command.
> > 
> > Default Performance
> > ---
> > 
> > [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
> > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
> > count=$((256*512)); sync; done
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s
> > 
> > Worst-case Performance
> > --
> > 
> > [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages
> > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k 
> > count=$((256*512)); sync; done
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s
> > 131072+0 records in
> > 131072+0 records out
> > 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s
> > 
> > In short, even worst case squeezing makes no visible performance
> > degradation.
> 
> I would argue that with a ~40MB/s throughput you won't see any
> performance difference at all regardless of the size of the pool of
> free pages or the amount of persistent grants because the bottleneck is
> on the storage performance itself.
> 
> You need to test this using nullblk or some kind of fast storage, or
> else the above figures are not going to reflect any changes you make
> because they are hidden by the poor performance of the underlying
> storage.

Yes, agree that.  My test is just a minimal check for my environment.  I will
note the points and concerns in the commit message.

> 
> > I think this is due to the slow speed of the I/O.  In
> > other words, the additional page allocation overhead is hidden under the
> > much slower I/O latency.
> > 
> > Nevertheless, pleaset note that this is just a very simple and minimal
> > test.
> 
> I would like to add that IMO this is papering over an existing issue,
> which is how pages to be used to map grants are allocated. Grant
> mappings _shouldn't_ consume RAM pages in the first place, and IIRC
> the fact that they do is because Linux balloons out memory in order to
> re-use those pages to map grants and have a valid page struct.
> 
> A way to solve this would be to hot

[Xen-devel] [PATCH v6 0/2] xenbus/backend: Add a memory pressure handler callback

2019-12-10 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third patch is a trivial cleanup of
variable names.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v6


Patch History
-

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (2):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c   | 23 +++--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c|  3 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback

2019-12-10 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..aedbe2198de5 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int xenbus_backend_reclaim(struct device *dev, void *data)
+{
+   struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim)
+   drv->reclaim(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   xenbus_backend_reclaim);
+   return 0;
+}
+
+static struct shrinker xenbus_backend_shrinker = {
+   .count_objects = xenbus_backend_shrink_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&xenbus_backend_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..196260017666 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-10 Thread SeongJae Park
bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s

Worst-case Performance
--

[dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages
[instance]$ for i in {1..5}; do dd if=/dev/zero of=file \
   bs=4k count=$((256*512)); sync; done
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s

In short, even worst case squeezing makes no visible performance
degradation on this test machine.  I think this is due to the slow speed
of the I/O devices I used.  In other words, the additional page
allocation overhead is hidden under the much slower I/O latency.
Nevertheless, pleaset note that this is just a very simple and minimal
test using a slow block device.  On systems using fast block devices
such as ramdisks or NVMe SSDs, the results could be very different.  If
you are in such cases, you should control the squeezing duration via the
module parameter.

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 22 --
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  |  3 ++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..b493c306e84f 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
 }
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long buffer_squeeze_end;
+
+void xen_blkbk_reclaim(struct xenbus_device *dev)
+{
+   buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
@@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..8a3195d2dca7 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+void xen_blkbk_reclaim(struct xenbus_device *dev);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
  struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..b596c6e8b006 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = {
.ids  = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
-   .otherend_changed = frontend_changed
+   .otherend_changed = frontend_changed,
+   .reclaim = xen_blkbk_reclaim,
 };
 
 int xen_blkif_xenbus_init(void)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v6 3/3] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-10 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index b493c306e84f..f690373669b8 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
@@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
conti

Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback

2019-12-11 Thread SeongJae Park
On Wed, 11 Dec 2019 11:51:12 +0100 "Roger Pau Monné"  
wrote:

> > On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné"  
> > wrote:
> > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > > > index 869c816d5f8c..cdb075e4182f 100644
> > > > --- a/include/xen/xenbus.h
> > > > +++ b/include/xen/xenbus.h
> > > > @@ -104,6 +104,7 @@ struct xenbus_driver {
> > > > struct device_driver driver;
> > > > int (*read_otherend_details)(struct xenbus_device *dev);
> > > > int (*is_ready)(struct xenbus_device *dev);
> > > > +   unsigned (*reclaim)(struct xenbus_device *dev);
> > > 
> > > ... hence I wonder why it's returning an unsigned when it's just
> > > ignored.
> > > 
> > > IMO it should return an int to signal errors, and the return should be
> > > ignored.
> > 
> > I first thought similarly and set the callback to return something.  
> > However,
> > as this callback is called to simply notify the memory pressure and ask the
> > driver to free its memory as many as possible, I couldn't easily imagine 
> > what
> > kind of errors that need to be handled by its caller can occur in the 
> > callback,
> > especially because current blkback's callback implementation has no such 
> > error.
> > So, if you and others agree, I would like to simply set the return type to
> > 'void' for now and defer the error handling to a future change.
> 
> Yes, I also wondered the same, but seeing you returned an integer I
> assumed there was interest in returning some kind of value. If there's
> nothing to return let's just make it void.
> 
> > > 
> > > Also, I think it would preferable for this function to take an extra
> > > parameter to describe the resource the driver should attempt to free
> > > (ie: memory or interrupts for example). I'm however not able to find
> > > any existing Linux type to describe such resources.
> > 
> > Yes, such extention would be the right direction.  However, because there 
> > is no
> > existing Linux type to describe the type of resources to reclaim as you also
> > mentioned, there could be many different opinions about its implementation
> > detail.  In my opinion, it could be also possible to simply add another
> > callback for another resource type.  That said, because currently we have an
> > use case and an implementation for the memory pressure only, I would like to
> > let it as is for now and defer the extension as a future work, if you and
> > others have no objection.
> 
> Ack, can I please ask the callback to be named reclaim_memory or some
> such then?

Yes, I will change the name.


Thanks,
SeongJae Park

> 
> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-11 Thread SeongJae Park
On Wed, 11 Dec 2019 12:14:44 +0100 "Roger Pau Monné"  
wrote:

> 
> I see that you have already sent v6, for future iterations can you
> please wait until the conversation on the previous version has been
> settled?
> 
> I'm still replying to your replies to v5, and hence you should hold off
> sending v6 until we get some kind of conclusion/agreement.

Sorry, I was inpatient.

> 
> On Wed, Dec 11, 2019 at 05:08:12AM +0100, SeongJae Park wrote:
> > On Tue, 10 Dec 2019 12:04:32 +0100 "Roger Pau Monné"  
> > wrote:
> > 
> > > > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > > > the pool starts from zero and be increased on demand while processing
> > > > the I/O requests.  If current I/O requests handling is finished or 100
> > > > milliseconds has passed since last I/O requests handling, it checks and
> > > > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> > > > 
> > > > Therefore, `blkfront` running guests can cause a memory pressure in the
> > > > `blkback` running guest by attaching a large number of block devices and
> > > > inducing I/O.
> > > 
> > > Hm, I don't think this is actually true. blkfront cannot attach an
> > > arbitrary number of devices, blkfront is just a frontend for a device
> > > that's instantiated by the Xen toolstack, so it's the toolstack the one
> > > that controls the amount of PV block devices.
> > 
> > Right, the problem can occur only if it is mis-configured so that the 
> > frontend
> > running guests can attach a large number of devices which is enough to cause
> > the memory pressure.  I tried to explain it in below paragraph, but seems 
> > above
> > paragraph is a little bit confusing.  I will wordsmith the sentence in the 
> > next
> > version.
> 
> I would word it along these lines:
> 
> "Host administrators can cause memory pressure in blkback by attaching
> a large number of block devices and inducing I/O."

Hmm, much better :)

> 
> > > 
> > > > System administrators can avoid such problematic
> > > > situations by limiting the maximum number of devices each guest can
> > > > attach.  However, finding the optimal limit is not so easy.  Improper
> > > > set of the limit can results in the memory pressure or a resource
> > > > underutilization.  This commit avoids such problematic situations by
> > > > squeezing the pools (returns every free page in the pool to the system)
> > > > for a while (users can set this duration via a module parameter) if a
> > > > memory pressure is detected.
> > > > 
> > > > Discussions
> > > > ===
> > > > 
> > > > The `blkback`'s original shrinking mechanism returns only pages in the
> > > > pool, which are not currently be used by `blkback`, to the system.  In
> > > > other words, the pages are not mapped with foreign pages.  Because this
> > > ^ that   ^ granted
> > > > commit is changing only the shrink limit but uses the mechanism as is,
> > > > this commit does not introduce improper mappings related security
> > > > issues.
> > > 
> > > That last sentence is hard to parse. I think something like:
> > > 
> > > "Because this commit is changing only the shrink limit but still uses the
> > > same freeing mechanism it does not touch pages which are currently
> > > mapping grants."
> > > 
> > > > 
> > > > Once a memory pressure is detected, this commit keeps the squeezing
> > > > limit for a user-specified time duration.  The duration should be
> > > > neither too long nor too short.  If it is too long, the squeezing
> > > > incurring overhead can reduce the I/O performance.  If it is too short,
> > > > `blkback` will not free enough pages to reduce the memory pressure.
> > > > This commit sets the value as `10 milliseconds` by default because it is
> > > > a short time in terms of I/O while it is a long time in terms of memory
> > > > operations.  Also, as the original shrinking mechanism works for at
> > > > least every 100 milliseconds, this could be a somewhat reasonable
> > > > choice.  I also tested other durations (refer to the below section for
> > > > more details) and confirmed that 10 milliseconds is the one that works
> > > > best with the test.  That said, the pr

Re: [Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback

2019-12-11 Thread SeongJae Park
On Wed, 11 Dec 2019 12:46:51 +0100 "Roger Pau Monné"  
wrote:

> > Granting pages consumes backend system memory.  In systems configured
> > with insufficient spare memory for those pages, it can cause a memory
> > pressure situation.  However, finding the optimal amount of the spare
>   ^ s/the//
> > memory is challenging for large systems having dynamic resource
> > utilization patterns.  Also, such a static configuration might lack
> > flexibility.
> > 
> > To mitigate such problems, this commit adds a memory reclaim callback to
> > 'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
>^ s/a//
> > every backend driver to volunarily release its memory.
> > 
> > Note that it would be able to improve the callback facility for more
> ^ possible
> > sophisticated handlings of general pressures.  For example, it would be
> ^ handling of resource starvation.
> > possible to monitor the memory consumption of each device and issue the
> > release requests to only devices which causing the pressure.  Also, the
> > callback could be extended to handle not only memory, but general
> > resources.  Nevertheless, this version of the implementation defers such
> > sophisticated goals as a future work.
> > 
> > Reviewed-by: Juergen Gross 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
> >  include/xen/xenbus.h  |  1 +
> >  2 files changed, 33 insertions(+)
> > 
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index b0bed4faf44c..aedbe2198de5 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> > return NOTIFY_DONE;
> >  }
> >  
> > +static int xenbus_backend_reclaim(struct device *dev, void *data)
> 
> No need for the xenbus_ prefix since it's a static function, ie:
> backend_reclaim_memory should be fine IMO.

Agreed, will change the name in the next version.

> 
> > +{
> > +   struct xenbus_driver *drv;
> 
> I've asked for this variable to be constified in v5, is it not
> possible to make it const?

Sorry, my mistake...  I was difinitely too hurry.

> 
> > +
> > +   if (!dev->driver)
> > +   return 0;
> > +   drv = to_xenbus_driver(dev->driver);
> > +   if (drv && drv->reclaim)
> > +   drv->reclaim(to_xenbus_device(dev));
> > +   return 0;
> > +}
> > +
> > +/*
> > + * Returns 0 always because we are using shrinker to only detect memory
> > + * pressure.
> > + */
> > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker,
> > +   struct shrink_control *sc)
> > +{
> > +   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
> > +   xenbus_backend_reclaim);
> > +   return 0;
> > +}
> > +
> > +static struct shrinker xenbus_backend_shrinker = {
> 
> I would drop the xenbus prefix, and I think it's not possible to
> constify this due to register_shrinker expecting a non-const
> parameter?

Yes, constifying it results in another compile warning.  Will drop the prefix.

> 
> > +   .count_objects = xenbus_backend_shrink_count,
> > +   .seeks = DEFAULT_SEEKS,
> > +};
> > +
> >  static int __init xenbus_probe_backend_init(void)
> >  {
> > static struct notifier_block xenstore_notifier = {
> > @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
> >  
> > register_xenstore_notifier(&xenstore_notifier);
> >  
> > +   if (register_shrinker(&xenbus_backend_shrinker))
> > +   pr_warn("shrinker registration failed\n");
> 
> Can you add a xenbus prefix to the error message? Or else it's hard to
> know which subsystem is complaining when you see such message on the
> log. ie: "xenbus: shrinker ..."

Because we have #define `pr_fmt(fmt) KBUILD_MODNAME ": " fmt` in the beginning
of the file, the message will have a proper prefix.

> 
> > +
> > return 0;
> >  }
> >  subsys_initcall(xenbus_probe_backend_init);
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index 869c816d5f8c..196260017666 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -104,6 +104,7 @@ struct xenbus_driver {
> > struct device_driver driver;
> > int (*read_otherend_details)(struct xenbus_device *dev);
> > int (*is_ready)(struct xenbus_device *dev);
> > +   void (*reclaim)(struct xenbus_device *dev);
> 
> reclaim_memory (if Juergen agrees).

Okay.


Thanks,
SeongJae Park

> 
> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-11 Thread SeongJae Park
5
1024  38.7  45.8  38.7   40.12  3.1752165
No difference proven at 95.0% confidence

On the fast block device


max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine for your workload to find the
optimal squeezing duration for you.

[1] https://aws.amazon.com/ebs/
[2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 22 --
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  |  3 ++-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..98823d150905 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
 }
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long buffer_squeeze_end;
+
+void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
+{
+   buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
@@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..1e0df86cb941 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+void xen_blkbk_reclaim_memory(struct xenbus_device *dev);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
  struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..0477f910b018 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = {
.ids  = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
-   .otherend_changed = frontend_changed
+   .otherend_changed = frontend_changed,
+   .reclaim_memory = xen_blkbk_reclaim_memory,
 };
 
 int xen_blkif_xenbus_init(void)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 3/3] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-11 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 98823d150905..f41c698dd854 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
@@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
conti

[Xen-devel] [PATCH v7 0/2] xenbus/backend: Add a memory pressure handler callback

2019-12-11 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third patch is a trivial cleanup of
variable names.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v7


Patch History
-

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (2):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c   | 23 +++--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c|  3 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++
 include/xen/xenbus.h  |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v7 1/3] xenbus/backend: Add memory pressure handler callback

2019-12-11 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&backend_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-12 Thread SeongJae Park
On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné"  
wrote:

> 
> Please make sure you Cc me in blkback related patches.

Sorry for forgotting you!  I will never forget again.

> 
> On Wed, Dec 11, 2019 at 06:10:15PM +, SeongJae Park wrote:
> > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > the pool starts from zero and be increased on demand while processing
> ^ is
> > the I/O requests.  If current I/O requests handling is finished or 100
> > milliseconds has passed since last I/O requests handling, it checks and
> > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> > 
> > Therefore, host administrators can cause memory pressure in blkback by
> > attaching a large number of block devices and inducing I/O.  Such
> > problematic situations can be avoided by limiting the maximum number of
> > devices that can be attached, but finding the optimal limit is not so
> > easy.  Improper set of the limit can results in the memory pressure or a
>   ^ s/the//
> > resource underutilization.  This commit avoids such problematic
> > situations by squeezing the pools (returns every free page in the pool
> > to the system) for a while (users can set this duration via a module
> > parameter) if a memory pressure is detected.
> ^ s/a//
> > 
> > Discussions
> > ===
> > 
> > The `blkback`'s original shrinking mechanism returns only pages in the
> > pool, which are not currently be used by `blkback`, to the system.  In
> 
> I think you can remove both comas in the above sentence.
> 
> > other words, the pages that are not mapped with granted pages.  Because
> > this commit is changing only the shrink limit but still uses the same
> > freeing mechanism it does not touch pages which are currently mapping
> > grants.
> > 
> > Once a memory pressure is detected, this commit keeps the squeezing
>^ s/a//

Thank you for corrections, will apply!

> > limit for a user-specified time duration.  The duration should be
> > neither too long nor too short.  If it is too long, the squeezing
> > incurring overhead can reduce the I/O performance.  If it is too short,
> > `blkback` will not free enough pages to reduce the memory pressure.
> > This commit sets the value as `10 milliseconds` by default because it is
> > a short time in terms of I/O while it is a long time in terms of memory
> > operations.  Also, as the original shrinking mechanism works for at
> > least every 100 milliseconds, this could be a somewhat reasonable
> > choice.  I also tested other durations (refer to the below section for
> > more details) and confirmed that 10 milliseconds is the one that works
> > best with the test.  That said, the proper duration depends on actual
> > configurations and workloads.  That's why this commit allows users to
> > set the duration as a module parameter.
> > 
> > Memory Pressure Test
> > 
> > 
> > To show how this commit fixes the memory pressure situation well, I
> > configured a test environment on a xen-running virtualization system.
> > On the `blkfront` running guest instances, I attach a large number of
> > network-backed volume devices and induce I/O to those.  Meanwhile, I
> > measure the number of pages that swapped in (pswpin) and out (pswpout)
> > on the `blkback` running guest.  The test ran twice, once for the
> > `blkback` before this commit and once for that after this commit.  As
> > shown below, this commit has dramatically reduced the memory pressure:
> > 
> > pswpin  pswpout
> > before  76,672  185,799
> > after  2123,325
> > 
> > Optimal Aggressive Shrinking Duration
> > -
> > 
> > To find a best squeezing duration, I repeated the test with three
> > different durations (1ms, 10ms, and 100ms).  The results are as below:
> > 
> > durationpswpin  pswpout
> > 1   852 6,424
> > 10  212 3,325
> > 100 203 3,340
> > 
> > As expected, the memory pressure has decreased as the duration is
> > increased, but the reduction stopped from the `10ms`.  Based on this
> > results, I chose the default duration as 10ms.
> > 
> > Performance Overhead Test
> > =
> > 
> > This commit could incur I/O performance degradation under severe memory
> > pressure because the squeezing will require m

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-12 Thread SeongJae Park
On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné"  
wrote:

> > diff --git a/drivers/block/xen-blkback/blkback.c 
> > b/drivers/block/xen-blkback/blkback.c
> > index fd1e19f1a49f..98823d150905 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
> > persistent_gnt *persistent_gnt)
> > HZ * xen_blkif_pgrant_timeout);
> >  }
> >  
> > +/* Once a memory pressure is detected, squeeze free page pools for a 
> > while. */
> > +static unsigned int buffer_squeeze_duration_ms = 10;
> > +module_param_named(buffer_squeeze_duration_ms,
> > +   buffer_squeeze_duration_ms, int, 0644);
> > +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > detected");
> > +
> > +static unsigned long buffer_squeeze_end;
> > +
> > +void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> > +{
> > +   buffer_squeeze_end = jiffies +
> > +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> 
> I'm not sure this is fully correct. This function will be called for
> each blkback instance, but the timeout is stored in a global variable
> that's shared between all blkback instances. Shouldn't this timeout be
> stored in xen_blkif so each instance has it's own local variable?
> 
> Or else in the case you have 1k blkback instances the timeout is
> certainly going to be longer than expected, because each call to
> xen_blkbk_reclaim_memory will move it forward.

Agreed that.  I think the extended timeout would not make a visible
performance, though, because the time that 1k-loop take would be short enough
to be ignored compared to the millisecond-scope duration.

I took this way because I wanted to minimize such structural changes as far as
I can, as this is just a point-fix rather than ultimate solution.  That said,
it is not fully correct and very confusing.  My another colleague also pointed
out it in internal review.  Correct solution would be to adding a variable in
the struct as you suggested or avoiding duplicated update of the variable by
initializing the variable once the squeezing duration passes.  I would prefer
the later way, as it is more straightforward and still not introducing
structural change.  For example, it might be like below:

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index f41c698dd854..6856c8ef88de 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end;
 
 void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
 {
-   buffer_squeeze_end = jiffies +
-   msecs_to_jiffies(buffer_squeeze_duration_ms);
+   if (!buffer_squeeze_end)
+   buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg)
}
 
/* Shrink the free pages pool if it is too large. */
-   if (time_before(jiffies, buffer_squeeze_end))
+   if (time_before(jiffies, buffer_squeeze_end)) {
shrink_free_pagepool(ring, 0);
-   else
+   } else {
+   if (unlikely(buffer_squeeze_end))
+   buffer_squeeze_end = 0;
shrink_free_pagepool(ring, max_buffer_pages);
+   }
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);

May I ask you what way would you prefer?


Thanks,
SeongJae Park

> 
> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-12 Thread SeongJae Park
On Thu, 12 Dec 2019 16:23:17 +0100 "Roger Pau Monné"  
wrote:

> > On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné"  
> > wrote:
> > > > On the slow block device
> > > > 
> > > > 
> > > > max_pgs   Min   Max   Median AvgStddev
> > > > 0 38.7  45.8  38.7   40.12  3.1752165
> > > > 1024  38.7  45.8  38.7   40.12  3.1752165
> > > > No difference proven at 95.0% confidence
> > > > 
> > > > On the fast block device
> > > > 
> > > > 
> > > > max_pgs   Min   Max   Median AvgStddev
> > > > 0 417   423   420419.4  2.5099801
> > > > 1024  414   425   416417.8  4.4384682
> > > > No difference proven at 95.0% confidence
> > > 
> > > This is intriguing, as it seems to prove that the usage of a cache of
> > > free pages is irrelevant performance wise.
> > > 
> > > The pool of free pages was introduced long ago, and it's possible that
> > > recent improvements to the balloon driver had made such pool useless,
> > > at which point it could be removed instead of worked around.
> > 
> > I guess the grant page allocation overhead in this test scenario is really
> > small.  In an absence of memory pressure, fragmentation, and NUMA imbalance,
> > the latency of the page allocation ('get_page()') is very short, as it will
> > success in the fast path.
> 
> The allocation of the pool of free pages involves more than get_page,
> it uses gnttab_alloc_pages which in the worse case will allocate a
> page and balloon it out issuing one hypercall.
> 
> > Few years ago, I once measured the page allocation latency on my machine.
> > Roughly speaking, it was about 1us in best case, 100us in worst case, and 
> > 5us
> > in average.  Please keep in mind that the measurement was not designed and
> > performed in serious way.  Thus the results could have profile overhead in 
> > it,
> > though.  While keeping that in mind, let's simply believe the number and 
> > ignore
> > the latency of the block layer, blkback itself (including the grant
> > mapping), and anything else including context switch, cache miss, but the
> > allocation.  In other words, suppose that the grant page allocation is only 
> > one
> > source of the overhead.  It will be able to achieve 1 million IOPS (4KB *
> > 1MIOPS = 4 GB/s) in the best case, 200 thousand IOPS (800 MB/s) in average, 
> > and
> > 10 thousand IOPS (40 MB/s) in worst case.  Based on this coarse 
> > calculation, I
> > think the test results is reasonable.
> > 
> > This also means that the effect of the blkback's free pages pool might be
> > visible under page allocation fast path failure situation.  Nevertheless, it
> > would be also hard to measure that in micro level unless the measurement is
> > well designed and controlled.
> > 
> > > 
> > > Do you think you could perform some more tests (as pointed out above
> > > against the block device to skip the fs overhead) and report back the
> > > results?
> > 
> > To be honest, I'm not sure whether additional tests are really necessary,
> > because I think the `dd` test and the results explanation already makes some
> > sense and provide the minimal proof of the concept.  Also, this change is a
> > fallback for the memory pressure situation, which is an error path in some
> > point of view.  Such errorneous situation might not happen frequently and if
> > the situation is not solved in short time, something much worse (e.g., OOM 
> > kill
> > of the user space xen control processes) than temporal I/O performance
> > degradation could happen.  Thus, I'm not sure whether such detailed 
> > performance
> > measurement is necessary for this rare error handling change.
> 
> Right, my main concern is that we seem to be adding duck tape so
> things don't fall apart, but if such cache is really not beneficial
> from a performance PoV I would rather see it go away than adding more
> stuff to it in order to workaround corner cases like memory
> starvation.

Right, if the cache is really giving no benefit, it would be much better to
simply remove it.  However, as mentioned before, I'm not sure whether it is
useless at all.  Maybe we could do some more detailed test to know that, but it
would be an out of scope of this patch.

> 
> Anyway, I guess we can take such change, but long term we need to look
> into fixing grants to not use ballooned pages, and figure out if the
> blkback free page cache is really useful or not.

Totally agreed.


Thanks,
SeongJae Park

> 
> Thanks, Roger.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
On Fri, Dec 13, 2019 at 10:33 AM Jürgen Groß  wrote:
>
> On 13.12.19 10:27, Roger Pau Monné wrote:
> > On Thu, Dec 12, 2019 at 05:06:58PM +0100, SeongJae Park wrote:
> >> On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné" 
> >>  wrote:
> >>
> >>>> diff --git a/drivers/block/xen-blkback/blkback.c 
> >>>> b/drivers/block/xen-blkback/blkback.c
> >>>> index fd1e19f1a49f..98823d150905 100644
> >>>> --- a/drivers/block/xen-blkback/blkback.c
> >>>> +++ b/drivers/block/xen-blkback/blkback.c
> >>>> @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
> >>>> persistent_gnt *persistent_gnt)
> >>>>HZ * xen_blkif_pgrant_timeout);
> >>>>   }
> >>>>
> >>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>> while. */
> >>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>> +  buffer_squeeze_duration_ms, int, 0644);
> >>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>> detected");
> >>>> +
> >>>> +static unsigned long buffer_squeeze_end;
> >>>> +
> >>>> +void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>>> +{
> >>>> +  buffer_squeeze_end = jiffies +
> >>>> +  msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>
> >>> I'm not sure this is fully correct. This function will be called for
> >>> each blkback instance, but the timeout is stored in a global variable
> >>> that's shared between all blkback instances. Shouldn't this timeout be
> >>> stored in xen_blkif so each instance has it's own local variable?
> >>>
> >>> Or else in the case you have 1k blkback instances the timeout is
> >>> certainly going to be longer than expected, because each call to
> >>> xen_blkbk_reclaim_memory will move it forward.
> >>
> >> Agreed that.  I think the extended timeout would not make a visible
> >> performance, though, because the time that 1k-loop take would be short 
> >> enough
> >> to be ignored compared to the millisecond-scope duration.
> >>
> >> I took this way because I wanted to minimize such structural changes as 
> >> far as
> >> I can, as this is just a point-fix rather than ultimate solution.  That 
> >> said,
> >> it is not fully correct and very confusing.  My another colleague also 
> >> pointed
> >> out it in internal review.  Correct solution would be to adding a variable 
> >> in
> >> the struct as you suggested or avoiding duplicated update of the variable 
> >> by
> >> initializing the variable once the squeezing duration passes.  I would 
> >> prefer
> >> the later way, as it is more straightforward and still not introducing
> >> structural change.  For example, it might be like below:
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c 
> >> b/drivers/block/xen-blkback/blkback.c
> >> index f41c698dd854..6856c8ef88de 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end;
> >>
> >>   void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>   {
> >> -   buffer_squeeze_end = jiffies +
> >> -   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >> +   if (!buffer_squeeze_end)
> >> +   buffer_squeeze_end = jiffies +
> >> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>   }
> >>
> >>   static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
> >> **page)
> >> @@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg)
> >>  }
> >>
> >>  /* Shrink the free pages pool if it is too large. */
> >> -   if (time_before(jiffies, buffer_squeeze_end))
> >> +   if (time_before(jiffies, buffer_squeeze_end)) {
> >>  shrink_free_pagepool(ring, 0);
> >> -   else
> >> +   } else {
> >> +   if (unlikely(buffer_s

[Xen-devel] [PATCH v8 1/3] xenbus/backend: Add memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&backend_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 0/3] xenbus/backend: Add a memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third patch is a trivial cleanup of
variable names.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v8


Patch History
-

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (3):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes

 .../ABI/testing/sysfs-driver-xen-blkback  |  9 +++
 drivers/block/xen-blkback/blkback.c   | 57 ---
 drivers/block/xen-blkback/common.h|  2 +
 drivers/block/xen-blkback/xenbus.c| 11 +++-
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 6 files changed, 90 insertions(+), 22 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
  38.7  45.8  38.7   40.12  3.1752165
No difference proven at 95.0% confidence

On the fast block device


max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine with your workload to find the
optimal squeezing duration for you.

[1] https://aws.amazon.com/ebs/
[2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html

Reviewed-by: Juergen Gross 
Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  |  9 
 drivers/block/xen-blkback/blkback.c   | 22 +--
 drivers/block/xen-blkback/common.h|  2 ++
 drivers/block/xen-blkback/xenbus.c| 11 +-
 4 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 4e7babb3ba1f..a74a6d513c9f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -25,3 +25,12 @@ Description:
 allocated without being in use. The time is in
 seconds, 0 means indefinitely long.
 The default is 60 seconds.
+
+What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
+Date:   December 2019
+KernelVersion:  5.5
+Contact:Roger Pau Monné 
+Description:
+How long the block backend buffers release every free pages in
+those under memory pressure.  The time is in milliseconds.
+The default is 10 milliseconds.
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..26606c4896fd 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
persistent_gnt *persistent_gnt)
HZ * xen_blkif_pgrant_timeout);
 }
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long buffer_squeeze_end;
+
+void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif)
+{
+   blkif->buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
@@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..ba653126177d 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -319,6 +319,7 @@ struct xen_blkif {
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
+   unsigned long   buffer_squeeze_end;
 };
 
 struct seg_buf {
@@ -383,6 +384,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id);
 int xen_blkif_schedule(void *arg);
 int xen_blkif_purge_persistent(void *arg);
 void xen_blkbk_free_caches(struct xen_blkif_ring *ring);
+void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif);
 
 int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt,
  struct backend_info *be, int state);
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..09fe6cb5c4ea 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -824,6 +824,14 @@ static void

[Xen-devel] [PATCH v8 3/3] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-13 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 26606c4896fd..85ff629a7546 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
@@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
conti

Re: [Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
he value as `0` is same to a situation doing the
> > squeezing always (worst-case).
> > 
> > For the I/O performance measurement, I run a simple `dd` command 5 times
> > as below and collect the 'MB/s' results.
> > 
> > $ for i in {1..5}; do dd if=/dev/zero of=file \
> >  bs=4k count=$((256*512)); sync; done
> > 
> > If the underlying block device is slow enough, the squeezing overhead
> > could be hidden.  For the reason, I do this test for both a slow block
> > device and a fast block device.  I use a popular cloud block storage
> > service, ebs[1] as a slow device and the ramdisk block device[2] for the
> > fast device.
> > 
> > The results are as below.  'max_pgs' represents the value of the
> > `blkback.max_buffer_pages` parameter.
> > 
> > On the slow block device
> > 
> > 
> > max_pgs   Min   Max   Median AvgStddev
> > 0 38.7  45.8  38.7   40.12  3.1752165
> > 1024  38.7  45.8  38.7   40.12  3.1752165
> > No difference proven at 95.0% confidence
> > 
> > On the fast block device
> > 
> > 
> > max_pgs   Min   Max   Median AvgStddev
> > 0 417   423   420419.4  2.5099801
> > 1024  414   425   416417.8  4.4384682
> > No difference proven at 95.0% confidence
> > 
> > In short, even worst case squeezing on ramdisk based fast block device
> > makes no visible performance degradation.  Please note that this is just
> > a very simple and minimal test.  On systems using super-fast block
> > devices and a special I/O workload, the results might be different.  If
> > you have any doubt, test on your machine with your workload to find the
> > optimal squeezing duration for you.
> > 
> > [1] https://aws.amazon.com/ebs/
> > [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html
> > 
> > Reviewed-by: Juergen Gross 
> 
> You should likely have dropped Juergen RB, since you made some
> non-trivial changes.

Yes, I will!

> 
> > Signed-off-by: SeongJae Park 
> > ---
> >  .../ABI/testing/sysfs-driver-xen-blkback  |  9 
> >  drivers/block/xen-blkback/blkback.c   | 22 +--
> >  drivers/block/xen-blkback/common.h|  2 ++
> >  drivers/block/xen-blkback/xenbus.c| 11 +-
> >  4 files changed, 41 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
> > b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > index 4e7babb3ba1f..a74a6d513c9f 100644
> > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
> > @@ -25,3 +25,12 @@ Description:
> >  allocated without being in use. The time is in
> >  seconds, 0 means indefinitely long.
> >  The default is 60 seconds.
> > +
> > +What:   
> > /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
> > +Date:   December 2019
> > +KernelVersion:  5.5
> > +Contact:Roger Pau Monn� 
> 
> I think you should be the contact for this feature, you are the one
> that implemented it :).
> 
> > +Description:
> > +How long the block backend buffers release every free 
> > pages in
> > +those under memory pressure.  The time is in milliseconds.
> 
> "When memory pressure is reported to blkback this option controls the
> duration in milliseconds that blkback will not cache any page not
> backed by a grant mapping. The default is 10ms."

Great, will change!

> 
> > +The default is 10 milliseconds.
> > diff --git a/drivers/block/xen-blkback/blkback.c 
> > b/drivers/block/xen-blkback/blkback.c
> > index fd1e19f1a49f..26606c4896fd 100644
> > --- a/drivers/block/xen-blkback/blkback.c
> > +++ b/drivers/block/xen-blkback/blkback.c
> > @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct 
> > persistent_gnt *persistent_gnt)
> > HZ * xen_blkif_pgrant_timeout);
> >  }
> >  
> > +/* Once a memory pressure is detected, squeeze free page pools for a 
> > while. */
> > +static unsigned int buffer_squeeze_duration_ms = 10;
> > +module_param_named(buffer_squeeze_duration_ms,
> > +   buffer_squeeze_duration_ms, int, 0644);
> > +MODULE_PARM_DESC(buffer_squ

[Xen-devel] [PATCH v9 0/4] xenbus/backend: Add a memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third patch is a trivial cleanup of
variable names.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v9


Patch History
-

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park 
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (4):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   | 42 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 26 +---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++
 include/xen/xenbus.h  |  1 +
 6 files changed, 86 insertions(+), 26 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-13 Thread SeongJae Park
  38.7  45.8  38.7   40.12  3.1752165
No difference proven at 95.0% confidence

On the fast block device


max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine with your workload to find the
optimal squeezing duration for you.

[1] https://aws.amazon.com/ebs/
[2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html

Signed-off-by: SeongJae Park 
---
 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   |  7 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 21 ++-
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback 
b/Documentation/ABI/testing/sysfs-driver-xen-blkback
index 4e7babb3ba1f..f01224231f3f 100644
--- a/Documentation/ABI/testing/sysfs-driver-xen-blkback
+++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback
@@ -25,3 +25,13 @@ Description:
 allocated without being in use. The time is in
 seconds, 0 means indefinitely long.
 The default is 60 seconds.
+
+What:   /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms
+Date:   December 2019
+KernelVersion:  5.5
+Contact:SeongJae Park 
+Description:
+When memory pressure is reported to blkback this option
+controls the duration in milliseconds that blkback will not
+cache any page not backed by a grant mapping.
+The default is 10ms.
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..79f677aeb5cc 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -656,8 +656,11 @@ int xen_blkif_schedule(void *arg)
ring->next_lru = jiffies + 
msecs_to_jiffies(LRU_INTERVAL);
}
 
-   /* Shrink if we have more than xen_blkif_max_buffer_pages */
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   /* Shrink the free pages pool if it is too large. */
+   if (time_before(jiffies, blkif->buffer_squeeze_end))
+   shrink_free_pagepool(ring, 0);
+   else
+   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index 1d3002d773f7..536c84f61fed 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -319,6 +319,7 @@ struct xen_blkif {
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
+   unsigned long   buffer_squeeze_end;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index b90dbcd99c03..4f6ea4feca79 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
 }
 
 
+/* Once a memory pressure is detected, squeeze free page pools for a while. */
+static unsigned int buffer_squeeze_duration_ms = 10;
+module_param_named(buffer_squeeze_duration_ms,
+   buffer_squeeze_duration_ms, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration_ms,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+/*
+ * Callback received when the memory pressure is detected.
+ */
+static void reclaim_memory(struct xenbus_device *dev)
+{
+   struct backend_info *be = dev_get_drvdata(&dev->dev);
+
+   be->blkif->buffer_squeeze_end = jiffies +
+   msecs_to_jiffies(buffer_squeeze_duration_ms);
+}
+
 /* ** Connection ** */
 
 
@@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = {
.ids  = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
-   .otherend_changed = frontend_changed
+   .otherend_changed = frontend_changed,
+   .reclaim_memory = reclaim_memory,
 };
 
 int xen_blkif_xenbus_init(void)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 4/4] xen/blkback: Consistently insert one empty line between functions

2019-12-13 Thread SeongJae Park
The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..dc0ea123c74c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(&dev->dev, &dev_attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(&dev->dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 1/4] xenbus/backend: Add memory pressure handler callback

2019-12-13 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&backend_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v9 3/4] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-13 Thread SeongJae Park
A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,

Re: [Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 10:37:55 +0100 "Roger Pau Monné"  
wrote:

> On Fri, Dec 13, 2019 at 03:35:44PM +0000, SeongJae Park wrote:
> > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > the pool starts from zero and is increased on demand while processing
> > the I/O requests.  If current I/O requests handling is finished or 100
> > milliseconds has passed since last I/O requests handling, it checks and
> > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> > 
> > Therefore, host administrators can cause memory pressure in blkback by
> > attaching a large number of block devices and inducing I/O.  Such
> > problematic situations can be avoided by limiting the maximum number of
> > devices that can be attached, but finding the optimal limit is not so
> > easy.  Improper set of the limit can results in memory pressure or a
> > resource underutilization.  This commit avoids such problematic
> > situations by squeezing the pools (returns every free page in the pool
> > to the system) for a while (users can set this duration via a module
> > parameter) if memory pressure is detected.
> > 
> > Discussions
> > ===
> > 
> > The `blkback`'s original shrinking mechanism returns only pages in the
> > pool which are not currently be used by `blkback` to the system.  In
> > other words, the pages that are not mapped with granted pages.  Because
> > this commit is changing only the shrink limit but still uses the same
> > freeing mechanism it does not touch pages which are currently mapping
> > grants.
> > 
> > Once memory pressure is detected, this commit keeps the squeezing limit
> > for a user-specified time duration.  The duration should be neither too
> > long nor too short.  If it is too long, the squeezing incurring overhead
> > can reduce the I/O performance.  If it is too short, `blkback` will not
> > free enough pages to reduce the memory pressure.  This commit sets the
> > value as `10 milliseconds` by default because it is a short time in
> > terms of I/O while it is a long time in terms of memory operations.
> > Also, as the original shrinking mechanism works for at least every 100
> > milliseconds, this could be a somewhat reasonable choice.  I also tested
> > other durations (refer to the below section for more details) and
> > confirmed that 10 milliseconds is the one that works best with the test.
> > That said, the proper duration depends on actual configurations and
> > workloads.  That's why this commit allows users to set the duration as a
> > module parameter.
> > 
> > Memory Pressure Test
> > 
> > 
> > To show how this commit fixes the memory pressure situation well, I
> > configured a test environment on a xen-running virtualization system.
> > On the `blkfront` running guest instances, I attach a large number of
> > network-backed volume devices and induce I/O to those.  Meanwhile, I
> > measure the number of pages that swapped in (pswpin) and out (pswpout)
> > on the `blkback` running guest.  The test ran twice, once for the
> > `blkback` before this commit and once for that after this commit.  As
> > shown below, this commit has dramatically reduced the memory pressure:
> > 
> > pswpin  pswpout
> > before  76,672  185,799
> > after  2123,325
> > 
> > Optimal Aggressive Shrinking Duration
> > -
> > 
> > To find a best squeezing duration, I repeated the test with three
> > different durations (1ms, 10ms, and 100ms).  The results are as below:
> > 
> > durationpswpin  pswpout
> > 1   852 6,424
> > 10  212 3,325
> > 100 203 3,340
> > 
> > As expected, the memory pressure has decreased as the duration is
> > increased, but the reduction stopped from the `10ms`.  Based on this
> > results, I chose the default duration as 10ms.
> > 
> > Performance Overhead Test
> > =
> > 
> > This commit could incur I/O performance degradation under severe memory
> > pressure because the squeezing will require more page allocations per
> > I/O.  To show the overhead, I artificially made a worst-case squeezing
> > situation and measured the I/O performance of a `blkfront` running
> > guest.
> > 
> > For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> > test, I set the value to `1024

[Xen-devel] [PATCH v10 1/4] xenbus/backend: Add memory pressure handler callback

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&backend_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 0/4] xenbus/backend: Add a memory pressure handler callback

2019-12-16 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and use it to mitigate the problem in
'xen-blkback' (patch 2).  The third and fourth patches are trivial
cleanups.

Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_squeezing_v10


Patch History
-

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (4):
  xenbus/backend: Add memory pressure handler callback
  xen/blkback: Squeeze page pools if a memory pressure is detected
  xen/blkback: Remove unnecessary static variable name prefixes
  xen/blkback: Consistently insert one empty line between functions

 .../ABI/testing/sysfs-driver-xen-blkback  | 10 +
 drivers/block/xen-blkback/blkback.c   | 42 +--
 drivers/block/xen-blkback/common.h|  1 +
 drivers/block/xen-blkback/xenbus.c| 26 +---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++
 include/xen/xenbus.h  |  1 +
 6 files changed, 86 insertions(+), 26 deletions(-)

-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 3/4] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *

[Xen-devel] [PATCH v10 4/4] xen/blkback: Consistently insert one empty line between functions

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..dc0ea123c74c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(&dev->dev, &dev_attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
@@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(&dev->dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In sh

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and is increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, host administrators can cause memory pressure in blkback by
> attaching a large number of block devices and inducing I/O.  Such
> problematic situations can be avoided by limiting the maximum number of
> devices that can be attached, but finding the optimal limit is not so
> easy.  Improper set of the limit can results in memory pressure or a
> resource underutilization.  This commit avoids such problematic
> situations by squeezing the pools (returns every free page in the pool
> to the system) for a while (users can set this duration via a module
> parameter) if memory pressure is detected.
> 
> Discussions
> ===
> 
> The `blkback`'s original shrinking mechanism returns only pages in the
> pool which are not currently be used by `blkback` to the system.  In
> other words, the pages that are not mapped with granted pages.  Because
> this commit is changing only the shrink limit but still uses the same
> freeing mechanism it does not touch pages which are currently mapping
> grants.
> 
> Once memory pressure is detected, this commit keeps the squeezing limit
> for a user-specified time duration.  The duration should be neither too
> long nor too short.  If it is too long, the squeezing incurring overhead
> can reduce the I/O performance.  If it is too short, `blkback` will not
> free enough pages to reduce the memory pressure.  This commit sets the
> value as `10 milliseconds` by default because it is a short time in
> terms of I/O while it is a long time in terms of memory operations.
> Also, as the original shrinking mechanism works for at least every 100
> milliseconds, this could be a somewhat reasonable choice.  I also tested
> other durations (refer to the below section for more details) and
> confirmed that 10 milliseconds is the one that works best with the test.
> That said, the proper duration depends on actual configurations and
> workloads.  That's why this commit allows users to set the duration as a
> module parameter.
> 
> Memory Pressure Test
> 
> 
> To show how this commit fixes the memory pressure situation well, I
> configured a test environment on a xen-running virtualization system.
> On the `blkfront` running guest instances, I attach a large number of
> network-backed volume devices and induce I/O to those.  Meanwhile, I
> measure the number of pages that swapped in (pswpin) and out (pswpout)
> on the `blkback` running guest.  The test ran twice, once for the
> `blkback` before this commit and once for that after this commit.  As
> shown below, this commit has dramatically reduced the memory pressure:
> 
> pswpin  pswpout
> before  76,672  185,799
> after  2123,325
> 
> Optimal Aggressive Shrinking Duration
> -
> 
> To find a best squeezing duration, I repeated the test with three
> different durations (1ms, 10ms, and 100ms).  The results are as below:
> 
> durationpswpin  pswpout
> 1   852 6,424
> 10  212 3,325
> 100 203 3,340
> 
> As expected, the memory pressure has decreased as the duration is
> increased, but the reduction stopped from the `10ms`.  Based on this
> results, I chose the default duration as 10ms.
> 
> Performance Overhead Test
> =
> 
> This commit could incur I/O performance degradation under severe memory
> pressure because the squeezing will require more page allocations per
> I/O.  To show the overhead, I artificially made a worst-case squeezing
> situation and measured the I/O performance of a `blkfront` running
> guest.
> 
> For the artificial squeezing, I set the `blkback.max_buffer_pages` using
> the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
> test, I set the value to `1024` and `0`.  The `1024` is the default
> value.  Setting the value as `0` is same to a situation doing the
> squeezing always (worst-case).
> 
> If the underlying block device is slow enough, the squeezing overhead
> could be hidden.  For the reason, I use a fast block device, namely the
> rbd[1]:
> 
> # xl block-attach guest phy:/dev/ram0 xvdb w
> 
> For the I/O performance measurement, I run a simple `dd` command 5 times
> d

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  wrote:

> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:
> 
> > From: SeongJae Park 
> > 
[...]
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
> >  }
> >  
> >  
> > +/* Once a memory pressure is detected, squeeze free page pools for a 
> > while. */
> > +static unsigned int buffer_squeeze_duration_ms = 10;
> > +module_param_named(buffer_squeeze_duration_ms,
> > +   buffer_squeeze_duration_ms, int, 0644);
> > +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > detected");
> > +
> > +/*
> > + * Callback received when the memory pressure is detected.
> > + */
> > +static void reclaim_memory(struct xenbus_device *dev)
> > +{
> > +   struct backend_info *be = dev_get_drvdata(&dev->dev);
> > +
> > +   be->blkif->buffer_squeeze_end = jiffies +
> > +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> 
> This callback might race with 'xen_blkbk_probe()'.  The race could result in
> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links
> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> 
> I will do more test and share results.  Meanwhile, if you have any opinion,
> please let me know.

Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
concurrency issues could be in other drivers in their way, I suggest to change
the reclaim callback ('->reclaim_memory') to be called for each driver instead
of each device.  Then, each driver could be able to deal with its concurrency
issues by itself.

For blkback, we could reuse the global variable based approach, as similar to
the v7[1] of this patchset.  As the callback is called for each driver instead
of each device now, the duplicated set of the timeout will not happen.


Thanks,
SeongJae Park

[1] https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/

> 
> 
> Thanks,
> SeongJae Park
> 
> > +}
> > +
> >  /* ** Connection ** */
> >  
> >  
> > @@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = {
> > .ids  = xen_blkbk_ids,
> > .probe = xen_blkbk_probe,
> > .remove = xen_blkbk_remove,
> > -   .otherend_changed = frontend_changed
> > +   .otherend_changed = frontend_changed,
> > +   .reclaim_memory = reclaim_memory,
> >  };
> >  
> >  int xen_blkif_xenbus_init(void)
> > -- 
> > 2.17.1
> > 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-16 Thread SeongJae Park
On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:

> On 16.12.19 17:15, SeongJae Park wrote:
> > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  wrote:
> > 
> >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  wrote:
> >>
> >>> From: SeongJae Park 
> >>>
> > [...]
> >>> --- a/drivers/block/xen-blkback/xenbus.c
> >>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device 
> >>> *dev,
> >>>   }
> >>>   
> >>>   
> >>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>> while. */
> >>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>> +module_param_named(buffer_squeeze_duration_ms,
> >>> + buffer_squeeze_duration_ms, int, 0644);
> >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>> detected");
> >>> +
> >>> +/*
> >>> + * Callback received when the memory pressure is detected.
> >>> + */
> >>> +static void reclaim_memory(struct xenbus_device *dev)
> >>> +{
> >>> + struct backend_info *be = dev_get_drvdata(&dev->dev);
> >>> +
> >>> + be->blkif->buffer_squeeze_end = jiffies +
> >>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>
> >> This callback might race with 'xen_blkbk_probe()'.  The race could result 
> >> in
> >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >> links
> >> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>
> >> I will do more test and share results.  Meanwhile, if you have any opinion,
> >> please let me know.

I reduced system memory and attached bunch of devices in short time so that
memory pressure occurs while device attachments are ongoing.  Under this
circumstance, I was able to see the race.

> > 
> > Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > concurrency issues could be in other drivers in their way, I suggest to 
> > change
> > the reclaim callback ('->reclaim_memory') to be called for each driver 
> > instead
> > of each device.  Then, each driver could be able to deal with its 
> > concurrency
> > issues by itself.
> 
> Hmm, I don't like that. This would need to be changed back in case we
> add per-guest quota.

Extending this callback in that way would be still not too hard.  We could use
the argument to the callback.  I would keep the argument of the callback to
'struct device *' as is, and will add a comment saying 'NULL' value of the
argument means every devices.  As an example, xenbus would pass NULL-ending
array of the device pointers that need to free its resources.

After seeing this race, I am now also thinking it could be better to delegate
detailed control of each device to its driver, as some drivers have some
complicated and unique relation with its devices.

> 
> Wouldn't a get_device() before calling the callback and a put_device()
> afterwards avoid that problem?

I didn't used the reference count manipulation operations because other similar
parts also didn't.  But, if there is no implicit reference count guarantee, it
seems those operations are indeed necessary.

That said, as get/put operations only adjust the reference count, those will
not make the callback to wait until the linking of the 'backend' and 'blkif' to
the device (xen_blkbk_probe()) is finished.  Thus, the race could still happen.
Or, am I missing something?

I also modified the code to do 'get_device()' and 'put_device()' as you
suggested and did test, but the race was still reproducible.


Thanks,
SeongJae Park

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:

> On 16.12.19 20:48, SeongJae Park wrote:
> > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > 
> >> On 16.12.19 17:15, SeongJae Park wrote:
> >>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> >>> wrote:
> >>>
> >>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> >>>> wrote:
> >>>>
> >>>>> From: SeongJae Park 
> >>>>>
> >>> [...]
> >>>>> --- a/drivers/block/xen-blkback/xenbus.c
> >>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device 
> >>>>> *dev,
> >>>>>}
> >>>>>
> >>>>>
> >>>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>>> while. */
> >>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>>> +   buffer_squeeze_duration_ms, int, 0644);
> >>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>>> detected");
> >>>>> +
> >>>>> +/*
> >>>>> + * Callback received when the memory pressure is detected.
> >>>>> + */
> >>>>> +static void reclaim_memory(struct xenbus_device *dev)
> >>>>> +{
> >>>>> +   struct backend_info *be = dev_get_drvdata(&dev->dev);
> >>>>> +
> >>>>> +   be->blkif->buffer_squeeze_end = jiffies +
> >>>>> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>>
> >>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> >>>> result in
> >>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >>>> links
> >>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>>>
> >>>> I will do more test and share results.  Meanwhile, if you have any 
> >>>> opinion,
> >>>> please let me know.
> > 
> > I reduced system memory and attached bunch of devices in short time so that
> > memory pressure occurs while device attachments are ongoing.  Under this
> > circumstance, I was able to see the race.
> > 
> >>>
> >>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> >>> concurrency issues could be in other drivers in their way, I suggest to 
> >>> change
> >>> the reclaim callback ('->reclaim_memory') to be called for each driver 
> >>> instead
> >>> of each device.  Then, each driver could be able to deal with its 
> >>> concurrency
> >>> issues by itself.
> >>
> >> Hmm, I don't like that. This would need to be changed back in case we
> >> add per-guest quota.
> > 
> > Extending this callback in that way would be still not too hard.  We could 
> > use
> > the argument to the callback.  I would keep the argument of the callback to
> > 'struct device *' as is, and will add a comment saying 'NULL' value of the
> > argument means every devices.  As an example, xenbus would pass NULL-ending
> > array of the device pointers that need to free its resources.
> > 
> > After seeing this race, I am now also thinking it could be better to 
> > delegate
> > detailed control of each device to its driver, as some drivers have some
> > complicated and unique relation with its devices.
> > 
> >>
> >> Wouldn't a get_device() before calling the callback and a put_device()
> >> afterwards avoid that problem?
> > 
> > I didn't used the reference count manipulation operations because other 
> > similar
> > parts also didn't.  But, if there is no implicit reference count guarantee, 
> > it
> > seems those operations are indeed necessary.
> > 
> > That said, as get/put operations only adjust the reference count, those will
> > not make the callback to wait until the linking of the 'backend' and 
> > 'blkif' to
> > the device (xen_blkbk_probe()) is finished.  Thus, the race could still 
> > happen.
> > Or, am I missing something?
> 
> No, I think we need a xenbus lock per device which will need to be
> taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the
> callback.

I also agree that locking should be used at last.  But, as each driver manages
its devices and resources in their way, it could have its unique race
conditions.  And, each unique race condition might have its unique efficient
way to synchronize it.  Therefore, I think the synchronization should be done
by each driver, not by xenbus and thus we should make the callback to be called
per-driver.


Thanks,
SeongJae Park

> 
> 
> Juergen
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 08:59, SeongJae Park wrote:
> > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 16.12.19 20:48, SeongJae Park wrote:
> >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> >>>
> >>>> On 16.12.19 17:15, SeongJae Park wrote:
> >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> >>>>> wrote:
> >>>>>
> >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> >>>>>> wrote:
> >>>>>>
> >>>>>>> From: SeongJae Park 
> >>>>>>>
> >>>>> [...]
> >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
> >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> >>>>>>> xenbus_device *dev,
> >>>>>>> }
> >>>>>>> 
> >>>>>>> 
> >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> >>>>>>> while. */
> >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>>>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>>>>> + buffer_squeeze_duration_ms, int, 0644);
> >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> >>>>>>> detected");
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * Callback received when the memory pressure is detected.
> >>>>>>> + */
> >>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
> >>>>>>> +{
> >>>>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev);
> >>>>>>> +
> >>>>>>> + be->blkif->buffer_squeeze_end = jiffies +
> >>>>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>>>>
> >>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> >>>>>> result in
> >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> >>>>>> links
> >>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> >>>>>>
> >>>>>> I will do more test and share results.  Meanwhile, if you have any 
> >>>>>> opinion,
> >>>>>> please let me know.
> >>>
> >>> I reduced system memory and attached bunch of devices in short time so 
> >>> that
> >>> memory pressure occurs while device attachments are ongoing.  Under this
> >>> circumstance, I was able to see the race.
> >>>
> >>>>>
> >>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> >>>>> concurrency issues could be in other drivers in their way, I suggest to 
> >>>>> change
> >>>>> the reclaim callback ('->reclaim_memory') to be called for each driver 
> >>>>> instead
> >>>>> of each device.  Then, each driver could be able to deal with its 
> >>>>> concurrency
> >>>>> issues by itself.
> >>>>
> >>>> Hmm, I don't like that. This would need to be changed back in case we
> >>>> add per-guest quota.
> >>>
> >>> Extending this callback in that way would be still not too hard.  We 
> >>> could use
> >>> the argument to the callback.  I would keep the argument of the callback 
> >>> to
> >>> 'struct device *' as is, and will add a comment saying 'NULL' value of the
> >>> argument means every devices.  As an example, xenbus would pass 
> >>> NULL-ending
> >>> array of the device pointers that need to free its resources.
> >>>
> >>> After seeing this race, I am now also thinking it could be better to 
> >>> delegate
> >>> detailed control of each device to its dri

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 12:39:15 +0100 "Roger Pau Monné"  
wrote:

> On Mon, Dec 16, 2019 at 08:48:03PM +0100, SeongJae Park wrote:
> > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > 
> > > On 16.12.19 17:15, SeongJae Park wrote:
> > > > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> > > > wrote:
> > > > 
> > > >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> > > >> wrote:
> > > >>
> > > >>> From: SeongJae Park 
> > > >>>
> > > > [...]
> > > >>> --- a/drivers/block/xen-blkback/xenbus.c
> > > >>> +++ b/drivers/block/xen-blkback/xenbus.c
> > > >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> > > >>> xenbus_device *dev,
> > > >>>   }
> > > >>>   
> > > >>>   
> > > >>> +/* Once a memory pressure is detected, squeeze free page pools for a 
> > > >>> while. */
> > > >>> +static unsigned int buffer_squeeze_duration_ms = 10;
> > > >>> +module_param_named(buffer_squeeze_duration_ms,
> > > >>> + buffer_squeeze_duration_ms, int, 0644);
> > > >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > > >>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > > >>> detected");
> > > >>> +
> > > >>> +/*
> > > >>> + * Callback received when the memory pressure is detected.
> > > >>> + */
> > > >>> +static void reclaim_memory(struct xenbus_device *dev)
> > > >>> +{
> > > >>> + struct backend_info *be = dev_get_drvdata(&dev->dev);
> > > >>> +
> > > >>> + be->blkif->buffer_squeeze_end = jiffies +
> > > >>> + msecs_to_jiffies(buffer_squeeze_duration_ms);
> > > >>
> > > >> This callback might race with 'xen_blkbk_probe()'.  The race could 
> > > >> result in
> > > >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it 
> > > >> links
> > > >> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> > > >>
> > > >> I will do more test and share results.  Meanwhile, if you have any 
> > > >> opinion,
> > > >> please let me know.
> > 
> > I reduced system memory and attached bunch of devices in short time so that
> > memory pressure occurs while device attachments are ongoing.  Under this
> > circumstance, I was able to see the race.
> > 
> > > > 
> > > > Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > > > concurrency issues could be in other drivers in their way, I suggest to 
> > > > change
> > > > the reclaim callback ('->reclaim_memory') to be called for each driver 
> > > > instead
> > > > of each device.  Then, each driver could be able to deal with its 
> > > > concurrency
> > > > issues by itself.
> > > 
> > > Hmm, I don't like that. This would need to be changed back in case we
> > > add per-guest quota.
> > 
> > Extending this callback in that way would be still not too hard.  We could 
> > use
> > the argument to the callback.  I would keep the argument of the callback to
> > 'struct device *' as is, and will add a comment saying 'NULL' value of the
> > argument means every devices.  As an example, xenbus would pass NULL-ending
> > array of the device pointers that need to free its resources.
> > 
> > After seeing this race, I am now also thinking it could be better to 
> > delegate
> > detailed control of each device to its driver, as some drivers have some
> > complicated and unique relation with its devices.
> > 
> > > 
> > > Wouldn't a get_device() before calling the callback and a put_device()
> > > afterwards avoid that problem?
> > 
> > I didn't used the reference count manipulation operations because other 
> > similar
> > parts also didn't.  But, if there is no implicit reference count guarantee, 
> > it
> > seems those operations are indeed necessary.
> > 
> > That said, as get/put operations only adjust the reference count, those will
> > not make the callback

[Xen-devel] [PATCH v11 1/6] xenbus/backend: Add memory pressure handler callback

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this commit adds a memory reclaim callback to
'xenbus_driver'.  If a memory pressure is detected, 'xenbus' requests
every backend driver to volunarily release its memory.

Note that it would be able to improve the callback facility for more
sophisticated handlings of general pressures.  For example, it would be
possible to monitor the memory consumption of each device and issue the
release requests to only devices which causing the pressure.  Also, the
callback could be extended to handle not only memory, but general
resources.  Nevertheless, this version of the implementation defers such
sophisticated goals as a future work.

Reviewed-by: Juergen Gross 
Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
 include/xen/xenbus.h  |  1 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index b0bed4faf44c..7e78ebef7c54 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
return NOTIFY_DONE;
 }
 
+static int backend_reclaim_memory(struct device *dev, void *data)
+{
+   const struct xenbus_driver *drv;
+
+   if (!dev->driver)
+   return 0;
+   drv = to_xenbus_driver(dev->driver);
+   if (drv && drv->reclaim_memory)
+   drv->reclaim_memory(to_xenbus_device(dev));
+   return 0;
+}
+
+/*
+ * Returns 0 always because we are using shrinker to only detect memory
+ * pressure.
+ */
+static unsigned long backend_shrink_memory_count(struct shrinker *shrinker,
+   struct shrink_control *sc)
+{
+   bus_for_each_dev(&xenbus_backend.bus, NULL, NULL,
+   backend_reclaim_memory);
+   return 0;
+}
+
+static struct shrinker backend_memory_shrinker = {
+   .count_objects = backend_shrink_memory_count,
+   .seeks = DEFAULT_SEEKS,
+};
+
 static int __init xenbus_probe_backend_init(void)
 {
static struct notifier_block xenstore_notifier = {
@@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void)
 
register_xenstore_notifier(&xenstore_notifier);
 
+   if (register_shrinker(&backend_memory_shrinker))
+   pr_warn("shrinker registration failed\n");
+
return 0;
 }
 subsys_initcall(xenbus_probe_backend_init);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..c861cfb6f720 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,7 @@ struct xenbus_driver {
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
int (*is_ready)(struct xenbus_device *dev);
+   void (*reclaim_memory)(struct xenbus_device *dev);
 };
 
 static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 0/6] xenbus/backend: Add a memory pressure handler callback

2019-12-17 Thread SeongJae Park
Granting pages consumes backend system memory.  In systems configured
with insufficient spare memory for those pages, it can cause a memory
pressure situation.  However, finding the optimal amount of the spare
memory is challenging for large systems having dynamic resource
utilization patterns.  Also, such a static configuration might lack
flexibility.

To mitigate such problems, this patchset adds a memory reclaim callback
to 'xenbus_driver' (patch 1) and then introduce a lock for race
condition avoidance (patch 2).  Those two patches could be merged into
one patch if necessary.

The third patch applies the callback mechanism to mitigate the problem
in 'xen-blkback' (patch 3), but it lacks use of the race condition
mitigation.  Following change (patch 4) applies the race protection
mechanism to the blkback.  Patch 3 and patch 4 has seperated for only
review convenience.  Highly recommend to merge those into one patch as
patch 3 applied version might confuse bisecting.

The fifth and sixth patches are trivial cleanups; those fix nits we
found during the development of this patchset.

Note that patch 1, 3, 5, 6 are same with previous version.  I made the
changes in this version to different commits (only second and fourth
patches) to make review more comfortable.  Especially, the third and
fourth patches should be merged into one patch, as the third one alone
might make bisecting confuse.  Tthe next version of this patchset will
also merge those.


Base Version


This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v11


Patch History
-

Changes from v10
(https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/)
 - Fix race condition (reported by SeongJae, suggested by Juergen)

Changes from v9
(https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/)
 - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné
 - Update the commit message for overhead test of the 2nd path

Changes from v8
(https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/)
 - Drop 'Reviewed-by: Juergen' from the second patch
   (suggested by Roger Pau Monné)
 - Update contact of the new module param to SeongJae Park
   
   (suggested by Roger Pau Monné)
 - Wordsmith the description of the parameter
   (suggested by Roger Pau Monné)
 - Fix dumb bugs
   (suggested by Roger Pau Monné)
 - Move module param definition to xenbus.c and reduce the number of
   lines for this change
   (suggested by Roger Pau Monné)
 - Add a comment for the new callback, reclaim_memory, as other
   callbacks also have
 - Add another trivial cleanup of xenbus.c file (4th patch)

Changes from v7
(https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/)
 - Update sysfs-driver-xen-blkback for new parameter
   (suggested by Roger Pau Monné)
 - Use per-xen_blkif buffer_squeeze_end instead of global variable
   (suggested by Roger Pau Monné)

Changes from v6
(https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/)
 - Remove more unnecessary prefixes (suggested by Roger Pau Monné)
 - Constify a variable (suggested by Roger Pau Monné)
 - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné)
 - More wordsmith of the commit message (suggested by Roger Pau Monné)

Changes from v5
(https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/)
 - Wordsmith the commit messages (suggested by Roger Pau Monné)
 - Change the reclaim callback return type (suggested by Roger Pau
   Monné)
 - Change the type of the blkback squeeze duration variable
   (suggested by Roger Pau Monné)
 - Add a patch for removal of unnecessary static variable name prefixes
   (suggested by Roger Pau Monné)
 - Fix checkpatch.pl warnings

Changes from v4
(https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/)
 - Remove domain id parameter from the callback (suggested by Juergen
   Gross)
 - Rename xen-blkback module parameter (suggested by Stefan Nuernburger)

Changes from v3
(https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/)
 - Add general callback in xen_driver and use it (suggested by Juergen
   Gross)

Changes from v2
(https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com)
 - Rename the module parameter and variables for brevity
   (aggressive shrinking -> squeezing)

Changes from v1
(https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily`
   (suggested by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch


SeongJae Park (6):
  xenbus/backend: Add memory

[Xen-devel] [PATCH v11 3/6] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

max_pgs   Min   Max   Median AvgStddev
0 417   423   420419.4  2.5099801
1024  414   425   416417.8  4.4384682
No difference proven at 95.0% confidence

In sh

[Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

'reclaim_memory' callback can race with a driver code as this callback
will be called from any memory pressure detected context.  To deal with
the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
'reclaim_memory' callback is called, the lock of the device which passed
to the callback as its argument is locked.  Thus, drivers registering
their 'reclaim_memory' callback should protect the data that might race
with the callback with the lock by themselves.

Signed-off-by: SeongJae Park 
---
 drivers/xen/xenbus/xenbus_probe.c |  1 +
 drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
 include/xen/xenbus.h  |  2 ++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index 5b471889d723..b86393f172e6 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
goto fail;
 
dev_set_name(&xendev->dev, "%s", devname);
+   spin_lock_init(&xendev->reclaim_lock);
 
/* Register with generic device framework. */
err = device_register(&xendev->dev);
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 7e78ebef7c54..516aa64b9967 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block 
*notifier,
 static int backend_reclaim_memory(struct device *dev, void *data)
 {
const struct xenbus_driver *drv;
+   struct xenbus_device *xdev;
+   unsigned long flags;
 
if (!dev->driver)
return 0;
drv = to_xenbus_driver(dev->driver);
-   if (drv && drv->reclaim_memory)
-   drv->reclaim_memory(to_xenbus_device(dev));
+   if (drv && drv->reclaim_memory) {
+   xdev = to_xenbus_device(dev);
+   spin_trylock_irqsave(&xdev->reclaim_lock, flags);
+   drv->reclaim_memory(xdev);
+   spin_unlock_irqrestore(&xdev->reclaim_lock, flags);
+   }
return 0;
 }
 
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index c861cfb6f720..d9468313061d 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -76,6 +76,8 @@ struct xenbus_device {
enum xenbus_state state;
struct completion down;
struct work_struct work;
+   /* 'reclaim_memory' callback is called while this lock is acquired */
+   spinlock_t reclaim_lock;
 };
 
 static inline struct xenbus_device *to_xenbus_device(struct device *dev)
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 4/6] xen/blkback: Protect 'reclaim_memory()' with 'reclaim_lock'

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

The 'reclaim_memory()' callback of blkback could race with
'xen_blkbk_probe()' and 'xen_blkbk_remove()'.  In the case, incompletely
linked 'backend_info' and 'blkif' might be exposed to the callback, thus
result in bad results including NULL dereference.  This commit fixes the
problem by applying the 'reclaim_lock' protection to those.

Note that this commit is separated for review purpose only.  As the
previous commit might result in race condition and might make bisect
confuse, please squash this commit into previous commit if possible.

Signed-off-by: SeongJae Park 

---
 drivers/block/xen-blkback/xenbus.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4f6ea4feca79..20045827a391 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -492,6 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
+   unsigned long flags;
 
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
 
@@ -504,6 +505,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
be->backend_watch.node = NULL;
}
 
+   spin_lock_irqsave(&dev->reclaim_lock, flags);
dev_set_drvdata(&dev->dev, NULL);
 
if (be->blkif) {
@@ -512,6 +514,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
/* Put the reference we set in xen_blkif_alloc(). */
xen_blkif_put(be->blkif);
}
+   spin_unlock_irqrestore(&dev->reclaim_lock, flags);
 
return 0;
 }
@@ -597,6 +600,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
int err;
struct backend_info *be = kzalloc(sizeof(struct backend_info),
  GFP_KERNEL);
+   unsigned long flags;
 
/* match the pr_debug in xen_blkbk_remove */
pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id);
@@ -607,6 +611,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return -ENOMEM;
}
be->dev = dev;
+   spin_lock_irqsave(&dev->reclaim_lock, flags);
dev_set_drvdata(&dev->dev, be);
 
be->blkif = xen_blkif_alloc(dev->otherend_id);
@@ -614,8 +619,10 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
err = PTR_ERR(be->blkif);
be->blkif = NULL;
xenbus_dev_fatal(dev, err, "creating block interface");
+   spin_unlock_irqrestore(&dev->reclaim_lock, flags);
goto fail;
}
+   spin_unlock_irqrestore(&dev->reclaim_lock, flags);
 
err = xenbus_printf(XBT_NIL, dev->nodename,
"feature-max-indirect-segments", "%u",
@@ -838,6 +845,10 @@ static void reclaim_memory(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
 
+   /* Device is registered but not probed yet */
+   if (!be)
+   return;
+
be->blkif->buffer_squeeze_end = jiffies +
msecs_to_jiffies(buffer_squeeze_duration_ms);
 }
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v11 5/6] xen/blkback: Remove unnecessary static variable name prefixes

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables.  This commit removes such prefixes.

Reviewed-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/blkback.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 79f677aeb5cc..fbd67f8e4e4e 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -62,8 +62,8 @@
  * IO workloads.
  */
 
-static int xen_blkif_max_buffer_pages = 1024;
-module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644);
+static int max_buffer_pages = 1024;
+module_param_named(max_buffer_pages, max_buffer_pages, int, 0644);
 MODULE_PARM_DESC(max_buffer_pages,
 "Maximum number of free pages to keep in each block backend buffer");
 
@@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages,
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 1056;
-module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
+static int max_pgrants = 1056;
+module_param_named(max_persistent_grants, max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
@@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants,
  * use. The time is in seconds, 0 means indefinitely long.
  */
 
-static unsigned int xen_blkif_pgrant_timeout = 60;
-module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout,
+static unsigned int pgrant_timeout = 60;
+module_param_named(persistent_grant_unused_seconds, pgrant_timeout,
   uint, 0644);
 MODULE_PARM_DESC(persistent_grant_unused_seconds,
 "Time in seconds an unused persistent grant is allowed to "
@@ -137,9 +137,8 @@ module_param(log_stats, int, 0644);
 
 static inline bool persistent_gnt_timeout(struct persistent_gnt 
*persistent_gnt)
 {
-   return xen_blkif_pgrant_timeout &&
-  (jiffies - persistent_gnt->last_used >=
-   HZ * xen_blkif_pgrant_timeout);
+   return pgrant_timeout && (jiffies - persistent_gnt->last_used >=
+   HZ * pgrant_timeout);
 }
 
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
@@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring,
struct persistent_gnt *this;
struct xen_blkif *blkif = ring->blkif;
 
-   if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) {
+   if (ring->persistent_gnt_c >= max_pgrants) {
if (!blkif->vbd.overflow_max_grants)
blkif->vbd.overflow_max_grants = 1;
return -EBUSY;
@@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring 
*ring)
goto out;
}
 
-   if (ring->persistent_gnt_c < xen_blkif_max_pgrants ||
-   (ring->persistent_gnt_c == xen_blkif_max_pgrants &&
+   if (ring->persistent_gnt_c < max_pgrants ||
+   (ring->persistent_gnt_c == max_pgrants &&
!ring->blkif->vbd.overflow_max_grants)) {
num_clean = 0;
} else {
-   num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN;
-   num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants +
-   num_clean;
+   num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN;
+   num_clean = ring->persistent_gnt_c - max_pgrants + num_clean;
num_clean = min(ring->persistent_gnt_c, num_clean);
pr_debug("Going to purge at least %u persistent grants\n",
 num_clean);
@@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring)
 current->comm, ring->st_oo_req,
 ring->st_rd_req, ring->st_wr_req,
 ring->st_f_req, ring->st_ds_req,
-ring->persistent_gnt_c,
-xen_blkif_max_pgrants);
+ring->persistent_gnt_c, max_pgrants);
ring->st_print = jiffies + msecs_to_jiffies(10 * 1000);
ring->st_rd_req = 0;
ring->st_wr_req = 0;
@@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg)
if (time_before(jiffies, blkif->buffer_squeeze_end))
shrink_free_pagepool(ring, 0);
else
-   shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+   shrink_free_pagepool(ring, max_buffer_pages);
 
if (log_stats && time_after(jiffies, ring->st_print))
print_stats(ring);
@@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *

[Xen-devel] [PATCH v11 6/6] xen/blkback: Consistently insert one empty line between functions

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.

Acked-by: Roger Pau Monné 
Signed-off-by: SeongJae Park 
---
 drivers/block/xen-blkback/xenbus.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 20045827a391..453f97dd533d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
device_remove_file(&dev->dev, &dev_attr_physical_device);
 }
 
-
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
if (vbd->bdev)
@@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
handle, blkif->domid);
return 0;
 }
+
 static int xen_blkbk_remove(struct xenbus_device *dev)
 {
struct backend_info *be = dev_get_drvdata(&dev->dev);
@@ -575,6 +575,7 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
if (err)
dev_warn(&dev->dev, "writing feature-discard (%d)", err);
 }
+
 int xen_blkbk_barrier(struct xenbus_transaction xbt,
  struct backend_info *be, int state)
 {
@@ -663,7 +664,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
return err;
 }
 
-
 /*
  * Callback received when the hotplug scripts have placed the physical-device
  * node.  Read it and the mode node, and create a vbd.  If the frontend is
@@ -755,7 +755,6 @@ static void backend_changed(struct xenbus_watch *watch,
}
 }
 
-
 /*
  * Callback received when the frontend's state changes.
  */
@@ -830,7 +829,6 @@ static void frontend_changed(struct xenbus_device *dev,
}
 }
 
-
 /* Once a memory pressure is detected, squeeze free page pools for a while. */
 static unsigned int buffer_squeeze_duration_ms = 10;
 module_param_named(buffer_squeeze_duration_ms,
@@ -855,7 +853,6 @@ static void reclaim_memory(struct xenbus_device *dev)
 
 /* ** Connection ** */
 
-
 /*
  * Write the physical details regarding the block device to the store, and
  * switch to Connected state.
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 09:30:32 +0100 SeongJae Park  wrote:

> On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß"  wrote:
> 
> > On 17.12.19 08:59, SeongJae Park wrote:
> > > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß"  wrote:
> > > 
> > >> On 16.12.19 20:48, SeongJae Park wrote:
> > >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
> > >>>
> > >>>> On 16.12.19 17:15, SeongJae Park wrote:
> > >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park  
> > >>>>> wrote:
> > >>>>>
> > >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park  
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> From: SeongJae Park 
> > >>>>>>>
> > >>>>> [...]
> > >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
> > >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
> > >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct 
> > >>>>>>> xenbus_device *dev,
> > >>>>>>> }
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for 
> > >>>>>>> a while. */
> > >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> > >>>>>>> +module_param_named(buffer_squeeze_duration_ms,
> > >>>>>>> +   buffer_squeeze_duration_ms, int, 0644);
> > >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> > >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is 
> > >>>>>>> detected");
> > >>>>>>> +
> > >>>>>>> +/*
> > >>>>>>> + * Callback received when the memory pressure is detected.
> > >>>>>>> + */
> > >>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
> > >>>>>>> +{
> > >>>>>>> +   struct backend_info *be = dev_get_drvdata(&dev->dev);
> > >>>>>>> +
> > >>>>>>> +   be->blkif->buffer_squeeze_end = jiffies +
> > >>>>>>> +   msecs_to_jiffies(buffer_squeeze_duration_ms);
> > >>>>>>
> > >>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could 
> > >>>>>> result in
> > >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after 
> > >>>>>> it links
> > >>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
> > >>>>>>
> > >>>>>> I will do more test and share results.  Meanwhile, if you have any 
> > >>>>>> opinion,
> > >>>>>> please let me know.
> > >>>
> > >>> I reduced system memory and attached bunch of devices in short time so 
> > >>> that
> > >>> memory pressure occurs while device attachments are ongoing.  Under this
> > >>> circumstance, I was able to see the race.
> > >>>
> > >>>>>
> > >>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
> > >>>>> concurrency issues could be in other drivers in their way, I suggest 
> > >>>>> to change
> > >>>>> the reclaim callback ('->reclaim_memory') to be called for each 
> > >>>>> driver instead
> > >>>>> of each device.  Then, each driver could be able to deal with its 
> > >>>>> concurrency
> > >>>>> issues by itself.
> > >>>>
> > >>>> Hmm, I don't like that. This would need to be changed back in case we
> > >>>> add per-guest quota.
> > >>>
> > >>> Extending this callback in that way would be still not too hard.  We 
> > >>> could use
> > >>> the argument to the callback.  I would keep the argument of the 
> > >>> callback to
> > >>> 'struct device *' as is, and wi

Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 17:07, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > 'reclaim_memory' callback can race with a driver code as this callback
> > will be called from any memory pressure detected context.  To deal with
> > the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> > 'reclaim_memory' callback is called, the lock of the device which passed
> > to the callback as its argument is locked.  Thus, drivers registering
> > their 'reclaim_memory' callback should protect the data that might race
> > with the callback with the lock by themselves.
> > 
> > Signed-off-by: SeongJae Park 
> > ---
> >   drivers/xen/xenbus/xenbus_probe.c |  1 +
> >   drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
> >   include/xen/xenbus.h  |  2 ++
> >   3 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> > b/drivers/xen/xenbus/xenbus_probe.c
> > index 5b471889d723..b86393f172e6 100644
> > --- a/drivers/xen/xenbus/xenbus_probe.c
> > +++ b/drivers/xen/xenbus/xenbus_probe.c
> > @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
> > goto fail;
> >   
> > dev_set_name(&xendev->dev, "%s", devname);
> > +   spin_lock_init(&xendev->reclaim_lock);
> >   
> > /* Register with generic device framework. */
> > err = device_register(&xendev->dev);
> > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> > b/drivers/xen/xenbus/xenbus_probe_backend.c
> > index 7e78ebef7c54..516aa64b9967 100644
> > --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> > @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct 
> > notifier_block *notifier,
> >   static int backend_reclaim_memory(struct device *dev, void *data)
> >   {
> > const struct xenbus_driver *drv;
> > +   struct xenbus_device *xdev;
> > +   unsigned long flags;
> >   
> > if (!dev->driver)
> > return 0;
> > drv = to_xenbus_driver(dev->driver);
> > -   if (drv && drv->reclaim_memory)
> > -   drv->reclaim_memory(to_xenbus_device(dev));
> > +   if (drv && drv->reclaim_memory) {
> > +   xdev = to_xenbus_device(dev);
> > +   spin_trylock_irqsave(&xdev->reclaim_lock, flags);
> 
> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine,
> too? I can't see a reason why you'd want to disable irqs here.

I needed to diable irq here as this is called from the memory shrinker context.

Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver
might include memory allocation.  And the xen-blkback actually does.  If the
allocation shows a memory pressure during the allocation, it will trigger this
shrinker callback again and then deadlock.


Thanks,
SeongJae Park

> 
> > +   drv->reclaim_memory(xdev);
> > +   spin_unlock_irqrestore(&xdev->reclaim_lock, flags);
> > +   }
> > return 0;
> >   }
> >   
> > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
> > index c861cfb6f720..d9468313061d 100644
> > --- a/include/xen/xenbus.h
> > +++ b/include/xen/xenbus.h
> > @@ -76,6 +76,8 @@ struct xenbus_device {
> > enum xenbus_state state;
> > struct completion down;
> > struct work_struct work;
> > +   /* 'reclaim_memory' callback is called while this lock is acquired */
> > +   spinlock_t reclaim_lock;
> >   };
> >   
> >   static inline struct xenbus_device *to_xenbus_device(struct device *dev)
> > 
> 
> 
> Juergen
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is detected

2019-12-17 Thread SeongJae Park
From: SeongJae Park 

I though it would be better to review separated patches, but seems it
was my mistake.  As Juergen asked, merged them again and post here.
Also, dropped Roger's reviewed-by.


Thanks,
SeongJae Park


 >8 ---
Subject: [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is
 detected

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test


To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

pswpin  pswpout
before  76,672  185,799
after  2123,325

Optimal Aggressive Shrinking Duration
-

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

durationpswpin  pswpout
1   852 6,424
10  212 3,325
100 203 3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

# xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

$ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
 bs=4k count=$((256*51

Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock

2019-12-17 Thread SeongJae Park
On Tue, 17 Dec 2019 18:10:19 +0100 "Jürgen Groß"  wrote:

> On 17.12.19 17:24, SeongJae Park wrote:
> > On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß"  wrote:
> > 
> >> On 17.12.19 17:07, SeongJae Park wrote:
> >>> From: SeongJae Park 
> >>>
> >>> 'reclaim_memory' callback can race with a driver code as this callback
> >>> will be called from any memory pressure detected context.  To deal with
> >>> the case, this commit adds a spinlock in the 'xenbus_device'.  Whenever
> >>> 'reclaim_memory' callback is called, the lock of the device which passed
> >>> to the callback as its argument is locked.  Thus, drivers registering
> >>> their 'reclaim_memory' callback should protect the data that might race
> >>> with the callback with the lock by themselves.
> >>>
> >>> Signed-off-by: SeongJae Park 
> >>> ---
> >>>drivers/xen/xenbus/xenbus_probe.c |  1 +
> >>>drivers/xen/xenbus/xenbus_probe_backend.c | 10 --
> >>>include/xen/xenbus.h  |  2 ++
> >>>3 files changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> >>> b/drivers/xen/xenbus/xenbus_probe.c
> >>> index 5b471889d723..b86393f172e6 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe.c
> >>> @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus,
> >>>   goto fail;
> >>>
> >>>   dev_set_name(&xendev->dev, "%s", devname);
> >>> + spin_lock_init(&xendev->reclaim_lock);
> >>>
> >>>   /* Register with generic device framework. */
> >>>   err = device_register(&xendev->dev);
> >>> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
> >>> b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> index 7e78ebef7c54..516aa64b9967 100644
> >>> --- a/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> +++ b/drivers/xen/xenbus/xenbus_probe_backend.c
> >>> @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct 
> >>> notifier_block *notifier,
> >>>static int backend_reclaim_memory(struct device *dev, void *data)
> >>>{
> >>>   const struct xenbus_driver *drv;
> >>> + struct xenbus_device *xdev;
> >>> + unsigned long flags;
> >>>
> >>>   if (!dev->driver)
> >>>   return 0;
> >>>   drv = to_xenbus_driver(dev->driver);
> >>> - if (drv && drv->reclaim_memory)
> >>> - drv->reclaim_memory(to_xenbus_device(dev));
> >>> + if (drv && drv->reclaim_memory) {
> >>> + xdev = to_xenbus_device(dev);
> >>> + spin_trylock_irqsave(&xdev->reclaim_lock, flags);
> >>
> >> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine,
> >> too? I can't see a reason why you'd want to disable irqs here.
> > 
> > I needed to diable irq here as this is called from the memory shrinker 
> > context.
> 
> Okay.
> 
> > 
> > Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver
> > might include memory allocation.  And the xen-blkback actually does.  If the
> > allocation shows a memory pressure during the allocation, it will trigger 
> > this
> > shrinker callback again and then deadlock.
> 
> In that case you need to either return when you didn't get the lock or

Yes, it should.  Cannot believe how I posted this code.  Seems I made some
terrible mistake while formatting patches.  Anyway, will return if fail to
acquire the lock, in the next version.


Thanks,
SeongJae Park

> 
> - when obtaining the lock during probe() and remove() set a variable
>containing the current cpu number
> - and reset that to e.g NR_CPUS before releasing the lock again
> - in the shrinker callback do trylock, and if you didn't get the lock
>test whether the cpu-variable above is set to your current cpu and
>continue only if yes; if not, redo the the trylock
> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >