Re: [Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected
Hello Roger, Sorry if I'm disturbing your vacation. If you are already came back to work, may I ask your opinion about this patch? On Wed, 18 Dec 2019 19:37:16 +0100 SeongJae Park wrote: > From: SeongJae Park > > Each `blkif` has a free pages pool for the grant mapping. The size of > the pool starts from zero and is increased on demand while processing > the I/O requests. If current I/O requests handling is finished or 100 > milliseconds has passed since last I/O requests handling, it checks and > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > Therefore, host administrators can cause memory pressure in blkback by > attaching a large number of block devices and inducing I/O. Such > problematic situations can be avoided by limiting the maximum number of > devices that can be attached, but finding the optimal limit is not so > easy. Improper set of the limit can results in memory pressure or a > resource underutilization. This commit avoids such problematic > situations by squeezing the pools (returns every free page in the pool > to the system) for a while (users can set this duration via a module > parameter) if memory pressure is detected. > > Discussions > === > > The `blkback`'s original shrinking mechanism returns only pages in the > pool which are not currently be used by `blkback` to the system. In > other words, the pages that are not mapped with granted pages. Because > this commit is changing only the shrink limit but still uses the same > freeing mechanism it does not touch pages which are currently mapping > grants. > > Once memory pressure is detected, this commit keeps the squeezing limit > for a user-specified time duration. The duration should be neither too > long nor too short. If it is too long, the squeezing incurring overhead > can reduce the I/O performance. If it is too short, `blkback` will not > free enough pages to reduce the memory pressure. This commit sets the > value as `10 milliseconds` by default because it is a short time in > terms of I/O while it is a long time in terms of memory operations. > Also, as the original shrinking mechanism works for at least every 100 > milliseconds, this could be a somewhat reasonable choice. I also tested > other durations (refer to the below section for more details) and > confirmed that 10 milliseconds is the one that works best with the test. > That said, the proper duration depends on actual configurations and > workloads. That's why this commit allows users to set the duration as a > module parameter. > > Memory Pressure Test > > > To show how this commit fixes the memory pressure situation well, I > configured a test environment on a xen-running virtualization system. > On the `blkfront` running guest instances, I attach a large number of > network-backed volume devices and induce I/O to those. Meanwhile, I > measure the number of pages that swapped in (pswpin) and out (pswpout) > on the `blkback` running guest. The test ran twice, once for the > `blkback` before this commit and once for that after this commit. As > shown below, this commit has dramatically reduced the memory pressure: > > pswpin pswpout > before 76,672 185,799 > after 8673,967 > > Optimal Aggressive Shrinking Duration > - > > To find a best squeezing duration, I repeated the test with three > different durations (1ms, 10ms, and 100ms). The results are as below: > > durationpswpin pswpout > 1 707 5,095 > 10 867 3,967 > 100 362 3,348 > > As expected, the memory pressure decreases as the duration increases, > but the reduction become slow from the `10ms`. Based on this results, I > chose the default duration as 10ms. > > Performance Overhead Test > = > > This commit could incur I/O performance degradation under severe memory > pressure because the squeezing will require more page allocations per > I/O. To show the overhead, I artificially made a worst-case squeezing > situation and measured the I/O performance of a `blkfront` running > guest. > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this > test, I set the value to `1024` and `0`. The `1024` is the default > value. Setting the value as `0` is same to a situation doing the > squeezing always (worst-case). > > If the underlying block device is slow enough, the squeezing overhead > could be hidden. For the reason, I use a fast block device, namely the > rbd[1]: > > # x
Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback
Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' from appropriate maintainers by last Wednesday, and after that, got no comment yet. May I ask some more comments? Thanks, SeongJae Park On Wed, 18 Dec 2019 19:37:13 +0100 SeongJae Park wrote: > Granting pages consumes backend system memory. In systems configured > with insufficient spare memory for those pages, it can cause a memory > pressure situation. However, finding the optimal amount of the spare > memory is challenging for large systems having dynamic resource > utilization patterns. Also, such a static configuration might lack > flexibility. > > To mitigate such problems, this patchset adds a memory reclaim callback > to 'xenbus_driver' (patch 1) and then introduce a lock for race > condition avoidance (patch 2). After that, patch 3 applies the callback > mechanism to mitigate the problem in 'xen-blkback'. The fourth and > fifth patches are trivial cleanups; those fix nits we found during the > development of this patchset. > > Note that patches 1, 4, and 5 are not changed since v9. > > > Base Version > > > This patch is based on v5.4. A complete tree is also available at my > public git repo: > https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v13 > > > Patch History > - > > Changes from v12 > (https://lore.kernel.org/xen-devel/20191218104232.9606-1-sjp...@amazon.com/) > - Do not unnecessarily disable interrupts (suggested by Juergen) > - Hold lock from xenbus side (suggested by Juergen) > > Changes from v11 > (https://lore.kernel.org/xen-devel/20191217160748.693-2-sjp...@amazon.com/) > - Fix wrong trylock use (reported by Juergen) > - Merge patch 3 and 4 (suggested by Juergen) > - Update test result > > Changes from v10 > (https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/) > - Fix race condition (reported by SeongJae, suggested by Juergen) > > Changes from v9 > (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/) > - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné > - Update the commit message for overhead test of the 2nd path > > Changes from v8 > (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/) > - Drop 'Reviewed-by: Juergen' from the second patch >(suggested by Roger Pau Monné) > - Update contact of the new module param to SeongJae Park > >(suggested by Roger Pau Monné) > - Wordsmith the description of the parameter >(suggested by Roger Pau Monné) > - Fix dumb bugs >(suggested by Roger Pau Monné) > - Move module param definition to xenbus.c and reduce the number of >lines for this change >(suggested by Roger Pau Monné) > - Add a comment for the new callback, reclaim_memory, as other >callbacks also have > - Add another trivial cleanup of xenbus.c file (4th patch) > > Changes from v7 > (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) > - Update sysfs-driver-xen-blkback for new parameter >(suggested by Roger Pau Monné) > - Use per-xen_blkif buffer_squeeze_end instead of global variable >(suggested by Roger Pau Monné) > > Changes from v6 > (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) > - Remove more unnecessary prefixes (suggested by Roger Pau Monné) > - Constify a variable (suggested by Roger Pau Monné) > - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) > - More wordsmith of the commit message (suggested by Roger Pau Monné) > > Changes from v5 > (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) > - Wordsmith the commit messages (suggested by Roger Pau Monné) > - Change the reclaim callback return type (suggested by Roger Pau >Monné) > - Change the type of the blkback squeeze duration variable >(suggested by Roger Pau Monné) > - Add a patch for removal of unnecessary static variable name prefixes >(suggested by Roger Pau Monné) > - Fix checkpatch.pl warnings > > Changes from v4 > (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) > - Remove domain id parameter from the callback (suggested by Juergen >Gross) > - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) > > Changes from v3 > (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) > - Add general callback in xen_driver and use it (suggested by Juergen >Gross) > > Changes from v2 > (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3
Re: [Xen-devel] [PATCH v13 0/5] xenbus/backend: Add memory pressure handler callback
On Mon, 13 Jan 2020 10:55:07 +0100 "Roger Pau Monné" wrote: > On Mon, Jan 13, 2020 at 10:49:52AM +0100, SeongJae Park wrote: > > Every patch of this patchset got at least one 'Reviewed-by' or 'Acked-by' > > from > > appropriate maintainers by last Wednesday, and after that, got no comment > > yet. > > May I ask some more comments? > > I'm not sure why more comments are needed, patches have all the > relevant Acks and will be pushed in due time unless someone has > objections. > > Please be patient and wait at least until the next merge window, this > patches are not bug fixes so pushing them now would be wrong. Ok, I will. Thank you for your quick and nice reply. Thanks, SeongJae Park > > Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants
From: SeongJae Park Persistent grants feature provides high scalability. On some small systems, however, it could incur data copy overheads[1] and thus it is required to be disabled. But, there is no option to disable it. For the reason, this commit adds a module parameter for disabling of the feature. [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability Signed-off-by: Anthony Liguori Signed-off-by: SeongJae Park --- .../ABI/testing/sysfs-driver-xen-blkback | 9 ++ drivers/block/xen-blkback/xenbus.c| 28 ++- 2 files changed, 30 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index ecb7942ff146..ac2947b98950 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -35,3 +35,12 @@ Description: controls the duration in milliseconds that blkback will not cache any page not backed by a grant mapping. The default is 10ms. + +What: /sys/module/xen_blkback/parameters/feature_persistent +Date: September 2020 +KernelVersion: 5.10 +Contact:SeongJae Park +Description: +Whether to enable the persistent grants feature or not. Note +that this option only takes effect on newly created backends. +The default is Y (enable). diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b9aa5d1ac10b..8a95ddd08b13 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev) /* ** Connection ** */ +/* Enable the persistent grants feature. */ +static bool feature_persistent = true; +module_param(feature_persistent, bool, 0644); +MODULE_PARM_DESC(feature_persistent, + "Enables the persistent grants feature"); + /* * Write the physical details regarding the block device to the store, and * switch to Connected state. @@ -906,11 +912,15 @@ static void connect(struct backend_info *be) xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1); - if (err) { - xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", -dev->nodename); - goto abort; + if (feature_persistent) { + err = xenbus_printf(xbt, dev->nodename, "feature-persistent", + "%u", feature_persistent); + if (err) { + xenbus_dev_fatal(dev, err, + "writing %s/feature-persistent", + dev->nodename); + goto abort; + } } err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be) xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); return -ENOSYS; } - pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent", - 0); + if (feature_persistent) + pers_grants = xenbus_read_unsigned(dev->otherend, + "feature-persistent", 0); + else + pers_grants = 0; + blkif->vbd.feature_gnt_persistent = pers_grants; blkif->vbd.overflow_max_grants = 0; -- 2.17.1
[PATCH v2 3/3] xen-blkfront: Apply changed parameter name to the document
From: SeongJae Park Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor parameter") changed the name of the module parameter for the maximum amount of segments in indirect requests but missed updating the document. This commit updates the document. Signed-off-by: SeongJae Park --- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 9c31334cb2e6..28008905615f 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -1,4 +1,4 @@ -What: /sys/module/xen_blkfront/parameters/max +What: /sys/module/xen_blkfront/parameters/max_indirect_segments Date: June 2013 KernelVersion: 3.11 Contact:Konrad Rzeszutek Wilk -- 2.17.1
[PATCH v2 0/3] xen-blk(back|front): Let users disable persistent grants
From: SeongJae Park Persistent grants feature provides high scalability. On some small systems, however, it could incur data copy overheads[1] and thus it is required to be disabled. But, there is no option to disable it. For the reason, this commit adds module parameters for disabling of the feature. [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability Baseline and Complete Git Trees === The patches are based on the v5.9-rc6. You can also clone the complete git tree: $ git clone git://github.com/sjp38/linux -b pgrants_disable_v2 The web is also available: https://github.com/sjp38/linux/tree/pgrants_disable_v2 Patch History = Changes from v1 (https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/) - use 'bool' parameter type (Jürgen Groß) - Let blkfront can also disable the feature from its side (Roger Pau Monné) - Avoid unnecessary xenbus_printf (Roger Pau Monné) - Update frontend parameter doc SeongJae Park (3): xen-blkback: add a parameter for disabling of persistent grants xen-blkfront: add a parameter for disabling of persistent grants xen-blkfront: Apply changed parameter name to the document .../ABI/testing/sysfs-driver-xen-blkback | 9 ++ .../ABI/testing/sysfs-driver-xen-blkfront | 11 +++- drivers/block/xen-blkback/xenbus.c| 28 ++- drivers/block/xen-blkfront.c | 28 +-- 4 files changed, 60 insertions(+), 16 deletions(-) -- 2.17.1
Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants
On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné" wrote: > On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote: > > From: SeongJae Park > > > > Persistent grants feature provides high scalability. On some small > > systems, however, it could incur data copy overheads[1] and thus it is > > required to be disabled. But, there is no option to disable it. For > > the reason, this commit adds a module parameter for disabling of the > > feature. > > > > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability > > > > Signed-off-by: Anthony Liguori > > Signed-off-by: SeongJae Park > > --- > > .../ABI/testing/sysfs-driver-xen-blkback | 9 ++ > > drivers/block/xen-blkback/xenbus.c| 28 ++- > > 2 files changed, 30 insertions(+), 7 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > index ecb7942ff146..ac2947b98950 100644 > > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > @@ -35,3 +35,12 @@ Description: > > controls the duration in milliseconds that blkback will not > > cache any page not backed by a grant mapping. > > The default is 10ms. > > + > > +What: /sys/module/xen_blkback/parameters/feature_persistent > > +Date: September 2020 > > +KernelVersion: 5.10 > > +Contact:SeongJae Park > > +Description: > > +Whether to enable the persistent grants feature or not. > > Note > > +that this option only takes effect on newly created > > backends. > > +The default is Y (enable). > > diff --git a/drivers/block/xen-blkback/xenbus.c > > b/drivers/block/xen-blkback/xenbus.c > > index b9aa5d1ac10b..8a95ddd08b13 100644 > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev) > > > > /* ** Connection ** */ > > > > +/* Enable the persistent grants feature. */ > > +static bool feature_persistent = true; > > +module_param(feature_persistent, bool, 0644); > > +MODULE_PARM_DESC(feature_persistent, > > + "Enables the persistent grants feature"); > > + > > /* > > * Write the physical details regarding the block device to the store, and > > * switch to Connected state. > > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be) > > > > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); > > > > - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1); > > - if (err) { > > - xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", > > -dev->nodename); > > - goto abort; > > + if (feature_persistent) { > > + err = xenbus_printf(xbt, dev->nodename, "feature-persistent", > > + "%u", feature_persistent); > > + if (err) { > > + xenbus_dev_fatal(dev, err, > > + "writing %s/feature-persistent", > > + dev->nodename); > > + goto abort; > > + } > > } > > > > err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", > > @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be) > > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); > > return -ENOSYS; > > } > > - pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent", > > - 0); > > + if (feature_persistent) > > + pers_grants = xenbus_read_unsigned(dev->otherend, > > + "feature-persistent", 0); > > + else > > + pers_grants = 0; > > + > > Sorry for not realizing earlier, but looking at it again I think you > need to cache the value of feature_persistent when it's first used in > the blkback state data, so that it's consistent. > > What would happen for example with the following flow (assume a > persistent grants enabled frontend): > > feature_persistent = false > > connect(...) > feature-persistent is not written to xenstore > > User changes feature_persistent = true > > connect_ring(...) > pers_grants = true, because feature-persistent is set unconditionally > by the frontend and feature_persistent variable is now true. > > Then blkback will try to use persistent grants and the whole > connection will malfunction because the frontend won't. Ah, you're right. I should also catch this before but didn't, sorry. > > The other option is to prevent changing the variable when there are > blkback instances already running. I think storing the option value in xenstore would be simpler. That said, if you prefer this way, please let me know. Thanks, SeongJae Park
Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants
On Tue, 22 Sep 2020 13:35:11 +0200 "Roger Pau Monné" wrote: > On Tue, Sep 22, 2020 at 01:26:38PM +0200, SeongJae Park wrote: > > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné" > > wrote: > > > > > On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote: > > > > From: SeongJae Park > > > > > > > > Persistent grants feature provides high scalability. On some small > > > > systems, however, it could incur data copy overheads[1] and thus it is > > > > required to be disabled. But, there is no option to disable it. For > > > > the reason, this commit adds a module parameter for disabling of the > > > > feature. > > > > > > > > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability > > > > > > > > Signed-off-by: Anthony Liguori > > > > Signed-off-by: SeongJae Park > > > > --- > > > > .../ABI/testing/sysfs-driver-xen-blkback | 9 ++ > > > > drivers/block/xen-blkback/xenbus.c| 28 ++- > > > > 2 files changed, 30 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > > > b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > > > index ecb7942ff146..ac2947b98950 100644 > > > > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > > > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > > > @@ -35,3 +35,12 @@ Description: > > > > controls the duration in milliseconds that blkback > > > > will not > > > > cache any page not backed by a grant mapping. > > > > The default is 10ms. > > > > + > > > > +What: /sys/module/xen_blkback/parameters/feature_persistent > > > > +Date: September 2020 > > > > +KernelVersion: 5.10 > > > > +Contact:SeongJae Park > > > > +Description: > > > > +Whether to enable the persistent grants feature or > > > > not. Note > > > > +that this option only takes effect on newly created > > > > backends. > > > > +The default is Y (enable). > > > > diff --git a/drivers/block/xen-blkback/xenbus.c > > > > b/drivers/block/xen-blkback/xenbus.c > > > > index b9aa5d1ac10b..8a95ddd08b13 100644 > > > > --- a/drivers/block/xen-blkback/xenbus.c > > > > +++ b/drivers/block/xen-blkback/xenbus.c > > > > @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device > > > > *dev) > > > > > > > > /* ** Connection ** */ > > > > > > > > +/* Enable the persistent grants feature. */ > > > > +static bool feature_persistent = true; > > > > +module_param(feature_persistent, bool, 0644); > > > > +MODULE_PARM_DESC(feature_persistent, > > > > + "Enables the persistent grants feature"); > > > > + > > > > /* > > > > * Write the physical details regarding the block device to the store, > > > > and > > > > * switch to Connected state. > > > > @@ -906,11 +912,15 @@ static void connect(struct backend_info *be) > > > > > > > > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); > > > > > > > > - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", > > > > "%u", 1); > > > > - if (err) { > > > > - xenbus_dev_fatal(dev, err, "writing > > > > %s/feature-persistent", > > > > -dev->nodename); > > > > - goto abort; > > > > + if (feature_persistent) { > > > > + err = xenbus_printf(xbt, dev->nodename, > > > > "feature-persistent", > > > > + "%u", feature_persistent); > > > > + if (err) { > > > > + xenbus_dev_fatal(dev, err, > > > > + "writing %s/feature-persistent", > > > > + dev->nodename); > > > > + goto abort; > > > > + } > > > >
Re: [PATCH v2 1/3] xen-blkback: add a parameter for disabling of persistent grants
On Tue, 22 Sep 2020 13:35:30 +0200 "Jürgen Groß" wrote: > On 22.09.20 13:26, SeongJae Park wrote: > > On Tue, 22 Sep 2020 13:12:59 +0200 "Roger Pau Monné" > > wrote: > > > >> On Tue, Sep 22, 2020 at 12:52:07PM +0200, SeongJae Park wrote: > >>> From: SeongJae Park > >>> > >>> Persistent grants feature provides high scalability. On some small > >>> systems, however, it could incur data copy overheads[1] and thus it is > >>> required to be disabled. But, there is no option to disable it. For > >>> the reason, this commit adds a module parameter for disabling of the > >>> feature. > >>> > >>> [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability > >>> > >>> Signed-off-by: Anthony Liguori > >>> Signed-off-by: SeongJae Park > >>> --- > >>> .../ABI/testing/sysfs-driver-xen-blkback | 9 ++ > >>> drivers/block/xen-blkback/xenbus.c| 28 ++- > >>> 2 files changed, 30 insertions(+), 7 deletions(-) > >>> > >>> diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback > >>> b/Documentation/ABI/testing/sysfs-driver-xen-blkback > >>> index ecb7942ff146..ac2947b98950 100644 > >>> --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > >>> +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > >>> @@ -35,3 +35,12 @@ Description: > >>> controls the duration in milliseconds that blkback will > >>> not > >>> cache any page not backed by a grant mapping. > >>> The default is 10ms. > >>> + > >>> +What: /sys/module/xen_blkback/parameters/feature_persistent > >>> +Date: September 2020 > >>> +KernelVersion: 5.10 > >>> +Contact:SeongJae Park > >>> +Description: > >>> +Whether to enable the persistent grants feature or not. > >>> Note > >>> +that this option only takes effect on newly created > >>> backends. > >>> +The default is Y (enable). > >>> diff --git a/drivers/block/xen-blkback/xenbus.c > >>> b/drivers/block/xen-blkback/xenbus.c > >>> index b9aa5d1ac10b..8a95ddd08b13 100644 > >>> --- a/drivers/block/xen-blkback/xenbus.c > >>> +++ b/drivers/block/xen-blkback/xenbus.c > >>> @@ -879,6 +879,12 @@ static void reclaim_memory(struct xenbus_device *dev) > >>> > >>> /* ** Connection ** */ > >>> > >>> +/* Enable the persistent grants feature. */ > >>> +static bool feature_persistent = true; > >>> +module_param(feature_persistent, bool, 0644); > >>> +MODULE_PARM_DESC(feature_persistent, > >>> + "Enables the persistent grants feature"); > >>> + > >>> /* > >>>* Write the physical details regarding the block device to the store, > >>> and > >>>* switch to Connected state. > >>> @@ -906,11 +912,15 @@ static void connect(struct backend_info *be) > >>> > >>> xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); > >>> > >>> - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1); > >>> - if (err) { > >>> - xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", > >>> - dev->nodename); > >>> - goto abort; > >>> + if (feature_persistent) { > >>> + err = xenbus_printf(xbt, dev->nodename, "feature-persistent", > >>> + "%u", feature_persistent); > >>> + if (err) { > >>> + xenbus_dev_fatal(dev, err, > >>> + "writing %s/feature-persistent", > >>> + dev->nodename); > >>> + goto abort; > >>> + } > >>> } > >>> > >>> err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", > >>> @@ -1093,8 +1103,12 @@ static int connect_ring(struct backend_info *be) > >>> xenbus_dev_fatal(dev, err, "unknown fe protocol %s", > >>> protoco
Re: [PATCH v2 2/3] xen-blkfront: add a parameter for disabling of persistent grants
On Tue, 22 Sep 2020 14:11:32 +0200 "Jürgen Groß" wrote: > On 22.09.20 12:52, SeongJae Park wrote: > > From: SeongJae Park > > > > Persistent grants feature provides high scalability. On some small > > systems, however, it could incur data copy overheads[1] and thus it is > > required to be disabled. It can be disabled from blkback side using a > > module parameter, 'feature_persistent'. But, it is impossible from > > blkfront side. For the reason, this commit adds a blkfront module > > parameter for disabling of the feature. > > > > [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability > > > > Signed-off-by: SeongJae Park > > --- > > .../ABI/testing/sysfs-driver-xen-blkfront | 9 ++ > > drivers/block/xen-blkfront.c | 28 +-- > > 2 files changed, 29 insertions(+), 8 deletions(-) > > [...] > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > > index 91de2e0755ae..49c324f377de 100644 > > --- a/drivers/block/xen-blkfront.c > > +++ b/drivers/block/xen-blkfront.c > > @@ -149,6 +149,13 @@ static unsigned int xen_blkif_max_ring_order; > > module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, > > 0444); > > MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used > > for the shared ring"); > > > > +/* Enable the persistent grants feature. */ > > +static bool feature_persistent = true; > > +module_param(feature_persistent, bool, 0644); > > +MODULE_PARM_DESC(feature_persistent, > > + "Enables the persistent grants feature"); > > + > > + > > #define BLK_RING_SIZE(info) \ > > __CONST_RING_SIZE(blkif, XEN_PAGE_SIZE * (info)->nr_ring_pages) > > > > @@ -1866,11 +1873,13 @@ static int talk_to_blkback(struct xenbus_device > > *dev, > > message = "writing protocol"; > > goto abort_transaction; > > } > > - err = xenbus_printf(xbt, dev->nodename, > > - "feature-persistent", "%u", 1); > > - if (err) > > - dev_warn(&dev->dev, > > -"writing persistent grants feature to xenbus"); > > + if (feature_persistent) { > > + err = xenbus_printf(xbt, dev->nodename, > > + "feature-persistent", "%u", 1); > > + if (err) > > + dev_warn(&dev->dev, > > +"writing persistent grants feature to xenbus"); > > + } > > > > err = xenbus_transaction_end(xbt, 0); > > if (err) { > > @@ -2316,9 +2325,12 @@ static void blkfront_gather_backend_features(struct > > blkfront_info *info) > > if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) > > blkfront_setup_discard(info); > > > > - info->feature_persistent = > > - !!xenbus_read_unsigned(info->xbdev->otherend, > > - "feature-persistent", 0); > > + if (feature_persistent) > > + info->feature_persistent = > > + !!xenbus_read_unsigned(info->xbdev->otherend, > > + "feature-persistent", 0); > > + else > > + info->feature_persistent = 0; > > > > indirect_segments = xenbus_read_unsigned(info->xbdev->otherend, > > "feature-max-indirect-segments", 0); > > > > Here you have the same problem as in blkback: feature_persistent could > change its value between the two tests. Yes, indeed. I will fix this in the next version. Thanks, SeongJae Park
[PATCH v3 0/3] xen-blk(back|front): Let users disable persistent grants
From: SeongJae Park Persistent grants feature provides high scalability. On some small systems, however, it could incur data copy overheads[1] and thus it is required to be disabled. But, there is no option to disable it. For the reason, this commit adds module parameters for disabling of the feature. [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability Baseline and Complete Git Trees === The patches are based on the v5.9-rc6. You can also clone the complete git tree: $ git clone git://github.com/sjp38/linux -b pgrants_disable_v3 The web is also available: https://github.com/sjp38/linux/tree/pgrants_disable_v3 Patch History = Changes from v2 (https://lore.kernel.org/linux-block/20200922105209.5284-1-sjp...@amazon.com/) - Avoid race conditions (Roger Pau Monné) Changes from v1 (https://lore.kernel.org/linux-block/20200922070125.27251-1-sjp...@amazon.com/) - use 'bool' parameter type (Jürgen Groß) - Let blkfront can also disable the feature from its side (Roger Pau Monné) - Avoid unnecessary xenbus_printf (Roger Pau Monné) - Update frontend parameter doc SeongJae Park (3): xen-blkback: add a parameter for disabling of persistent grants xen-blkfront: add a parameter for disabling of persistent grants xen-blkfront: Apply changed parameter name to the document .../ABI/testing/sysfs-driver-xen-blkback | 9 .../ABI/testing/sysfs-driver-xen-blkfront | 11 +- drivers/block/xen-blkback/xenbus.c| 22 ++- drivers/block/xen-blkfront.c | 20 - 4 files changed, 50 insertions(+), 12 deletions(-) -- 2.17.1
Re: [PATCH v3 3/3] xen-blkfront: Apply changed parameter name to the document
On Tue, 22 Sep 2020 16:44:25 +0200 "Roger Pau Monné" wrote: > On Tue, Sep 22, 2020 at 04:27:39PM +0200, Jürgen Groß wrote: > > On 22.09.20 16:15, SeongJae Park wrote: > > > From: SeongJae Park > > > > > > Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor > > > parameter") changed the name of the module parameter for the maximum > > > amount of segments in indirect requests but missed updating the > > > document. This commit updates the document. > > > > > > Signed-off-by: SeongJae Park > > > > Reviewed-by: Juergen Gross > > Does this need to be backported to stable branches? I don't think so, as this is not a bug affecting users? Thanks, SeongJae Park
[PATCH v4 1/3] xen-blkback: add a parameter for disabling of persistent grants
From: SeongJae Park Persistent grants feature provides high scalability. On some small systems, however, it could incur data copy overheads[1] and thus it is required to be disabled. But, there is no option to disable it. For the reason, this commit adds a module parameter for disabling of the feature. [1] https://wiki.xen.org/wiki/Xen_4.3_Block_Protocol_Scalability Signed-off-by: Anthony Liguori Signed-off-by: SeongJae Park Reviewed-by: Juergen Gross Acked-by: Roger Pau Monné --- .../ABI/testing/sysfs-driver-xen-blkback | 9 drivers/block/xen-blkback/xenbus.c| 22 ++- 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index ecb7942ff146..ac2947b98950 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -35,3 +35,12 @@ Description: controls the duration in milliseconds that blkback will not cache any page not backed by a grant mapping. The default is 10ms. + +What: /sys/module/xen_blkback/parameters/feature_persistent +Date: September 2020 +KernelVersion: 5.10 +Contact:SeongJae Park +Description: +Whether to enable the persistent grants feature or not. Note +that this option only takes effect on newly created backends. +The default is Y (enable). diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b9aa5d1ac10b..8fc34211dc8b 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -474,6 +474,12 @@ static void xen_vbd_free(struct xen_vbd *vbd) vbd->bdev = NULL; } +/* Enable the persistent grants feature. */ +static bool feature_persistent = true; +module_param(feature_persistent, bool, 0644); +MODULE_PARM_DESC(feature_persistent, + "Enables the persistent grants feature"); + static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, unsigned major, unsigned minor, int readonly, int cdrom) @@ -519,6 +525,8 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, if (q && blk_queue_secure_erase(q)) vbd->discard_secure = true; + vbd->feature_gnt_persistent = feature_persistent; + pr_debug("Successful creation of handle=%04x (dom=%u)\n", handle, blkif->domid); return 0; @@ -906,7 +914,8 @@ static void connect(struct backend_info *be) xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support); - err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", 1); + err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u", + be->blkif->vbd.feature_gnt_persistent); if (err) { xenbus_dev_fatal(dev, err, "writing %s/feature-persistent", dev->nodename); @@ -1067,7 +1076,6 @@ static int connect_ring(struct backend_info *be) { struct xenbus_device *dev = be->dev; struct xen_blkif *blkif = be->blkif; - unsigned int pers_grants; char protocol[64] = ""; int err, i; char *xspath; @@ -1093,9 +1101,11 @@ static int connect_ring(struct backend_info *be) xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); return -ENOSYS; } - pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent", - 0); - blkif->vbd.feature_gnt_persistent = pers_grants; + if (blkif->vbd.feature_gnt_persistent) + blkif->vbd.feature_gnt_persistent = + xenbus_read_unsigned(dev->otherend, + "feature-persistent", 0); + blkif->vbd.overflow_max_grants = 0; /* @@ -1118,7 +1128,7 @@ static int connect_ring(struct backend_info *be) pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename, blkif->nr_rings, blkif->blk_protocol, protocol, -pers_grants ? "persistent grants" : ""); +blkif->vbd.feature_gnt_persistent ? "persistent grants" : ""); ring_page_order = xenbus_read_unsigned(dev->otherend, "ring-page-order", 0); -- 2.17.1
[PATCH v4 3/3] xen-blkfront: Apply changed parameter name to the document
From: SeongJae Park Commit 14e710fe7897 ("xen-blkfront: rename indirect descriptor parameter") changed the name of the module parameter for the maximum amount of segments in indirect requests but missed updating the document. This commit updates the document. Signed-off-by: SeongJae Park Reviewed-by: Juergen Gross Acked-by: Roger Pau Monné --- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 9c31334cb2e6..28008905615f 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -1,4 +1,4 @@ -What: /sys/module/xen_blkfront/parameters/max +What: /sys/module/xen_blkfront/parameters/max_indirect_segments Date: June 2013 KernelVersion: 3.11 Contact:Konrad Rzeszutek Wilk -- 2.17.1
Re: [PATCH] xen-blkback: add a parameter for disabling of persistent grants
On Wed, 23 Sep 2020 16:09:30 -0400 Konrad Rzeszutek Wilk wrote: > On Tue, Sep 22, 2020 at 09:01:25AM +0200, SeongJae Park wrote: > > From: SeongJae Park > > > > Persistent grants feature provides high scalability. On some small > > systems, however, it could incur data copy overhead[1] and thus it is > > required to be disabled. But, there is no option to disable it. For > > the reason, this commit adds a module parameter for disabling of the > > feature. > > Would it be better suited to have it per guest? The latest version of this patchset[1] supports blkfront side disablement. Could that partially solves your concern? [1] https://lore.kernel.org/xen-devel/20200923061841.20531-1-sjp...@amazon.com/ Thanks, SeongJae Park
Re: [PATCH 1/2] xen/blkback: turn the cache purge LRU interval into a parameter
On Thu, 15 Oct 2020 16:24:15 +0200 Roger Pau Monne wrote: > Assume that reads and writes to the variable will be atomic. The worse > that could happen is that one of the LRU intervals is not calculated > properly if a partially written value is read, but that would only be > a transient issue. > > Signed-off-by: Roger Pau Monné > --- > Cc: Konrad Rzeszutek Wilk > Cc: Jens Axboe > Cc: Boris Ostrovsky > Cc: SeongJae Park > Cc: xen-devel@lists.xenproject.org > Cc: linux-bl...@vger.kernel.org > Cc: J. Roeleveld > Cc: Jürgen Groß > --- > Documentation/ABI/testing/sysfs-driver-xen-blkback | 10 ++ > drivers/block/xen-blkback/blkback.c| 9 ++--- > 2 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback > b/Documentation/ABI/testing/sysfs-driver-xen-blkback > index ecb7942ff146..776f25d335ca 100644 > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > @@ -35,3 +35,13 @@ Description: > controls the duration in milliseconds that blkback will not > cache any page not backed by a grant mapping. > The default is 10ms. > + > +What: /sys/module/xen_blkback/parameters/lru_internval > +Date: October 2020 > +KernelVersion: 5.10 > +Contact:Roger Pau Monné > +Description: > +The LRU mechanism to clean the lists of persistent grants > needs > +to be executed periodically. This parameter controls the time > +interval between consecutive executions of the purge > mechanism > +is set in ms. I think noticing the default value (100ms) here would be better. Thanks, SeongJae Park
Re: [PATCH linux-next] drivers/xen/xenbus/xenbus_client.c: fix bugon.cocci warnings
From: SeongJae Park On Tue, 24 Aug 2021 23:24:51 -0700 CGEL wrote: > From: Jing Yangyang > > Use BUG_ON instead of a if condition followed by BUG. > > Generated by: scripts/coccinelle/misc/bugon.cocci > > Reported-by: Zeal Robot > Signed-off-by: Jing Yangyang Reviewed-by: SeongJae Park Thanks, SJ [...]
[PATCH] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent
SeongJae is currently listed as a contact point for some blk{back,front} features, but he will not work for XEN for a while. This commit therefore updates the contact point to his colleague, Maximilian, who is understanding the context and actively working with the features now. Signed-off-by: SeongJae Park Signed-off-by: Maximilian Heyne --- Documentation/ABI/testing/sysfs-driver-xen-blkback | 4 ++-- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index a74dfe52dd76..7faf719af165 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -29,7 +29,7 @@ Description: What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms Date: December 2019 KernelVersion: 5.6 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: When memory pressure is reported to blkback this option controls the duration in milliseconds that blkback will not @@ -39,7 +39,7 @@ Description: What: /sys/module/xen_blkback/parameters/feature_persistent Date: September 2020 KernelVersion: 5.10 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note that this option only takes effect on newly created backends. diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 61fd173fabfe..7f646c58832e 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -12,7 +12,7 @@ Description: What: /sys/module/xen_blkfront/parameters/feature_persistent Date: September 2020 KernelVersion: 5.10 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note that this option only takes effect on newly created frontends. -- 2.17.1
Re: [PATCH] xen, blkback: fix persistent grants negotiation
From: SeongJae Park On Thu, 6 Jan 2022 09:10:13 + Maximilian Heyne wrote: > Given dom0 supports persistent grants but the guest does not. > Then, when attaching a block device during runtime of the guest, dom0 > will enable persistent grants for this newly attached block device: > > $ xenstore-ls -f | grep 20674 | grep persistent > /local/domain/0/backend/vbd/20674/768/feature-persistent = "0" > /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1" > > Here disk 768 was attached during guest creation while 51792 was > attached at runtime. If the guest would have advertised the persistent > grant feature, there would be a xenstore entry like: > > /local/domain/20674/device/vbd/51792/feature-persistent = "1" > > Persistent grants are also used when the guest tries to access the disk > which can be seen when enabling log stats: > > $ echo 1 > /sys/module/xen_blkback/parameters/log_stats > $ dmesg > xen-blkback: (20674.xvdf-0): oo 0 | rd0 | wr0 | f0 | > ds0 | pg:1/1056 > > The "pg: 1/1056" shows that one persistent grant is used. > > Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling > of persistent grants") vbd->feature_gnt_persistent was set in > connect_ring. After the commit it was intended to be initialized in > xen_vbd_create and then set according to the guest feature availability > in connect_ring. However, with a running guest, connect_ring might be > called before xen_vbd_create and vbd->feature_gnt_persistent will be > incorrectly initialized. xen_vbd_create will overwrite it with the value > of feature_persistent regardless whether the guest actually supports > persistent grants. > > With this commit, vbd->feature_gnt_persistent is set only in > connect_ring and this is the only use of the module parameter > feature_persistent. This avoids races when the module parameter changes > during the block attachment process. > > Note that vbd->feature_gnt_persistent doesn't need to be initialized in > xen_vbd_create. It's next use is in connect which can only be called > once connect_ring has initialized the rings. xen_update_blkif_status is > checking for this. > > Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of > persistent grants") > Signed-off-by: Maximilian Heyne Thank you for this patch! Reviewed-by: SeongJae Park Also, I guess this tag is needed? Cc: # 5.10.x Thanks, SJ > --- > drivers/block/xen-blkback/xenbus.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/block/xen-blkback/xenbus.c > b/drivers/block/xen-blkback/xenbus.c > index 914587aabca0c..51b6ec0380ca4 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, > blkif_vdev_t handle, > if (q && blk_queue_secure_erase(q)) > vbd->discard_secure = true; > > - vbd->feature_gnt_persistent = feature_persistent; > - > pr_debug("Successful creation of handle=%04x (dom=%u)\n", > handle, blkif->domid); > return 0; > @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be) > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); > return -ENOSYS; > } > - if (blkif->vbd.feature_gnt_persistent) > - blkif->vbd.feature_gnt_persistent = > - xenbus_read_unsigned(dev->otherend, > - "feature-persistent", 0); > + > + blkif->vbd.feature_gnt_persistent = feature_persistent && > + xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); > > blkif->vbd.overflow_max_grants = 0; > > -- > 2.32.0 > > > > > Amazon Development Center Germany GmbH > Krausenstr. 38 > 10117 Berlin > Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss > Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B > Sitz: Berlin > Ust-ID: DE 289 237 879 >
Re: [PATCH] xen, blkback: fix persistent grants negotiation
On Tue, 11 Jan 2022 13:26:50 +0100 "Roger Pau Monné" wrote: > On Tue, Jan 11, 2022 at 11:50:32AM +, Durrant, Paul wrote: > > On 11/01/2022 11:11, Roger Pau Monné wrote: > > > On Thu, Jan 06, 2022 at 09:10:13AM +, Maximilian Heyne wrote: > > > > Given dom0 supports persistent grants but the guest does not. > > > > Then, when attaching a block device during runtime of the guest, dom0 > > > > will enable persistent grants for this newly attached block device: > > > > > > > >$ xenstore-ls -f | grep 20674 | grep persistent > > > >/local/domain/0/backend/vbd/20674/768/feature-persistent = "0" > > > >/local/domain/0/backend/vbd/20674/51792/feature-persistent = "1" > > > > > > The mechanism that we use to advertise persistent grants support is > > > wrong. 'feature-persistent' should always be set if the backend > > > supports persistent grant (like it's done for other features in > > > xen_blkbk_probe). The usage of the feature depends on whether both > > > parties support persistent grants, and the xenstore entry printed by > > > blkback shouldn't reflect whether persistent grants are in use, but > > > rather whether blkback supports the feature. > > > > > > > > > > > Here disk 768 was attached during guest creation while 51792 was > > > > attached at runtime. If the guest would have advertised the persistent > > > > grant feature, there would be a xenstore entry like: > > > > > > > >/local/domain/20674/device/vbd/51792/feature-persistent = "1" > > > > > > > > Persistent grants are also used when the guest tries to access the disk > > > > which can be seen when enabling log stats: > > > > > > > >$ echo 1 > /sys/module/xen_blkback/parameters/log_stats > > > >$ dmesg > > > >xen-blkback: (20674.xvdf-0): oo 0 | rd0 | wr0 | f > > > > 0 | ds0 | pg:1/1056 > > > > > > > > The "pg: 1/1056" shows that one persistent grant is used. > > > > > > > > Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling > > > > of persistent grants") vbd->feature_gnt_persistent was set in > > > > connect_ring. After the commit it was intended to be initialized in > > > > xen_vbd_create and then set according to the guest feature availability > > > > in connect_ring. However, with a running guest, connect_ring might be > > > > called before xen_vbd_create and vbd->feature_gnt_persistent will be > > > > incorrectly initialized. xen_vbd_create will overwrite it with the value > > > > of feature_persistent regardless whether the guest actually supports > > > > persistent grants. > > > > > > > > With this commit, vbd->feature_gnt_persistent is set only in > > > > connect_ring and this is the only use of the module parameter > > > > feature_persistent. This avoids races when the module parameter changes > > > > during the block attachment process. > > > > > > > > Note that vbd->feature_gnt_persistent doesn't need to be initialized in > > > > xen_vbd_create. It's next use is in connect which can only be called > > > > once connect_ring has initialized the rings. xen_update_blkif_status is > > > > checking for this. > > > > > > > > Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of > > > > persistent grants") > > > > Signed-off-by: Maximilian Heyne > > > > --- > > > > drivers/block/xen-blkback/xenbus.c | 9 +++-- > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/drivers/block/xen-blkback/xenbus.c > > > > b/drivers/block/xen-blkback/xenbus.c > > > > index 914587aabca0c..51b6ec0380ca4 100644 > > > > --- a/drivers/block/xen-blkback/xenbus.c > > > > +++ b/drivers/block/xen-blkback/xenbus.c > > > > @@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, > > > > blkif_vdev_t handle, > > > > if (q && blk_queue_secure_erase(q)) > > > > vbd->discard_secure = true; > > > > - vbd->feature_gnt_persistent = feature_persistent; > > > > - > > > > pr_debug("Successful creation of handle=%04x (dom=%u)\n", > > > > handle, blkif->domid); > > > > return 0; > > > > @@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be) > > > > xenbus_dev_fatal(dev, err, "unknown fe protocol %s", > > > > protocol); > > > > return -ENOSYS; > > > > } > > > > - if (blkif->vbd.feature_gnt_persistent) > > > > - blkif->vbd.feature_gnt_persistent = > > > > - xenbus_read_unsigned(dev->otherend, > > > > - "feature-persistent", 0); > > > > + > > > > + blkif->vbd.feature_gnt_persistent = feature_persistent && > > > > + xenbus_read_unsigned(dev->otherend, > > > > "feature-persistent", 0); > > > > > > I'm not sure it's correct to potentially read feature_persistent > > > multiple times like it's done here. > > > > > > A device can be disconnected and re-attached multiple times, and that > > > implies multiple
[PATCH v2] xen-blkback: fix persistent grants negotiation
Persistent grants feature can be used only when both backend and the frontend supports the feature. The feature was always supported by 'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") has introduced a parameter for disabling it runtime. To avoid the parameter be updated while being used by 'blkback', the commit caches the parameter into 'vbd->feature_gnt_persistent' in 'xen_vbd_create()', and then check if the guest also supports the feature and finally updates the field in 'connect_ring()'. However, 'connect_ring()' could be called before 'xen_vbd_create()', so later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to 'vbd->feature_gnt_persistent'. As a result, 'blkback' could try to use 'persistent grants' feature even if the guest doesn't support the feature. This commit fixes the issue by moving the parameter value caching to 'xen_blkif_alloc()', which allocates the 'blkif'. Because the struct embeds 'vbd' object, which will be used by 'connect_ring()' later, this should be called before 'connect_ring()' and therefore this should be the right and safe place to do the caching. Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") Cc: # 5.10.x Signed-off-by: Maximilian Heyne Signed-off-by: SeongJae Park --- Changes from v1[1] - Avoid the behavioral change[2] - Rebase on latest xen/tip/linux-next - Re-work by SeongJae Park - Cc stable@ [1] https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/ [2] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/ drivers/block/xen-blkback/xenbus.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 97de13b14175..16c6785d260c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif) return 0; } +/* Enable the persistent grants feature. */ +static bool feature_persistent = true; +module_param(feature_persistent, bool, 0644); +MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature"); + static struct xen_blkif *xen_blkif_alloc(domid_t domid) { struct xen_blkif *blkif; @@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid) __module_get(THIS_MODULE); INIT_WORK(&blkif->free_work, xen_blkif_deferred_free); + blkif->vbd.feature_gnt_persistent = feature_persistent; + return blkif; } @@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd) vbd->bdev = NULL; } -/* Enable the persistent grants feature. */ -static bool feature_persistent = true; -module_param(feature_persistent, bool, 0644); -MODULE_PARM_DESC(feature_persistent, - "Enables the persistent grants feature"); - static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, unsigned major, unsigned minor, int readonly, int cdrom) @@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, if (bdev_max_secure_erase_sectors(bdev)) vbd->discard_secure = true; - vbd->feature_gnt_persistent = feature_persistent; - pr_debug("Successful creation of handle=%04x (dom=%u)\n", handle, blkif->domid); return 0; -- 2.25.1
Re: [PATCH v2] xen-blkback: fix persistent grants negotiation
Hello, Oleksandr, thank you for Cc-ing Andrii. Andrii, thank you for the comment! On Fri, 15 Jul 2022 15:00:10 +0300 Andrii Chepurnyi wrote: > [-- Attachment #1: Type: text/plain, Size: 5237 bytes --] > > Hello All, > > I faced the mentioned issue recently and just to bring more context here is > our setup: > We use pvblock backend for Android guest. It starts using u-boot with > pvblock support(which frontend doesn't support the persistent grants > feature), later it loads and starts the Linux kernel(which frontend > supports the persistent grants feature). So in total, we have sequent two > different frontends reconnection, the first of which doesn't support > persistent grants. > So the original patch [1] perfectly solves the original issue and provides > the ability to use persistent grants after the reconnection when Linux > frontend which supports persistent grants comes into play. > At the same time [2] will disable the persistent grants feature for the > first and second frontend. Thank you for this great explanation of your situation. > Is it possible to keep [1] as is? Yes, my concerns about Max's original patch[1] are conflicting behavior description in the document[1] and different behavior on blkfront-side 'feature_persistent' parameter. I will post Max's patch again with patches for blkfront behavior change and Documents updates. [1] https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/ Thanks, SJ > > [1] > https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/ > [2] https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/ > > Best regards, > Andrii > > On Fri, Jul 15, 2022 at 1:15 PM Oleksandr wrote: > > > > > On 15.07.22 01:44, SeongJae Park wrote: > > > > > > Hello all. > > > > Adding Andrii Chepurnyi to CC who have played with the use-case which > > required reconnect recently and faced some issues with > > feature_persistent handling. [...]
[PATCH 2/2] xen-blkfront: Apply 'feature_persistent' parameter when connect
Previous commit made xen-blkback's 'feature_persistent' parameter to make effect for not only newly created backends but also for every reconnected backends. This commit makes xen-blkfront's counterpart parameter to works in same manner and update the document to avoid any confusion due to inconsistent behavior of same-named parameters. Cc: # 5.10.x Signed-off-by: SeongJae Park --- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- drivers/block/xen-blkfront.c| 4 +--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 7f646c58832e..4d36c5a10546 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -15,5 +15,5 @@ KernelVersion: 5.10 Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note -that this option only takes effect on newly created frontends. +that this option only takes effect on newly connected frontends. The default is Y (enable). diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 3646c0cae672..4e763701b372 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev, info->vdevice = vdevice; info->connected = BLKIF_STATE_DISCONNECTED; - info->feature_persistent = feature_persistent; - /* Front end dir is a number, which is used as the id. */ info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0); dev_set_drvdata(&dev->dev, info); @@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) blkfront_setup_discard(info); - if (info->feature_persistent) + if (feature_persistent) info->feature_persistent = !!xenbus_read_unsigned(info->xbdev->otherend, "feature-persistent", 0); -- 2.25.1
[PATCH v3 0/2] Fix persistent grants negotiation with a behavior change
The first patch of this patchset fixes 'feature_persistent' parameter handling in 'blkback' to respect the frontend's persistent grants support always. The fix makes a behavioral change, so the second patch makes the counterpart of 'blkfront' to consistently follow the behavior change. Changes from v2 (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/) - Keep the behavioral change of v1 - Update blkfront's counterpart to follow the changed behavior - Update documents for the changed behavior Changes from v1 (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/) - Avoid the behavioral change (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/) - Rebase on latest xen/tip/linux-next - Re-work by SeongJae Park - Cc stable@ Maximilian Heyne (1): xen, blkback: fix persistent grants negotiation SeongJae Park (1): xen-blkfront: Apply 'feature_persistent' parameter when connect Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- drivers/block/xen-blkback/xenbus.c | 9 +++-- drivers/block/xen-blkfront.c| 4 +--- 4 files changed, 6 insertions(+), 11 deletions(-) -- 2.25.1
[PATCH 1/2] xen, blkback: fix persistent grants negotiation
From: Maximilian Heyne Given dom0 supports persistent grants but the guest does not. Then, when attaching a block device during runtime of the guest, dom0 will enable persistent grants for this newly attached block device: $ xenstore-ls -f | grep 20674 | grep persistent /local/domain/0/backend/vbd/20674/768/feature-persistent = "0" /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1" Here disk 768 was attached during guest creation while 51792 was attached at runtime. If the guest would have advertised the persistent grant feature, there would be a xenstore entry like: /local/domain/20674/device/vbd/51792/feature-persistent = "1" Persistent grants are also used when the guest tries to access the disk which can be seen when enabling log stats: $ echo 1 > /sys/module/xen_blkback/parameters/log_stats $ dmesg xen-blkback: (20674.xvdf-0): oo 0 | rd0 | wr0 | f0 | ds 0 | pg:1/1056 The "pg: 1/1056" shows that one persistent grant is used. Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") vbd->feature_gnt_persistent was set in connect_ring. After the commit it was intended to be initialized in xen_vbd_create and then set according to the guest feature availability in connect_ring. However, with a running guest, connect_ring might be called before xen_vbd_create and vbd->feature_gnt_persistent will be incorrectly initialized. xen_vbd_create will overwrite it with the value of feature_persistent regardless whether the guest actually supports persistent grants. With this commit, vbd->feature_gnt_persistent is set only in connect_ring and this is the only use of the module parameter feature_persistent. This avoids races when the module parameter changes during the block attachment process. Note that vbd->feature_gnt_persistent doesn't need to be initialized in xen_vbd_create. It's next use is in connect which can only be called once connect_ring has initialized the rings. xen_update_blkif_status is checking for this. Please also note that this commit makes a behavioral change of the parameter. That is, the parameter made effect on only newly created backends before this commit, but it makes the effect for every new connection after this commit. Therefore, this commit also updates the document. Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") Cc: # 5.10.x Signed-off-by: Maximilian Heyne Signed-off-by: SeongJae Park --- Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +- drivers/block/xen-blkback/xenbus.c | 9 +++-- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index 7faf719af165..fac0f429a869 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -42,5 +42,5 @@ KernelVersion: 5.10 Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note -that this option only takes effect on newly created backends. +that this option only takes effect on newly connected backends. The default is Y (enable). diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 97de13b14175..874b846fb622 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -520,8 +520,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, if (bdev_max_secure_erase_sectors(bdev)) vbd->discard_secure = true; - vbd->feature_gnt_persistent = feature_persistent; - pr_debug("Successful creation of handle=%04x (dom=%u)\n", handle, blkif->domid); return 0; @@ -1087,10 +1085,9 @@ static int connect_ring(struct backend_info *be) xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); return -ENOSYS; } - if (blkif->vbd.feature_gnt_persistent) - blkif->vbd.feature_gnt_persistent = - xenbus_read_unsigned(dev->otherend, - "feature-persistent", 0); + + blkif->vbd.feature_gnt_persistent = feature_persistent && + xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); blkif->vbd.overflow_max_grants = 0; -- 2.25.1
Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change
Hi all, On Fri, 15 Jul 2022 17:55:19 + SeongJae Park wrote: > The first patch of this patchset fixes 'feature_persistent' parameter > handling in 'blkback' to respect the frontend's persistent grants > support always. The fix makes a behavioral change, so the second patch > makes the counterpart of 'blkfront' to consistently follow the behavior > change. I made the behavior change as requested by Andrii[1]. I therefore made similar behavior change to blkfront and Cc-ed stable for the second change, too. To make the change history clear and reduce the stable side overhead, however, it might be better to apply the v2, which don't make behavior change but only fix the issue, Cc stable@ for it, make the behavior change commits for both blkback and blkfront, update the documents, and don't Cc stable@ for the behavior change and documents update commits. One downside of that would be that it will make a behavioral difference in pre-5.19.x and post-5.19.x. I think both downsides are not critical, so posted this patchset in this shape. If anyone prefer some changes, please let me know, though. [1] https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/ Thanks, SJ > > Changes from v2 > (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/) > - Keep the behavioral change of v1 > - Update blkfront's counterpart to follow the changed behavior > - Update documents for the changed behavior > > Changes from v1 > (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/) > - Avoid the behavioral change > (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/) > - Rebase on latest xen/tip/linux-next > - Re-work by SeongJae Park > - Cc stable@ > > > > Maximilian Heyne (1): > xen, blkback: fix persistent grants negotiation > > SeongJae Park (1): > xen-blkfront: Apply 'feature_persistent' parameter when connect > > Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +- > Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- > drivers/block/xen-blkback/xenbus.c | 9 +++-- > drivers/block/xen-blkfront.c| 4 +--- > 4 files changed, 6 insertions(+), 11 deletions(-) > > -- > 2.25.1
Re: [PATCH v3 0/2] Fix persistent grants negotiation with a behavior change
Hi all, On Fri, 15 Jul 2022 18:12:26 + SeongJae Park wrote: > Hi all, > > On Fri, 15 Jul 2022 17:55:19 + SeongJae Park wrote: > > > The first patch of this patchset fixes 'feature_persistent' parameter > > handling in 'blkback' to respect the frontend's persistent grants > > support always. The fix makes a behavioral change, so the second patch > > makes the counterpart of 'blkfront' to consistently follow the behavior > > change. > > I made the behavior change as requested by Andrii[1]. I therefore made > similar > behavior change to blkfront and Cc-ed stable for the second change, too. Now I realize that commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") introduced two issues. One is what Max reported with his patch, and the second one is unintended behavioral change that broke Andrii's use case. That is, Andrii's use case should had no problem at all before the introduction of 'feature_persistent', as at that time 'blkback' checked if the frontend support the persistent grants for every 'reconnect()' and enables it if so. However, introduction of the parameter made it behaves differently. Yes, we intended to make the prameter to make effects to newly created devices. But, as it breaks user workflows, this should be fixed. Same for the 'blkfront' side 'feature_persistent'. > > To make the change history clear and reduce the stable side overhead, however, > it might be better to apply the v2, which don't make behavior change but only > fix the issue, Cc stable@ for it, make the behavior change commits for both > blkback and blkfront, update the documents, and don't Cc stable@ for the > behavior change and documents update commits. I'd say having one patch for each issue would be the right way to go, and all fixes should Cc stable@. > > One downside of that would be that it will make a behavioral difference in > pre-5.19.x and post-5.19.x. The unintended behavioral fix should also be considered fix and therefore should be merged into stable@, so above concern is not valid. I will send the next spin soon. Thanks, SJ [...]
[PATCH v4 2/3] xen-blkback: Apply 'feature_persistent' parameter when connect
From: Maximilian Heyne In some use cases[1], the backend is created while the frontend doesn't support the persistent grants feature, but later the frontend can be changed to support the feature and reconnect. In the past, 'blkback' enabled the persistent grants feature since it unconditionally checked if frontend supports the persistent grants feature for every connect ('connect_ring()') and decided whether it should use persistent grans or not. However, commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") has mistakenly changed the behavior. It made the frontend feature support check to not be repeated once it shown the 'feature_persistent' as 'false', or the frontend doesn't support persistent grants. This commit changes the behavior of the parameter to make effect for every connect, so that the previous workflow can work again as expected. [1] https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/ Reported-by: Andrii Chepurnyi Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") Cc: # 5.10.x Signed-off-by: Maximilian Heyne Signed-off-by: SeongJae Park --- Documentation/ABI/testing/sysfs-driver-xen-blkback | 2 +- drivers/block/xen-blkback/xenbus.c | 9 +++-- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index 7faf719af165..fac0f429a869 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -42,5 +42,5 @@ KernelVersion: 5.10 Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note -that this option only takes effect on newly created backends. +that this option only takes effect on newly connected backends. The default is Y (enable). diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 16c6785d260c..ee7ad2fb432d 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -186,8 +186,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid) __module_get(THIS_MODULE); INIT_WORK(&blkif->free_work, xen_blkif_deferred_free); - blkif->vbd.feature_gnt_persistent = feature_persistent; - return blkif; } @@ -1086,10 +1084,9 @@ static int connect_ring(struct backend_info *be) xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol); return -ENOSYS; } - if (blkif->vbd.feature_gnt_persistent) - blkif->vbd.feature_gnt_persistent = - xenbus_read_unsigned(dev->otherend, - "feature-persistent", 0); + + blkif->vbd.feature_gnt_persistent = feature_persistent && + xenbus_read_unsigned(dev->otherend, "feature-persistent", 0); blkif->vbd.overflow_max_grants = 0; -- 2.25.1
[PATCH v4 3/3] xen-blkfront: Apply 'feature_persistent' parameter when connect
In some use cases[1], the backend is created while the frontend doesn't support the persistent grants feature, but later the frontend can be changed to support the feature and reconnect. In the past, 'blkback' enabled the persistent grants feature since it unconditionally checked if frontend supports the persistent grants feature for every connect ('connect_ring()') and decided whether it should use persistent grans or not. However, commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") has mistakenly changed the behavior. It made the frontend feature support check to not be repeated once it shown the 'feature_persistent' as 'false', or the frontend doesn't support persistent grants. Similar behavioral change has made on 'blkfront' by commit 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent grants"). This commit changes the behavior of the parameter to make effect for every connect, so that the previous behavior of 'blkfront' can be restored. [1] https://lore.kernel.org/xen-devel/CAJwUmVB6H3iTs-C+U=v-pwJB7-_ZRHPxHzKRJZ22xEPW7z8a=g...@mail.gmail.com/ Fixes: 74a852479c68 ("xen-blkfront: add a parameter for disabling of persistent grants") Cc: # 5.10.x Signed-off-by: SeongJae Park --- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- drivers/block/xen-blkfront.c| 4 +--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 7f646c58832e..4d36c5a10546 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -15,5 +15,5 @@ KernelVersion: 5.10 Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note -that this option only takes effect on newly created frontends. +that this option only takes effect on newly connected frontends. The default is Y (enable). diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 3646c0cae672..4e763701b372 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -1988,8 +1988,6 @@ static int blkfront_probe(struct xenbus_device *dev, info->vdevice = vdevice; info->connected = BLKIF_STATE_DISCONNECTED; - info->feature_persistent = feature_persistent; - /* Front end dir is a number, which is used as the id. */ info->handle = simple_strtoul(strrchr(dev->nodename, '/')+1, NULL, 0); dev_set_drvdata(&dev->dev, info); @@ -2283,7 +2281,7 @@ static void blkfront_gather_backend_features(struct blkfront_info *info) if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0)) blkfront_setup_discard(info); - if (info->feature_persistent) + if (feature_persistent) info->feature_persistent = !!xenbus_read_unsigned(info->xbdev->otherend, "feature-persistent", 0); -- 2.25.1
[PATCH v4 1/3] xen-blkback: fix persistent grants negotiation
Persistent grants feature can be used only when both backend and the frontend supports the feature. The feature was always supported by 'blkback', but commit aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") has introduced a parameter for disabling it runtime. To avoid the parameter be updated while being used by 'blkback', the commit caches the parameter into 'vbd->feature_gnt_persistent' in 'xen_vbd_create()', and then check if the guest also supports the feature and finally updates the field in 'connect_ring()'. However, 'connect_ring()' could be called before 'xen_vbd_create()', so later execution of 'xen_vbd_create()' can wrongly overwrite 'true' to 'vbd->feature_gnt_persistent'. As a result, 'blkback' could try to use 'persistent grants' feature even if the guest doesn't support the feature. This commit fixes the issue by moving the parameter value caching to 'xen_blkif_alloc()', which allocates the 'blkif'. Because the struct embeds 'vbd' object, which will be used by 'connect_ring()' later, this should be called before 'connect_ring()' and therefore this should be the right and safe place to do the caching. Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants") Cc: # 5.10.x Signed-off-by: Maximilian Heyne Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 97de13b14175..16c6785d260c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -157,6 +157,11 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif) return 0; } +/* Enable the persistent grants feature. */ +static bool feature_persistent = true; +module_param(feature_persistent, bool, 0644); +MODULE_PARM_DESC(feature_persistent, "Enables the persistent grants feature"); + static struct xen_blkif *xen_blkif_alloc(domid_t domid) { struct xen_blkif *blkif; @@ -181,6 +186,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid) __module_get(THIS_MODULE); INIT_WORK(&blkif->free_work, xen_blkif_deferred_free); + blkif->vbd.feature_gnt_persistent = feature_persistent; + return blkif; } @@ -472,12 +479,6 @@ static void xen_vbd_free(struct xen_vbd *vbd) vbd->bdev = NULL; } -/* Enable the persistent grants feature. */ -static bool feature_persistent = true; -module_param(feature_persistent, bool, 0644); -MODULE_PARM_DESC(feature_persistent, - "Enables the persistent grants feature"); - static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, unsigned major, unsigned minor, int readonly, int cdrom) @@ -520,8 +521,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, if (bdev_max_secure_erase_sectors(bdev)) vbd->discard_secure = true; - vbd->feature_gnt_persistent = feature_persistent; - pr_debug("Successful creation of handle=%04x (dom=%u)\n", handle, blkif->domid); return 0; -- 2.25.1
[PATCH v4 0/3] xen-blk{back,front}: Fix two bugs in 'feature_persistent'
Introduction of 'feature_persistent' made two bugs. First one is wrong overwrite of 'vbd->feature_gnt_persistent' in 'blkback' due to wrong parameter value caching position, and the second one is unintended behavioral change that could break previous dynamic frontend/backend persistent feature support changes. This patchset fixes the issues. Changes from v3 (https://lore.kernel.org/xen-devel/20220715175521.126649-1...@kernel.org/) - Split 'blkback' patch for each of the two issues - Add 'Reported-by: Andrii Chepurnyi ' Changes from v2 (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/) - Keep the behavioral change of v1 - Update blkfront's counterpart to follow the changed behavior - Update documents for the changed behavior Changes from v1 (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/) - Avoid the behavioral change (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/) - Rebase on latest xen/tip/linux-next - Re-work by SeongJae Park - Cc stable@ Maximilian Heyne (1): xen-blkback: Apply 'feature_persistent' parameter when connect SeongJae Park (2): xen-blkback: fix persistent grants negotiation xen-blkfront: Apply 'feature_persistent' parameter when connect .../ABI/testing/sysfs-driver-xen-blkback | 2 +- .../ABI/testing/sysfs-driver-xen-blkfront | 2 +- drivers/block/xen-blkback/xenbus.c| 20 --- drivers/block/xen-blkfront.c | 4 +--- 4 files changed, 11 insertions(+), 17 deletions(-) -- 2.25.1
[PATCH v2] xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent
SeongJae is currently listed as a contact point for some blk{back,front} features, but he will not work for XEN for a while. This commit therefore updates the contact point to his colleague, Maximilian, who is understanding the context and actively working with the features now. Signed-off-by: SeongJae Park Signed-off-by: Maximilian Heyne Acked-by: Roger Pau Monné --- Changes from v1 (https://lore.kernel.org/xen-devel/20220301144628.2858-1...@kernel.org/) - Add Acked-by from Roger Documentation/ABI/testing/sysfs-driver-xen-blkback | 4 ++-- Documentation/ABI/testing/sysfs-driver-xen-blkfront | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index a74dfe52dd76..7faf719af165 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -29,7 +29,7 @@ Description: What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms Date: December 2019 KernelVersion: 5.6 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: When memory pressure is reported to blkback this option controls the duration in milliseconds that blkback will not @@ -39,7 +39,7 @@ Description: What: /sys/module/xen_blkback/parameters/feature_persistent Date: September 2020 KernelVersion: 5.10 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note that this option only takes effect on newly created backends. diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkfront b/Documentation/ABI/testing/sysfs-driver-xen-blkfront index 61fd173fabfe..7f646c58832e 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkfront +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkfront @@ -12,7 +12,7 @@ Description: What: /sys/module/xen_blkfront/parameters/feature_persistent Date: September 2020 KernelVersion: 5.10 -Contact:SeongJae Park +Contact:Maximilian Heyne Description: Whether to enable the persistent grants feature or not. Note that this option only takes effect on newly created frontends. -- 2.25.1
[Xen-devel] [PATCH] xen/blkback: Avoid unmapping unmapped grant pages
From: SeongJae Park For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0001 [ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [0.00] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Signed-off-by: SeongJae Park Reviewed-by: David Woodhouse Reviewed-by: Maximilian Heyne Reviewed-by: Paul Durrant --- drivers/block/xen-blkback/blkback.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..3666afa639d1 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -936,6 +936,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, out_of_memory: pr_alert("%s: out of memory\n", __func__); put_free_pages(ring, pages_to_gnt, segs_to_map); + for (i = last_map; i < num; i++) + pages[i]->handle = BLKBACK_INVALID_HANDLE; return -ENOMEM; } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 0/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected
isible performance degradation. I think this is due to the slow speed of the I/O. In other words, the additional page allocation overhead is hidden under the much slower I/O time. SeongJae Park (2): xen/blkback: Aggressively shrink page pools if a memory pressure is detected blkback: Add a module parameter for aggressive pool shrinking duration drivers/block/xen-blkback/blkback.c | 35 +++-- 1 file changed, 33 insertions(+), 2 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, `blkfront` running guests can cause a memory pressure in the `blkback` running guest by attaching arbitrarily large number of block devices and inducing I/O. This commit avoids such problematic situations by shrinking the pools aggressively (further the limit) for a while (one millisecond) if a memory pressure is detected. Discussions === The shrinking mechanism returns only pages in the pool which are not currently be used by blkback. In other words, the pages that will be shrunk are not mapped with foreign pages. Because this commit is changing only the shrink limit but uses the shrinking mechanism as is, this commit does not introduce security issues such as improper unmappings. This commit keeps the aggressive shrinking limit for one milisecond from last memory pressure detected time. The duration should be neither too short nor too long. If it is too long, free pages pool shrinking overhead can reduce the I/O performance. If it is too short, blkback will not free enough pages to reduce the memory pressure. I believe that one millisecond is a short duration in terms of I/O while it is a long duration in terms of memory operations. Also, as the original shrinking mechanism works for every 100 milliseconds, this 1 millisecond could be a somewhat reasonable choice. Also, this duration worked well for our testing environment simulating the memory pressure situation (will be described in detail below). Memory Pressure Test To show whether this commit fixes the above mentioned memory pressure situation well, I configured a test environment. On the `blkfront` running guest instances of a virtualized environment, I attach arbitrarily large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in and out on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. Roughly speaking, this commit has reduced those numbers 130x (pswpin) and 34x (pswpout) as below: pswpin pswpout before 76,672 185,799 after 5875,402 Performance Overhead Test = This commit could incur I/O performance degradation under memory pressure because the aggressive shrinking will require more page allocations. To show the overhead, I artificially made an aggressive pages pool shrinking situation and measured the I/O performance of a `blkfront` running guest. For the artificial shrinking, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` incurs the worst-case aggressive shrinking stress. For the I/O performance measurement, I use a simple `dd` command. Default Performance --- [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s Worst-case Performance -- [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s In short, even worst case aggressive pools shrinking makes no visible performance degradation. I think this is due to the slow speed of the I/O. In other words, the additional page allocation overhead is hidden under the much sl
[Xen-devel] [PATCH 2/2] blkback: Add a module parameter for aggressive pool shrinking duration
From: SeongJae Park As discussed by the previous commit ("xen/blkback: Aggressively shrink page pools if a memory pressure is detected"), the aggressive pool shrinking duration should be carefully selected: ``If it is too long, free pages pool shrinking overhead can reduce the I/O performance. If it is too short, blkback will not free enough pages to reduce the memory pressure.`` That said, the proper duration would depends on given configurations and workloads. For the reason, this commit allows users to set it via a module parameter interface. Signed-off-by: SeongJae Park Suggested-by: Amit Shah --- drivers/block/xen-blkback/blkback.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index aa1a127093e5..88c011300ee9 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -137,9 +137,13 @@ module_param(log_stats, int, 0644); /* * Once a memory pressure is detected, keep aggressive shrinking of the free - * page pools for this time (msec) + * page pools for this time (milliseconds) */ -#define AGGRESSIVE_SHRINKING_DURATION 1 +static int xen_blkif_aggressive_shrinking_duration = 1; +module_param_named(aggressive_shrinking_duration, + xen_blkif_aggressive_shrinking_duration, int, 0644); +MODULE_PARM_DESC(aggressive_shrinking_duration, +"Duration to do aggressive shrinking when a memory pressure is detected"); static unsigned long xen_blk_mem_pressure_end; @@ -147,7 +151,7 @@ static unsigned long blkif_shrink_count(struct shrinker *shrinker, struct shrink_control *sc) { xen_blk_mem_pressure_end = jiffies + - msecs_to_jiffies(AGGRESSIVE_SHRINKING_DURATION); + msecs_to_jiffies(xen_blkif_aggressive_shrinking_duration); return 0; } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, `blkfront` running guests can cause a memory pressure in the `blkback` running guest by attaching a large number of block devices and inducing I/O. System administrators can avoid such problematic situations by limiting the maximum number of devices each guest can attach. However, finding the optimal limit is not so easy. Improper set of the limit can results in the memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if a memory pressure is detected. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_aggressive_shrinking_v3 Patch History - Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (1): xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 35 +++-- 1 file changed, 33 insertions(+), 2 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, `blkfront` running guests can cause a memory pressure in the `blkback` running guest by attaching a large number of block devices and inducing I/O. System administrators can avoid such problematic situations by limiting the maximum number of devices each guest can attach. However, finding the optimal limit is not so easy. Improper set of the limit can results in the memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if a memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool, which are not currently be used by `blkback`, to the system. In other words, the pages are not mapped with foreign pages. Because this commit is changing only the shrink limit but uses the mechanism as is, this commit does not introduce improper mappings related security issues. Once a memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit is allowing users to set it as their optimal value via the module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in and out on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). For the I/O performance measurement, I use a simple `dd` command. Default Performance --- [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s 131072+0 records in 131072+0 records out 5368709
Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
On Mon, 9 Dec 2019 10:39:02 +0100 Juergen wrote: >On 09.12.19 09:58, SeongJae Park wrote: >> Each `blkif` has a free pages pool for the grant mapping. The size of >> the pool starts from zero and be increased on demand while processing >> the I/O requests. If current I/O requests handling is finished or 100 >> milliseconds has passed since last I/O requests handling, it checks and >> shrinks the pool to not exceed the size limit, `max_buffer_pages`. >> >> Therefore, `blkfront` running guests can cause a memory pressure in the >> `blkback` running guest by attaching a large number of block devices and >> inducing I/O. > >I'm having problems to understand how a guest can attach a large number >of block devices without those having been configured by the host admin >before. > >If those devices have been configured, dom0 should be ready for that >number of devices, e.g. by having enough spare memory area for ballooned >pages. As mentioned in the original message as below, administrators _can_ avoid this problem, but finding the optimal configuration is hard, especially if the number of the guests is large. System administrators can avoid such problematic situations by limiting the maximum number of devices each guest can attach. However, finding the optimal limit is not so easy. Improper set of the limit can results in the memory pressure or a resource underutilization. Thanks, SeongJae Park > >So either I'm missing something here or your reasoning for the need of >the patch is wrong. > > >Juergen > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß" wrote: >On 09.12.19 10:46, Durrant, Paul wrote: >>> -Original Message- >>> From: Jürgen Groß >>> Sent: 09 December 2019 09:39 >>> To: Park, Seongjae ; ax...@kernel.dk; >>> konrad.w...@oracle.com; roger@citrix.com >>> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Durrant, >>> Paul ; sj38.p...@gmail.com; xen- >>> de...@lists.xenproject.org >>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory >>> pressure >>> >>> On 09.12.19 09:58, SeongJae Park wrote: >>>> Each `blkif` has a free pages pool for the grant mapping. The size of >>>> the pool starts from zero and be increased on demand while processing >>>> the I/O requests. If current I/O requests handling is finished or 100 >>>> milliseconds has passed since last I/O requests handling, it checks and >>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`. >>>> >>>> Therefore, `blkfront` running guests can cause a memory pressure in the >>>> `blkback` running guest by attaching a large number of block devices and >>>> inducing I/O. >>> >>> I'm having problems to understand how a guest can attach a large number >>> of block devices without those having been configured by the host admin >>> before. >>> >>> If those devices have been configured, dom0 should be ready for that >>> number of devices, e.g. by having enough spare memory area for ballooned >>> pages. >>> >>> So either I'm missing something here or your reasoning for the need of >>> the patch is wrong. >>> >> >> I think the underlying issue is that persistent grant support is hogging >> memory in the backends, thereby compromising scalability. IIUC this patch is >> essentially a band-aid to get back to the scalability that was possible >> before persistent grant support was added. Ultimately the right answer >> should be to get rid of persistent grants support and use grant copy, but >> such a change is clearly more invasive and would need far more testing. > >Persistent grants are hogging ballooned pages, which is equivalent to >memory only in case of the backend's domain memory being equal or >rather near to its max memory size. > >So configuring the backend domain with enough spare area for ballooned >pages should make this problem much less serious. > >Another problem in this area is the amount of maptrack frames configured >for a driver domain, which will limit the number of concurrent foreign >mappings of that domain. Right, similar problems from other backends are possible. > >So instead of having a blkback specific solution I'd rather have a >common callback for backends to release foreign mappings in order to >enable a global resource management. This patch is also based on a common callback, namely the shrinker callback system. As the shrinker callback is designed for the general memory pressure handling, I thought this is a right one to use. Other backends having similar problems can use this in their way. Thanks, SeongJae Park > > >Juergen > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
On Mon, 9 Dec 2019 12:08:10 +0100 "Jürgen Groß" wrote: >On 09.12.19 11:52, SeongJae Park wrote: >> On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß" wrote: >> >>> On 09.12.19 10:46, Durrant, Paul wrote: >>>>> -Original Message- >>>>> From: Jürgen Groß >>>>> Sent: 09 December 2019 09:39 >>>>> To: Park, Seongjae ; ax...@kernel.dk; >>>>> konrad.w...@oracle.com; roger@citrix.com >>>>> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Durrant, >>>>> Paul ; sj38.p...@gmail.com; xen- >>>>> de...@lists.xenproject.org >>>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory >>>>> pressure >>>>> >>>>> On 09.12.19 09:58, SeongJae Park wrote: >>>>>> Each `blkif` has a free pages pool for the grant mapping. The size of >>>>>> the pool starts from zero and be increased on demand while processing >>>>>> the I/O requests. If current I/O requests handling is finished or 100 >>>>>> milliseconds has passed since last I/O requests handling, it checks and >>>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`. >>>>>> >>>>>> Therefore, `blkfront` running guests can cause a memory pressure in the >>>>>> `blkback` running guest by attaching a large number of block devices and >>>>>> inducing I/O. >>>>> >>>>> I'm having problems to understand how a guest can attach a large number >>>>> of block devices without those having been configured by the host admin >>>>> before. >>>>> >>>>> If those devices have been configured, dom0 should be ready for that >>>>> number of devices, e.g. by having enough spare memory area for ballooned >>>>> pages. >>>>> >>>>> So either I'm missing something here or your reasoning for the need of >>>>> the patch is wrong. >>>>> >>>> >>>> I think the underlying issue is that persistent grant support is hogging >>>> memory in the backends, thereby compromising scalability. IIUC this patch >>>> is essentially a band-aid to get back to the scalability that was possible >>>> before persistent grant support was added. Ultimately the right answer >>>> should be to get rid of persistent grants support and use grant copy, but >>>> such a change is clearly more invasive and would need far more testing. >>> >>> Persistent grants are hogging ballooned pages, which is equivalent to >>> memory only in case of the backend's domain memory being equal or >>> rather near to its max memory size. >>> >>> So configuring the backend domain with enough spare area for ballooned >>> pages should make this problem much less serious. >>> >>> Another problem in this area is the amount of maptrack frames configured >>> for a driver domain, which will limit the number of concurrent foreign >>> mappings of that domain. >> >> Right, similar problems from other backends are possible. >> >>> >>> So instead of having a blkback specific solution I'd rather have a >>> common callback for backends to release foreign mappings in order to >>> enable a global resource management. >> >> This patch is also based on a common callback, namely the shrinker callback >> system. As the shrinker callback is designed for the general memory pressure >> handling, I thought this is a right one to use. Other backends having >> similar >> problems can use this in their way. > > But this is addressing memory shortage only and it is acting globally. > > What I'd like to have in some (maybe distant) future is a way to control > resource usage per guest. Why would you want to throttle performance of > all guests instead of only the one causing the pain by hogging lots of > resources? Good point. I was also concerned about the performance fairness at first, but settled in this ugly but simple solution mainly because my worst-case performance test (detailed in 1st patch's commit msg) shows no visible performance degradation, though it is a minimal test on my test environment. Anyway, I agree with your future direction. > > The new backend callback should (IMO) have a domid as parameter for > specifying which guest should be taken away resources (including the > possibility to select "any d
[Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback
From: SeongJae Park Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lacks a flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. Using this facility, 'xenbus' would be able to monitor a memory pressure and request specific domains of specific backend drivers which causing the given pressure to voluntarily release its memory. That said, this commit simply requests every callback registered driver to release its memory for every domain, rather than issueing the requests to the drivers and domain in charge. Such things would be a future work. Also, this commit focuses on memory only. However, it would be ablt to be extended for general resources. Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 2 files changed, 32 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..cd5fd1cd8de3 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int xenbus_backend_reclaim(struct device *dev, void *data) +{ + struct xenbus_driver *drv; + if (!dev->driver) + return -ENOENT; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim) + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + xenbus_backend_reclaim); + return 0; +} + +static struct shrinker xenbus_backend_shrinker = { + .count_objects = xenbus_backend_shrink_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&xenbus_backend_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..52aaf4f78400 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v4 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and be increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, `blkfront` running guests can cause a memory pressure in the `blkback` running guest by attaching a large number of block devices and inducing I/O. System administrators can avoid such problematic situations by limiting the maximum number of devices each guest can attach. However, finding the optimal limit is not so easy. Improper set of the limit can results in the memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if a memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool, which are not currently be used by `blkback`, to the system. In other words, the pages are not mapped with foreign pages. Because this commit is changing only the shrink limit but uses the mechanism as is, this commit does not introduce improper mappings related security issues. Once a memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit is allowing users to set it as their optimal value via the module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in and out on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). For the I/O performance measurement, I use a simple `dd` command. Default Performance --- [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s 131072+0 records in 131072+0 records out 5368709
[Xen-devel] [PATCH v4 0/2] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lacks a flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v4 Patch History - Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (1): xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 35 +++-- 1 file changed, 33 insertions(+), 2 deletions(-) SeongJae Park (2): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 23 +++-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 3 ++- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback
On Tue, Dec 10, 2019 at 7:11 AM Jürgen Groß wrote: > > On 09.12.19 20:43, SeongJae Park wrote: > > From: SeongJae Park > > > > Granting pages consumes backend system memory. In systems configured > > with insufficient spare memory for those pages, it can cause a memory > > pressure situation. However, finding the optimal amount of the spare > > memory is challenging for large systems having dynamic resource > > utilization patterns. Also, such a static configuration might lacks a > > flexibility. > > > > To mitigate such problems, this commit adds a memory reclaim callback to > > 'xenbus_driver'. Using this facility, 'xenbus' would be able to monitor > > a memory pressure and request specific domains of specific backend > > drivers which causing the given pressure to voluntarily release its > > memory. > > > > That said, this commit simply requests every callback registered driver > > to release its memory for every domain, rather than issueing the > > requests to the drivers and domain in charge. Such things would be a > > future work. Also, this commit focuses on memory only. However, it > > would be ablt to be extended for general resources. > > > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ > > include/xen/xenbus.h | 1 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index b0bed4faf44c..cd5fd1cd8de3 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > return NOTIFY_DONE; > > } > > > > +static int xenbus_backend_reclaim(struct device *dev, void *data) > > +{ > > + struct xenbus_driver *drv; > > + if (!dev->driver) > > + return -ENOENT; > > + drv = to_xenbus_driver(dev->driver); > > + if (drv && drv->reclaim) > > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID); > > + return 0; > > +} > > + > > +/* > > + * Returns 0 always because we are using shrinker to only detect memory > > + * pressure. > > + */ > > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, > > + struct shrink_control *sc) > > +{ > > + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, > > + xenbus_backend_reclaim); > > + return 0; > > +} > > + > > +static struct shrinker xenbus_backend_shrinker = { > > + .count_objects = xenbus_backend_shrink_count, > > + .seeks = DEFAULT_SEEKS, > > +}; > > + > > static int __init xenbus_probe_backend_init(void) > > { > > static struct notifier_block xenstore_notifier = { > > @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void) > > > > register_xenstore_notifier(&xenstore_notifier); > > > > + if (register_shrinker(&xenbus_backend_shrinker)) > > + pr_warn("shrinker registration failed\n"); > > + > > return 0; > > } > > subsys_initcall(xenbus_probe_backend_init); > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > index 869c816d5f8c..52aaf4f78400 100644 > > --- a/include/xen/xenbus.h > > +++ b/include/xen/xenbus.h > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > struct device_driver driver; > > int (*read_otherend_details)(struct xenbus_device *dev); > > int (*is_ready)(struct xenbus_device *dev); > > + unsigned (*reclaim)(struct xenbus_device *dev, domid_t domid); > > Can you please add a comment here regarding semantics of specifying > DOMID_INVALID as domid? Yes, of course. Will do with the next version. Thanks, SeongJae Park > > Block maintainers, would you be fine with me carrying this series > through the Xen tree? > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 1/2] xenbus/backend: Add memory pressure handler callback
On Tue, Dec 10, 2019 at 7:23 AM Jürgen Groß wrote: > > On 09.12.19 20:43, SeongJae Park wrote: > > From: SeongJae Park > > > > Granting pages consumes backend system memory. In systems configured > > with insufficient spare memory for those pages, it can cause a memory > > pressure situation. However, finding the optimal amount of the spare > > memory is challenging for large systems having dynamic resource > > utilization patterns. Also, such a static configuration might lacks a > > flexibility. > > > > To mitigate such problems, this commit adds a memory reclaim callback to > > 'xenbus_driver'. Using this facility, 'xenbus' would be able to monitor > > a memory pressure and request specific domains of specific backend > > drivers which causing the given pressure to voluntarily release its > > memory. > > > > That said, this commit simply requests every callback registered driver > > to release its memory for every domain, rather than issueing the > > requests to the drivers and domain in charge. Such things would be a > > future work. Also, this commit focuses on memory only. However, it > > would be ablt to be extended for general resources. > > > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ > > include/xen/xenbus.h | 1 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index b0bed4faf44c..cd5fd1cd8de3 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > return NOTIFY_DONE; > > } > > > > +static int xenbus_backend_reclaim(struct device *dev, void *data) > > +{ > > + struct xenbus_driver *drv; > > + if (!dev->driver) > > + return -ENOENT; > > + drv = to_xenbus_driver(dev->driver); > > + if (drv && drv->reclaim) > > + drv->reclaim(to_xenbus_device(dev), DOMID_INVALID); > > Oh, sorry for first requesting you to add the domid as a parameter, > but now I realize this could be handled in the xenbus driver, as > struct xenbus_device already contains the otherend_id. > > Would you mind dropping the parameter again, please? Oh, I also missed it! Will do! Thanks, SeongJae Park > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
ed, 13.8702 s, 38.7 MB/s Worst-case Performance -- [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s In short, even worst case squeezing makes no visible performance degradation. I think this is due to the slow speed of the I/O. In other words, the additional page allocation overhead is hidden under the much slower I/O latency. Nevertheless, pleaset note that this is just a very simple and minimal test. Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 23 +-- drivers/block/xen-blkback/common.h | 1 + drivers/block/xen-blkback/xenbus.c | 3 ++- 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..4d4dba7ea721 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -142,6 +142,22 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) HZ * xen_blkif_pgrant_timeout); } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static int xen_blkif_buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + xen_blkif_buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +static unsigned long xen_blk_buffer_squeeze_end; + +unsigned xen_blkbk_reclaim(struct xenbus_device *dev) +{ + xen_blk_buffer_squeeze_end = jiffies + + msecs_to_jiffies(xen_blkif_buffer_squeeze_duration_ms); + return 0; +} + static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; @@ -656,8 +672,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, xen_blk_buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..c0334cda79fe 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id); int xen_blkif_schedule(void *arg); int xen_blkif_purge_persistent(void *arg); void xen_blkbk_free_caches(struct xen_blkif_ring *ring); +unsigned xen_blkbk_reclaim(struct xenbus_device *dev); int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, struct backend_info *be, int state); diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..de49a09e6933 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = { .ids = xen_blkbk_ids, .probe = xen_blkbk_probe, .remove = xen_blkbk_remove, - .otherend_changed = frontend_changed + .otherend_changed = frontend_changed, + .reclaim = xen_blkbk_reclaim }; int xen_blkif_xenbus_init(void) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack a flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. Using this facility, 'xenbus' would be able to monitor a memory pressure and request specific devices of specific backend drivers which causing the given pressure to voluntarily release its memory. That said, this commit simply requests every callback registered driver to release its memory for every domain, rather than issueing the requests to the drivers and the domain in charge. Such things will be done in a futur. Also, this commit focuses on memory only. However, it would be ablt to be extended for general resources. Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 2 files changed, 32 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..5a5ba29e39df 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int xenbus_backend_reclaim(struct device *dev, void *data) +{ + struct xenbus_driver *drv; + if (!dev->driver) + return -ENOENT; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim) + drv->reclaim(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + xenbus_backend_reclaim); + return 0; +} + +static struct shrinker xenbus_backend_shrinker = { + .count_objects = xenbus_backend_shrink_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&xenbus_backend_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..cdb075e4182f 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + unsigned (*reclaim)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 0/2] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack a flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v5 Patch History - Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Jergen Gross) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (2): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 23 +++-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 3 ++- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
On Tue, Dec 10, 2019 at 11:21 AM Roger Pau Monné wrote: > > On Tue, Dec 10, 2019 at 11:16:35AM +0100, Roger Pau Monné wrote: > > On Tue, Dec 10, 2019 at 08:06:27AM +0000, SeongJae Park wrote: > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > > index 869c816d5f8c..cdb075e4182f 100644 > > > --- a/include/xen/xenbus.h > > > +++ b/include/xen/xenbus.h > > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > > struct device_driver driver; > > > int (*read_otherend_details)(struct xenbus_device *dev); > > > int (*is_ready)(struct xenbus_device *dev); > > > + unsigned (*reclaim)(struct xenbus_device *dev); > > > > ... hence I wonder why it's returning an unsigned when it's just > > ignored. > > > > IMO it should return an int to signal errors, and the return should be > > ignored. > > Meant to write 'shouldn't be ignored' sorry. Thanks for good opinions and comments! I will apply your comments in the next version. Thanks, SeongJae Park > > Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné" wrote: > > Granting pages consumes backend system memory. In systems configured > > with insufficient spare memory for those pages, it can cause a memory > > pressure situation. However, finding the optimal amount of the spare > > memory is challenging for large systems having dynamic resource > > utilization patterns. Also, such a static configuration might lack a > > s/lack a/lack/ > > > flexibility. > > > > To mitigate such problems, this commit adds a memory reclaim callback to > > 'xenbus_driver'. Using this facility, 'xenbus' would be able to monitor > > a memory pressure and request specific devices of specific backend > > s/monitor a/monitor/ > > > drivers which causing the given pressure to voluntarily release its > > ...which are causing... > > > memory. > > > > That said, this commit simply requests every callback registered driver > > to release its memory for every domain, rather than issueing the > > s/issueing/issuing/ > > > requests to the drivers and the domain in charge. Such things will be > > I'm afraid I don't understand the "domain in charge" part of this > sentence. > > > done in a futur. Also, this commit focuses on memory only. However, it > > ... done in a future change. Also I think the period after only should > be removed in order to tie both sentences together. > > > would be ablt to be extended for general resources. > > s/ablt/able/ > > > > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ > > include/xen/xenbus.h | 1 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index b0bed4faf44c..5a5ba29e39df 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -248,6 +248,34 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > return NOTIFY_DONE; > > } > > > > +static int xenbus_backend_reclaim(struct device *dev, void *data) > > +{ > > + struct xenbus_driver *drv; > > Newline and const. > > > + if (!dev->driver) > > + return -ENOENT; > > + drv = to_xenbus_driver(dev->driver); > > + if (drv && drv->reclaim) > > + drv->reclaim(to_xenbus_device(dev)); > > You seem to completely ignore the return of the reclaim hook... > > > + return 0; > > +} > > + > > +/* > > + * Returns 0 always because we are using shrinker to only detect memory > > + * pressure. > > + */ > > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, > > + struct shrink_control *sc) > > +{ > > + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, > > + xenbus_backend_reclaim); > > + return 0; > > +} > > + > > +static struct shrinker xenbus_backend_shrinker = { > > + .count_objects = xenbus_backend_shrink_count, > > + .seeks = DEFAULT_SEEKS, > > +}; > > + > > static int __init xenbus_probe_backend_init(void) > > { > > static struct notifier_block xenstore_notifier = { > > @@ -264,6 +292,9 @@ static int __init xenbus_probe_backend_init(void) > > > > register_xenstore_notifier(&xenstore_notifier); > > > > + if (register_shrinker(&xenbus_backend_shrinker)) > > + pr_warn("shrinker registration failed\n"); > > + > > return 0; > > } > > subsys_initcall(xenbus_probe_backend_init); > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > index 869c816d5f8c..cdb075e4182f 100644 > > --- a/include/xen/xenbus.h > > +++ b/include/xen/xenbus.h > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > struct device_driver driver; > > int (*read_otherend_details)(struct xenbus_device *dev); > > int (*is_ready)(struct xenbus_device *dev); > > + unsigned (*reclaim)(struct xenbus_device *dev); > > ... hence I wonder why it's returning an unsigned when it's just > ignored. > > IMO it should return an int to signal errors, and the return should be > ignored. I first thought similarly and set the callback to return something. However, as this callback is called to simply notify the memory pressure and ask the drive
Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
t; after 2123,325 > > > > Optimal Aggressive Shrinking Duration > > - > > > > To find a best squeezing duration, I repeated the test with three > > different durations (1ms, 10ms, and 100ms). The results are as below: > > > > durationpswpin pswpout > > 1 852 6,424 > > 10 212 3,325 > > 100 203 3,340 > > > > As expected, the memory pressure has decreased as the duration is > > increased, but the reduction stopped from the `10ms`. Based on this > > results, I chose the default duration as 10ms. > > > > Performance Overhead Test > > = > > > > This commit could incur I/O performance degradation under severe memory > > pressure because the squeezing will require more page allocations per > > I/O. To show the overhead, I artificially made a worst-case squeezing > > situation and measured the I/O performance of a `blkfront` running > > guest. > > > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. We set > > the value to `1024` and `0`. The `1024` is the default value. Setting > > the value as `0` is same to a situation doing the squeezing always > > (worst-case). > > > > For the I/O performance measurement, I use a simple `dd` command. > > > > Default Performance > > --- > > > > [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages > > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k > > count=$((256*512)); sync; done > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s > > > > Worst-case Performance > > -- > > > > [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages > > [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k > > count=$((256*512)); sync; done > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s > > 131072+0 records in > > 131072+0 records out > > 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s > > > > In short, even worst case squeezing makes no visible performance > > degradation. > > I would argue that with a ~40MB/s throughput you won't see any > performance difference at all regardless of the size of the pool of > free pages or the amount of persistent grants because the bottleneck is > on the storage performance itself. > > You need to test this using nullblk or some kind of fast storage, or > else the above figures are not going to reflect any changes you make > because they are hidden by the poor performance of the underlying > storage. Yes, agree that. My test is just a minimal check for my environment. I will note the points and concerns in the commit message. > > > I think this is due to the slow speed of the I/O. In > > other words, the additional page allocation overhead is hidden under the > > much slower I/O latency. > > > > Nevertheless, pleaset note that this is just a very simple and minimal > > test. > > I would like to add that IMO this is papering over an existing issue, > which is how pages to be used to map grants are allocated. Grant > mappings _shouldn't_ consume RAM pages in the first place, and IIRC > the fact that they do is because Linux balloons out memory in order to > re-use those pages to map grants and have a valid page struct. > > A way to solve this would be to hot
[Xen-devel] [PATCH v6 0/2] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third patch is a trivial cleanup of variable names. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v6 Patch History - Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (2): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 23 +++-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 3 ++- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..aedbe2198de5 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int xenbus_backend_reclaim(struct device *dev, void *data) +{ + struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim) + drv->reclaim(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + xenbus_backend_reclaim); + return 0; +} + +static struct shrinker xenbus_backend_shrinker = { + .count_objects = xenbus_backend_shrink_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&xenbus_backend_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..196260017666 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
bytes (537 MB) copied, 13.8737 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s Worst-case Performance -- [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages [instance]$ for i in {1..5}; do dd if=/dev/zero of=file \ bs=4k count=$((256*512)); sync; done 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s In short, even worst case squeezing makes no visible performance degradation on this test machine. I think this is due to the slow speed of the I/O devices I used. In other words, the additional page allocation overhead is hidden under the much slower I/O latency. Nevertheless, pleaset note that this is just a very simple and minimal test using a slow block device. On systems using fast block devices such as ramdisks or NVMe SSDs, the results could be very different. If you are in such cases, you should control the squeezing duration via the module parameter. Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 22 -- drivers/block/xen-blkback/common.h | 1 + drivers/block/xen-blkback/xenbus.c | 3 ++- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..b493c306e84f 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) HZ * xen_blkif_pgrant_timeout); } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static unsigned int buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +static unsigned long buffer_squeeze_end; + +void xen_blkbk_reclaim(struct xenbus_device *dev) +{ + buffer_squeeze_end = jiffies + + msecs_to_jiffies(buffer_squeeze_duration_ms); +} + static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; @@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..8a3195d2dca7 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id); int xen_blkif_schedule(void *arg); int xen_blkif_purge_persistent(void *arg); void xen_blkbk_free_caches(struct xen_blkif_ring *ring); +void xen_blkbk_reclaim(struct xenbus_device *dev); int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, struct backend_info *be, int state); diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..b596c6e8b006 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = { .ids = xen_blkbk_ids, .probe = xen_blkbk_probe, .remove = xen_blkbk_remove, - .otherend_changed = frontend_changed + .otherend_changed = frontend_changed, + .reclaim = xen_blkbk_reclaim, }; int xen_blkif_xenbus_init(void) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v6 3/3] xen/blkback: Remove unnecessary static variable name prefixes
A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index b493c306e84f..f690373669b8 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } /* Once a memory pressure is detected, squeeze free page pools for a while. */ @@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, conti
Re: [Xen-devel] [PATCH v5 1/2] xenbus/backend: Add memory pressure handler callback
On Wed, 11 Dec 2019 11:51:12 +0100 "Roger Pau Monné" wrote: > > On Tue, 10 Dec 2019 11:16:35 +0100 "Roger Pau Monné" > > wrote: > > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > > > index 869c816d5f8c..cdb075e4182f 100644 > > > > --- a/include/xen/xenbus.h > > > > +++ b/include/xen/xenbus.h > > > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > > > struct device_driver driver; > > > > int (*read_otherend_details)(struct xenbus_device *dev); > > > > int (*is_ready)(struct xenbus_device *dev); > > > > + unsigned (*reclaim)(struct xenbus_device *dev); > > > > > > ... hence I wonder why it's returning an unsigned when it's just > > > ignored. > > > > > > IMO it should return an int to signal errors, and the return should be > > > ignored. > > > > I first thought similarly and set the callback to return something. > > However, > > as this callback is called to simply notify the memory pressure and ask the > > driver to free its memory as many as possible, I couldn't easily imagine > > what > > kind of errors that need to be handled by its caller can occur in the > > callback, > > especially because current blkback's callback implementation has no such > > error. > > So, if you and others agree, I would like to simply set the return type to > > 'void' for now and defer the error handling to a future change. > > Yes, I also wondered the same, but seeing you returned an integer I > assumed there was interest in returning some kind of value. If there's > nothing to return let's just make it void. > > > > > > > Also, I think it would preferable for this function to take an extra > > > parameter to describe the resource the driver should attempt to free > > > (ie: memory or interrupts for example). I'm however not able to find > > > any existing Linux type to describe such resources. > > > > Yes, such extention would be the right direction. However, because there > > is no > > existing Linux type to describe the type of resources to reclaim as you also > > mentioned, there could be many different opinions about its implementation > > detail. In my opinion, it could be also possible to simply add another > > callback for another resource type. That said, because currently we have an > > use case and an implementation for the memory pressure only, I would like to > > let it as is for now and defer the extension as a future work, if you and > > others have no objection. > > Ack, can I please ask the callback to be named reclaim_memory or some > such then? Yes, I will change the name. Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 2/2] xen/blkback: Squeeze page pools if a memory pressure is detected
On Wed, 11 Dec 2019 12:14:44 +0100 "Roger Pau Monné" wrote: > > I see that you have already sent v6, for future iterations can you > please wait until the conversation on the previous version has been > settled? > > I'm still replying to your replies to v5, and hence you should hold off > sending v6 until we get some kind of conclusion/agreement. Sorry, I was inpatient. > > On Wed, Dec 11, 2019 at 05:08:12AM +0100, SeongJae Park wrote: > > On Tue, 10 Dec 2019 12:04:32 +0100 "Roger Pau Monné" > > wrote: > > > > > > Each `blkif` has a free pages pool for the grant mapping. The size of > > > > the pool starts from zero and be increased on demand while processing > > > > the I/O requests. If current I/O requests handling is finished or 100 > > > > milliseconds has passed since last I/O requests handling, it checks and > > > > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > > > > > > > Therefore, `blkfront` running guests can cause a memory pressure in the > > > > `blkback` running guest by attaching a large number of block devices and > > > > inducing I/O. > > > > > > Hm, I don't think this is actually true. blkfront cannot attach an > > > arbitrary number of devices, blkfront is just a frontend for a device > > > that's instantiated by the Xen toolstack, so it's the toolstack the one > > > that controls the amount of PV block devices. > > > > Right, the problem can occur only if it is mis-configured so that the > > frontend > > running guests can attach a large number of devices which is enough to cause > > the memory pressure. I tried to explain it in below paragraph, but seems > > above > > paragraph is a little bit confusing. I will wordsmith the sentence in the > > next > > version. > > I would word it along these lines: > > "Host administrators can cause memory pressure in blkback by attaching > a large number of block devices and inducing I/O." Hmm, much better :) > > > > > > > > System administrators can avoid such problematic > > > > situations by limiting the maximum number of devices each guest can > > > > attach. However, finding the optimal limit is not so easy. Improper > > > > set of the limit can results in the memory pressure or a resource > > > > underutilization. This commit avoids such problematic situations by > > > > squeezing the pools (returns every free page in the pool to the system) > > > > for a while (users can set this duration via a module parameter) if a > > > > memory pressure is detected. > > > > > > > > Discussions > > > > === > > > > > > > > The `blkback`'s original shrinking mechanism returns only pages in the > > > > pool, which are not currently be used by `blkback`, to the system. In > > > > other words, the pages are not mapped with foreign pages. Because this > > > ^ that ^ granted > > > > commit is changing only the shrink limit but uses the mechanism as is, > > > > this commit does not introduce improper mappings related security > > > > issues. > > > > > > That last sentence is hard to parse. I think something like: > > > > > > "Because this commit is changing only the shrink limit but still uses the > > > same freeing mechanism it does not touch pages which are currently > > > mapping grants." > > > > > > > > > > > Once a memory pressure is detected, this commit keeps the squeezing > > > > limit for a user-specified time duration. The duration should be > > > > neither too long nor too short. If it is too long, the squeezing > > > > incurring overhead can reduce the I/O performance. If it is too short, > > > > `blkback` will not free enough pages to reduce the memory pressure. > > > > This commit sets the value as `10 milliseconds` by default because it is > > > > a short time in terms of I/O while it is a long time in terms of memory > > > > operations. Also, as the original shrinking mechanism works for at > > > > least every 100 milliseconds, this could be a somewhat reasonable > > > > choice. I also tested other durations (refer to the below section for > > > > more details) and confirmed that 10 milliseconds is the one that works > > > > best with the test. That said, the pr
Re: [Xen-devel] [PATCH v6 1/3] xenbus/backend: Add memory pressure handler callback
On Wed, 11 Dec 2019 12:46:51 +0100 "Roger Pau Monné" wrote: > > Granting pages consumes backend system memory. In systems configured > > with insufficient spare memory for those pages, it can cause a memory > > pressure situation. However, finding the optimal amount of the spare > ^ s/the// > > memory is challenging for large systems having dynamic resource > > utilization patterns. Also, such a static configuration might lack > > flexibility. > > > > To mitigate such problems, this commit adds a memory reclaim callback to > > 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests >^ s/a// > > every backend driver to volunarily release its memory. > > > > Note that it would be able to improve the callback facility for more > ^ possible > > sophisticated handlings of general pressures. For example, it would be > ^ handling of resource starvation. > > possible to monitor the memory consumption of each device and issue the > > release requests to only devices which causing the pressure. Also, the > > callback could be extended to handle not only memory, but general > > resources. Nevertheless, this version of the implementation defers such > > sophisticated goals as a future work. > > > > Reviewed-by: Juergen Gross > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ > > include/xen/xenbus.h | 1 + > > 2 files changed, 33 insertions(+) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index b0bed4faf44c..aedbe2198de5 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > return NOTIFY_DONE; > > } > > > > +static int xenbus_backend_reclaim(struct device *dev, void *data) > > No need for the xenbus_ prefix since it's a static function, ie: > backend_reclaim_memory should be fine IMO. Agreed, will change the name in the next version. > > > +{ > > + struct xenbus_driver *drv; > > I've asked for this variable to be constified in v5, is it not > possible to make it const? Sorry, my mistake... I was difinitely too hurry. > > > + > > + if (!dev->driver) > > + return 0; > > + drv = to_xenbus_driver(dev->driver); > > + if (drv && drv->reclaim) > > + drv->reclaim(to_xenbus_device(dev)); > > + return 0; > > +} > > + > > +/* > > + * Returns 0 always because we are using shrinker to only detect memory > > + * pressure. > > + */ > > +static unsigned long xenbus_backend_shrink_count(struct shrinker *shrinker, > > + struct shrink_control *sc) > > +{ > > + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, > > + xenbus_backend_reclaim); > > + return 0; > > +} > > + > > +static struct shrinker xenbus_backend_shrinker = { > > I would drop the xenbus prefix, and I think it's not possible to > constify this due to register_shrinker expecting a non-const > parameter? Yes, constifying it results in another compile warning. Will drop the prefix. > > > + .count_objects = xenbus_backend_shrink_count, > > + .seeks = DEFAULT_SEEKS, > > +}; > > + > > static int __init xenbus_probe_backend_init(void) > > { > > static struct notifier_block xenstore_notifier = { > > @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) > > > > register_xenstore_notifier(&xenstore_notifier); > > > > + if (register_shrinker(&xenbus_backend_shrinker)) > > + pr_warn("shrinker registration failed\n"); > > Can you add a xenbus prefix to the error message? Or else it's hard to > know which subsystem is complaining when you see such message on the > log. ie: "xenbus: shrinker ..." Because we have #define `pr_fmt(fmt) KBUILD_MODNAME ": " fmt` in the beginning of the file, the message will have a proper prefix. > > > + > > return 0; > > } > > subsys_initcall(xenbus_probe_backend_init); > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > index 869c816d5f8c..196260017666 100644 > > --- a/include/xen/xenbus.h > > +++ b/include/xen/xenbus.h > > @@ -104,6 +104,7 @@ struct xenbus_driver { > > struct device_driver driver; > > int (*read_otherend_details)(struct xenbus_device *dev); > > int (*is_ready)(struct xenbus_device *dev); > > + void (*reclaim)(struct xenbus_device *dev); > > reclaim_memory (if Juergen agrees). Okay. Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
5 1024 38.7 45.8 38.7 40.12 3.1752165 No difference proven at 95.0% confidence On the fast block device max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on ramdisk based fast block device makes no visible performance degradation. Please note that this is just a very simple and minimal test. On systems using super-fast block devices and a special I/O workload, the results might be different. If you have any doubt, test on your machine for your workload to find the optimal squeezing duration for you. [1] https://aws.amazon.com/ebs/ [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 22 -- drivers/block/xen-blkback/common.h | 1 + drivers/block/xen-blkback/xenbus.c | 3 ++- 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..98823d150905 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) HZ * xen_blkif_pgrant_timeout); } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static unsigned int buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +static unsigned long buffer_squeeze_end; + +void xen_blkbk_reclaim_memory(struct xenbus_device *dev) +{ + buffer_squeeze_end = jiffies + + msecs_to_jiffies(buffer_squeeze_duration_ms); +} + static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; @@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..1e0df86cb941 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -383,6 +383,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id); int xen_blkif_schedule(void *arg); int xen_blkif_purge_persistent(void *arg); void xen_blkbk_free_caches(struct xen_blkif_ring *ring); +void xen_blkbk_reclaim_memory(struct xenbus_device *dev); int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, struct backend_info *be, int state); diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..0477f910b018 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -1115,7 +1115,8 @@ static struct xenbus_driver xen_blkbk_driver = { .ids = xen_blkbk_ids, .probe = xen_blkbk_probe, .remove = xen_blkbk_remove, - .otherend_changed = frontend_changed + .otherend_changed = frontend_changed, + .reclaim_memory = xen_blkbk_reclaim_memory, }; int xen_blkif_xenbus_init(void) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 3/3] xen/blkback: Remove unnecessary static variable name prefixes
A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 98823d150905..f41c698dd854 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } /* Once a memory pressure is detected, squeeze free page pools for a while. */ @@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, conti
[Xen-devel] [PATCH v7 0/2] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third patch is a trivial cleanup of variable names. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v7 Patch History - Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (2): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected drivers/block/xen-blkback/blkback.c | 23 +++-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 3 ++- drivers/xen/xenbus/xenbus_probe_backend.c | 31 +++ include/xen/xenbus.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v7 1/3] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&backend_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné" wrote: > > Please make sure you Cc me in blkback related patches. Sorry for forgotting you! I will never forget again. > > On Wed, Dec 11, 2019 at 06:10:15PM +, SeongJae Park wrote: > > Each `blkif` has a free pages pool for the grant mapping. The size of > > the pool starts from zero and be increased on demand while processing > ^ is > > the I/O requests. If current I/O requests handling is finished or 100 > > milliseconds has passed since last I/O requests handling, it checks and > > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > > > Therefore, host administrators can cause memory pressure in blkback by > > attaching a large number of block devices and inducing I/O. Such > > problematic situations can be avoided by limiting the maximum number of > > devices that can be attached, but finding the optimal limit is not so > > easy. Improper set of the limit can results in the memory pressure or a > ^ s/the// > > resource underutilization. This commit avoids such problematic > > situations by squeezing the pools (returns every free page in the pool > > to the system) for a while (users can set this duration via a module > > parameter) if a memory pressure is detected. > ^ s/a// > > > > Discussions > > === > > > > The `blkback`'s original shrinking mechanism returns only pages in the > > pool, which are not currently be used by `blkback`, to the system. In > > I think you can remove both comas in the above sentence. > > > other words, the pages that are not mapped with granted pages. Because > > this commit is changing only the shrink limit but still uses the same > > freeing mechanism it does not touch pages which are currently mapping > > grants. > > > > Once a memory pressure is detected, this commit keeps the squeezing >^ s/a// Thank you for corrections, will apply! > > limit for a user-specified time duration. The duration should be > > neither too long nor too short. If it is too long, the squeezing > > incurring overhead can reduce the I/O performance. If it is too short, > > `blkback` will not free enough pages to reduce the memory pressure. > > This commit sets the value as `10 milliseconds` by default because it is > > a short time in terms of I/O while it is a long time in terms of memory > > operations. Also, as the original shrinking mechanism works for at > > least every 100 milliseconds, this could be a somewhat reasonable > > choice. I also tested other durations (refer to the below section for > > more details) and confirmed that 10 milliseconds is the one that works > > best with the test. That said, the proper duration depends on actual > > configurations and workloads. That's why this commit allows users to > > set the duration as a module parameter. > > > > Memory Pressure Test > > > > > > To show how this commit fixes the memory pressure situation well, I > > configured a test environment on a xen-running virtualization system. > > On the `blkfront` running guest instances, I attach a large number of > > network-backed volume devices and induce I/O to those. Meanwhile, I > > measure the number of pages that swapped in (pswpin) and out (pswpout) > > on the `blkback` running guest. The test ran twice, once for the > > `blkback` before this commit and once for that after this commit. As > > shown below, this commit has dramatically reduced the memory pressure: > > > > pswpin pswpout > > before 76,672 185,799 > > after 2123,325 > > > > Optimal Aggressive Shrinking Duration > > - > > > > To find a best squeezing duration, I repeated the test with three > > different durations (1ms, 10ms, and 100ms). The results are as below: > > > > durationpswpin pswpout > > 1 852 6,424 > > 10 212 3,325 > > 100 203 3,340 > > > > As expected, the memory pressure has decreased as the duration is > > increased, but the reduction stopped from the `10ms`. Based on this > > results, I chose the default duration as 10ms. > > > > Performance Overhead Test > > = > > > > This commit could incur I/O performance degradation under severe memory > > pressure because the squeezing will require m
Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné" wrote: > > diff --git a/drivers/block/xen-blkback/blkback.c > > b/drivers/block/xen-blkback/blkback.c > > index fd1e19f1a49f..98823d150905 100644 > > --- a/drivers/block/xen-blkback/blkback.c > > +++ b/drivers/block/xen-blkback/blkback.c > > @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct > > persistent_gnt *persistent_gnt) > > HZ * xen_blkif_pgrant_timeout); > > } > > > > +/* Once a memory pressure is detected, squeeze free page pools for a > > while. */ > > +static unsigned int buffer_squeeze_duration_ms = 10; > > +module_param_named(buffer_squeeze_duration_ms, > > + buffer_squeeze_duration_ms, int, 0644); > > +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > +"Duration in ms to squeeze pages buffer when a memory pressure is > > detected"); > > + > > +static unsigned long buffer_squeeze_end; > > + > > +void xen_blkbk_reclaim_memory(struct xenbus_device *dev) > > +{ > > + buffer_squeeze_end = jiffies + > > + msecs_to_jiffies(buffer_squeeze_duration_ms); > > I'm not sure this is fully correct. This function will be called for > each blkback instance, but the timeout is stored in a global variable > that's shared between all blkback instances. Shouldn't this timeout be > stored in xen_blkif so each instance has it's own local variable? > > Or else in the case you have 1k blkback instances the timeout is > certainly going to be longer than expected, because each call to > xen_blkbk_reclaim_memory will move it forward. Agreed that. I think the extended timeout would not make a visible performance, though, because the time that 1k-loop take would be short enough to be ignored compared to the millisecond-scope duration. I took this way because I wanted to minimize such structural changes as far as I can, as this is just a point-fix rather than ultimate solution. That said, it is not fully correct and very confusing. My another colleague also pointed out it in internal review. Correct solution would be to adding a variable in the struct as you suggested or avoiding duplicated update of the variable by initializing the variable once the squeezing duration passes. I would prefer the later way, as it is more straightforward and still not introducing structural change. For example, it might be like below: diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index f41c698dd854..6856c8ef88de 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end; void xen_blkbk_reclaim_memory(struct xenbus_device *dev) { - buffer_squeeze_end = jiffies + - msecs_to_jiffies(buffer_squeeze_duration_ms); + if (!buffer_squeeze_end) + buffer_squeeze_end = jiffies + + msecs_to_jiffies(buffer_squeeze_duration_ms); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg) } /* Shrink the free pages pool if it is too large. */ - if (time_before(jiffies, buffer_squeeze_end)) + if (time_before(jiffies, buffer_squeeze_end)) { shrink_free_pagepool(ring, 0); - else + } else { + if (unlikely(buffer_squeeze_end)) + buffer_squeeze_end = 0; shrink_free_pagepool(ring, max_buffer_pages); + } if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); May I ask you what way would you prefer? Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
On Thu, 12 Dec 2019 16:23:17 +0100 "Roger Pau Monné" wrote: > > On Thu, 12 Dec 2019 12:42:47 +0100 "Roger Pau Monné" > > wrote: > > > > On the slow block device > > > > > > > > > > > > max_pgs Min Max Median AvgStddev > > > > 0 38.7 45.8 38.7 40.12 3.1752165 > > > > 1024 38.7 45.8 38.7 40.12 3.1752165 > > > > No difference proven at 95.0% confidence > > > > > > > > On the fast block device > > > > > > > > > > > > max_pgs Min Max Median AvgStddev > > > > 0 417 423 420419.4 2.5099801 > > > > 1024 414 425 416417.8 4.4384682 > > > > No difference proven at 95.0% confidence > > > > > > This is intriguing, as it seems to prove that the usage of a cache of > > > free pages is irrelevant performance wise. > > > > > > The pool of free pages was introduced long ago, and it's possible that > > > recent improvements to the balloon driver had made such pool useless, > > > at which point it could be removed instead of worked around. > > > > I guess the grant page allocation overhead in this test scenario is really > > small. In an absence of memory pressure, fragmentation, and NUMA imbalance, > > the latency of the page allocation ('get_page()') is very short, as it will > > success in the fast path. > > The allocation of the pool of free pages involves more than get_page, > it uses gnttab_alloc_pages which in the worse case will allocate a > page and balloon it out issuing one hypercall. > > > Few years ago, I once measured the page allocation latency on my machine. > > Roughly speaking, it was about 1us in best case, 100us in worst case, and > > 5us > > in average. Please keep in mind that the measurement was not designed and > > performed in serious way. Thus the results could have profile overhead in > > it, > > though. While keeping that in mind, let's simply believe the number and > > ignore > > the latency of the block layer, blkback itself (including the grant > > mapping), and anything else including context switch, cache miss, but the > > allocation. In other words, suppose that the grant page allocation is only > > one > > source of the overhead. It will be able to achieve 1 million IOPS (4KB * > > 1MIOPS = 4 GB/s) in the best case, 200 thousand IOPS (800 MB/s) in average, > > and > > 10 thousand IOPS (40 MB/s) in worst case. Based on this coarse > > calculation, I > > think the test results is reasonable. > > > > This also means that the effect of the blkback's free pages pool might be > > visible under page allocation fast path failure situation. Nevertheless, it > > would be also hard to measure that in micro level unless the measurement is > > well designed and controlled. > > > > > > > > Do you think you could perform some more tests (as pointed out above > > > against the block device to skip the fs overhead) and report back the > > > results? > > > > To be honest, I'm not sure whether additional tests are really necessary, > > because I think the `dd` test and the results explanation already makes some > > sense and provide the minimal proof of the concept. Also, this change is a > > fallback for the memory pressure situation, which is an error path in some > > point of view. Such errorneous situation might not happen frequently and if > > the situation is not solved in short time, something much worse (e.g., OOM > > kill > > of the user space xen control processes) than temporal I/O performance > > degradation could happen. Thus, I'm not sure whether such detailed > > performance > > measurement is necessary for this rare error handling change. > > Right, my main concern is that we seem to be adding duck tape so > things don't fall apart, but if such cache is really not beneficial > from a performance PoV I would rather see it go away than adding more > stuff to it in order to workaround corner cases like memory > starvation. Right, if the cache is really giving no benefit, it would be much better to simply remove it. However, as mentioned before, I'm not sure whether it is useless at all. Maybe we could do some more detailed test to know that, but it would be an out of scope of this patch. > > Anyway, I guess we can take such change, but long term we need to look > into fixing grants to not use ballooned pages, and figure out if the > blkback free page cache is really useful or not. Totally agreed. Thanks, SeongJae Park > > Thanks, Roger. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
On Fri, Dec 13, 2019 at 10:33 AM Jürgen Groß wrote: > > On 13.12.19 10:27, Roger Pau Monné wrote: > > On Thu, Dec 12, 2019 at 05:06:58PM +0100, SeongJae Park wrote: > >> On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné" > >> wrote: > >> > >>>> diff --git a/drivers/block/xen-blkback/blkback.c > >>>> b/drivers/block/xen-blkback/blkback.c > >>>> index fd1e19f1a49f..98823d150905 100644 > >>>> --- a/drivers/block/xen-blkback/blkback.c > >>>> +++ b/drivers/block/xen-blkback/blkback.c > >>>> @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct > >>>> persistent_gnt *persistent_gnt) > >>>>HZ * xen_blkif_pgrant_timeout); > >>>> } > >>>> > >>>> +/* Once a memory pressure is detected, squeeze free page pools for a > >>>> while. */ > >>>> +static unsigned int buffer_squeeze_duration_ms = 10; > >>>> +module_param_named(buffer_squeeze_duration_ms, > >>>> + buffer_squeeze_duration_ms, int, 0644); > >>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > >>>> +"Duration in ms to squeeze pages buffer when a memory pressure is > >>>> detected"); > >>>> + > >>>> +static unsigned long buffer_squeeze_end; > >>>> + > >>>> +void xen_blkbk_reclaim_memory(struct xenbus_device *dev) > >>>> +{ > >>>> + buffer_squeeze_end = jiffies + > >>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >>> > >>> I'm not sure this is fully correct. This function will be called for > >>> each blkback instance, but the timeout is stored in a global variable > >>> that's shared between all blkback instances. Shouldn't this timeout be > >>> stored in xen_blkif so each instance has it's own local variable? > >>> > >>> Or else in the case you have 1k blkback instances the timeout is > >>> certainly going to be longer than expected, because each call to > >>> xen_blkbk_reclaim_memory will move it forward. > >> > >> Agreed that. I think the extended timeout would not make a visible > >> performance, though, because the time that 1k-loop take would be short > >> enough > >> to be ignored compared to the millisecond-scope duration. > >> > >> I took this way because I wanted to minimize such structural changes as > >> far as > >> I can, as this is just a point-fix rather than ultimate solution. That > >> said, > >> it is not fully correct and very confusing. My another colleague also > >> pointed > >> out it in internal review. Correct solution would be to adding a variable > >> in > >> the struct as you suggested or avoiding duplicated update of the variable > >> by > >> initializing the variable once the squeezing duration passes. I would > >> prefer > >> the later way, as it is more straightforward and still not introducing > >> structural change. For example, it might be like below: > >> > >> diff --git a/drivers/block/xen-blkback/blkback.c > >> b/drivers/block/xen-blkback/blkback.c > >> index f41c698dd854..6856c8ef88de 100644 > >> --- a/drivers/block/xen-blkback/blkback.c > >> +++ b/drivers/block/xen-blkback/blkback.c > >> @@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end; > >> > >> void xen_blkbk_reclaim_memory(struct xenbus_device *dev) > >> { > >> - buffer_squeeze_end = jiffies + > >> - msecs_to_jiffies(buffer_squeeze_duration_ms); > >> + if (!buffer_squeeze_end) > >> + buffer_squeeze_end = jiffies + > >> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >> } > >> > >> static inline int get_free_page(struct xen_blkif_ring *ring, struct page > >> **page) > >> @@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg) > >> } > >> > >> /* Shrink the free pages pool if it is too large. */ > >> - if (time_before(jiffies, buffer_squeeze_end)) > >> + if (time_before(jiffies, buffer_squeeze_end)) { > >> shrink_free_pagepool(ring, 0); > >> - else > >> + } else { > >> + if (unlikely(buffer_s
[Xen-devel] [PATCH v8 1/3] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&backend_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 0/3] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third patch is a trivial cleanup of variable names. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v8 Patch History - Changes from v7 (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) - Update sysfs-driver-xen-blkback for new parameter (suggested by Roger Pau Monné) - Use per-xen_blkif buffer_squeeze_end instead of global variable (suggested by Roger Pau Monné) Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (3): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected xen/blkback: Remove unnecessary static variable name prefixes .../ABI/testing/sysfs-driver-xen-blkback | 9 +++ drivers/block/xen-blkback/blkback.c | 57 --- drivers/block/xen-blkback/common.h| 2 + drivers/block/xen-blkback/xenbus.c| 11 +++- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 6 files changed, 90 insertions(+), 22 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
38.7 45.8 38.7 40.12 3.1752165 No difference proven at 95.0% confidence On the fast block device max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on ramdisk based fast block device makes no visible performance degradation. Please note that this is just a very simple and minimal test. On systems using super-fast block devices and a special I/O workload, the results might be different. If you have any doubt, test on your machine with your workload to find the optimal squeezing duration for you. [1] https://aws.amazon.com/ebs/ [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html Reviewed-by: Juergen Gross Signed-off-by: SeongJae Park --- .../ABI/testing/sysfs-driver-xen-blkback | 9 drivers/block/xen-blkback/blkback.c | 22 +-- drivers/block/xen-blkback/common.h| 2 ++ drivers/block/xen-blkback/xenbus.c| 11 +- 4 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index 4e7babb3ba1f..a74a6d513c9f 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -25,3 +25,12 @@ Description: allocated without being in use. The time is in seconds, 0 means indefinitely long. The default is 60 seconds. + +What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms +Date: December 2019 +KernelVersion: 5.5 +Contact:Roger Pau Monné +Description: +How long the block backend buffers release every free pages in +those under memory pressure. The time is in milliseconds. +The default is 10 milliseconds. diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..26606c4896fd 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) HZ * xen_blkif_pgrant_timeout); } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static unsigned int buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +static unsigned long buffer_squeeze_end; + +void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif) +{ + blkif->buffer_squeeze_end = jiffies + + msecs_to_jiffies(buffer_squeeze_duration_ms); +} + static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) { unsigned long flags; @@ -656,8 +671,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..ba653126177d 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -319,6 +319,7 @@ struct xen_blkif { /* All rings for this device. */ struct xen_blkif_ring *rings; unsigned intnr_rings; + unsigned long buffer_squeeze_end; }; struct seg_buf { @@ -383,6 +384,7 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id); int xen_blkif_schedule(void *arg); int xen_blkif_purge_persistent(void *arg); void xen_blkbk_free_caches(struct xen_blkif_ring *ring); +void xen_blkbk_update_buffer_squeeze_end(struct xen_blkif *blkif); int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, struct backend_info *be, int state); diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..09fe6cb5c4ea 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -824,6 +824,14 @@ static void
[Xen-devel] [PATCH v8 3/3] xen/blkback: Remove unnecessary static variable name prefixes
A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 26606c4896fd..85ff629a7546 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } /* Once a memory pressure is detected, squeeze free page pools for a while. */ @@ -249,7 +248,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -412,14 +411,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -614,8 +612,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -675,7 +672,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -902,7 +899,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, conti
Re: [Xen-devel] [PATCH v8 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected
he value as `0` is same to a situation doing the > > squeezing always (worst-case). > > > > For the I/O performance measurement, I run a simple `dd` command 5 times > > as below and collect the 'MB/s' results. > > > > $ for i in {1..5}; do dd if=/dev/zero of=file \ > > bs=4k count=$((256*512)); sync; done > > > > If the underlying block device is slow enough, the squeezing overhead > > could be hidden. For the reason, I do this test for both a slow block > > device and a fast block device. I use a popular cloud block storage > > service, ebs[1] as a slow device and the ramdisk block device[2] for the > > fast device. > > > > The results are as below. 'max_pgs' represents the value of the > > `blkback.max_buffer_pages` parameter. > > > > On the slow block device > > > > > > max_pgs Min Max Median AvgStddev > > 0 38.7 45.8 38.7 40.12 3.1752165 > > 1024 38.7 45.8 38.7 40.12 3.1752165 > > No difference proven at 95.0% confidence > > > > On the fast block device > > > > > > max_pgs Min Max Median AvgStddev > > 0 417 423 420419.4 2.5099801 > > 1024 414 425 416417.8 4.4384682 > > No difference proven at 95.0% confidence > > > > In short, even worst case squeezing on ramdisk based fast block device > > makes no visible performance degradation. Please note that this is just > > a very simple and minimal test. On systems using super-fast block > > devices and a special I/O workload, the results might be different. If > > you have any doubt, test on your machine with your workload to find the > > optimal squeezing duration for you. > > > > [1] https://aws.amazon.com/ebs/ > > [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html > > > > Reviewed-by: Juergen Gross > > You should likely have dropped Juergen RB, since you made some > non-trivial changes. Yes, I will! > > > Signed-off-by: SeongJae Park > > --- > > .../ABI/testing/sysfs-driver-xen-blkback | 9 > > drivers/block/xen-blkback/blkback.c | 22 +-- > > drivers/block/xen-blkback/common.h| 2 ++ > > drivers/block/xen-blkback/xenbus.c| 11 +- > > 4 files changed, 41 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > index 4e7babb3ba1f..a74a6d513c9f 100644 > > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > > @@ -25,3 +25,12 @@ Description: > > allocated without being in use. The time is in > > seconds, 0 means indefinitely long. > > The default is 60 seconds. > > + > > +What: > > /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms > > +Date: December 2019 > > +KernelVersion: 5.5 > > +Contact:Roger Pau Monn� > > I think you should be the contact for this feature, you are the one > that implemented it :). > > > +Description: > > +How long the block backend buffers release every free > > pages in > > +those under memory pressure. The time is in milliseconds. > > "When memory pressure is reported to blkback this option controls the > duration in milliseconds that blkback will not cache any page not > backed by a grant mapping. The default is 10ms." Great, will change! > > > +The default is 10 milliseconds. > > diff --git a/drivers/block/xen-blkback/blkback.c > > b/drivers/block/xen-blkback/blkback.c > > index fd1e19f1a49f..26606c4896fd 100644 > > --- a/drivers/block/xen-blkback/blkback.c > > +++ b/drivers/block/xen-blkback/blkback.c > > @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct > > persistent_gnt *persistent_gnt) > > HZ * xen_blkif_pgrant_timeout); > > } > > > > +/* Once a memory pressure is detected, squeeze free page pools for a > > while. */ > > +static unsigned int buffer_squeeze_duration_ms = 10; > > +module_param_named(buffer_squeeze_duration_ms, > > + buffer_squeeze_duration_ms, int, 0644); > > +MODULE_PARM_DESC(buffer_squ
[Xen-devel] [PATCH v9 0/4] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third patch is a trivial cleanup of variable names. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v9 Patch History - Changes from v8 (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/) - Drop 'Reviewed-by: Juergen' from the second patch (suggested by Roger Pau Monné) - Update contact of the new module param to SeongJae Park (suggested by Roger Pau Monné) - Wordsmith the description of the parameter (suggested by Roger Pau Monné) - Fix dumb bugs (suggested by Roger Pau Monné) - Move module param definition to xenbus.c and reduce the number of lines for this change (suggested by Roger Pau Monné) - Add a comment for the new callback, reclaim_memory, as other callbacks also have - Add another trivial cleanup of xenbus.c file (4th patch) Changes from v7 (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) - Update sysfs-driver-xen-blkback for new parameter (suggested by Roger Pau Monné) - Use per-xen_blkif buffer_squeeze_end instead of global variable (suggested by Roger Pau Monné) Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (4): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected xen/blkback: Remove unnecessary static variable name prefixes xen/blkback: Consistently insert one empty line between functions .../ABI/testing/sysfs-driver-xen-blkback | 10 + drivers/block/xen-blkback/blkback.c | 42 +-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 26 +--- drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++ include/xen/xenbus.h | 1 + 6 files changed, 86 insertions(+), 26 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
38.7 45.8 38.7 40.12 3.1752165 No difference proven at 95.0% confidence On the fast block device max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on ramdisk based fast block device makes no visible performance degradation. Please note that this is just a very simple and minimal test. On systems using super-fast block devices and a special I/O workload, the results might be different. If you have any doubt, test on your machine with your workload to find the optimal squeezing duration for you. [1] https://aws.amazon.com/ebs/ [2] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html Signed-off-by: SeongJae Park --- .../ABI/testing/sysfs-driver-xen-blkback | 10 + drivers/block/xen-blkback/blkback.c | 7 +-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 21 ++- 4 files changed, 36 insertions(+), 3 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback index 4e7babb3ba1f..f01224231f3f 100644 --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback @@ -25,3 +25,13 @@ Description: allocated without being in use. The time is in seconds, 0 means indefinitely long. The default is 60 seconds. + +What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms +Date: December 2019 +KernelVersion: 5.5 +Contact:SeongJae Park +Description: +When memory pressure is reported to blkback this option +controls the duration in milliseconds that blkback will not +cache any page not backed by a grant mapping. +The default is 10ms. diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..79f677aeb5cc 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -656,8 +656,11 @@ int xen_blkif_schedule(void *arg) ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); } - /* Shrink if we have more than xen_blkif_max_buffer_pages */ - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + /* Shrink the free pages pool if it is too large. */ + if (time_before(jiffies, blkif->buffer_squeeze_end)) + shrink_free_pagepool(ring, 0); + else + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 1d3002d773f7..536c84f61fed 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -319,6 +319,7 @@ struct xen_blkif { /* All rings for this device. */ struct xen_blkif_ring *rings; unsigned intnr_rings; + unsigned long buffer_squeeze_end; }; struct seg_buf { diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index b90dbcd99c03..4f6ea4feca79 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev, } +/* Once a memory pressure is detected, squeeze free page pools for a while. */ +static unsigned int buffer_squeeze_duration_ms = 10; +module_param_named(buffer_squeeze_duration_ms, + buffer_squeeze_duration_ms, int, 0644); +MODULE_PARM_DESC(buffer_squeeze_duration_ms, +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); + +/* + * Callback received when the memory pressure is detected. + */ +static void reclaim_memory(struct xenbus_device *dev) +{ + struct backend_info *be = dev_get_drvdata(&dev->dev); + + be->blkif->buffer_squeeze_end = jiffies + + msecs_to_jiffies(buffer_squeeze_duration_ms); +} + /* ** Connection ** */ @@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = { .ids = xen_blkbk_ids, .probe = xen_blkbk_probe, .remove = xen_blkbk_remove, - .otherend_changed = frontend_changed + .otherend_changed = frontend_changed, + .reclaim_memory = reclaim_memory, }; int xen_blkif_xenbus_init(void) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v9 4/4] xen/blkback: Consistently insert one empty line between functions
The number of empty lines between functions in the xenbus.c is inconsistent. This trivial style cleanup commit fixes the file to consistently place only one empty line. Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 4f6ea4feca79..dc0ea123c74c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev) device_remove_file(&dev->dev, &dev_attr_physical_device); } - static void xen_vbd_free(struct xen_vbd *vbd) { if (vbd->bdev) @@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, handle, blkif->domid); return 0; } + static int xen_blkbk_remove(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(&dev->dev); @@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info if (err) dev_warn(&dev->dev, "writing feature-discard (%d)", err); } + int xen_blkbk_barrier(struct xenbus_transaction xbt, struct backend_info *be, int state) { @@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev, return err; } - /* * Callback received when the hotplug scripts have placed the physical-device * node. Read it and the mode node, and create a vbd. If the frontend is @@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch, } } - /* * Callback received when the frontend's state changes. */ @@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev, } } - /* Once a memory pressure is detected, squeeze free page pools for a while. */ static unsigned int buffer_squeeze_duration_ms = 10; module_param_named(buffer_squeeze_duration_ms, @@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev) /* ** Connection ** */ - /* * Write the physical details regarding the block device to the store, and * switch to Connected state. -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v9 1/4] xenbus/backend: Add memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&backend_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v9 3/4] xen/blkback: Remove unnecessary static variable name prefixes
A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79f677aeb5cc..fbd67f8e4e4e 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, blkif->buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
Re: [Xen-devel] [PATCH v9 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Mon, 16 Dec 2019 10:37:55 +0100 "Roger Pau Monné" wrote: > On Fri, Dec 13, 2019 at 03:35:44PM +0000, SeongJae Park wrote: > > Each `blkif` has a free pages pool for the grant mapping. The size of > > the pool starts from zero and is increased on demand while processing > > the I/O requests. If current I/O requests handling is finished or 100 > > milliseconds has passed since last I/O requests handling, it checks and > > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > > > Therefore, host administrators can cause memory pressure in blkback by > > attaching a large number of block devices and inducing I/O. Such > > problematic situations can be avoided by limiting the maximum number of > > devices that can be attached, but finding the optimal limit is not so > > easy. Improper set of the limit can results in memory pressure or a > > resource underutilization. This commit avoids such problematic > > situations by squeezing the pools (returns every free page in the pool > > to the system) for a while (users can set this duration via a module > > parameter) if memory pressure is detected. > > > > Discussions > > === > > > > The `blkback`'s original shrinking mechanism returns only pages in the > > pool which are not currently be used by `blkback` to the system. In > > other words, the pages that are not mapped with granted pages. Because > > this commit is changing only the shrink limit but still uses the same > > freeing mechanism it does not touch pages which are currently mapping > > grants. > > > > Once memory pressure is detected, this commit keeps the squeezing limit > > for a user-specified time duration. The duration should be neither too > > long nor too short. If it is too long, the squeezing incurring overhead > > can reduce the I/O performance. If it is too short, `blkback` will not > > free enough pages to reduce the memory pressure. This commit sets the > > value as `10 milliseconds` by default because it is a short time in > > terms of I/O while it is a long time in terms of memory operations. > > Also, as the original shrinking mechanism works for at least every 100 > > milliseconds, this could be a somewhat reasonable choice. I also tested > > other durations (refer to the below section for more details) and > > confirmed that 10 milliseconds is the one that works best with the test. > > That said, the proper duration depends on actual configurations and > > workloads. That's why this commit allows users to set the duration as a > > module parameter. > > > > Memory Pressure Test > > > > > > To show how this commit fixes the memory pressure situation well, I > > configured a test environment on a xen-running virtualization system. > > On the `blkfront` running guest instances, I attach a large number of > > network-backed volume devices and induce I/O to those. Meanwhile, I > > measure the number of pages that swapped in (pswpin) and out (pswpout) > > on the `blkback` running guest. The test ran twice, once for the > > `blkback` before this commit and once for that after this commit. As > > shown below, this commit has dramatically reduced the memory pressure: > > > > pswpin pswpout > > before 76,672 185,799 > > after 2123,325 > > > > Optimal Aggressive Shrinking Duration > > - > > > > To find a best squeezing duration, I repeated the test with three > > different durations (1ms, 10ms, and 100ms). The results are as below: > > > > durationpswpin pswpout > > 1 852 6,424 > > 10 212 3,325 > > 100 203 3,340 > > > > As expected, the memory pressure has decreased as the duration is > > increased, but the reduction stopped from the `10ms`. Based on this > > results, I chose the default duration as 10ms. > > > > Performance Overhead Test > > = > > > > This commit could incur I/O performance degradation under severe memory > > pressure because the squeezing will require more page allocations per > > I/O. To show the overhead, I artificially made a worst-case squeezing > > situation and measured the I/O performance of a `blkfront` running > > guest. > > > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this > > test, I set the value to `1024
[Xen-devel] [PATCH v10 1/4] xenbus/backend: Add memory pressure handler callback
From: SeongJae Park Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&backend_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v10 0/4] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and use it to mitigate the problem in 'xen-blkback' (patch 2). The third and fourth patches are trivial cleanups. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/blkback_squeezing_v10 Patch History - Changes from v9 (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/) - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné - Update the commit message for overhead test of the 2nd path Changes from v8 (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/) - Drop 'Reviewed-by: Juergen' from the second patch (suggested by Roger Pau Monné) - Update contact of the new module param to SeongJae Park (suggested by Roger Pau Monné) - Wordsmith the description of the parameter (suggested by Roger Pau Monné) - Fix dumb bugs (suggested by Roger Pau Monné) - Move module param definition to xenbus.c and reduce the number of lines for this change (suggested by Roger Pau Monné) - Add a comment for the new callback, reclaim_memory, as other callbacks also have - Add another trivial cleanup of xenbus.c file (4th patch) Changes from v7 (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) - Update sysfs-driver-xen-blkback for new parameter (suggested by Roger Pau Monné) - Use per-xen_blkif buffer_squeeze_end instead of global variable (suggested by Roger Pau Monné) Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (4): xenbus/backend: Add memory pressure handler callback xen/blkback: Squeeze page pools if a memory pressure is detected xen/blkback: Remove unnecessary static variable name prefixes xen/blkback: Consistently insert one empty line between functions .../ABI/testing/sysfs-driver-xen-blkback | 10 + drivers/block/xen-blkback/blkback.c | 42 +-- drivers/block/xen-blkback/common.h| 1 + drivers/block/xen-blkback/xenbus.c| 26 +--- drivers/xen/xenbus/xenbus_probe_backend.c | 32 ++ include/xen/xenbus.h | 1 + 6 files changed, 86 insertions(+), 26 deletions(-) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v10 3/4] xen/blkback: Remove unnecessary static variable name prefixes
From: SeongJae Park A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79f677aeb5cc..fbd67f8e4e4e 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, blkif->buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *
[Xen-devel] [PATCH v10 4/4] xen/blkback: Consistently insert one empty line between functions
From: SeongJae Park The number of empty lines between functions in the xenbus.c is inconsistent. This trivial style cleanup commit fixes the file to consistently place only one empty line. Acked-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 4f6ea4feca79..dc0ea123c74c 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev) device_remove_file(&dev->dev, &dev_attr_physical_device); } - static void xen_vbd_free(struct xen_vbd *vbd) { if (vbd->bdev) @@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, handle, blkif->domid); return 0; } + static int xen_blkbk_remove(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(&dev->dev); @@ -572,6 +572,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info if (err) dev_warn(&dev->dev, "writing feature-discard (%d)", err); } + int xen_blkbk_barrier(struct xenbus_transaction xbt, struct backend_info *be, int state) { @@ -656,7 +657,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev, return err; } - /* * Callback received when the hotplug scripts have placed the physical-device * node. Read it and the mode node, and create a vbd. If the frontend is @@ -748,7 +748,6 @@ static void backend_changed(struct xenbus_watch *watch, } } - /* * Callback received when the frontend's state changes. */ @@ -823,7 +822,6 @@ static void frontend_changed(struct xenbus_device *dev, } } - /* Once a memory pressure is detected, squeeze free page pools for a while. */ static unsigned int buffer_squeeze_duration_ms = 10; module_param_named(buffer_squeeze_duration_ms, @@ -844,7 +842,6 @@ static void reclaim_memory(struct xenbus_device *dev) /* ** Connection ** */ - /* * Write the physical details regarding the block device to the store, and * switch to Connected state. -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*512)); sync; done The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In sh
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park wrote: > From: SeongJae Park > > Each `blkif` has a free pages pool for the grant mapping. The size of > the pool starts from zero and is increased on demand while processing > the I/O requests. If current I/O requests handling is finished or 100 > milliseconds has passed since last I/O requests handling, it checks and > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > Therefore, host administrators can cause memory pressure in blkback by > attaching a large number of block devices and inducing I/O. Such > problematic situations can be avoided by limiting the maximum number of > devices that can be attached, but finding the optimal limit is not so > easy. Improper set of the limit can results in memory pressure or a > resource underutilization. This commit avoids such problematic > situations by squeezing the pools (returns every free page in the pool > to the system) for a while (users can set this duration via a module > parameter) if memory pressure is detected. > > Discussions > === > > The `blkback`'s original shrinking mechanism returns only pages in the > pool which are not currently be used by `blkback` to the system. In > other words, the pages that are not mapped with granted pages. Because > this commit is changing only the shrink limit but still uses the same > freeing mechanism it does not touch pages which are currently mapping > grants. > > Once memory pressure is detected, this commit keeps the squeezing limit > for a user-specified time duration. The duration should be neither too > long nor too short. If it is too long, the squeezing incurring overhead > can reduce the I/O performance. If it is too short, `blkback` will not > free enough pages to reduce the memory pressure. This commit sets the > value as `10 milliseconds` by default because it is a short time in > terms of I/O while it is a long time in terms of memory operations. > Also, as the original shrinking mechanism works for at least every 100 > milliseconds, this could be a somewhat reasonable choice. I also tested > other durations (refer to the below section for more details) and > confirmed that 10 milliseconds is the one that works best with the test. > That said, the proper duration depends on actual configurations and > workloads. That's why this commit allows users to set the duration as a > module parameter. > > Memory Pressure Test > > > To show how this commit fixes the memory pressure situation well, I > configured a test environment on a xen-running virtualization system. > On the `blkfront` running guest instances, I attach a large number of > network-backed volume devices and induce I/O to those. Meanwhile, I > measure the number of pages that swapped in (pswpin) and out (pswpout) > on the `blkback` running guest. The test ran twice, once for the > `blkback` before this commit and once for that after this commit. As > shown below, this commit has dramatically reduced the memory pressure: > > pswpin pswpout > before 76,672 185,799 > after 2123,325 > > Optimal Aggressive Shrinking Duration > - > > To find a best squeezing duration, I repeated the test with three > different durations (1ms, 10ms, and 100ms). The results are as below: > > durationpswpin pswpout > 1 852 6,424 > 10 212 3,325 > 100 203 3,340 > > As expected, the memory pressure has decreased as the duration is > increased, but the reduction stopped from the `10ms`. Based on this > results, I chose the default duration as 10ms. > > Performance Overhead Test > = > > This commit could incur I/O performance degradation under severe memory > pressure because the squeezing will require more page allocations per > I/O. To show the overhead, I artificially made a worst-case squeezing > situation and measured the I/O performance of a `blkfront` running > guest. > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this > test, I set the value to `1024` and `0`. The `1024` is the default > value. Setting the value as `0` is same to a situation doing the > squeezing always (worst-case). > > If the underlying block device is slow enough, the squeezing overhead > could be hidden. For the reason, I use a fast block device, namely the > rbd[1]: > > # xl block-attach guest phy:/dev/ram0 xvdb w > > For the I/O performance measurement, I run a simple `dd` command 5 times > d
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park wrote: > On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park wrote: > > > From: SeongJae Park > > [...] > > --- a/drivers/block/xen-blkback/xenbus.c > > +++ b/drivers/block/xen-blkback/xenbus.c > > @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev, > > } > > > > > > +/* Once a memory pressure is detected, squeeze free page pools for a > > while. */ > > +static unsigned int buffer_squeeze_duration_ms = 10; > > +module_param_named(buffer_squeeze_duration_ms, > > + buffer_squeeze_duration_ms, int, 0644); > > +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > +"Duration in ms to squeeze pages buffer when a memory pressure is > > detected"); > > + > > +/* > > + * Callback received when the memory pressure is detected. > > + */ > > +static void reclaim_memory(struct xenbus_device *dev) > > +{ > > + struct backend_info *be = dev_get_drvdata(&dev->dev); > > + > > + be->blkif->buffer_squeeze_end = jiffies + > > + msecs_to_jiffies(buffer_squeeze_duration_ms); > > This callback might race with 'xen_blkbk_probe()'. The race could result in > __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links > 'be' to the 'dev'. Please _don't merge_ this patch now! > > I will do more test and share results. Meanwhile, if you have any opinion, > please let me know. Not only '->blkif', but 'be' itself also coule be a NULL. As similar concurrency issues could be in other drivers in their way, I suggest to change the reclaim callback ('->reclaim_memory') to be called for each driver instead of each device. Then, each driver could be able to deal with its concurrency issues by itself. For blkback, we could reuse the global variable based approach, as similar to the v7[1] of this patchset. As the callback is called for each driver instead of each device now, the duplicated set of the timeout will not happen. Thanks, SeongJae Park [1] https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/ > > > Thanks, > SeongJae Park > > > +} > > + > > /* ** Connection ** */ > > > > > > @@ -1115,7 +1133,8 @@ static struct xenbus_driver xen_blkbk_driver = { > > .ids = xen_blkbk_ids, > > .probe = xen_blkbk_probe, > > .remove = xen_blkbk_remove, > > - .otherend_changed = frontend_changed > > + .otherend_changed = frontend_changed, > > + .reclaim_memory = reclaim_memory, > > }; > > > > int xen_blkif_xenbus_init(void) > > -- > > 2.17.1 > > > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > On 16.12.19 17:15, SeongJae Park wrote: > > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park wrote: > > > >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park wrote: > >> > >>> From: SeongJae Park > >>> > > [...] > >>> --- a/drivers/block/xen-blkback/xenbus.c > >>> +++ b/drivers/block/xen-blkback/xenbus.c > >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device > >>> *dev, > >>> } > >>> > >>> > >>> +/* Once a memory pressure is detected, squeeze free page pools for a > >>> while. */ > >>> +static unsigned int buffer_squeeze_duration_ms = 10; > >>> +module_param_named(buffer_squeeze_duration_ms, > >>> + buffer_squeeze_duration_ms, int, 0644); > >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > >>> +"Duration in ms to squeeze pages buffer when a memory pressure is > >>> detected"); > >>> + > >>> +/* > >>> + * Callback received when the memory pressure is detected. > >>> + */ > >>> +static void reclaim_memory(struct xenbus_device *dev) > >>> +{ > >>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > >>> + > >>> + be->blkif->buffer_squeeze_end = jiffies + > >>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >> > >> This callback might race with 'xen_blkbk_probe()'. The race could result > >> in > >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > >> links > >> 'be' to the 'dev'. Please _don't merge_ this patch now! > >> > >> I will do more test and share results. Meanwhile, if you have any opinion, > >> please let me know. I reduced system memory and attached bunch of devices in short time so that memory pressure occurs while device attachments are ongoing. Under this circumstance, I was able to see the race. > > > > Not only '->blkif', but 'be' itself also coule be a NULL. As similar > > concurrency issues could be in other drivers in their way, I suggest to > > change > > the reclaim callback ('->reclaim_memory') to be called for each driver > > instead > > of each device. Then, each driver could be able to deal with its > > concurrency > > issues by itself. > > Hmm, I don't like that. This would need to be changed back in case we > add per-guest quota. Extending this callback in that way would be still not too hard. We could use the argument to the callback. I would keep the argument of the callback to 'struct device *' as is, and will add a comment saying 'NULL' value of the argument means every devices. As an example, xenbus would pass NULL-ending array of the device pointers that need to free its resources. After seeing this race, I am now also thinking it could be better to delegate detailed control of each device to its driver, as some drivers have some complicated and unique relation with its devices. > > Wouldn't a get_device() before calling the callback and a put_device() > afterwards avoid that problem? I didn't used the reference count manipulation operations because other similar parts also didn't. But, if there is no implicit reference count guarantee, it seems those operations are indeed necessary. That said, as get/put operations only adjust the reference count, those will not make the callback to wait until the linking of the 'backend' and 'blkif' to the device (xen_blkbk_probe()) is finished. Thus, the race could still happen. Or, am I missing something? I also modified the code to do 'get_device()' and 'put_device()' as you suggested and did test, but the race was still reproducible. Thanks, SeongJae Park > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" wrote: > On 16.12.19 20:48, SeongJae Park wrote: > > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > > >> On 16.12.19 17:15, SeongJae Park wrote: > >>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park > >>> wrote: > >>> > >>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park > >>>> wrote: > >>>> > >>>>> From: SeongJae Park > >>>>> > >>> [...] > >>>>> --- a/drivers/block/xen-blkback/xenbus.c > >>>>> +++ b/drivers/block/xen-blkback/xenbus.c > >>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device > >>>>> *dev, > >>>>>} > >>>>> > >>>>> > >>>>> +/* Once a memory pressure is detected, squeeze free page pools for a > >>>>> while. */ > >>>>> +static unsigned int buffer_squeeze_duration_ms = 10; > >>>>> +module_param_named(buffer_squeeze_duration_ms, > >>>>> + buffer_squeeze_duration_ms, int, 0644); > >>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > >>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is > >>>>> detected"); > >>>>> + > >>>>> +/* > >>>>> + * Callback received when the memory pressure is detected. > >>>>> + */ > >>>>> +static void reclaim_memory(struct xenbus_device *dev) > >>>>> +{ > >>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > >>>>> + > >>>>> + be->blkif->buffer_squeeze_end = jiffies + > >>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >>>> > >>>> This callback might race with 'xen_blkbk_probe()'. The race could > >>>> result in > >>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > >>>> links > >>>> 'be' to the 'dev'. Please _don't merge_ this patch now! > >>>> > >>>> I will do more test and share results. Meanwhile, if you have any > >>>> opinion, > >>>> please let me know. > > > > I reduced system memory and attached bunch of devices in short time so that > > memory pressure occurs while device attachments are ongoing. Under this > > circumstance, I was able to see the race. > > > >>> > >>> Not only '->blkif', but 'be' itself also coule be a NULL. As similar > >>> concurrency issues could be in other drivers in their way, I suggest to > >>> change > >>> the reclaim callback ('->reclaim_memory') to be called for each driver > >>> instead > >>> of each device. Then, each driver could be able to deal with its > >>> concurrency > >>> issues by itself. > >> > >> Hmm, I don't like that. This would need to be changed back in case we > >> add per-guest quota. > > > > Extending this callback in that way would be still not too hard. We could > > use > > the argument to the callback. I would keep the argument of the callback to > > 'struct device *' as is, and will add a comment saying 'NULL' value of the > > argument means every devices. As an example, xenbus would pass NULL-ending > > array of the device pointers that need to free its resources. > > > > After seeing this race, I am now also thinking it could be better to > > delegate > > detailed control of each device to its driver, as some drivers have some > > complicated and unique relation with its devices. > > > >> > >> Wouldn't a get_device() before calling the callback and a put_device() > >> afterwards avoid that problem? > > > > I didn't used the reference count manipulation operations because other > > similar > > parts also didn't. But, if there is no implicit reference count guarantee, > > it > > seems those operations are indeed necessary. > > > > That said, as get/put operations only adjust the reference count, those will > > not make the callback to wait until the linking of the 'backend' and > > 'blkif' to > > the device (xen_blkbk_probe()) is finished. Thus, the race could still > > happen. > > Or, am I missing something? > > No, I think we need a xenbus lock per device which will need to be > taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the > callback. I also agree that locking should be used at last. But, as each driver manages its devices and resources in their way, it could have its unique race conditions. And, each unique race condition might have its unique efficient way to synchronize it. Therefore, I think the synchronization should be done by each driver, not by xenbus and thus we should make the callback to be called per-driver. Thanks, SeongJae Park > > > Juergen > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß" wrote: > On 17.12.19 08:59, SeongJae Park wrote: > > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" wrote: > > > >> On 16.12.19 20:48, SeongJae Park wrote: > >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > >>> > >>>> On 16.12.19 17:15, SeongJae Park wrote: > >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park > >>>>> wrote: > >>>>> > >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park > >>>>>> wrote: > >>>>>> > >>>>>>> From: SeongJae Park > >>>>>>> > >>>>> [...] > >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c > >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c > >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct > >>>>>>> xenbus_device *dev, > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a > >>>>>>> while. */ > >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10; > >>>>>>> +module_param_named(buffer_squeeze_duration_ms, > >>>>>>> + buffer_squeeze_duration_ms, int, 0644); > >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is > >>>>>>> detected"); > >>>>>>> + > >>>>>>> +/* > >>>>>>> + * Callback received when the memory pressure is detected. > >>>>>>> + */ > >>>>>>> +static void reclaim_memory(struct xenbus_device *dev) > >>>>>>> +{ > >>>>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > >>>>>>> + > >>>>>>> + be->blkif->buffer_squeeze_end = jiffies + > >>>>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > >>>>>> > >>>>>> This callback might race with 'xen_blkbk_probe()'. The race could > >>>>>> result in > >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > >>>>>> links > >>>>>> 'be' to the 'dev'. Please _don't merge_ this patch now! > >>>>>> > >>>>>> I will do more test and share results. Meanwhile, if you have any > >>>>>> opinion, > >>>>>> please let me know. > >>> > >>> I reduced system memory and attached bunch of devices in short time so > >>> that > >>> memory pressure occurs while device attachments are ongoing. Under this > >>> circumstance, I was able to see the race. > >>> > >>>>> > >>>>> Not only '->blkif', but 'be' itself also coule be a NULL. As similar > >>>>> concurrency issues could be in other drivers in their way, I suggest to > >>>>> change > >>>>> the reclaim callback ('->reclaim_memory') to be called for each driver > >>>>> instead > >>>>> of each device. Then, each driver could be able to deal with its > >>>>> concurrency > >>>>> issues by itself. > >>>> > >>>> Hmm, I don't like that. This would need to be changed back in case we > >>>> add per-guest quota. > >>> > >>> Extending this callback in that way would be still not too hard. We > >>> could use > >>> the argument to the callback. I would keep the argument of the callback > >>> to > >>> 'struct device *' as is, and will add a comment saying 'NULL' value of the > >>> argument means every devices. As an example, xenbus would pass > >>> NULL-ending > >>> array of the device pointers that need to free its resources. > >>> > >>> After seeing this race, I am now also thinking it could be better to > >>> delegate > >>> detailed control of each device to its dri
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Tue, 17 Dec 2019 12:39:15 +0100 "Roger Pau Monné" wrote: > On Mon, Dec 16, 2019 at 08:48:03PM +0100, SeongJae Park wrote: > > On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > > > > On 16.12.19 17:15, SeongJae Park wrote: > > > > On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park > > > > wrote: > > > > > > > >> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park > > > >> wrote: > > > >> > > > >>> From: SeongJae Park > > > >>> > > > > [...] > > > >>> --- a/drivers/block/xen-blkback/xenbus.c > > > >>> +++ b/drivers/block/xen-blkback/xenbus.c > > > >>> @@ -824,6 +824,24 @@ static void frontend_changed(struct > > > >>> xenbus_device *dev, > > > >>> } > > > >>> > > > >>> > > > >>> +/* Once a memory pressure is detected, squeeze free page pools for a > > > >>> while. */ > > > >>> +static unsigned int buffer_squeeze_duration_ms = 10; > > > >>> +module_param_named(buffer_squeeze_duration_ms, > > > >>> + buffer_squeeze_duration_ms, int, 0644); > > > >>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > > >>> +"Duration in ms to squeeze pages buffer when a memory pressure is > > > >>> detected"); > > > >>> + > > > >>> +/* > > > >>> + * Callback received when the memory pressure is detected. > > > >>> + */ > > > >>> +static void reclaim_memory(struct xenbus_device *dev) > > > >>> +{ > > > >>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > > > >>> + > > > >>> + be->blkif->buffer_squeeze_end = jiffies + > > > >>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > > > >> > > > >> This callback might race with 'xen_blkbk_probe()'. The race could > > > >> result in > > > >> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it > > > >> links > > > >> 'be' to the 'dev'. Please _don't merge_ this patch now! > > > >> > > > >> I will do more test and share results. Meanwhile, if you have any > > > >> opinion, > > > >> please let me know. > > > > I reduced system memory and attached bunch of devices in short time so that > > memory pressure occurs while device attachments are ongoing. Under this > > circumstance, I was able to see the race. > > > > > > > > > > Not only '->blkif', but 'be' itself also coule be a NULL. As similar > > > > concurrency issues could be in other drivers in their way, I suggest to > > > > change > > > > the reclaim callback ('->reclaim_memory') to be called for each driver > > > > instead > > > > of each device. Then, each driver could be able to deal with its > > > > concurrency > > > > issues by itself. > > > > > > Hmm, I don't like that. This would need to be changed back in case we > > > add per-guest quota. > > > > Extending this callback in that way would be still not too hard. We could > > use > > the argument to the callback. I would keep the argument of the callback to > > 'struct device *' as is, and will add a comment saying 'NULL' value of the > > argument means every devices. As an example, xenbus would pass NULL-ending > > array of the device pointers that need to free its resources. > > > > After seeing this race, I am now also thinking it could be better to > > delegate > > detailed control of each device to its driver, as some drivers have some > > complicated and unique relation with its devices. > > > > > > > > Wouldn't a get_device() before calling the callback and a put_device() > > > afterwards avoid that problem? > > > > I didn't used the reference count manipulation operations because other > > similar > > parts also didn't. But, if there is no implicit reference count guarantee, > > it > > seems those operations are indeed necessary. > > > > That said, as get/put operations only adjust the reference count, those will > > not make the callback
[Xen-devel] [PATCH v11 1/6] xenbus/backend: Add memory pressure handler callback
From: SeongJae Park Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this commit adds a memory reclaim callback to 'xenbus_driver'. If a memory pressure is detected, 'xenbus' requests every backend driver to volunarily release its memory. Note that it would be able to improve the callback facility for more sophisticated handlings of general pressures. For example, it would be possible to monitor the memory consumption of each device and issue the release requests to only devices which causing the pressure. Also, the callback could be extended to handle not only memory, but general resources. Nevertheless, this version of the implementation defers such sophisticated goals as a future work. Reviewed-by: Juergen Gross Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++ include/xen/xenbus.h | 1 + 2 files changed, 33 insertions(+) diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index b0bed4faf44c..7e78ebef7c54 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -248,6 +248,35 @@ static int backend_probe_and_watch(struct notifier_block *notifier, return NOTIFY_DONE; } +static int backend_reclaim_memory(struct device *dev, void *data) +{ + const struct xenbus_driver *drv; + + if (!dev->driver) + return 0; + drv = to_xenbus_driver(dev->driver); + if (drv && drv->reclaim_memory) + drv->reclaim_memory(to_xenbus_device(dev)); + return 0; +} + +/* + * Returns 0 always because we are using shrinker to only detect memory + * pressure. + */ +static unsigned long backend_shrink_memory_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + bus_for_each_dev(&xenbus_backend.bus, NULL, NULL, + backend_reclaim_memory); + return 0; +} + +static struct shrinker backend_memory_shrinker = { + .count_objects = backend_shrink_memory_count, + .seeks = DEFAULT_SEEKS, +}; + static int __init xenbus_probe_backend_init(void) { static struct notifier_block xenstore_notifier = { @@ -264,6 +293,9 @@ static int __init xenbus_probe_backend_init(void) register_xenstore_notifier(&xenstore_notifier); + if (register_shrinker(&backend_memory_shrinker)) + pr_warn("shrinker registration failed\n"); + return 0; } subsys_initcall(xenbus_probe_backend_init); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 869c816d5f8c..c861cfb6f720 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -104,6 +104,7 @@ struct xenbus_driver { struct device_driver driver; int (*read_otherend_details)(struct xenbus_device *dev); int (*is_ready)(struct xenbus_device *dev); + void (*reclaim_memory)(struct xenbus_device *dev); }; static inline struct xenbus_driver *to_xenbus_driver(struct device_driver *drv) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11 0/6] xenbus/backend: Add a memory pressure handler callback
Granting pages consumes backend system memory. In systems configured with insufficient spare memory for those pages, it can cause a memory pressure situation. However, finding the optimal amount of the spare memory is challenging for large systems having dynamic resource utilization patterns. Also, such a static configuration might lack flexibility. To mitigate such problems, this patchset adds a memory reclaim callback to 'xenbus_driver' (patch 1) and then introduce a lock for race condition avoidance (patch 2). Those two patches could be merged into one patch if necessary. The third patch applies the callback mechanism to mitigate the problem in 'xen-blkback' (patch 3), but it lacks use of the race condition mitigation. Following change (patch 4) applies the race protection mechanism to the blkback. Patch 3 and patch 4 has seperated for only review convenience. Highly recommend to merge those into one patch as patch 3 applied version might confuse bisecting. The fifth and sixth patches are trivial cleanups; those fix nits we found during the development of this patchset. Note that patch 1, 3, 5, 6 are same with previous version. I made the changes in this version to different commits (only second and fourth patches) to make review more comfortable. Especially, the third and fourth patches should be merged into one patch, as the third one alone might make bisecting confuse. Tthe next version of this patchset will also merge those. Base Version This patch is based on v5.4. A complete tree is also available at my public git repo: https://github.com/sjp38/linux/tree/patches/blkback/buffer_squeeze/v11 Patch History - Changes from v10 (https://lore.kernel.org/xen-devel/20191216124527.30306-1-sjp...@amazon.com/) - Fix race condition (reported by SeongJae, suggested by Juergen) Changes from v9 (https://lore.kernel.org/xen-devel/20191213153546.17425-1-sjp...@amazon.de/) - Add 'Reviewed-by' and 'Acked-by' from Roger Pau Monné - Update the commit message for overhead test of the 2nd path Changes from v8 (https://lore.kernel.org/xen-devel/20191213130211.24011-1-sjp...@amazon.de/) - Drop 'Reviewed-by: Juergen' from the second patch (suggested by Roger Pau Monné) - Update contact of the new module param to SeongJae Park (suggested by Roger Pau Monné) - Wordsmith the description of the parameter (suggested by Roger Pau Monné) - Fix dumb bugs (suggested by Roger Pau Monné) - Move module param definition to xenbus.c and reduce the number of lines for this change (suggested by Roger Pau Monné) - Add a comment for the new callback, reclaim_memory, as other callbacks also have - Add another trivial cleanup of xenbus.c file (4th patch) Changes from v7 (https://lore.kernel.org/xen-devel/20191211181016.14366-1-sjp...@amazon.de/) - Update sysfs-driver-xen-blkback for new parameter (suggested by Roger Pau Monné) - Use per-xen_blkif buffer_squeeze_end instead of global variable (suggested by Roger Pau Monné) Changes from v6 (https://lore.kernel.org/linux-block/20191211042428.5961-1-sjp...@amazon.de/) - Remove more unnecessary prefixes (suggested by Roger Pau Monné) - Constify a variable (suggested by Roger Pau Monné) - Rename 'reclaim' into 'reclaim_memory' (suggested by Roger Pau Monné) - More wordsmith of the commit message (suggested by Roger Pau Monné) Changes from v5 (https://lore.kernel.org/linux-block/20191210080628.5264-1-sjp...@amazon.de/) - Wordsmith the commit messages (suggested by Roger Pau Monné) - Change the reclaim callback return type (suggested by Roger Pau Monné) - Change the type of the blkback squeeze duration variable (suggested by Roger Pau Monné) - Add a patch for removal of unnecessary static variable name prefixes (suggested by Roger Pau Monné) - Fix checkpatch.pl warnings Changes from v4 (https://lore.kernel.org/xen-devel/20191209194305.20828-1-sjp...@amazon.com/) - Remove domain id parameter from the callback (suggested by Juergen Gross) - Rename xen-blkback module parameter (suggested by Stefan Nuernburger) Changes from v3 (https://lore.kernel.org/xen-devel/20191209085839.21215-1-sjp...@amazon.com/) - Add general callback in xen_driver and use it (suggested by Juergen Gross) Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34...@amazon.com) - Rename the module parameter and variables for brevity (aggressive shrinking -> squeezing) Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjp...@amazon.com/) - Adjust the description to not use the term, `arbitrarily` (suggested by Paul Durrant) - Specify time unit of the duration in the parameter description, (suggested by Maximilian Heyne) - Change default aggressive shrinking duration from 1ms to 10ms - Merge two patches into one single patch SeongJae Park (6): xenbus/backend: Add memory
[Xen-devel] [PATCH v11 3/6] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*512)); sync; done The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. max_pgs Min Max Median AvgStddev 0 417 423 420419.4 2.5099801 1024 414 425 416417.8 4.4384682 No difference proven at 95.0% confidence In sh
[Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock
From: SeongJae Park 'reclaim_memory' callback can race with a driver code as this callback will be called from any memory pressure detected context. To deal with the case, this commit adds a spinlock in the 'xenbus_device'. Whenever 'reclaim_memory' callback is called, the lock of the device which passed to the callback as its argument is locked. Thus, drivers registering their 'reclaim_memory' callback should protect the data that might race with the callback with the lock by themselves. Signed-off-by: SeongJae Park --- drivers/xen/xenbus/xenbus_probe.c | 1 + drivers/xen/xenbus/xenbus_probe_backend.c | 10 -- include/xen/xenbus.h | 2 ++ 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 5b471889d723..b86393f172e6 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, goto fail; dev_set_name(&xendev->dev, "%s", devname); + spin_lock_init(&xendev->reclaim_lock); /* Register with generic device framework. */ err = device_register(&xendev->dev); diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index 7e78ebef7c54..516aa64b9967 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct notifier_block *notifier, static int backend_reclaim_memory(struct device *dev, void *data) { const struct xenbus_driver *drv; + struct xenbus_device *xdev; + unsigned long flags; if (!dev->driver) return 0; drv = to_xenbus_driver(dev->driver); - if (drv && drv->reclaim_memory) - drv->reclaim_memory(to_xenbus_device(dev)); + if (drv && drv->reclaim_memory) { + xdev = to_xenbus_device(dev); + spin_trylock_irqsave(&xdev->reclaim_lock, flags); + drv->reclaim_memory(xdev); + spin_unlock_irqrestore(&xdev->reclaim_lock, flags); + } return 0; } diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index c861cfb6f720..d9468313061d 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -76,6 +76,8 @@ struct xenbus_device { enum xenbus_state state; struct completion down; struct work_struct work; + /* 'reclaim_memory' callback is called while this lock is acquired */ + spinlock_t reclaim_lock; }; static inline struct xenbus_device *to_xenbus_device(struct device *dev) -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11 4/6] xen/blkback: Protect 'reclaim_memory()' with 'reclaim_lock'
From: SeongJae Park The 'reclaim_memory()' callback of blkback could race with 'xen_blkbk_probe()' and 'xen_blkbk_remove()'. In the case, incompletely linked 'backend_info' and 'blkif' might be exposed to the callback, thus result in bad results including NULL dereference. This commit fixes the problem by applying the 'reclaim_lock' protection to those. Note that this commit is separated for review purpose only. As the previous commit might result in race condition and might make bisect confuse, please squash this commit into previous commit if possible. Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 4f6ea4feca79..20045827a391 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -492,6 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, static int xen_blkbk_remove(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(&dev->dev); + unsigned long flags; pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id); @@ -504,6 +505,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev) be->backend_watch.node = NULL; } + spin_lock_irqsave(&dev->reclaim_lock, flags); dev_set_drvdata(&dev->dev, NULL); if (be->blkif) { @@ -512,6 +514,7 @@ static int xen_blkbk_remove(struct xenbus_device *dev) /* Put the reference we set in xen_blkif_alloc(). */ xen_blkif_put(be->blkif); } + spin_unlock_irqrestore(&dev->reclaim_lock, flags); return 0; } @@ -597,6 +600,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev, int err; struct backend_info *be = kzalloc(sizeof(struct backend_info), GFP_KERNEL); + unsigned long flags; /* match the pr_debug in xen_blkbk_remove */ pr_debug("%s %p %d\n", __func__, dev, dev->otherend_id); @@ -607,6 +611,7 @@ static int xen_blkbk_probe(struct xenbus_device *dev, return -ENOMEM; } be->dev = dev; + spin_lock_irqsave(&dev->reclaim_lock, flags); dev_set_drvdata(&dev->dev, be); be->blkif = xen_blkif_alloc(dev->otherend_id); @@ -614,8 +619,10 @@ static int xen_blkbk_probe(struct xenbus_device *dev, err = PTR_ERR(be->blkif); be->blkif = NULL; xenbus_dev_fatal(dev, err, "creating block interface"); + spin_unlock_irqrestore(&dev->reclaim_lock, flags); goto fail; } + spin_unlock_irqrestore(&dev->reclaim_lock, flags); err = xenbus_printf(XBT_NIL, dev->nodename, "feature-max-indirect-segments", "%u", @@ -838,6 +845,10 @@ static void reclaim_memory(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(&dev->dev); + /* Device is registered but not probed yet */ + if (!be) + return; + be->blkif->buffer_squeeze_end = jiffies + msecs_to_jiffies(buffer_squeeze_duration_ms); } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v11 5/6] xen/blkback: Remove unnecessary static variable name prefixes
From: SeongJae Park A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/blkback.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 79f677aeb5cc..fbd67f8e4e4e 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -62,8 +62,8 @@ * IO workloads. */ -static int xen_blkif_max_buffer_pages = 1024; -module_param_named(max_buffer_pages, xen_blkif_max_buffer_pages, int, 0644); +static int max_buffer_pages = 1024; +module_param_named(max_buffer_pages, max_buffer_pages, int, 0644); MODULE_PARM_DESC(max_buffer_pages, "Maximum number of free pages to keep in each block backend buffer"); @@ -78,8 +78,8 @@ MODULE_PARM_DESC(max_buffer_pages, * algorithm. */ -static int xen_blkif_max_pgrants = 1056; -module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644); +static int max_pgrants = 1056; +module_param_named(max_persistent_grants, max_pgrants, int, 0644); MODULE_PARM_DESC(max_persistent_grants, "Maximum number of grants to map persistently"); @@ -88,8 +88,8 @@ MODULE_PARM_DESC(max_persistent_grants, * use. The time is in seconds, 0 means indefinitely long. */ -static unsigned int xen_blkif_pgrant_timeout = 60; -module_param_named(persistent_grant_unused_seconds, xen_blkif_pgrant_timeout, +static unsigned int pgrant_timeout = 60; +module_param_named(persistent_grant_unused_seconds, pgrant_timeout, uint, 0644); MODULE_PARM_DESC(persistent_grant_unused_seconds, "Time in seconds an unused persistent grant is allowed to " @@ -137,9 +137,8 @@ module_param(log_stats, int, 0644); static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt) { - return xen_blkif_pgrant_timeout && - (jiffies - persistent_gnt->last_used >= - HZ * xen_blkif_pgrant_timeout); + return pgrant_timeout && (jiffies - persistent_gnt->last_used >= + HZ * pgrant_timeout); } static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page) @@ -234,7 +233,7 @@ static int add_persistent_gnt(struct xen_blkif_ring *ring, struct persistent_gnt *this; struct xen_blkif *blkif = ring->blkif; - if (ring->persistent_gnt_c >= xen_blkif_max_pgrants) { + if (ring->persistent_gnt_c >= max_pgrants) { if (!blkif->vbd.overflow_max_grants) blkif->vbd.overflow_max_grants = 1; return -EBUSY; @@ -397,14 +396,13 @@ static void purge_persistent_gnt(struct xen_blkif_ring *ring) goto out; } - if (ring->persistent_gnt_c < xen_blkif_max_pgrants || - (ring->persistent_gnt_c == xen_blkif_max_pgrants && + if (ring->persistent_gnt_c < max_pgrants || + (ring->persistent_gnt_c == max_pgrants && !ring->blkif->vbd.overflow_max_grants)) { num_clean = 0; } else { - num_clean = (xen_blkif_max_pgrants / 100) * LRU_PERCENT_CLEAN; - num_clean = ring->persistent_gnt_c - xen_blkif_max_pgrants + - num_clean; + num_clean = (max_pgrants / 100) * LRU_PERCENT_CLEAN; + num_clean = ring->persistent_gnt_c - max_pgrants + num_clean; num_clean = min(ring->persistent_gnt_c, num_clean); pr_debug("Going to purge at least %u persistent grants\n", num_clean); @@ -599,8 +597,7 @@ static void print_stats(struct xen_blkif_ring *ring) current->comm, ring->st_oo_req, ring->st_rd_req, ring->st_wr_req, ring->st_f_req, ring->st_ds_req, -ring->persistent_gnt_c, -xen_blkif_max_pgrants); +ring->persistent_gnt_c, max_pgrants); ring->st_print = jiffies + msecs_to_jiffies(10 * 1000); ring->st_rd_req = 0; ring->st_wr_req = 0; @@ -660,7 +657,7 @@ int xen_blkif_schedule(void *arg) if (time_before(jiffies, blkif->buffer_squeeze_end)) shrink_free_pagepool(ring, 0); else - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); + shrink_free_pagepool(ring, max_buffer_pages); if (log_stats && time_after(jiffies, ring->st_print)) print_stats(ring); @@ -887,7 +884,7 @@ static int xen_blkbk_map(struct xen_blkif_ring *
[Xen-devel] [PATCH v11 6/6] xen/blkback: Consistently insert one empty line between functions
From: SeongJae Park The number of empty lines between functions in the xenbus.c is inconsistent. This trivial style cleanup commit fixes the file to consistently place only one empty line. Acked-by: Roger Pau Monné Signed-off-by: SeongJae Park --- drivers/block/xen-blkback/xenbus.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 20045827a391..453f97dd533d 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -432,7 +432,6 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev) device_remove_file(&dev->dev, &dev_attr_physical_device); } - static void xen_vbd_free(struct xen_vbd *vbd) { if (vbd->bdev) @@ -489,6 +488,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle, handle, blkif->domid); return 0; } + static int xen_blkbk_remove(struct xenbus_device *dev) { struct backend_info *be = dev_get_drvdata(&dev->dev); @@ -575,6 +575,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info if (err) dev_warn(&dev->dev, "writing feature-discard (%d)", err); } + int xen_blkbk_barrier(struct xenbus_transaction xbt, struct backend_info *be, int state) { @@ -663,7 +664,6 @@ static int xen_blkbk_probe(struct xenbus_device *dev, return err; } - /* * Callback received when the hotplug scripts have placed the physical-device * node. Read it and the mode node, and create a vbd. If the frontend is @@ -755,7 +755,6 @@ static void backend_changed(struct xenbus_watch *watch, } } - /* * Callback received when the frontend's state changes. */ @@ -830,7 +829,6 @@ static void frontend_changed(struct xenbus_device *dev, } } - /* Once a memory pressure is detected, squeeze free page pools for a while. */ static unsigned int buffer_squeeze_duration_ms = 10; module_param_named(buffer_squeeze_duration_ms, @@ -855,7 +853,6 @@ static void reclaim_memory(struct xenbus_device *dev) /* ** Connection ** */ - /* * Write the physical details regarding the block device to the store, and * switch to Connected state. -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a memory pressure is detected
On Tue, 17 Dec 2019 09:30:32 +0100 SeongJae Park wrote: > On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß" wrote: > > > On 17.12.19 08:59, SeongJae Park wrote: > > > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" wrote: > > > > > >> On 16.12.19 20:48, SeongJae Park wrote: > > >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > >>> > > >>>> On 16.12.19 17:15, SeongJae Park wrote: > > >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park > > >>>>> wrote: > > >>>>> > > >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park > > >>>>>> wrote: > > >>>>>> > > >>>>>>> From: SeongJae Park > > >>>>>>> > > >>>>> [...] > > >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c > > >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c > > >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct > > >>>>>>> xenbus_device *dev, > > >>>>>>> } > > >>>>>>> > > >>>>>>> > > >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for > > >>>>>>> a while. */ > > >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10; > > >>>>>>> +module_param_named(buffer_squeeze_duration_ms, > > >>>>>>> + buffer_squeeze_duration_ms, int, 0644); > > >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is > > >>>>>>> detected"); > > >>>>>>> + > > >>>>>>> +/* > > >>>>>>> + * Callback received when the memory pressure is detected. > > >>>>>>> + */ > > >>>>>>> +static void reclaim_memory(struct xenbus_device *dev) > > >>>>>>> +{ > > >>>>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > > >>>>>>> + > > >>>>>>> + be->blkif->buffer_squeeze_end = jiffies + > > >>>>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > > >>>>>> > > >>>>>> This callback might race with 'xen_blkbk_probe()'. The race could > > >>>>>> result in > > >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after > > >>>>>> it links > > >>>>>> 'be' to the 'dev'. Please _don't merge_ this patch now! > > >>>>>> > > >>>>>> I will do more test and share results. Meanwhile, if you have any > > >>>>>> opinion, > > >>>>>> please let me know. > > >>> > > >>> I reduced system memory and attached bunch of devices in short time so > > >>> that > > >>> memory pressure occurs while device attachments are ongoing. Under this > > >>> circumstance, I was able to see the race. > > >>> > > >>>>> > > >>>>> Not only '->blkif', but 'be' itself also coule be a NULL. As similar > > >>>>> concurrency issues could be in other drivers in their way, I suggest > > >>>>> to change > > >>>>> the reclaim callback ('->reclaim_memory') to be called for each > > >>>>> driver instead > > >>>>> of each device. Then, each driver could be able to deal with its > > >>>>> concurrency > > >>>>> issues by itself. > > >>>> > > >>>> Hmm, I don't like that. This would need to be changed back in case we > > >>>> add per-guest quota. > > >>> > > >>> Extending this callback in that way would be still not too hard. We > > >>> could use > > >>> the argument to the callback. I would keep the argument of the > > >>> callback to > > >>> 'struct device *' as is, and wi
Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock
On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß" wrote: > On 17.12.19 17:07, SeongJae Park wrote: > > From: SeongJae Park > > > > 'reclaim_memory' callback can race with a driver code as this callback > > will be called from any memory pressure detected context. To deal with > > the case, this commit adds a spinlock in the 'xenbus_device'. Whenever > > 'reclaim_memory' callback is called, the lock of the device which passed > > to the callback as its argument is locked. Thus, drivers registering > > their 'reclaim_memory' callback should protect the data that might race > > with the callback with the lock by themselves. > > > > Signed-off-by: SeongJae Park > > --- > > drivers/xen/xenbus/xenbus_probe.c | 1 + > > drivers/xen/xenbus/xenbus_probe_backend.c | 10 -- > > include/xen/xenbus.h | 2 ++ > > 3 files changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/xen/xenbus/xenbus_probe.c > > b/drivers/xen/xenbus/xenbus_probe.c > > index 5b471889d723..b86393f172e6 100644 > > --- a/drivers/xen/xenbus/xenbus_probe.c > > +++ b/drivers/xen/xenbus/xenbus_probe.c > > @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, > > goto fail; > > > > dev_set_name(&xendev->dev, "%s", devname); > > + spin_lock_init(&xendev->reclaim_lock); > > > > /* Register with generic device framework. */ > > err = device_register(&xendev->dev); > > diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > > b/drivers/xen/xenbus/xenbus_probe_backend.c > > index 7e78ebef7c54..516aa64b9967 100644 > > --- a/drivers/xen/xenbus/xenbus_probe_backend.c > > +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > > @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct > > notifier_block *notifier, > > static int backend_reclaim_memory(struct device *dev, void *data) > > { > > const struct xenbus_driver *drv; > > + struct xenbus_device *xdev; > > + unsigned long flags; > > > > if (!dev->driver) > > return 0; > > drv = to_xenbus_driver(dev->driver); > > - if (drv && drv->reclaim_memory) > > - drv->reclaim_memory(to_xenbus_device(dev)); > > + if (drv && drv->reclaim_memory) { > > + xdev = to_xenbus_device(dev); > > + spin_trylock_irqsave(&xdev->reclaim_lock, flags); > > You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine, > too? I can't see a reason why you'd want to disable irqs here. I needed to diable irq here as this is called from the memory shrinker context. Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver might include memory allocation. And the xen-blkback actually does. If the allocation shows a memory pressure during the allocation, it will trigger this shrinker callback again and then deadlock. Thanks, SeongJae Park > > > + drv->reclaim_memory(xdev); > > + spin_unlock_irqrestore(&xdev->reclaim_lock, flags); > > + } > > return 0; > > } > > > > diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h > > index c861cfb6f720..d9468313061d 100644 > > --- a/include/xen/xenbus.h > > +++ b/include/xen/xenbus.h > > @@ -76,6 +76,8 @@ struct xenbus_device { > > enum xenbus_state state; > > struct completion down; > > struct work_struct work; > > + /* 'reclaim_memory' callback is called while this lock is acquired */ > > + spinlock_t reclaim_lock; > > }; > > > > static inline struct xenbus_device *to_xenbus_device(struct device *dev) > > > > > Juergen > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is detected
From: SeongJae Park I though it would be better to review separated patches, but seems it was my mistake. As Juergen asked, merged them again and post here. Also, dropped Roger's reviewed-by. Thanks, SeongJae Park >8 --- Subject: [PATCH 1/3] xen/blkback: Squeeze page pools if a memory pressure is detected Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions === The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 2123,325 Optimal Aggressive Shrinking Duration - To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: durationpswpin pswpout 1 852 6,424 10 212 3,325 100 203 3,340 As expected, the memory pressure has decreased as the duration is increased, but the reduction stopped from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test = This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*51
Re: [Xen-devel] [PATCH v11 2/6] xenbus/backend: Protect xenbus callback with lock
On Tue, 17 Dec 2019 18:10:19 +0100 "Jürgen Groß" wrote: > On 17.12.19 17:24, SeongJae Park wrote: > > On Tue, 17 Dec 2019 17:13:42 +0100 "Jürgen Groß" wrote: > > > >> On 17.12.19 17:07, SeongJae Park wrote: > >>> From: SeongJae Park > >>> > >>> 'reclaim_memory' callback can race with a driver code as this callback > >>> will be called from any memory pressure detected context. To deal with > >>> the case, this commit adds a spinlock in the 'xenbus_device'. Whenever > >>> 'reclaim_memory' callback is called, the lock of the device which passed > >>> to the callback as its argument is locked. Thus, drivers registering > >>> their 'reclaim_memory' callback should protect the data that might race > >>> with the callback with the lock by themselves. > >>> > >>> Signed-off-by: SeongJae Park > >>> --- > >>>drivers/xen/xenbus/xenbus_probe.c | 1 + > >>>drivers/xen/xenbus/xenbus_probe_backend.c | 10 -- > >>>include/xen/xenbus.h | 2 ++ > >>>3 files changed, 11 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/xen/xenbus/xenbus_probe.c > >>> b/drivers/xen/xenbus/xenbus_probe.c > >>> index 5b471889d723..b86393f172e6 100644 > >>> --- a/drivers/xen/xenbus/xenbus_probe.c > >>> +++ b/drivers/xen/xenbus/xenbus_probe.c > >>> @@ -472,6 +472,7 @@ int xenbus_probe_node(struct xen_bus_type *bus, > >>> goto fail; > >>> > >>> dev_set_name(&xendev->dev, "%s", devname); > >>> + spin_lock_init(&xendev->reclaim_lock); > >>> > >>> /* Register with generic device framework. */ > >>> err = device_register(&xendev->dev); > >>> diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c > >>> b/drivers/xen/xenbus/xenbus_probe_backend.c > >>> index 7e78ebef7c54..516aa64b9967 100644 > >>> --- a/drivers/xen/xenbus/xenbus_probe_backend.c > >>> +++ b/drivers/xen/xenbus/xenbus_probe_backend.c > >>> @@ -251,12 +251,18 @@ static int backend_probe_and_watch(struct > >>> notifier_block *notifier, > >>>static int backend_reclaim_memory(struct device *dev, void *data) > >>>{ > >>> const struct xenbus_driver *drv; > >>> + struct xenbus_device *xdev; > >>> + unsigned long flags; > >>> > >>> if (!dev->driver) > >>> return 0; > >>> drv = to_xenbus_driver(dev->driver); > >>> - if (drv && drv->reclaim_memory) > >>> - drv->reclaim_memory(to_xenbus_device(dev)); > >>> + if (drv && drv->reclaim_memory) { > >>> + xdev = to_xenbus_device(dev); > >>> + spin_trylock_irqsave(&xdev->reclaim_lock, flags); > >> > >> You need spin_lock_irqsave() here. Or maybe spin_lock() would be fine, > >> too? I can't see a reason why you'd want to disable irqs here. > > > > I needed to diable irq here as this is called from the memory shrinker > > context. > > Okay. > > > > > Also, used 'trylock' because the 'probe()' and 'remove()' code of the driver > > might include memory allocation. And the xen-blkback actually does. If the > > allocation shows a memory pressure during the allocation, it will trigger > > this > > shrinker callback again and then deadlock. > > In that case you need to either return when you didn't get the lock or Yes, it should. Cannot believe how I posted this code. Seems I made some terrible mistake while formatting patches. Anyway, will return if fail to acquire the lock, in the next version. Thanks, SeongJae Park > > - when obtaining the lock during probe() and remove() set a variable >containing the current cpu number > - and reset that to e.g NR_CPUS before releasing the lock again > - in the shrinker callback do trylock, and if you didn't get the lock >test whether the cpu-variable above is set to your current cpu and >continue only if yes; if not, redo the the trylock > > > Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel