Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-14 Thread Roger Pau Monné
On Thu, Jun 13, 2024 at 04:05:08PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> > Right.  AFAICT advertising "feature-barrier" and/or
> > "feature-flush-cache" could be done based on whether blkback
> > understand those commands, not on whether the underlying storage
> > supports the equivalent of them.
> > 
> > Worst case we can print a warning message once about the underlying
> > storage failing to complete flush/barrier requests, and that data
> > integrity might not be guaranteed going forward, and not propagate the
> > error to the upper layer?
> > 
> > What would be the consequence of propagating a flush error to the
> > upper layers?
> 
> If you propage the error to the upper layer you will generate an
> I/O error there, which usually leads to a file system shutdown.
> 
> > Given the description of the feature in the blkif header, I'm afraid
> > we cannot guarantee that seeing the feature exposed implies barrier or
> > flush support, since the request could fail at any time (or even from
> > the start of the disk attachment) and it would still sadly be a correct
> > implementation given the description of the options.
> 
> Well, then we could do something like the patch below, which keeps
> the existing behavior, but insolates the block layer from it and
> removes the only user of blk_queue_write_cache from interrupt
> context:

LGTM, I'm not sure there's much else we can do.

> ---
> From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Thu, 30 May 2024 08:58:52 +0200
> Subject: xen-blkfront: don't disable cache flushes when they fail
> 
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes in the block layer as the
> flags handling is moving to the atomic queue limits API that needs
> user context to freeze the queue for that.  Instead handle the case
> of the feature flags cleared inside of blkfront.  This removes old
> debug code to check for such a mismatch which was previously impossible
> to hit, including the check for passthrough requests that blkfront
> never used to start with.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/xen-blkfront.c | 44 +++-
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..e2c92d5095ff17 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, 
> struct blkfront_ring_info *ri
>* A barrier request a superset of FUA, so we can
>* implement it the same way.  (It's also a FLUSH+FUA,
>* since it is guaranteed ordered WRT previous writes.)
> +  *
> +  * Note that can end up here with a FUA write and the
> +  * flags cleared.  This happens when the flag was
> +  * run-time disabled and raced with I/O submission in
> +  * the block layer.  We submit it as a normal write

Since blkfront no longer signals that FUA is no longer available for the
device, getting a request with FUA is not actually a race I think?

> +  * here.  A pure flush should never end up here with
> +  * the flags cleared as they are completed earlier for
> +  * the !feature_flush case.
>*/
>   if (info->feature_flush && info->feature_fua)
>   ring_req->operation =
> @@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
>   else if (info->feature_flush)
>   ring_req->operation =
>   BLKIF_OP_FLUSH_DISKCACHE;
> - else
> - ring_req->operation = 0;
>   }
>   ring_req->u.rw.nr_segments = num_grant;
>   if (unlikely(require_extra_req)) {
> @@ -887,16 +893,6 @@ static inline void flush_requests(struct 
> blkfront_ring_info *rinfo)
>   notify_remote_via_irq(rinfo->irq);
>  }
>  
> -static inline bool blkif_request_flush_invalid(struct request *req,
> -struct blkfront_info *info)
> -{
> - return (blk_rq_is_passthrough(req) ||
> - ((req_op(req) == REQ_OP_FLUSH) &&
> -  !info->feature_flush) ||
> - ((req->cmd_flags & REQ_FUA) &&
> -  !info->feature_fua));
> -}
> -
>  static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
> const struct blk_mq_queue_data *qd)
>  {
> @@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx 
> *hc

Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-13 Thread Christoph Hellwig
On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> Right.  AFAICT advertising "feature-barrier" and/or
> "feature-flush-cache" could be done based on whether blkback
> understand those commands, not on whether the underlying storage
> supports the equivalent of them.
> 
> Worst case we can print a warning message once about the underlying
> storage failing to complete flush/barrier requests, and that data
> integrity might not be guaranteed going forward, and not propagate the
> error to the upper layer?
> 
> What would be the consequence of propagating a flush error to the
> upper layers?

If you propage the error to the upper layer you will generate an
I/O error there, which usually leads to a file system shutdown.

> Given the description of the feature in the blkif header, I'm afraid
> we cannot guarantee that seeing the feature exposed implies barrier or
> flush support, since the request could fail at any time (or even from
> the start of the disk attachment) and it would still sadly be a correct
> implementation given the description of the options.

Well, then we could do something like the patch below, which keeps
the existing behavior, but insolates the block layer from it and
removes the only user of blk_queue_write_cache from interrupt
context:

---
>From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Thu, 30 May 2024 08:58:52 +0200
Subject: xen-blkfront: don't disable cache flushes when they fail

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes in the block layer as the
flags handling is moving to the atomic queue limits API that needs
user context to freeze the queue for that.  Instead handle the case
of the feature flags cleared inside of blkfront.  This removes old
debug code to check for such a mismatch which was previously impossible
to hit, including the check for passthrough requests that blkfront
never used to start with.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/xen-blkfront.c | 44 +++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9b4ec3e4908cce..e2c92d5095ff17 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
 * A barrier request a superset of FUA, so we can
 * implement it the same way.  (It's also a FLUSH+FUA,
 * since it is guaranteed ordered WRT previous writes.)
+*
+* Note that can end up here with a FUA write and the
+* flags cleared.  This happens when the flag was
+* run-time disabled and raced with I/O submission in
+* the block layer.  We submit it as a normal write
+* here.  A pure flush should never end up here with
+* the flags cleared as they are completed earlier for
+* the !feature_flush case.
 */
if (info->feature_flush && info->feature_fua)
ring_req->operation =
@@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
else if (info->feature_flush)
ring_req->operation =
BLKIF_OP_FLUSH_DISKCACHE;
-   else
-   ring_req->operation = 0;
}
ring_req->u.rw.nr_segments = num_grant;
if (unlikely(require_extra_req)) {
@@ -887,16 +893,6 @@ static inline void flush_requests(struct 
blkfront_ring_info *rinfo)
notify_remote_via_irq(rinfo->irq);
 }
 
-static inline bool blkif_request_flush_invalid(struct request *req,
-  struct blkfront_info *info)
-{
-   return (blk_rq_is_passthrough(req) ||
-   ((req_op(req) == REQ_OP_FLUSH) &&
-!info->feature_flush) ||
-   ((req->cmd_flags & REQ_FUA) &&
-!info->feature_fua));
-}
-
 static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
  const struct blk_mq_queue_data *qd)
 {
@@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx 
*hctx,
rinfo = get_rinfo(info, qid);
blk_mq_start_request(qd->rq);
spin_lock_irqsave(&rinfo->ring_lock, flags);
-   if (RING_FULL(&rinfo->ring))
-   goto out_busy;
 
-   if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
-   goto out_err;
+   /*
+* Check if the backend actually supports flushes.
+*
+* 

Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-12 Thread Roger Pau Monné
On Wed, Jun 12, 2024 at 05:00:30PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 10:01:18AM +0200, Roger Pau Monné wrote:
> > On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> > > blkfront always had a robust negotiation protocol for detecting a write
> > > cache.  Stop simply disabling cache flushes when they fail as that is
> > > a grave error.
> > 
> > It's my understanding the current code attempts to cover up for the
> > lack of guarantees the feature itself provides:
> 
> > So even when the feature is exposed, the backend might return
> > EOPNOTSUPP for the flush/barrier operations.
> 
> How is this supposed to work?  I mean in the worst case we could
> just immediately complete the flush requests in the driver, but
> we're really lying to any upper layer.

Right.  AFAICT advertising "feature-barrier" and/or
"feature-flush-cache" could be done based on whether blkback
understand those commands, not on whether the underlying storage
supports the equivalent of them.

Worst case we can print a warning message once about the underlying
storage failing to complete flush/barrier requests, and that data
integrity might not be guaranteed going forward, and not propagate the
error to the upper layer?

What would be the consequence of propagating a flush error to the
upper layers?

> > Such failure is tied on whether the underlying blkback storage
> > supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
> > expose "feature-barrier" and/or "feature-flush-cache" without knowing
> > whether the underlying backend supports those operations, hence the
> > weird fallback in blkfront.
> 
> If we are just talking about the Linux blkback driver (I know there
> probably are a few other implementations) it won't every do that.
> I see it has code to do so, but the Linux block layer doesn't
> allow the flush operation to randomly fail if it was previously
> advertised.  Note that even blkfront conforms to this as it fixes
> up the return value when it gets this notsupp error to ok.

Yes, I'm afraid it's impossible to know what the multiple incarnations
of all the scattered blkback implementations possibly do (FreeBSD,
NetBSD, QEMU and blktap at least I know of).

> > Overall blkback should ensure that REQ_PREFLUSH is supported before
> > exposing "feature-barrier" or "feature-flush-cache", as then the
> > exposed features would really match what the underlying backend
> > supports (rather than the commands blkback knows about).
> 
> Yes.  The in-tree xen-blkback does that, but even without that the
> Linux block layer actually makes sure flushes sent by upper layers
> always succeed even when not supported.

Given the description of the feature in the blkif header, I'm afraid
we cannot guarantee that seeing the feature exposed implies barrier or
flush support, since the request could fail at any time (or even from
the start of the disk attachment) and it would still sadly be a correct
implementation given the description of the options.

Thanks, Roger.



Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2024 at 10:01:18AM +0200, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> > blkfront always had a robust negotiation protocol for detecting a write
> > cache.  Stop simply disabling cache flushes when they fail as that is
> > a grave error.
> 
> It's my understanding the current code attempts to cover up for the
> lack of guarantees the feature itself provides:

> So even when the feature is exposed, the backend might return
> EOPNOTSUPP for the flush/barrier operations.

How is this supposed to work?  I mean in the worst case we could
just immediately complete the flush requests in the driver, but
we're really lying to any upper layer.

> Such failure is tied on whether the underlying blkback storage
> supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
> expose "feature-barrier" and/or "feature-flush-cache" without knowing
> whether the underlying backend supports those operations, hence the
> weird fallback in blkfront.

If we are just talking about the Linux blkback driver (I know there
probably are a few other implementations) it won't every do that.
I see it has code to do so, but the Linux block layer doesn't
allow the flush operation to randomly fail if it was previously
advertised.  Note that even blkfront conforms to this as it fixes
up the return value when it gets this notsupp error to ok.

> Overall blkback should ensure that REQ_PREFLUSH is supported before
> exposing "feature-barrier" or "feature-flush-cache", as then the
> exposed features would really match what the underlying backend
> supports (rather than the commands blkback knows about).

Yes.  The in-tree xen-blkback does that, but even without that the
Linux block layer actually makes sure flushes sent by upper layers
always succeed even when not supported.




Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-12 Thread Roger Pau Monné
On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes when they fail as that is
> a grave error.

It's my understanding the current code attempts to cover up for the
lack of guarantees the feature itself provides:

 * feature-barrier
 *  Values: 0/1 (boolean)
 *  Default Value:  0
 *
 *  A value of "1" indicates that the backend can process requests
 *  containing the BLKIF_OP_WRITE_BARRIER request opcode.  Requests
 *  of this type may still be returned at any time with the
 *  BLKIF_RSP_EOPNOTSUPP result code.
 *
 * feature-flush-cache
 *  Values: 0/1 (boolean)
 *  Default Value:  0
 *
 *  A value of "1" indicates that the backend can process requests
 *  containing the BLKIF_OP_FLUSH_DISKCACHE request opcode.  Requests
 *  of this type may still be returned at any time with the
 *  BLKIF_RSP_EOPNOTSUPP result code.

So even when the feature is exposed, the backend might return
EOPNOTSUPP for the flush/barrier operations.

Such failure is tied on whether the underlying blkback storage
supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
expose "feature-barrier" and/or "feature-flush-cache" without knowing
whether the underlying backend supports those operations, hence the
weird fallback in blkfront.

I'm unsure whether lack of REQ_PREFLUSH support is not something that
we should worry about, it seems like it was when the code was
introduced, but that's > 10y ago.

Overall blkback should ensure that REQ_PREFLUSH is supported before
exposing "feature-barrier" or "feature-flush-cache", as then the
exposed features would really match what the underlying backend
supports (rather than the commands blkback knows about).

Thanks, Roger.



Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-11 Thread Christoph Hellwig
On Tue, Jun 11, 2024 at 04:30:39PM +0900, Damien Le Moal wrote:
> On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> > blkfront always had a robust negotiation protocol for detecting a write
> > cache.  Stop simply disabling cache flushes when they fail as that is
> > a grave error.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Looks good to me but maybe mention that removal of xlvbd_flush() as well ?
> And it feels like the "stop disabling cache flushes when they fail" part 
> should
> be a fix patch sent separately...

I'll move the patch to the front of the series to get more attention from
the maintainers, but otherwise the xlvbd_flush remova lis the really
trivial part here.



Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-11 Thread Hannes Reinecke

On 6/11/24 07:19, Christoph Hellwig wrote:

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes when they fail as that is
a grave error.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/xen-blkfront.c | 29 +
  1 file changed, 9 insertions(+), 20 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke  Kernel Storage Architect
h...@suse.de+49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-11 Thread Damien Le Moal
On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes when they fail as that is
> a grave error.
> 
> Signed-off-by: Christoph Hellwig 

Looks good to me but maybe mention that removal of xlvbd_flush() as well ?
And it feels like the "stop disabling cache flushes when they fail" part should
be a fix patch sent separately...

Anyway, for both parts, feel free to add:

Reviewed-by: Damien Le Moal 

> ---
>  drivers/block/xen-blkfront.c | 29 +
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..9794ac2d3299d1 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -982,18 +982,6 @@ static const char *flush_info(struct blkfront_info *info)
>   return "barrier or flush: disabled;";
>  }
>  
> -static void xlvbd_flush(struct blkfront_info *info)
> -{
> - blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
> -   info->feature_fua ? true : false);
> - pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
> - info->gd->disk_name, flush_info(info),
> - "persistent grants:", info->feature_persistent ?
> - "enabled;" : "disabled;", "indirect descriptors:",
> - info->max_indirect_segments ? "enabled;" : "disabled;",
> - "bounce buffer:", info->bounce ? "enabled" : "disabled;");
> -}
> -
>  static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  {
>   int major;
> @@ -1162,7 +1150,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>   info->sector_size = sector_size;
>   info->physical_sector_size = physical_sector_size;
>  
> - xlvbd_flush(info);
> + blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
> +   info->feature_fua ? true : false);
> +
> + pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
> + info->gd->disk_name, flush_info(info),
> + "persistent grants:", info->feature_persistent ?
> + "enabled;" : "disabled;", "indirect descriptors:",
> + info->max_indirect_segments ? "enabled;" : "disabled;",
> + "bounce buffer:", info->bounce ? "enabled" : "disabled;");
>  
>   if (info->vdisk_info & VDISK_READONLY)
>   set_disk_ro(gd, 1);
> @@ -1622,13 +1618,6 @@ static irqreturn_t blkif_interrupt(int irq, void 
> *dev_id)
>  info->gd->disk_name, 
> op_name(bret.operation));
>   blkif_req(req)->error = BLK_STS_NOTSUPP;
>   }
> - if (unlikely(blkif_req(req)->error)) {
> - if (blkif_req(req)->error == BLK_STS_NOTSUPP)
> - blkif_req(req)->error = BLK_STS_OK;
> - info->feature_fua = 0;
> - info->feature_flush = 0;
> - xlvbd_flush(info);
> - }
>   fallthrough;
>   case BLKIF_OP_READ:
>   case BLKIF_OP_WRITE:

-- 
Damien Le Moal
Western Digital Research




[PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail

2024-06-10 Thread Christoph Hellwig
blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes when they fail as that is
a grave error.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/xen-blkfront.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9b4ec3e4908cce..9794ac2d3299d1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -982,18 +982,6 @@ static const char *flush_info(struct blkfront_info *info)
return "barrier or flush: disabled;";
 }
 
-static void xlvbd_flush(struct blkfront_info *info)
-{
-   blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
- info->feature_fua ? true : false);
-   pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
-   info->gd->disk_name, flush_info(info),
-   "persistent grants:", info->feature_persistent ?
-   "enabled;" : "disabled;", "indirect descriptors:",
-   info->max_indirect_segments ? "enabled;" : "disabled;",
-   "bounce buffer:", info->bounce ? "enabled" : "disabled;");
-}
-
 static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
 {
int major;
@@ -1162,7 +1150,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
info->sector_size = sector_size;
info->physical_sector_size = physical_sector_size;
 
-   xlvbd_flush(info);
+   blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
+ info->feature_fua ? true : false);
+
+   pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
+   info->gd->disk_name, flush_info(info),
+   "persistent grants:", info->feature_persistent ?
+   "enabled;" : "disabled;", "indirect descriptors:",
+   info->max_indirect_segments ? "enabled;" : "disabled;",
+   "bounce buffer:", info->bounce ? "enabled" : "disabled;");
 
if (info->vdisk_info & VDISK_READONLY)
set_disk_ro(gd, 1);
@@ -1622,13 +1618,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
   info->gd->disk_name, 
op_name(bret.operation));
blkif_req(req)->error = BLK_STS_NOTSUPP;
}
-   if (unlikely(blkif_req(req)->error)) {
-   if (blkif_req(req)->error == BLK_STS_NOTSUPP)
-   blkif_req(req)->error = BLK_STS_OK;
-   info->feature_fua = 0;
-   info->feature_flush = 0;
-   xlvbd_flush(info);
-   }
fallthrough;
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
-- 
2.43.0