How to display a ktime value as trace timestamp in trace output?

2024-01-31 Thread David Howells
Hi Steven,

I have a tracepoint in AF_RXRPC that displays information about a timeout I'm
going to set.  I have the timeout in a ktime_t as an absolute time.  Is there
a way to display this in the trace output such that it looks like a trace
timestamp and can be (roughly) correlated with the displayed timestamps?

I tried subtracting ktime_get_read() - ktime_get_boottime() from it and
displaying the result, but it looked about one and a bit seconds out from the
trace timestamp.

Thanks,
David




Re: [PATCH 02/20] filelock: add coccinelle scripts to move fields to struct file_lock_core

2024-01-17 Thread David Howells
Do we need to keep these coccinelle scripts for posterity?  Or can they just
be included in the patch description of the patch that generates them?

David




Re: [RFC][PATCH] fix short copy handling in copy_mc_pipe_to_iter()

2022-06-14 Thread David Howells
Al Viro  wrote:

> What's wrong with
> p_occupancy = pipe_occupancy(head, tail);
> if (p_occupancy >= pipe->max_usage)
> return 0;
>   else
>   return pipe->max_usage - p_occupancy;

Because "pipe->max_usage - p_occupancy" can be negative.

post_one_notification() is limited by pipe->ring_size, not pipe->max_usage.

The idea is to allow some slack in a watch pipe for the watch_queue code to
use that userspace can't.

David




Fix for CVE-2020-26541

2021-04-16 Thread David Howells


Hi Linus,

I posted a pull request for a fix for CVE-2020-26541:


https://lore.kernel.org/keyrings/1884195.1615482...@warthog.procyon.org.uk/
[GIT PULL] Add EFI_CERT_X509_GUID support for dbx/mokx entries

I'm guessing you're not going to pull it now for 5.12, so should I just
reissue the request in the merge window?  Also, do you want the base pulling
up to something a bit more recent than 5.11-rc4?

David



[PATCH v7] mm: Add set/end/wait functions for PG_private_2

2021-04-13 Thread David Howells
Add three functions to manipulate PG_private_2:

 (*) set_page_private_2() - Set the flag and take an appropriate reference
 on the flagged page.

 (*) end_page_private_2() - Clear the flag, drop the reference and wake up
 any waiters, somewhat analogously with end_page_writeback().

 (*) wait_on_page_private_2() - Wait for the flag to be cleared.

Wrappers will need to be placed in the netfs lib header in the patch that
adds that.

[This implements a suggestion by Linus[1] to not mix the terminology of
 PG_private_2 and PG_fscache in the mm core function]

Changes:
v7:
- Use compound_head() in all the functions to make them THP safe[6].

v5:
- Add set and end functions, calling the end function end rather than
  unlock[3].
- Keep a ref on the page when PG_private_2 is set[4][5].

v4:
- Remove extern from the declaration[2].

Suggested-by: Linus Torvalds 
Signed-off-by: David Howells 
Tested-by: Jeff Layton 
Tested-by: Dave Wysochanski 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/1330473.1612974...@warthog.procyon.org.uk/ # v1
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [1]
Link: https://lore.kernel.org/r/20210216102659.ga27...@lst.de/ [2]
Link: 
https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539528910.286939.1252328699383291173.st...@warthog.procyon.org.uk
 # v4
Link: https://lore.kernel.org/r/20210321105309.gg3...@casper.infradead.org [3]
Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [4]
Link: 
https://lore.kernel.org/r/CAHk-=wjsgsrj7xwhsmq6daqiz53xa39pog+xa_wetgwbbu4...@mail.gmail.com/
 [5]
Link: https://lore.kernel.org/r/20210408145057.gn2531...@casper.infradead.org/ 
[6]
Link: 
https://lore.kernel.org/r/161653788200.2770958.9517755716374927208.st...@warthog.procyon.org.uk/
 # v5
Link: 
https://lore.kernel.org/r/161789066013.6155.9816857201817288382.st...@warthog.procyon.org.uk/
 # v6
---
 include/linux/pagemap.h |   20 +++
 mm/filemap.c|   61 
 2 files changed, 81 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8c9947fd62f3..bb4433c98d02 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -688,6 +688,26 @@ void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
 
+/**
+ * set_page_private_2 - Set PG_private_2 on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_private_2 flag on a page and take the reference needed for the VM
+ * to handle its lifetime correctly.  This sets the flag and takes the
+ * reference unconditionally, so care must be taken not to set the flag again
+ * if it's already set.
+ */
+static inline void set_page_private_2(struct page *page)
+{
+   page = compound_head(page);
+   get_page(page);
+   SetPagePrivate2(page);
+}
+
+void end_page_private_2(struct page *page);
+void wait_on_page_private_2(struct page *page);
+int wait_on_page_private_2_killable(struct page *page);
+
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
diff --git a/mm/filemap.c b/mm/filemap.c
index 43700480d897..afe22f09960e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1432,6 +1432,67 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * end_page_private_2 - Clear PG_private_2 and release any waiters
+ * @page: The page
+ *
+ * Clear the PG_private_2 bit on a page and wake up any sleepers waiting for
+ * this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+void end_page_private_2(struct page *page)
+{
+   page = compound_head(page);
+   VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+   clear_bit_unlock(PG_private_2, >flags);
+   wake_up_page_bit(page, PG_private_2);
+   put_page(page);
+}
+EXPORT_SYMBOL(end_page_private_2);
+
+/**
+ * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page.
+ */
+void wait_on_page_private_2(struct page *page)
+{
+   page = compound_head(page);
+   while (PagePrivate2(page))
+   wait_on_page_bit(page, PG_private_2);
+}
+EXPORT_SYMBOL(wait_on_page_private_2);
+
+/**
+ * wait_on_page_private_2_killable - Wait for PG_private_2 to be cleared on a 
page
+ * @page: The page to w

[RFC PATCH 2/2] iov_iter: Drop the X argument from iterate_all_kinds() and use B instead

2021-04-09 Thread David Howells
Drop the X argument from iterate_all_kinds() and use the B argument instead
as it's always the same unless the ITER_XARRAY is handled specially.

Signed-off-by: David Howells 
---

 lib/iov_iter.c |   42 --
 1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 93e9838c128d..144abdac11db 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -79,8 +79,8 @@
 #define iterate_xarray(i, n, __v, skip, STEP) {\
struct page *head = NULL;   \
size_t wanted = n, seg, offset; \
-   loff_t start = i->xarray_start + skip;  \
-   pgoff_t index = start >> PAGE_SHIFT;\
+   loff_t xarray_start = i->xarray_start + skip;   \
+   pgoff_t index = xarray_start >> PAGE_SHIFT; \
int j;  \
\
XA_STATE(xas, i->xarray, index);\
@@ -113,7 +113,7 @@
n = wanted - n; \
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K, X) {   \
+#define iterate_all_kinds(i, n, v, I, B, K) {  \
if (likely(n)) {\
size_t skip = i->iov_offset;\
if (unlikely(i->type & ITER_BVEC)) {\
@@ -127,7 +127,7 @@
} else if (unlikely(i->type & ITER_DISCARD)) {  \
} else if (unlikely(i->type & ITER_XARRAY)) {   \
struct bio_vec v;   \
-   iterate_xarray(i, n, v, skip, (X)); \
+   iterate_xarray(i, n, v, skip, (B)); \
} else {\
const struct iovec *iov;\
struct iovec v; \
@@ -842,9 +842,7 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct 
iov_iter *i)
0;}),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 v.bv_offset, v.bv_len),
-   memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
-   memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
-v.bv_offset, v.bv_len)
+   memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)
 
iov_iter_advance(i, bytes);
@@ -927,9 +925,7 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, 
struct iov_iter *i)
0;}),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
 v.bv_offset, v.bv_len),
-   memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
-   memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
-v.bv_offset, v.bv_len)
+   memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)
 
iov_iter_advance(i, bytes);
@@ -1058,9 +1054,7 @@ size_t iov_iter_copy_from_user_atomic(struct page *page,
copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
 v.bv_offset, v.bv_len),
-   memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
-   memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page,
-v.bv_offset, v.bv_len)
+   memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)
kunmap_atomic(kaddr);
return bytes;
@@ -1349,8 +1343,7 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
iterate_all_kinds(i, size, v,
(res |= (unsigned long)v.iov_base | v.iov_len, 0),
res |= v.bv_offset | v.bv_len,
-   res |= (unsigned long)v.iov_base | v.iov_len,
-   res |= v.bv_offset | v.bv_len
+   res |= (unsigned long)v.iov_base | v.iov_len
)
return res;
 }
@@ -1372,9 +1365,7 @@ unsigned long iov_iter_gap_alignment(const struct 
iov_iter *i)
(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
(size != v.bv_len ? size : 0)),
(res |= (!res ? 0 : (unsigned long)v.iov_base) |
-   (size != v.iov_len ? size : 0)),
-   (res |= (!res ? 0 : (unsigned long)v.bv_offset) |
-   (size != v.bv_len ? size : 0))
+   (size != v.iov_len ? size : 0))
);
return res;
 }
@@ -1530,8 +1521,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
return v.bv_len;

[RFC PATCH 1/2] iov_iter: Remove iov_iter_for_each_range()

2021-04-09 Thread David Howells
Remove iov_iter_for_each_range() as it's no longer used with the removal of
lustre.

Signed-off-by: David Howells 
---

 include/linux/uio.h |4 
 lib/iov_iter.c  |   27 ---
 2 files changed, 31 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 5f5ffc45d4aa..221c256304d4 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -295,8 +295,4 @@ ssize_t __import_iovec(int type, const struct iovec __user 
*uvec,
 int import_single_range(int type, void __user *buf, size_t len,
 struct iovec *iov, struct iov_iter *i);
 
-int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
-   int (*f)(struct kvec *vec, void *context),
-   void *context);
-
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f808c625c11e..93e9838c128d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -2094,30 +2094,3 @@ int import_single_range(int rw, void __user *buf, size_t 
len,
return 0;
 }
 EXPORT_SYMBOL(import_single_range);
-
-int iov_iter_for_each_range(struct iov_iter *i, size_t bytes,
-   int (*f)(struct kvec *vec, void *context),
-   void *context)
-{
-   struct kvec w;
-   int err = -EINVAL;
-   if (!bytes)
-   return 0;
-
-   iterate_all_kinds(i, bytes, v, -EINVAL, ({
-   w.iov_base = kmap(v.bv_page) + v.bv_offset;
-   w.iov_len = v.bv_len;
-   err = f(, context);
-   kunmap(v.bv_page);
-   err;}), ({
-   w = v;
-   err = f(, context);}), ({
-   w.iov_base = kmap(v.bv_page) + v.bv_offset;
-   w.iov_len = v.bv_len;
-   err = f(, context);
-   kunmap(v.bv_page);
-   err;})
-   )
-   return err;
-}
-EXPORT_SYMBOL(iov_iter_for_each_range);




Re: [RFC PATCH 2/3] mm: Return bool from pagebit test functions

2021-04-09 Thread David Howells
David Howells  wrote:

> add/remove: 2/2 grow/shrink: 15/16 up/down: 408/-599 (-191)
> Function old new   delta
> iomap_write_end_inline - 128+128

I can get rid of the iomap_write_end_inline() increase for my config by
marking it __always_inline, thereby getting:

add/remove: 1/2 grow/shrink: 15/15 up/down: 280/-530 (-250)

It seems that the decision whether or not to inline iomap_write_end_inline()
is affected by the switch to bool.

David



Re: [RFC PATCH 2/3] mm: Return bool from pagebit test functions

2021-04-09 Thread David Howells
Matthew Wilcox  wrote:

> iirc i looked at doing this as part of the folio work, and it ended up
> increasing the size of the kernel.  Did you run bloat-o-meter on the
> result of doing this?

add/remove: 2/2 grow/shrink: 15/16 up/down: 408/-599 (-191)
Function old new   delta
iomap_write_end_inline - 128+128
try_to_free_swap  59 179+120
page_to_index.part -  36 +36
page_size432 456 +24
PageTransCompound154 175 +21
truncate_inode_pages_range   791 807 +16
invalidate_inode_pages2_range504 518 +14
ceph_uninline_data   969 982 +13
iomap_read_inline_data.isra  129 139 +10
page_cache_pipe_buf_confirm   85  93  +8
ceph_writepages_start   32373243  +6
hpage_pincount_available  94  97  +3
__collapse_huge_page_isolate 768 771  +3
page_vma_mapped_walk10701072  +2
PageHuge  39  41  +2
collapse_file   20462047  +1
__free_pages_ok  449 450  +1
wait_on_page_bit_common  598 597  -1
iomap_page_release   104 103  -1
change_pte_range 818 817  -1
pageblock_skip_persistent 45  42  -3
is_transparent_hugepage   63  60  -3
nfs_readpage 486 482  -4
ext4_readpage_inline 155 151  -4
release_pages640 635  -5
ext4_write_inline_data_end   286 281  -5
ext4_mb_load_buddy_gfp   690 684  -6
afs_dir_check536 529  -7
page_trans_huge_map_swapcount374 363 -11
io_uring_mmap199 184 -15
io_buffer_account_pin276 259 -17
page_to_index 50   - -50
iomap_write_end  375 306 -69
try_to_free_swap.part137   --137
PageUptodate 716 456-260
Total: Before=17207139, After=17206948, chg -0.00%



Re: [RFC PATCH 2/3] mm: Return bool from pagebit test functions

2021-04-09 Thread David Howells
Matthew Wilcox  wrote:

> On Fri, Apr 09, 2021 at 11:59:17AM +0100, David Howells wrote:
> > Make functions that test page bits return a bool, not an int.  This means
> > that the value is definitely 0 or 1 if they're used in arithmetic, rather
> > than rely on test_bit() and friends to return this (though they probably
> > should).
> 
> iirc i looked at doing this as part of the folio work, and it ended up
> increasing the size of the kernel.  Did you run bloat-o-meter on the
> result of doing this?

Hmmm.  With my usual monolithic x86_64 kernel, it makes vmlinux text section
100 bytes larger (19392347 rather than 19392247).  I can look into why.

David



[RFC PATCH 3/3] mm: Split page_has_private() in two to better handle PG_private_2

2021-04-09 Thread David Howells
Split page_has_private() into two functions:

 (1) page_needs_cleanup() to find out if a page needs the ->releasepage(),
 ->invalidatepage(), etc. address space ops calling upon it.

 This returns true when either PG_private or PG_private_2 are set.

 (2) page_private_count() which returns a count of the number of refs
 contributed to a page for attached private data.

 This returns 1 if PG_private is set and 0 otherwise.

I think suggestion[1] is that PG_private_2 should just have a ref on the
page, but this isn't accounted in the same way as PG_private's ref.

Notes:

 (*) The following:

btrfs_migratepage()
iomap_set_range_uptodate()
iomap_migrate_page()
to_iomap_page()

 should probably all use PagePrivate() rather than page_has_private()
 since they're interested in what's attached to page->private when
 they're doing this, and not PG_private_2.

 It may not matter in these cases since page->private is probably NULL
 if PG_private is not set.

 (*) Do we actually need PG_private, or is it possible just to see if
 page->private is NULL?

 (*) There's a lot of "if (page_has_private()) try_to_release_page()"
 combos.  Does it make sense to have a pot this into an inline
 function?

Signed-off-by: David Howells 
cc: Linus Torvalds 
cc: Matthew Wilcox 
cc: Christoph Hellwig 
cc: Josef Bacik 
cc: Alexander Viro 
cc: Andrew Morton 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
Link: 
https://lore.kernel.org/linux-fsdevel/CAHk-=whwojhgemn85loh9fx-5d2-upzmv1m2zmyxvd31tkp...@mail.gmail.com/
 [1]
---

 arch/s390/kernel/uv.c  |2 +-
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/inode.c   |2 +-
 fs/ext4/move_extent.c  |8 
 fs/fuse/dev.c  |2 +-
 fs/iomap/buffered-io.c |6 +++---
 fs/splice.c|2 +-
 include/linux/page-flags.h |   21 +
 include/trace/events/pagemap.h |2 +-
 mm/khugepaged.c|4 ++--
 mm/memory-failure.c|2 +-
 mm/migrate.c   |   10 +-
 mm/readahead.c |2 +-
 mm/truncate.c  |   12 ++--
 mm/vmscan.c|   12 ++--
 15 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index b2d2ad153067..09256f40cd89 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -175,7 +175,7 @@ static int expected_page_refs(struct page *page)
res++;
} else if (page_mapping(page)) {
res++;
-   if (page_has_private(page))
+   if (page_private_count(page))
res++;
}
return res;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41b718cfea40..d95f8d4b3004 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -936,7 +936,7 @@ static int btree_migratepage(struct address_space *mapping,
 * Buffers may be managed in a filesystem specific way.
 * We must have no buffers or drop them.
 */
-   if (page_has_private(page) &&
+   if (page_needs_cleanup(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
return migrate_page(mapping, newpage, page, mode);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7cdf65be3707..94f038d34f16 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8333,7 +8333,7 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page))
+   if (PagePrivate(page))
attach_page_private(newpage, detach_page_private(page));
 
if (PagePrivate2(page)) {
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 64a579734f93..16d0a7a73191 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -329,9 +329,9 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
ext4_double_up_write_data_sem(orig_inode, donor_inode);
goto data_copy;
}
-   if ((page_has_private(pagep[0]) &&
+   if ((page_needs_cleanup(pagep[0]) &&
 !try_to_release_page(pagep[0], 0)) ||
-   (page_has_private(pagep[1]) &&
+   (page_needs_cleanup(pagep[1]) &&
 !try_to_release_page(pagep[1], 0))) {
*err = -EBUSY;
goto drop_data_sem;
@@ -351,8 +351,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
 
/* At this point all buffers in range are uptodate, old mapping layout
 * is no longer required, try to drop it now. */
-   if ((page_has_private(pagep[0]) &

[RFC PATCH 2/3] mm: Return bool from pagebit test functions

2021-04-09 Thread David Howells
Make functions that test page bits return a bool, not an int.  This means
that the value is definitely 0 or 1 if they're used in arithmetic, rather
than rely on test_bit() and friends to return this (though they probably
should).

Signed-off-by: David Howells 
cc: Linus Torvalds 
cc: Matthew Wilcox 
cc: Andrew Morton 
cc: linux...@kvack.org
cc: linux-fsde...@vger.kernel.org
---

 include/linux/page-flags.h |   50 ++--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..4ff7de61b13d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -188,18 +188,18 @@ static inline struct page *compound_head(struct page 
*page)
return page;
 }
 
-static __always_inline int PageTail(struct page *page)
+static __always_inline bool PageTail(struct page *page)
 {
return READ_ONCE(page->compound_head) & 1;
 }
 
-static __always_inline int PageCompound(struct page *page)
+static __always_inline bool PageCompound(struct page *page)
 {
return test_bit(PG_head, >flags) || PageTail(page);
 }
 
 #definePAGE_POISON_PATTERN -1l
-static inline int PagePoisoned(const struct page *page)
+static inline bool PagePoisoned(const struct page *page)
 {
return page->flags == PAGE_POISON_PATTERN;
 }
@@ -260,7 +260,7 @@ static inline void page_init_poison(struct page *page, 
size_t size)
  * Macros to create function definitions for page flags
  */
 #define TESTPAGEFLAG(uname, lname, policy) \
-static __always_inline int Page##uname(struct page *page)  \
+static __always_inline bool Page##uname(struct page *page) \
{ return test_bit(PG_##lname, (page, 0)->flags); }
 
 #define SETPAGEFLAG(uname, lname, policy)  \
@@ -280,11 +280,11 @@ static __always_inline void __ClearPage##uname(struct 
page *page) \
{ __clear_bit(PG_##lname, (page, 1)->flags); }
 
 #define TESTSETFLAG(uname, lname, policy)  \
-static __always_inline int TestSetPage##uname(struct page *page)   \
+static __always_inline bool TestSetPage##uname(struct page *page)  \
{ return test_and_set_bit(PG_##lname, (page, 1)->flags); }
 
 #define TESTCLEARFLAG(uname, lname, policy)\
-static __always_inline int TestClearPage##uname(struct page *page) \
+static __always_inline bool TestClearPage##uname(struct page *page)\
{ return test_and_clear_bit(PG_##lname, (page, 1)->flags); }
 
 #define PAGEFLAG(uname, lname, policy) \
@@ -302,7 +302,7 @@ static __always_inline int TestClearPage##uname(struct page 
*page)  \
TESTCLEARFLAG(uname, lname, policy)
 
 #define TESTPAGEFLAG_FALSE(uname)  \
-static inline int Page##uname(const struct page *page) { return 0; }
+static inline bool Page##uname(const struct page *page) { return false; }
 
 #define SETPAGEFLAG_NOOP(uname)
\
 static inline void SetPage##uname(struct page *page) {  }
@@ -314,10 +314,10 @@ static inline void ClearPage##uname(struct page *page) {  
}
 static inline void __ClearPage##uname(struct page *page) {  }
 
 #define TESTSETFLAG_FALSE(uname)   \
-static inline int TestSetPage##uname(struct page *page) { return 0; }
+static inline bool TestSetPage##uname(struct page *page) { return false; }
 
 #define TESTCLEARFLAG_FALSE(uname) \
-static inline int TestClearPage##uname(struct page *page) { return 0; }
+static inline bool TestClearPage##uname(struct page *page) { return false; }
 
 #define PAGEFLAG_FALSE(uname) TESTPAGEFLAG_FALSE(uname)
\
SETPAGEFLAG_NOOP(uname) CLEARPAGEFLAG_NOOP(uname)
@@ -393,7 +393,7 @@ PAGEFLAG_FALSE(HighMem)
 #endif
 
 #ifdef CONFIG_SWAP
-static __always_inline int PageSwapCache(struct page *page)
+static __always_inline bool PageSwapCache(struct page *page)
 {
 #ifdef CONFIG_THP_SWAP
page = compound_head(page);
@@ -473,18 +473,18 @@ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
 #define PAGE_MAPPING_KSM   (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
 
-static __always_inline int PageMappingFlags(struct page *page)
+static __always_inline bool PageMappingFlags(struct page *page)
 {
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
 }
 
-static __always_inline int PageAnon(struct page *page)
+static __always_inline bool PageAnon(struct page *page)
 {
page = compound_head(page);
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
 }
 
-static __always_inline int __PageMovable(struct page *page)
+static __always_inline bool __PageMovab

[RFC PATCH 1/3] Make the generic bitops return bool

2021-04-09 Thread David Howells
Make the generic bitops return bool when returning the value of a tested
bit.

Signed-off-by: David Howells 
cc: Linus Torvalds 
cc: Matthew Wilcox 
cc: Akinobu Mita 
cc: Arnd Bergmann 
cc: Will Deacon 
---

 include/asm-generic/bitops/atomic.h |6 +++---
 include/asm-generic/bitops/le.h |   10 +-
 include/asm-generic/bitops/lock.h   |4 ++--
 include/asm-generic/bitops/non-atomic.h |8 
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/bitops/atomic.h 
b/include/asm-generic/bitops/atomic.h
index 0e7316a86240..9b05e8634c09 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -29,7 +29,7 @@ static __always_inline void change_bit(unsigned int nr, 
volatile unsigned long *
atomic_long_xor(BIT_MASK(nr), (atomic_long_t *)p);
 }
 
-static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
+static inline bool test_and_set_bit(unsigned int nr, volatile unsigned long *p)
 {
long old;
unsigned long mask = BIT_MASK(nr);
@@ -42,7 +42,7 @@ static inline int test_and_set_bit(unsigned int nr, volatile 
unsigned long *p)
return !!(old & mask);
 }
 
-static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long 
*p)
+static inline bool test_and_clear_bit(unsigned int nr, volatile unsigned long 
*p)
 {
long old;
unsigned long mask = BIT_MASK(nr);
@@ -55,7 +55,7 @@ static inline int test_and_clear_bit(unsigned int nr, 
volatile unsigned long *p)
return !!(old & mask);
 }
 
-static inline int test_and_change_bit(unsigned int nr, volatile unsigned long 
*p)
+static inline bool test_and_change_bit(unsigned int nr, volatile unsigned long 
*p)
 {
long old;
unsigned long mask = BIT_MASK(nr);
diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
index 188d3eba3ace..33355cf288f6 100644
--- a/include/asm-generic/bitops/le.h
+++ b/include/asm-generic/bitops/le.h
@@ -50,7 +50,7 @@ extern unsigned long find_next_bit_le(const void *addr,
 #error "Please fix "
 #endif
 
-static inline int test_bit_le(int nr, const void *addr)
+static inline bool test_bit_le(int nr, const void *addr)
 {
return test_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
@@ -75,22 +75,22 @@ static inline void __clear_bit_le(int nr, void *addr)
__clear_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
 
-static inline int test_and_set_bit_le(int nr, void *addr)
+static inline bool test_and_set_bit_le(int nr, void *addr)
 {
return test_and_set_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
 
-static inline int test_and_clear_bit_le(int nr, void *addr)
+static inline bool test_and_clear_bit_le(int nr, void *addr)
 {
return test_and_clear_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
 
-static inline int __test_and_set_bit_le(int nr, void *addr)
+static inline bool __test_and_set_bit_le(int nr, void *addr)
 {
return __test_and_set_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
 
-static inline int __test_and_clear_bit_le(int nr, void *addr)
+static inline bool __test_and_clear_bit_le(int nr, void *addr)
 {
return __test_and_clear_bit(nr ^ BITOP_LE_SWIZZLE, addr);
 }
diff --git a/include/asm-generic/bitops/lock.h 
b/include/asm-generic/bitops/lock.h
index 3ae021368f48..0e6acd059a59 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -15,8 +15,8 @@
  * the returned value is 0.
  * It can be used to implement bit locks.
  */
-static inline int test_and_set_bit_lock(unsigned int nr,
-   volatile unsigned long *p)
+static inline bool test_and_set_bit_lock(unsigned int nr,
+volatile unsigned long *p)
 {
long old;
unsigned long mask = BIT_MASK(nr);
diff --git a/include/asm-generic/bitops/non-atomic.h 
b/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..7d916f677be3 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -55,7 +55,7 @@ static inline void __change_bit(int nr, volatile unsigned 
long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline bool __test_and_set_bit(int nr, volatile unsigned long *addr)
 {
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +74,7 @@ static inline int __test_and_set_bit(int nr, volatile 
unsigned long *addr)
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline bool __test_and_clear_bit(int nr, volatile unsigned long *addr)
 {
unsigned long 

Re: [PATCH v6 01/30] iov_iter: Add ITER_XARRAY

2021-04-09 Thread David Howells
Al Viro  wrote:

> > +#define iterate_all_kinds(i, n, v, I, B, K, X) {   \
> 
> Do you have any users that would pass different B and X?
> 
> > @@ -1440,7 +1665,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
> > return v.bv_len;
> > }),({
> > return -EFAULT;
> > -   })
> > +   }), 0
> 
> Correction - users that might get that flavour.  This one explicitly checks
> for xarray and doesn't get to iterate_... in that case.

This is the case for iterate_all_kinds(), but not for iterate_and_advance().

See _copy_mc_to_iter() for example: that can return directly out of the middle
of the loop, so the X variant must drop the rcu_read_lock(), but the B variant
doesn't need to.  You also can't just use break to get out as the X variant
has a loop within a loop to handle iteration over the subelements of a THP.

But with iterate_all_kinds(), I could just drop the X parameter and use the B
parameter for both, I think.

David



Re: [RFC][PATCH] mm: Split page_has_private() in two to better handle PG_private_2

2021-04-09 Thread David Howells
Linus Torvalds  wrote:

> >  #define PAGE_FLAGS_PRIVATE \
> > (1UL << PG_private | 1UL << PG_private_2)
>
> I think this should be re-named to be PAGE_FLAGS_CLEANUP, because I
> don't think it makes any other sense to "combine" the two PG_private*
> bits any more. No?

Sure.  Do we even want it still, or should I just fold it into
page_needs_cleanup()?  It seems to be the only place it's used.

> > +static inline int page_private_count(struct page *page)
> > +{
> > +   return test_bit(PG_private, >flags) ? 1 : 0;
> > +}
>
> Why is this open-coding the bit test, rather than just doing
>
> return PagePrivate(page) ? 1 : 0;
>
> instead? In fact, since test_bit() _should_ return a 'bool', I think even just
>
> return PagePrivate(page);

Sorry, yes, it should be that.  I was looking at transforming the "1 <<
PG_private" and completely overlooked that this should be PagePrivate().

> should work and give the same result, but I could imagine that some
> architecture version of "test_bit()" might return some other non-zero
> value (although honestly, I think that should be fixed if so).

Yeah.  I seem to recall that test_bit() on some arches used to return the
datum just with the other bits masked off, but I may be misremembering.

In asm-generic/bitops/non-atomic.h:

static inline int test_bit(int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

should perhaps return bool?

I wonder, should:

static __always_inline int PageTail(struct page *page)
static __always_inline int PageCompound(struct page *page)
static __always_inline int Page##uname(struct page *page)
static __always_inline int TestSetPage##uname(struct page *page)
static __always_inline int TestClearPage##uname(struct page *page)

also all return bool?

David



[RFC][PATCH] mm: Split page_has_private() in two to better handle PG_private_2

2021-04-08 Thread David Howells
Hi Willy, Linus,

How about this to handle the situation with PG_private_2?  I think it handles
things according to Linus's suggestion.

David
---
mm: Split page_has_private() in two to better handle PG_private_2

Split page_has_private() into two functions:

 (1) page_needs_cleanup() to find out if a page needs the ->releasepage(),
 ->invalidatepage(), etc. address space ops calling upon it.

 This returns true when either PG_private or PG_private_2 are set.

 (2) page_private_count() which returns a count of the number of refs
 contributed to a page for attached private data.

 This returns 1 if PG_private is set and 0 otherwise.

I think the suggestion[1] is that PG_private_2 should just have a ref on
the page, but this isn't accounted in the same way as PG_private's ref.

Notes:

 (*) The following:

btrfs_migratepage()
iomap_set_range_uptodate()
iomap_migrate_page()
to_iomap_page()

 should probably all use PagePrivate() rather than page_has_private()
 since they're interested in what's attached to page->private when
 they're doing this, and not PG_private_2.

 It may not matter in these cases since page->private is probably NULL
 if PG_private is not set.

 (*) Do we actually need PG_private, or is it possible just to see if
 page->private is NULL?

 (*) There's a lot of "if (page_has_private()) try_to_release_page()"
 combos.  Does it make sense to create a inline function for this?

Signed-off-by: David Howells 
Link: 
https://lore.kernel.org/linux-fsdevel/CAHk-=whwojhgemn85loh9fx-5d2-upzmv1m2zmyxvd31tkp...@mail.gmail.com/
 [1]
---
 fs/btrfs/disk-io.c |2 +-
 fs/btrfs/inode.c   |2 +-
 fs/ext4/move_extent.c  |8 
 fs/fuse/dev.c  |2 +-
 fs/iomap/buffered-io.c |6 +++---
 fs/splice.c|2 +-
 include/linux/page-flags.h |   17 +++--
 include/trace/events/pagemap.h |2 +-
 mm/khugepaged.c|4 ++--
 mm/migrate.c   |   10 +-
 mm/readahead.c |2 +-
 mm/truncate.c  |   12 ++--
 mm/vmscan.c|   12 ++--
 13 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 41b718cfea40..d95f8d4b3004 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -936,7 +936,7 @@ static int btree_migratepage(struct address_space *mapping,
 * Buffers may be managed in a filesystem specific way.
 * We must have no buffers or drop them.
 */
-   if (page_has_private(page) &&
+   if (page_needs_cleanup(page) &&
!try_to_release_page(page, GFP_KERNEL))
return -EAGAIN;
return migrate_page(mapping, newpage, page, mode);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7cdf65be3707..94f038d34f16 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8333,7 +8333,7 @@ static int btrfs_migratepage(struct address_space 
*mapping,
if (ret != MIGRATEPAGE_SUCCESS)
return ret;
 
-   if (page_has_private(page))
+   if (PagePrivate(page))
attach_page_private(newpage, detach_page_private(page));
 
if (PagePrivate2(page)) {
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 64a579734f93..16d0a7a73191 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -329,9 +329,9 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
ext4_double_up_write_data_sem(orig_inode, donor_inode);
goto data_copy;
}
-   if ((page_has_private(pagep[0]) &&
+   if ((page_needs_cleanup(pagep[0]) &&
 !try_to_release_page(pagep[0], 0)) ||
-   (page_has_private(pagep[1]) &&
+   (page_needs_cleanup(pagep[1]) &&
 !try_to_release_page(pagep[1], 0))) {
*err = -EBUSY;
goto drop_data_sem;
@@ -351,8 +351,8 @@ move_extent_per_page(struct file *o_filp, struct inode 
*donor_inode,
 
/* At this point all buffers in range are uptodate, old mapping layout
 * is no longer required, try to drop it now. */
-   if ((page_has_private(pagep[0]) && !try_to_release_page(pagep[0], 0)) ||
-   (page_has_private(pagep[1]) && !try_to_release_page(pagep[1], 0))) {
+   if ((page_needs_cleanup(pagep[0]) && !try_to_release_page(pagep[0], 0)) 
||
+   (page_needs_cleanup(pagep[1]) && !try_to_release_page(pagep[1], 
0))) {
*err = -EBUSY;
goto unlock_pages;
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index c0fee830a34e..76e8ca9e47fa 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse

Re: [PATCH v6 02/30] mm: Add set/end/wait functions for PG_private_2

2021-04-08 Thread David Howells
Here's a partial change, but we still need to deal with the assumption that
page_has_private() makes that its output can be used to count the number of
refs held for PG_private *and* PG_private_2 - which isn't true for my code
here.

David
---
commit e7c28d83b84b972c3faa0dd86020548aa50eda75
Author: David Howells 
Date:   Thu Apr 8 16:33:20 2021 +0100

netfs: Fix PG_private_2 helper functions to consistently use compound_head()

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ef511364cc0c..63ca6430aef5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -699,6 +699,7 @@ void page_endio(struct page *page, bool is_write, int err);
  */
 static inline void set_page_private_2(struct page *page)
 {
+   page = compound_head(page);
get_page(page);
SetPagePrivate2(page);
 }
diff --git a/mm/filemap.c b/mm/filemap.c
index 0ce93c8799ca..46e0321ba87a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1461,6 +1461,7 @@ EXPORT_SYMBOL(end_page_private_2);
  */
 void wait_on_page_private_2(struct page *page)
 {
+   page = compound_head(page);
while (PagePrivate2(page))
wait_on_page_bit(page, PG_private_2);
 }
@@ -1481,6 +1482,7 @@ int wait_on_page_private_2_killable(struct page *page)
 {
int ret = 0;
 
+   page = compound_head(page);
while (PagePrivate2(page)) {
ret = wait_on_page_bit_killable(page, PG_private_2);
if (ret < 0)



Re: [PATCH v6 02/30] mm: Add set/end/wait functions for PG_private_2

2021-04-08 Thread David Howells
Matthew Wilcox  wrote:

> > +void end_page_private_2(struct page *page)
> > +{
> > +   page = compound_head(page);
> > +   VM_BUG_ON_PAGE(!PagePrivate2(page), page);
> > +   clear_bit_unlock(PG_private_2, >flags);
> > +   wake_up_page_bit(page, PG_private_2);
> 
> ... but when we try to end on a tail, we actually wake up the head ...

Question is, should I remove compound_head() here or add it into the other
functions?

David



Re: [PATCH v2 00/18] Implement RSASSA-PSS signature verification

2021-04-08 Thread David Howells
Varad Gautam  wrote:

> The test harness is available at [5].

Can you add this to the keyutils testsuite?

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git

David



Re: [PATCH v2 18/18] keyctl_pkey: Add pkey parameters slen and mgfhash for PSS

2021-04-08 Thread David Howells
Varad Gautam  wrote:

> + Opt_slen,   /* "slen=" eg. "slen=32" */

"slen" seems a bit unobvious.  Maybe "saltlen=..."?

David



[PATCH v6 30/30] afs: Use the netfs_write_begin() helper

2021-04-08 Thread David Howells
Make AFS use the new netfs_write_begin() helper to do the pre-reading
required before the write.  If successful, the helper returns with the
required page filled in and locked.  It may read more than just one page,
expanding the read to meet cache granularity requirements as necessary.

Note: A more advanced version of this could be made that does
generic_perform_write() for a whole cache granule.  This would make it
easier to avoid doing the download/read for the data to be overwritten.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588546422.3465195.1546354372589291098.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161539563244.286939.16537296241609909980.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653819291.2770958.406013201547420544.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/file.c |   19 +
 fs/afs/internal.h |1 
 fs/afs/write.c|  108 ++---
 3 files changed, 31 insertions(+), 97 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 10c6eaaac2cc..db035ae2a134 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -333,6 +333,13 @@ static void afs_init_rreq(struct netfs_read_request *rreq, 
struct file *file)
rreq->netfs_priv = key_get(afs_file_key(file));
 }
 
+static bool afs_is_cache_enabled(struct inode *inode)
+{
+   struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode));
+
+   return fscache_cookie_enabled(cookie) && 
!hlist_empty(>backing_objects);
+}
+
 static int afs_begin_cache_operation(struct netfs_read_request *rreq)
 {
struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
@@ -340,14 +347,24 @@ static int afs_begin_cache_operation(struct 
netfs_read_request *rreq)
return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode));
 }
 
+static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
+struct page *page, void **_fsdata)
+{
+   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+
+   return test_bit(AFS_VNODE_DELETED, >flags) ? -ESTALE : 0;
+}
+
 static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
 {
key_put(netfs_priv);
 }
 
-static const struct netfs_read_request_ops afs_req_ops = {
+const struct netfs_read_request_ops afs_req_ops = {
.init_rreq  = afs_init_rreq,
+   .is_cache_enabled   = afs_is_cache_enabled,
.begin_cache_operation  = afs_begin_cache_operation,
+   .check_write_begin  = afs_check_write_begin,
.issue_op   = afs_req_issue_op,
.cleanup= afs_priv_cleanup,
 };
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f9a692fc08f4..52157a05796a 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1045,6 +1045,7 @@ extern void afs_dynroot_depopulate(struct super_block *);
 extern const struct address_space_operations afs_fs_aops;
 extern const struct inode_operations afs_file_inode_operations;
 extern const struct file_operations afs_file_operations;
+extern const struct netfs_read_request_ops afs_req_ops;
 
 extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
 extern void afs_put_wb_key(struct afs_wb_key *);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index bc84c771b0fd..dc66ff15dd16 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "internal.h"
 
 /*
@@ -22,68 +24,6 @@ int afs_set_page_dirty(struct page *page)
return __set_page_dirty_nobuffers(page);
 }
 
-/*
- * Handle completion of a read operation to fill a page.
- */
-static void afs_fill_hole(struct afs_read *req)
-{
-   if (iov_iter_count(req->iter) > 0)
-   /* The read was short - clear the excess buffer. */
-   iov_iter_zero(iov_iter_count(req->iter), req->iter);
-}
-
-/*
- * partly or wholly fill a page that's under preparation for writing
- */
-static int afs_fill_page(struct file *file,
-loff_t pos, unsigned int len, struct page *page)
-{
-   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-   struct afs_read *req;
-   size_t p;
-   void *data;
-   int ret;
-
-   _enter(",,%llu", (unsigned long long)pos);
-
-   if (pos >= vnode->vfs_inode.i_size) {
-   p = pos & ~PAGE_MASK;
-   ASSERTCMP(p + len, <=, PAGE_SIZE);
-   data = kmap(page);
-   memset(data + p, 0, len);
-   kunmap(page);
-   return 0;
-   }
-
-   req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
-   if (!req)
-   return -ENOMEM;
-
-   refcount_set(>usage, 1);
-   req->vnode  = vnode;
-   req->done 

[PATCH v6 29/30] afs: Use new netfs lib read helper API

2021-04-08 Thread David Howells
Make AFS use the new netfs read helpers to implement the VM read
operations:

 - afs_readpage() now hands off responsibility to netfs_readpage().

 - afs_readpages() is gone and replaced with afs_readahead().

 - afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk [1]
Link: 
https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539561926.286939.5729036262354802339.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/Kconfig|1 
 fs/afs/file.c |  327 +
 fs/afs/fsclient.c |1 
 fs/afs/internal.h |3 
 fs/afs/write.c|7 +
 5 files changed, 88 insertions(+), 251 deletions(-)

diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index 1ad211d72b3b..fc8ba9142f2f 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -4,6 +4,7 @@ config AFS_FS
depends on INET
select AF_RXRPC
select DNS_RESOLVER
+   select NETFS_SUPPORT
help
  If you say Y here, you will get an experimental Andrew File System
  driver. It currently only supports unsecured read-only AFS access.
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 2db810467d3f..10c6eaaac2cc 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
@@ -22,8 +23,7 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
   unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 
-static int afs_readpages(struct file *filp, struct address_space *mapping,
-struct list_head *pages, unsigned nr_pages);
+static void afs_readahead(struct readahead_control *ractl);
 
 const struct file_operations afs_file_operations = {
.open   = afs_open,
@@ -47,7 +47,7 @@ const struct inode_operations afs_file_inode_operations = {
 
 const struct address_space_operations afs_fs_aops = {
.readpage   = afs_readpage,
-   .readpages  = afs_readpages,
+   .readahead  = afs_readahead,
.set_page_dirty = afs_set_page_dirty,
.launder_page   = afs_launder_page,
.releasepage= afs_releasepage,
@@ -184,61 +184,17 @@ int afs_release(struct inode *inode, struct file *file)
 }
 
 /*
- * Handle completion of a read operation.
+ * Allocate a new read record.
  */
-static void afs_file_read_done(struct afs_read *req)
+struct afs_read *afs_alloc_read(gfp_t gfp)
 {
-   struct afs_vnode *vnode = req->vnode;
-   struct page *page;
-   pgoff_t index = req->pos >> PAGE_SHIFT;
-   pgoff_t last = index + req->nr_pages - 1;
-
-   XA_STATE(xas, >vfs_inode.i_mapping->i_pages, index);
-
-   if (iov_iter_count(req->iter) > 0) {
-   /* The read was short - clear the excess buffer. */
-   _debug("afterclear %zx %zx %llx/%llx",
-  req->iter->iov_offset,
-  iov_iter_count(req->iter),
-  req->actual_len, req->len);
-   iov_iter_zero(iov_iter_count(req->iter), req->iter);
-   }
-
-   rcu_read_lock();
-   xas_for_each(, page, last) {
-   page_endio(page, false, 0);
-   put_page(page);
-   }
-   rcu_read_unlock();
-
-   task_io_account_read(req->len);
-   req->cleanup = NULL;
-}
-
-/*
- * Dispose of our locks and refs on the pages if the read failed.
- */
-static void afs_file_read_cleanup(struct afs_read *req)
-{
-   struct page *page;
-   pgoff_t index = req->pos >> PAGE_SHIFT;
-   pgoff_t

[PATCH v6 28/30] afs: Use the fs operation ops to handle FetchData completion

2021-04-08 Thread David Howells
Use the 'success' and 'aborted' afs_operations_ops methods and add a
'failed' method to handle the completion of an AFS.FetchData,
AFS.FetchData64 or YFS.FetchData64 RPC operation rather than directly
calling the done func pointed to by the afs_read struct from the call
delivery handler.

This means the done function will be called back on error also, not just on
successful completion.

This allows motion towards asynchronous data reception on data fetch calls
and allows any error to be handed off to the fscache read helper in the
same place as a successful completion.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588541471.3465195.8807019223378490810.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118157260.1232039.6549085372718234792.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161052647.2537118.12922380836599003659.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340417106.1303470.3502017303898569631.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539560673.286939.391310781674212229.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653816367.2770958.5856904574822446404.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/file.c |   15 +++
 fs/afs/fs_operation.c |4 +++-
 fs/afs/fsclient.c |3 ---
 fs/afs/internal.h |1 +
 fs/afs/yfsclient.c|3 ---
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index edf21c8708a3..2db810467d3f 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -254,6 +254,19 @@ void afs_put_read(struct afs_read *req)
}
 }
 
+static void afs_fetch_data_notify(struct afs_operation *op)
+{
+   struct afs_read *req = op->fetch.req;
+   int error = op->error;
+
+   if (error == -ECONNABORTED)
+   error = afs_abort_to_error(op->ac.abort_code);
+   req->error = error;
+
+   if (req->done)
+   req->done(req);
+}
+
 static void afs_fetch_data_success(struct afs_operation *op)
 {
struct afs_vnode *vnode = op->file[0].vnode;
@@ -262,6 +275,7 @@ static void afs_fetch_data_success(struct afs_operation *op)
afs_vnode_commit_status(op, >file[0]);
afs_stat_v(vnode, n_fetches);
atomic_long_add(op->fetch.req->actual_len, >net->n_fetch_bytes);
+   afs_fetch_data_notify(op);
 }
 
 static void afs_fetch_data_put(struct afs_operation *op)
@@ -275,6 +289,7 @@ static const struct afs_operation_ops 
afs_fetch_data_operation = {
.issue_yfs_rpc  = yfs_fs_fetch_data,
.success= afs_fetch_data_success,
.aborted= afs_check_for_remote_deletion,
+   .failed = afs_fetch_data_notify,
.put= afs_fetch_data_put,
 };
 
diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c
index 71c58723763d..2cb0951acca6 100644
--- a/fs/afs/fs_operation.c
+++ b/fs/afs/fs_operation.c
@@ -198,8 +198,10 @@ void afs_wait_for_operation(struct afs_operation *op)
case -ECONNABORTED:
if (op->ops->aborted)
op->ops->aborted(op);
-   break;
+   fallthrough;
default:
+   if (op->ops->failed)
+   op->ops->failed(op);
break;
}
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 31e6b3635541..5e34f4dbd385 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -392,9 +392,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
break;
}
 
-   if (req->done)
-   req->done(req);
-
_leave(" = 0 [done]");
return 0;
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9629b6430a52..ee283e3ebc4d 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -742,6 +742,7 @@ struct afs_operation_ops {
void (*issue_yfs_rpc)(struct afs_operation *op);
void (*success)(struct afs_operation *op);
void (*aborted)(struct afs_operation *op);
+   void (*failed)(struct afs_operation *op);
void (*edit_dir)(struct afs_operation *op);
void (*put)(struct afs_operation *op);
 };
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 363d6dd276c0..2b35cba8ad62 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -449,9 +449,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call 
*call)
break;
}
 
-   if (req->done)
-   req->done(req);
-
_leave(" = 0 [done]");
return 0;
 }




[PATCH v6 27/30] afs: Prepare for use of THPs

2021-04-08 Thread David Howells
As a prelude to supporting transparent huge pages, use thp_size() and
similar rather than PAGE_SIZE/SHIFT.

Further, try and frame everything in terms of file positions and lengths
rather than page indices and numbers of pages.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588540227.3465195.4752143929716269062.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118155821.1232039.540445038028845740.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161051439.2537118.15577827510426326534.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340415869.1303470.6040191748634322355.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539559365.286939.18344613540296085269.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653815142.2770958.454490670311230206.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/dir.c  |2 
 fs/afs/file.c |8 -
 fs/afs/internal.h |2 
 fs/afs/write.c|  434 +
 4 files changed, 244 insertions(+), 202 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 8c093bfff8b6..117df15e5367 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -2083,6 +2083,6 @@ static void afs_dir_invalidatepage(struct page *page, 
unsigned int offset,
afs_stat_v(dvnode, n_inval);
 
/* we clean up only if the entire page is being invalidated */
-   if (offset == 0 && length == PAGE_SIZE)
+   if (offset == 0 && length == thp_size(page))
detach_page_private(page);
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index f1e30b89e41c..edf21c8708a3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -329,8 +329,8 @@ static int afs_page_filler(struct key *key, struct page 
*page)
req->vnode  = vnode;
req->key= key_get(key);
req->pos= (loff_t)page->index << PAGE_SHIFT;
-   req->len= PAGE_SIZE;
-   req->nr_pages   = 1;
+   req->len= thp_size(page);
+   req->nr_pages   = thp_nr_pages(page);
req->done   = afs_file_read_done;
req->cleanup= afs_file_read_cleanup;
 
@@ -574,8 +574,8 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
clear_page_dirty_for_io(page);
 full_invalidate:
-   detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
+   detach_page_private(page);
 }
 
 /*
@@ -620,8 +620,8 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
 #endif
 
if (PagePrivate(page)) {
-   detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
+   detach_page_private(page);
}
 
/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 16020725cc68..9629b6430a52 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -815,8 +815,6 @@ struct afs_operation {
loff_t  pos;
loff_t  size;
loff_t  i_size;
-   pgoff_t first;  /* first page in mapping to 
deal with */
-   pgoff_t last;   /* last page in mapping to deal 
with */
boollaundering; /* Laundering page, 
PG_writeback not set */
} store;
struct {
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 4ccd2c263983..099c7dad09c5 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -94,15 +94,15 @@ int afs_write_begin(struct file *file, struct address_space 
*mapping,
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
struct page *page;
unsigned long priv;
-   unsigned f, from = pos & (PAGE_SIZE - 1);
-   unsigned t, to = from + len;
-   pgoff_t index = pos >> PAGE_SHIFT;
+   unsigned f, from;
+   unsigned t, to;
+   pgoff_t index;
int ret;
 
-   _enter("{%llx:%llu},{%lx},%u,%u",
-  vnode->fid.vid, vnode->fid.vnode, index, from, to);
+   _enter("{%llx:%llu},%llx,%x",
+  vnode->fid.vid, vnode->fid.vnode, pos, len);
 
-   page = grab_cache_page_write_begin(mapping, index, flags);
+   page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags);
if (!page)
return -ENOMEM;
 
@@ -121,19 +121,20 @@ int afs_write_begin(struct file *file, struct 
address_space *mapping,
wait_on_page_fscache(page);
 #endif
 
+   index = page->index;

[PATCH v6 26/30] afs: Extract writeback extension into its own function

2021-04-08 Thread David Howells
Extract writeback extension into its own function to break up the writeback
function a bit.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588538471.3465195.782513375683399583.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118154610.1232039.1765365632920504822.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161050546.2537118.2202554806419189453.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340414102.1303470.9078891484034668985.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539558417.286939.2879469588895925399.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653813972.2770958.12671731209438112378.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/write.c |  109 ++--
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 1b8cabf5ac92..4ccd2c263983 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -490,47 +490,25 @@ static int afs_store_data(struct afs_vnode *vnode, struct 
iov_iter *iter,
 }
 
 /*
- * Synchronously write back the locked page and any subsequent non-locked dirty
- * pages.
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ *
+ * If this page holds new content, then we can include filler zeros in the
+ * writeback.
  */
-static int afs_write_back_from_locked_page(struct address_space *mapping,
-  struct writeback_control *wbc,
-  struct page *primary_page,
-  pgoff_t final_page)
+static void afs_extend_writeback(struct address_space *mapping,
+struct afs_vnode *vnode,
+long *_count,
+pgoff_t start,
+pgoff_t final_page,
+unsigned *_offset,
+unsigned *_to,
+bool new_content)
 {
-   struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-   struct iov_iter iter;
struct page *pages[8], *page;
-   unsigned long count, priv;
-   unsigned n, offset, to, f, t;
-   pgoff_t start, first, last;
-   loff_t i_size, pos, end;
-   int loop, ret;
-
-   _enter(",%lx", primary_page->index);
-
-   count = 1;
-   if (test_set_page_writeback(primary_page))
-   BUG();
-
-   /* Find all consecutive lockable dirty pages that have contiguous
-* written regions, stopping when we find a page that is not
-* immediately lockable, is not dirty or is missing, or we reach the
-* end of the range.
-*/
-   start = primary_page->index;
-   priv = page_private(primary_page);
-   offset = afs_page_dirty_from(primary_page, priv);
-   to = afs_page_dirty_to(primary_page, priv);
-   trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
-
-   WARN_ON(offset == to);
-   if (offset == to)
-   trace_afs_page_dirty(vnode, tracepoint_string("WARN"), 
primary_page);
-
-   if (start >= final_page ||
-   (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, >flags)))
-   goto no_more;
+   unsigned long count = *_count, priv;
+   unsigned offset = *_offset, to = *_to, n, f, t;
+   int loop;
 
start++;
do {
@@ -551,8 +529,7 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
 
for (loop = 0; loop < n; loop++) {
page = pages[loop];
-   if (to != PAGE_SIZE &&
-   !test_bit(AFS_VNODE_NEW_CONTENT, >flags))
+   if (to != PAGE_SIZE && !new_content)
break;
if (page->index > final_page)
break;
@@ -566,8 +543,7 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
priv = page_private(page);
f = afs_page_dirty_from(page, priv);
t = afs_page_dirty_to(page, priv);
-   if (f != 0 &&
-   !test_bit(AFS_VNODE_NEW_CONTENT, >flags)) {
+   if (f != 0 && !new_content) {
unlock_page(page);
break;
}
@@ -593,6 +569,55 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
} while (start <= 

[PATCH v6 25/30] afs: Wait on PG_fscache before modifying/releasing a page

2021-04-08 Thread David Howells
PG_fscache is going to be used to indicate that a page is being written to
the cache, and that the page should not be modified or released until it's
finished.

Make afs_invalidatepage() and afs_releasepage() wait for it.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861253957.340223.7465334678444521655.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465832417.1377938.3571599385208729791.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588536286.3465195.13231895135369807920.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118153708.1232039.3535103645871176749.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161049369.2537118.11591934943429117060.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340412903.1303470.6424701655031380012.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539556890.286939.5873470593519458598.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653812726.2770958.18167145829938766503.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/file.c  |9 +
 fs/afs/write.c |   10 ++
 2 files changed, 19 insertions(+)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 4a34ffaf6de4..f1e30b89e41c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -593,6 +593,7 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
if (PagePrivate(page))
afs_invalidate_dirty(page, offset, length);
 
+   wait_on_page_fscache(page);
_leave("");
 }
 
@@ -610,6 +611,14 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
 
/* deny if page is being written to the cache and the caller hasn't
 * elected to wait */
+#ifdef CONFIG_AFS_FSCACHE
+   if (PageFsCache(page)) {
+   if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & 
__GFP_FS))
+   return false;
+   wait_on_page_fscache(page);
+   }
+#endif
+
if (PagePrivate(page)) {
detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 6e41b982c71b..1b8cabf5ac92 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -117,6 +117,10 @@ int afs_write_begin(struct file *file, struct 
address_space *mapping,
SetPageUptodate(page);
}
 
+#ifdef CONFIG_AFS_FSCACHE
+   wait_on_page_fscache(page);
+#endif
+
 try_again:
/* See if this page is already partially written in a way that we can
 * merge the new write with.
@@ -857,6 +861,11 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
/* Wait for the page to be written to the cache before we allow it to
 * be modified.  We then assume the entire page will need writing back.
 */
+#ifdef CONFIG_AFS_FSCACHE
+   if (PageFsCache(vmf->page) &&
+   wait_on_page_bit_killable(vmf->page, PG_fscache) < 0)
+   return VM_FAULT_RETRY;
+#endif
 
if (wait_on_page_writeback_killable(vmf->page))
return VM_FAULT_RETRY;
@@ -947,5 +956,6 @@ int afs_launder_page(struct page *page)
 
detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
+   wait_on_page_fscache(page);
return ret;
 }




[PATCH v6 24/30] afs: Use ITER_XARRAY for writing

2021-04-08 Thread David Howells
Use a single ITER_XARRAY iterator to describe the portion of a file to be
transmitted to the server rather than generating a series of small
ITER_BVEC iterators on the fly.  This will make it easier to implement AIO
in afs.

In theory we could maybe use one giant ITER_BVEC, but that means
potentially allocating a huge array of bio_vec structs (max 256 per page)
when in fact the pagecache already has a structure listing all the relevant
pages (radix_tree/xarray) that can be walked over.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/153685395197.14766.16289516750731233933.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/158861251312.340223.17924900795425422532.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465828607.1377938.6903132788463419368.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588535018.3465195.14509994354240338307.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118152415.1232039.6452879415814850025.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161048194.2537118.13763612220937637316.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340411602.1303470.4661108879482218408.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539555629.286939.5241869986617154517.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653811456.2770958.7017388543246759245.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/fsclient.c  |   50 +
 fs/afs/internal.h  |   15 +++---
 fs/afs/rxrpc.c |  103 ++--
 fs/afs/write.c |  100 ---
 fs/afs/yfsclient.c |   25 +++
 include/trace/events/afs.h |   51 --
 6 files changed, 126 insertions(+), 218 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 897b37301851..31e6b3635541 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -1055,8 +1055,7 @@ static const struct afs_call_type afs_RXFSStoreData64 = {
 /*
  * store a set of pages to a very large file
  */
-static void afs_fs_store_data64(struct afs_operation *op,
-   loff_t pos, loff_t size, loff_t i_size)
+static void afs_fs_store_data64(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
struct afs_call *call;
@@ -1071,7 +1070,7 @@ static void afs_fs_store_data64(struct afs_operation *op,
if (!call)
return afs_op_nomem(op);
 
-   call->send_pages = true;
+   call->write_iter = op->store.write_iter;
 
/* marshall the parameters */
bp = call->request;
@@ -1087,47 +1086,38 @@ static void afs_fs_store_data64(struct afs_operation 
*op,
*bp++ = 0; /* unix mode */
*bp++ = 0; /* segment size */
 
-   *bp++ = htonl(upper_32_bits(pos));
-   *bp++ = htonl(lower_32_bits(pos));
-   *bp++ = htonl(upper_32_bits(size));
-   *bp++ = htonl(lower_32_bits(size));
-   *bp++ = htonl(upper_32_bits(i_size));
-   *bp++ = htonl(lower_32_bits(i_size));
+   *bp++ = htonl(upper_32_bits(op->store.pos));
+   *bp++ = htonl(lower_32_bits(op->store.pos));
+   *bp++ = htonl(upper_32_bits(op->store.size));
+   *bp++ = htonl(lower_32_bits(op->store.size));
+   *bp++ = htonl(upper_32_bits(op->store.i_size));
+   *bp++ = htonl(lower_32_bits(op->store.i_size));
 
trace_afs_make_fs_call(call, >fid);
afs_make_op_call(op, call, GFP_NOFS);
 }
 
 /*
- * store a set of pages
+ * Write data to a file on the server.
  */
 void afs_fs_store_data(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
struct afs_call *call;
-   loff_t size, pos, i_size;
__be32 *bp;
 
_enter(",%x,{%llx:%llu},,",
   key_serial(op->key), vp->fid.vid, vp->fid.vnode);
 
-   size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-   if (op->store.first != op->store.last)
-   size += (loff_t)(op->store.last - op->store.first) << 
PAGE_SHIFT;
-   pos = (loff_t)op->store.first << PAGE_SHIFT;
-   pos += op->store.first_offset;
-
-   i_size = i_size_read(>vnode->vfs_inode);
-   if (pos + size > i_size)
-   i_size = size + pos;
-
_debug("size %llx, at %llx, i_size %llx",
-  (unsigned long long) size, (unsigned long long) pos,
-  (unsigned long long) i_size);
+  (unsigned long long)op->store.size,
+  (unsigned long long)op->store.pos,
+  (unsigned long long)op->store.i_size);
 
-   if (upper_32_bits(pos) || upper_32_bits(i_size) || upper

[PATCH v6 23/30] afs: Set up the iov_iter before calling afs_extract_data()

2021-04-08 Thread David Howells
afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
 appropriate places.  A number of convenience functions are provided to
 this end (eg. afs_extract_to_buf()).

 This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_XARRAY iterator when reading data to load directly
 into the inode's pages without needing to create a list of them.

This will allow O_DIRECT calls to be supported in future patches.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/152898380012.11616.12094591785228251717.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/153685394431.14766.3178466345696987059.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/153999787395.866.11218209749223643998.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/154033911195.12041.3882700371848894587.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/158861250059.340223.1248231474865140653.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465827399.1377938.11181327349704960046.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588533776.3465195.3612752083351956948.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118151238.1232039.17015723405750601161.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161047240.2537118.14721975104810564022.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340410333.1303470.16260122230371140878.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539554187.286939.15305559004905459852.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653810525.2770958.4630666029125411789.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/dir.c   |  222 +++-
 fs/afs/file.c  |  190 ++---
 fs/afs/fsclient.c  |   54 +++--
 fs/afs/internal.h  |   16 ++--
 fs/afs/write.c |   27 --
 fs/afs/yfsclient.c |   54 +++--
 6 files changed, 314 insertions(+), 249 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index d8825ce63eba..8c093bfff8b6 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -102,6 +102,35 @@ struct afs_lookup_cookie {
struct afs_fid  fids[50];
 };
 
+/*
+ * Drop the refs that we're holding on the pages we were reading into.  We've
+ * got refs on the first nr_pages pages.
+ */
+static void afs_dir_read_cleanup(struct afs_read *req)
+{
+   struct address_space *mapping = req->vnode->vfs_inode.i_mapping;
+   struct page *page;
+   pgoff_t last = req->nr_pages - 1;
+
+   XA_STATE(xas, >i_pages, 0);
+
+   if (unlikely(!req->nr_pages))
+   return;
+
+   rcu_read_lock();
+   xas_for_each(, page, last) {
+   if (xas_retry(, page))
+   continue;
+   BUG_ON(xa_is_value(page));
+   BUG_ON(PageCompound(page));
+   ASSERTCMP(page->mapping, ==, mapping);
+
+   put_page(page);
+   }
+
+   rcu_read_unlock();
+}
+
 /*
  * check that a directory page is valid
  */
@@ -127,7 +156,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
qty /= sizeof(union afs_xdr_dir_block);
 
/* check them */
-   dbuf = kmap(page);
+   dbuf = kmap_atomic(page);
for (tmp = 0; tmp < qty; tmp++) {
if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) {
printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n",
@@ -146,7 +175,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
((u8 *)>blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0;
}
 
-   kunmap(page);
+   kunmap_atomic(dbuf);
 
 checked:
afs_stat_v(dvnode, n_read_dir);
@@ -157,35 +186,74 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
 }
 
 /*
- * Check the contents of a directory that we've just read.
+ * Dump the contents of a directory.
  */
-static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req)
+static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req)
 {
struct afs_xdr_dir_page *dbuf;
-   unsigned int i, j, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+   struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+   struct page *page;
+   unsigned int i, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+   pgoff_t last = req->nr_pages - 1;
 
-   for (i = 0; i < req->nr_pages; i++)
-

[PATCH v6 22/30] afs: Log remote unmarshalling errors

2021-04-08 Thread David Howells
Log unmarshalling errors reported by the peer (ie. it can't parse what we
sent it).  Limit the maximum number of messages to 3.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/159465826250.1377938.16372395422217583913.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588532584.3465195.15618385466614028590.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118149739.1232039.208060911149801695.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161046033.2537118.7779717661044373273.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340409118.1303470.17812607349396199116.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539552964.286939.16503232687974398308.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653808989.2770958.11530765353025697860.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/rxrpc.c |   34 ++
 1 file changed, 34 insertions(+)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 0ec38b758f29..ae68576f822f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -500,6 +500,39 @@ void afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call, gfp_t gfp)
_leave(" = %d", ret);
 }
 
+/*
+ * Log remote abort codes that indicate that we have a protocol disagreement
+ * with the server.
+ */
+static void afs_log_error(struct afs_call *call, s32 remote_abort)
+{
+   static int max = 0;
+   const char *msg;
+   int m;
+
+   switch (remote_abort) {
+   case RX_EOF: msg = "unexpected EOF";break;
+   case RXGEN_CC_MARSHAL:   msg = "client marshalling";break;
+   case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling";  break;
+   case RXGEN_SS_MARSHAL:   msg = "server marshalling";break;
+   case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling";  break;
+   case RXGEN_DECODE:   msg = "opcode decode"; break;
+   case RXGEN_SS_XDRFREE:   msg = "server XDR cleanup";break;
+   case RXGEN_CC_XDRFREE:   msg = "client XDR cleanup";break;
+   case -32:msg = "insufficient data"; break;
+   default:
+   return;
+   }
+
+   m = max;
+   if (m < 3) {
+   max = m + 1;
+   pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n",
+ msg, call->type->name,
+ >alist->addrs[call->addr_ix].transport);
+   }
+}
+
 /*
  * deliver messages to a call
  */
@@ -563,6 +596,7 @@ static void afs_deliver_to_call(struct afs_call *call)
goto out;
case -ECONNABORTED:
ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
+   afs_log_error(call, call->abort_code);
goto done;
case -ENOTSUPP:
abort_code = RXGEN_OPCODE;




[PATCH v6 21/30] afs: Don't truncate iter during data fetch

2021-04-08 Thread David Howells
Don't truncate the iterator to correspond to the actual data size when
fetching the data from the server - rather, pass the length we want to read
to rxrpc.

This will allow the clear-after-read code in future to simply clear the
remaining iterator capacity rather than having to reinitialise the
iterator.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861249201.340223.13035445866976590375.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465825061.1377938.14403904452300909320.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588531418.3465195.10712005940763063144.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118148567.1232039.13380313332292947956.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161044610.2537118.17908520793806837792.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340407907.1303470.6501394859511712746.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539551721.286939.14655713136572200716.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653807790.2770958.14034599989374173734.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/fsclient.c  |6 --
 fs/afs/internal.h  |6 ++
 fs/afs/rxrpc.c |   13 +
 fs/afs/yfsclient.c |6 --
 include/net/af_rxrpc.h |2 +-
 net/rxrpc/recvmsg.c|9 +
 6 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 1d95ed9dd86e..4a57c6c6f12b 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -305,8 +305,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
unsigned int size;
int ret;
 
-   _enter("{%u,%zu/%llu}",
-  call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+   _enter("{%u,%zu,%zu/%llu}",
+  call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+  req->actual_len);
 
switch (call->unmarshall) {
case 0:
@@ -343,6 +344,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
size = PAGE_SIZE - req->offset;
else
size = req->remain;
+   call->iov_len = size;
call->bvec[0].bv_len = size;
call->bvec[0].bv_offset = req->offset;
call->bvec[0].bv_page = req->pages[req->index];
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 995fef267be7..7b8306d8e81e 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -104,6 +104,7 @@ struct afs_call {
struct afs_server   *server;/* The fileserver record if fs 
op (pins ref) */
struct afs_vlserver *vlserver;  /* The vlserver record if vl op 
*/
void*request;   /* request data (first part) */
+   size_t  iov_len;/* Size of *iter to be used */
struct iov_iter def_iter;   /* Default buffer/data iterator 
*/
struct iov_iter *iter;  /* Iterator currently in use */
union { /* Convenience for ->def_iter */
@@ -1271,6 +1272,7 @@ static inline void afs_make_op_call(struct afs_operation 
*op, struct afs_call *c
 
 static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t 
size)
 {
+   call->iov_len = size;
call->kvec[0].iov_base = buf;
call->kvec[0].iov_len = size;
iov_iter_kvec(>def_iter, READ, call->kvec, 1, size);
@@ -1278,21 +1280,25 @@ static inline void afs_extract_begin(struct afs_call 
*call, void *buf, size_t si
 
 static inline void afs_extract_to_tmp(struct afs_call *call)
 {
+   call->iov_len = sizeof(call->tmp);
afs_extract_begin(call, >tmp, sizeof(call->tmp));
 }
 
 static inline void afs_extract_to_tmp64(struct afs_call *call)
 {
+   call->iov_len = sizeof(call->tmp64);
afs_extract_begin(call, >tmp64, sizeof(call->tmp64));
 }
 
 static inline void afs_extract_discard(struct afs_call *call, size_t size)
 {
+   call->iov_len = size;
iov_iter_discard(>def_iter, READ, size);
 }
 
 static inline void afs_extract_to_buf(struct afs_call *call, size_t size)
 {
+   call->iov_len = size;
afs_extract_begin(call, call->buffer, size);
 }
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 8be709cb8542..0ec38b758f29 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -363,6 +363,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call, gfp_t gfp)
struct rxrpc_call *rxcall;
struct msghdr msg;
struct kvec iov[1];
+   size_t len;
s64 tx_total_len;
int ret;
 
@@ -466,9 +467,10 @@ v

[PATCH v6 20/30] afs: Move key to afs_read struct

2021-04-08 Thread David Howells
Stash the key used to authenticate read operations in the afs_read struct.
This will be necessary to reissue the operation against the server if a
read from the cache fails in upcoming cache changes.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861248336.340223.1851189950710196001.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465823899.1377938.11925978022348532049.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588529557.3465195.7303323479305254243.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118147693.1232039.13780672951838643842.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161043340.2537118.511899217704140722.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340406678.1303470.12676824086429446370.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539550819.286939.1268332875889175195.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653806683.2770958.11300984379283401542.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/dir.c  |3 ++-
 fs/afs/file.c |   16 +---
 fs/afs/internal.h |3 ++-
 fs/afs/write.c|   12 ++--
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 17548c1faf02..d8825ce63eba 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -241,6 +241,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
return ERR_PTR(-ENOMEM);
 
refcount_set(>usage, 1);
+   req->key = key_get(key);
req->nr_pages = nr_pages;
req->actual_len = i_size; /* May change */
req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
@@ -305,7 +306,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 
if (!test_bit(AFS_VNODE_DIR_VALID, >flags)) {
trace_afs_reload_dir(dvnode);
-   ret = afs_fetch_data(dvnode, key, req);
+   ret = afs_fetch_data(dvnode, req);
if (ret < 0)
goto error_unlock;
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index f1bae0b0a9c0..af6471defec3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -198,6 +198,7 @@ void afs_put_read(struct afs_read *req)
if (req->pages != req->array)
kfree(req->pages);
}
+   key_put(req->key);
kfree(req);
}
 }
@@ -228,7 +229,7 @@ static const struct afs_operation_ops 
afs_fetch_data_operation = {
 /*
  * Fetch file data from the volume.
  */
-int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read 
*req)
+int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 {
struct afs_operation *op;
 
@@ -237,9 +238,9 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key 
*key, struct afs_read *re
   vnode->fid.vid,
   vnode->fid.vnode,
   vnode->fid.unique,
-  key_serial(key));
+  key_serial(req->key));
 
-   op = afs_alloc_operation(key, vnode->volume);
+   op = afs_alloc_operation(req->key, vnode->volume);
if (IS_ERR(op))
return PTR_ERR(op);
 
@@ -278,6 +279,7 @@ int afs_page_filler(void *data, struct page *page)
 * unmarshalling code will clear the unfilled space.
 */
refcount_set(>usage, 1);
+   req->key = key_get(key);
req->pos = (loff_t)page->index << PAGE_SHIFT;
req->len = PAGE_SIZE;
req->nr_pages = 1;
@@ -287,7 +289,7 @@ int afs_page_filler(void *data, struct page *page)
 
/* read the contents of the file from the server into the
 * page */
-   ret = afs_fetch_data(vnode, key, req);
+   ret = afs_fetch_data(vnode, req);
afs_put_read(req);
 
if (ret < 0) {
@@ -372,7 +374,6 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
struct afs_read *req;
struct list_head *p;
struct page *first, *page;
-   struct key *key = afs_file_key(file);
pgoff_t index;
int ret, n, i;
 
@@ -396,6 +397,7 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
 
refcount_set(>usage, 1);
req->vnode = vnode;
+   req->key = key_get(afs_file_key(file));
req->page_done = afs_readpages_page_done;
req->pos = first->index;
req->pos <<= PAGE_SHIFT;
@@ -425,11 +427,11 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
} while (req->nr_pages < n);
 
if (req->nr_pages == 0) {
-   kfree(req);

[PATCH v6 19/30] afs: Print the operation debug_id when logging an unexpected data version

2021-04-08 Thread David Howells
Print the afs_operation debug_id when logging an unexpected change in the
data version.  This allows the logged message to be matched against
tracelines.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588528377.3465195.2206051235095182302.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118146111.1232039.11398082422487058312.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161042180.2537118.2471333561661033316.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340405772.1303470.3877167548944248214.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539549628.286939.15234870409714613954.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653805530.2770958.15120507632529970934.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/inode.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 8de6f05987b4..a4bb3ac762be 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -214,11 +214,12 @@ static void afs_apply_status(struct afs_operation *op,
 
if (vp->dv_before + vp->dv_delta != status->data_version) {
if (test_bit(AFS_VNODE_CB_PROMISED, >flags))
-   pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx 
%s\n",
+   pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s 
(op=%x)\n",
vnode->fid.vid, vnode->fid.vnode,
(unsigned long long)vp->dv_before + 
vp->dv_delta,
(unsigned long long)status->data_version,
-   op->type ? op->type->name : "???");
+   op->type ? op->type->name : "???",
+   op->debug_id);
 
vnode->invalid_before = status->data_version;
if (vnode->status.type == AFS_FTYPE_DIR) {




[PATCH v6 18/30] afs: Pass page into dirty region helpers to provide THP size

2021-04-08 Thread David Howells
Pass a pointer to the page being accessed into the dirty region helpers so
that the size of the page can be determined in case it's a transparent huge
page.

This also required the page to be passed into the afs_page_dirty trace
point - so there's no need to specifically pass in the index or private
data as these can be retrieved directly from the page struct.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539548385.286939.8864598314493255313.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653804285.2770958.3497360004849598038.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/file.c  |   20 +++
 fs/afs/internal.h  |   16 ++--
 fs/afs/write.c |   60 ++--
 include/trace/events/afs.h |   23 ++---
 4 files changed, 55 insertions(+), 64 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 314f6a9517c7..f1bae0b0a9c0 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -514,8 +514,8 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
return;
 
/* We may need to shorten the dirty region */
-   f = afs_page_dirty_from(priv);
-   t = afs_page_dirty_to(priv);
+   f = afs_page_dirty_from(page, priv);
+   t = afs_page_dirty_to(page, priv);
 
if (t <= offset || f >= end)
return; /* Doesn't overlap */
@@ -533,17 +533,17 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
if (f == t)
goto undirty;
 
-   priv = afs_page_dirty(f, t);
+   priv = afs_page_dirty(page, f, t);
set_page_private(page, priv);
-   trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, 
priv);
+   trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page);
return;
 
 undirty:
-   trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, 
priv);
+   trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
clear_page_dirty_for_io(page);
 full_invalidate:
-   priv = (unsigned long)detach_page_private(page);
-   trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, 
priv);
+   detach_page_private(page);
+   trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
 }
 
 /*
@@ -571,7 +571,6 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
 static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-   unsigned long priv;
 
_enter("{{%llx:%llu}[%lu],%lx},%x",
   vnode->fid.vid, vnode->fid.vnode, page->index, page->flags,
@@ -580,9 +579,8 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
/* deny if page is being written to the cache and the caller hasn't
 * elected to wait */
if (PagePrivate(page)) {
-   priv = (unsigned long)detach_page_private(page);
-   trace_afs_page_dirty(vnode, tracepoint_string("rel"),
-page->index, priv);
+   detach_page_private(page);
+   trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
}
 
/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 1627b1872812..fd437d4722b5 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -875,31 +875,31 @@ struct afs_vnode_cache_aux {
 #define __AFS_PAGE_PRIV_MMAPPED0x8000UL
 #endif
 
-static inline unsigned int afs_page_dirty_resolution(void)
+static inline unsigned int afs_page_dirty_resolution(struct page *page)
 {
-   int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
+   int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
return (shift > 0) ? shift : 0;
 }
 
-static inline size_t afs_page_dirty_from(unsigned long priv)
+static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv)
 {
unsigned long x = priv & __AFS_PAGE_PRIV_MASK;
 
/* The lower bound is inclusive */
-   return x << afs_page_dirty_resolution();
+   return x << afs_page_dirty_resolution(page);
 }
 
-static inline size_t afs_pa

[PATCH v6 15/30] netfs: Add a tracepoint to log failures that would be otherwise unseen

2021-04-08 Thread David Howells
Add a tracepoint to log internal failures (such as cache errors) that we
don't otherwise want to pass back to the netfs.

Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
 Link: 
https://lore.kernel.org/r/161781048813.463527.1557000804674707986.st...@warthog.procyon.org.uk/
---

 fs/netfs/read_helper.c   |   14 +-
 include/trace/events/netfs.h |   58 ++
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index cd3b61d5e192..1d3b50c5db6d 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -271,6 +271,8 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t 
transferred_or_error,
 
if (IS_ERR_VALUE(transferred_or_error)) {
netfs_stat(_n_rh_write_failed);
+   trace_netfs_failure(rreq, subreq, transferred_or_error,
+   netfs_fail_copy_to_cache);
} else {
netfs_stat(_n_rh_write_done);
}
@@ -323,6 +325,7 @@ static void netfs_rreq_do_write_to_cache(struct 
netfs_read_request *rreq)
ret = cres->ops->prepare_write(cres, >start, 
>len,
   rreq->i_size);
if (ret < 0) {
+   trace_netfs_failure(rreq, subreq, ret, 
netfs_fail_prepare_write);
trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
continue;
}
@@ -627,6 +630,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest 
*subreq,
 
if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error;
+   trace_netfs_failure(rreq, subreq, transferred_or_error,
+   netfs_fail_read);
goto failed;
}
 
@@ -996,8 +1001,10 @@ int netfs_readpage(struct file *file,
} while (test_bit(NETFS_RREQ_IN_PROGRESS, >flags));
 
ret = rreq->error;
-   if (ret == 0 && rreq->submitted < rreq->len)
+   if (ret == 0 && rreq->submitted < rreq->len) {
+   trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_readpage);
ret = -EIO;
+   }
 out:
netfs_put_read_request(rreq, false);
return ret;
@@ -1069,6 +1076,7 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
/* Allow the netfs (eg. ceph) to flush conflicts. */
ret = ops->check_write_begin(file, pos, len, page, _fsdata);
if (ret < 0) {
+   trace_netfs_failure(NULL, NULL, ret, 
netfs_fail_check_write_begin);
if (ret == -EAGAIN)
goto retry;
goto error;
@@ -1145,8 +1153,10 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
}
 
ret = rreq->error;
-   if (ret == 0 && rreq->submitted < rreq->len)
+   if (ret == 0 && rreq->submitted < rreq->len) {
+   trace_netfs_failure(rreq, NULL, ret, 
netfs_fail_short_write_begin);
ret = -EIO;
+   }
netfs_put_read_request(rreq, false);
if (ret < 0)
goto error;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index e3ebeabd3852..de1c64635e42 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -47,6 +47,15 @@ enum netfs_sreq_trace {
netfs_sreq_trace_write_term,
 };
 
+enum netfs_failure {
+   netfs_fail_check_write_begin,
+   netfs_fail_copy_to_cache,
+   netfs_fail_read,
+   netfs_fail_short_readpage,
+   netfs_fail_short_write_begin,
+   netfs_fail_prepare_write,
+};
+
 #endif
 
 #define netfs_read_traces  \
@@ -81,6 +90,14 @@ enum netfs_sreq_trace {
EM(netfs_sreq_trace_write_skip, "SKIP ")\
E_(netfs_sreq_trace_write_term, "WTERM")
 
+#define netfs_failures \
+   EM(netfs_fail_check_write_begin,"check-write-begin")\
+   EM(netfs_fail_copy_to_cache,"copy-to-cache")\
+   EM(netfs_fail_read, "read") \
+   EM(netfs_fail_short_readpage,   "short-readpage")   \
+   EM(netfs_fail_short_write_begin,"short-write-begin")\
+   E_(netfs_fail_prepare_write,"prep-write")
+
 
 /*
  * Export enum symbols via user

[PATCH v6 17/30] afs: Disable use of the fscache I/O routines

2021-04-08 Thread David Howells
Disable use of the fscache I/O routined by the AFS filesystem.  It's about
to transition to passing iov_iters down and fscache is about to have its
I/O path to use iov_iter, so all that needs to change.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861209824.340223.1864211542341758994.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465768717.1376105.2229314852486665807.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588457929.3465195.1730097418904945578.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118143744.1232039.2727898205333669064.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161039077.2537118.7986870854927176905.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340403323.1303470.8159439948319423431.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539547167.286939.3536238932531122332.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653802797.2770958.547311814861545911.st...@warthog.procyon.org.uk/
 # v5
---

 fs/afs/file.c  |  199 ++--
 fs/afs/inode.c |2 -
 fs/afs/write.c |   10 ---
 3 files changed, 36 insertions(+), 175 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 960b64268623..314f6a9517c7 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -202,24 +202,6 @@ void afs_put_read(struct afs_read *req)
}
 }
 
-#ifdef CONFIG_AFS_FSCACHE
-/*
- * deal with notification that a page was read from the cache
- */
-static void afs_file_readpage_read_complete(struct page *page,
-   void *data,
-   int error)
-{
-   _enter("%p,%p,%d", page, data, error);
-
-   /* if the read completes with an error, we just unlock the page and let
-* the VM reissue the readpage */
-   if (!error)
-   SetPageUptodate(page);
-   unlock_page(page);
-}
-#endif
-
 static void afs_fetch_data_success(struct afs_operation *op)
 {
struct afs_vnode *vnode = op->file[0].vnode;
@@ -287,89 +269,46 @@ int afs_page_filler(void *data, struct page *page)
if (test_bit(AFS_VNODE_DELETED, >flags))
goto error;
 
-   /* is it cached? */
-#ifdef CONFIG_AFS_FSCACHE
-   ret = fscache_read_or_alloc_page(vnode->cache,
-page,
-afs_file_readpage_read_complete,
-NULL,
-GFP_KERNEL);
-#else
-   ret = -ENOBUFS;
-#endif
-   switch (ret) {
-   /* read BIO submitted (page in cache) */
-   case 0:
-   break;
-
-   /* page not yet cached */
-   case -ENODATA:
-   _debug("cache said ENODATA");
-   goto go_on;
-
-   /* page will not be cached */
-   case -ENOBUFS:
-   _debug("cache said ENOBUFS");
-
-   fallthrough;
-   default:
-   go_on:
-   req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
-   if (!req)
-   goto enomem;
-
-   /* We request a full page.  If the page is a partial one at the
-* end of the file, the server will return a short read and the
-* unmarshalling code will clear the unfilled space.
-*/
-   refcount_set(>usage, 1);
-   req->pos = (loff_t)page->index << PAGE_SHIFT;
-   req->len = PAGE_SIZE;
-   req->nr_pages = 1;
-   req->pages = req->array;
-   req->pages[0] = page;
-   get_page(page);
-
-   /* read the contents of the file from the server into the
-* page */
-   ret = afs_fetch_data(vnode, key, req);
-   afs_put_read(req);
-
-   if (ret < 0) {
-   if (ret == -ENOENT) {
-   _debug("got NOENT from server"
-  " - marking file deleted and stale");
-   set_bit(AFS_VNODE_DELETED, >flags);
-   ret = -ESTALE;
-   }
-
-#ifdef CONFIG_AFS_FSCACHE
-   fscache_uncache_page(vnode->cache, page);
-#endif
-   BUG_ON(PageFsCache(page));
-
-   if (ret == -EINTR ||
-   ret == -ENOMEM ||
-   ret == -ERESTARTSYS ||
-   ret == -EAGAIN)
-   goto error;
-   goto io_error;
-   }
+  

[PATCH v6 16/30] fscache, cachefiles: Add alternate API to use kiocb for read/write to cache

2021-04-08 Thread David Howells
Add an alternate API by which the cache can be accessed through a kiocb,
doing async DIO, rather than using the current API that tells the cache
where all the pages are.

The new API is intended to be used in conjunction with the netfs helper
library.  A filesystem must pick one or the other and not mix them.

Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API
before #including the header.  This prevents them from continuing to use
the old API at the same time as there are incompatibilities in how the
PG_fscache page bit is used.

Changes:
v6:
 - Provide a routine to shape a write so that the start and length can be
   aligned for DIO[3].

v4:
 - Use the vfs_iocb_iter_read/write() helpers[1]
 - Move initial definition of fscache_begin_read_operation() here.
 - Remove a commented-out line[2]
 - Combine ki->term_func calls in cachefiles_read_complete()[2].
 - Remove explicit NULL initialiser[2].
 - Remove extern on func decl[2].
 - Put in param names on func decl[2].
 - Remove redundant else[2].
 - Fill out the kdoc comment for fscache_begin_read_operation().
 - Rename fs/fscache/page2.c to io.c to match later patches.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Christoph Hellwig 
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216102614.ga27...@lst.de/ [1]
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [2]
Link: 
https://lore.kernel.org/r/161781047695.463527.7463536103593997492.st...@warthog.procyon.org.uk/
 [3]
Link: 
https://lore.kernel.org/r/161118142558.1232039.17993829899588971439.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161037850.2537118.8819808229350326503.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340402057.1303470.8038373593844486698.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539545919.286939.14573472672781434757.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653801477.2770958.10543270629064934227.st...@warthog.procyon.org.uk/
 # v5
---

 fs/cachefiles/Makefile|1 
 fs/cachefiles/interface.c |5 
 fs/cachefiles/internal.h  |9 +
 fs/cachefiles/io.c|  420 +
 fs/fscache/Kconfig|1 
 fs/fscache/Makefile   |1 
 fs/fscache/internal.h |4 
 fs/fscache/io.c   |  116 +++
 fs/fscache/page.c |2 
 fs/fscache/stats.c|1 
 include/linux/fscache-cache.h |4 
 include/linux/fscache.h   |   39 
 12 files changed, 600 insertions(+), 3 deletions(-)
 create mode 100644 fs/cachefiles/io.c
 create mode 100644 fs/fscache/io.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 891dedda5905..2227dc2d5498 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
bind.o \
daemon.o \
interface.o \
+   io.o \
key.o \
main.o \
namei.o \
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 5efa6a3702c0..da3948fdb615 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -319,8 +319,8 @@ static void cachefiles_drop_object(struct fscache_object 
*_object)
 /*
  * dispose of a reference to an object
  */
-static void cachefiles_put_object(struct fscache_object *_object,
- enum fscache_obj_ref_trace why)
+void cachefiles_put_object(struct fscache_object *_object,
+  enum fscache_obj_ref_trace why)
 {
struct cachefiles_object *object;
struct fscache_cache *cache;
@@ -568,4 +568,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.uncache_page   = cachefiles_uncache_page,
.dissociate_pages   = cachefiles_dissociate_pages,
.check_consistency  = cachefiles_check_consistency,
+   .begin_read_operation   = cachefiles_begin_read_operation,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index cf9bd6401c2d..4ed83aa5253b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -150,6 +150,9 @@ extern int cachefiles_has_space(struct cachefiles_cache 
*cache,
  */
 extern const struct fscache_cache_ops cachefiles_cache_ops;
 
+void cachefiles_put_object(struct fscache_object *_object,
+  enum fscache_obj_ref_trace why);
+
 /*
  * key.c
  */
@@ -217,6 +220,12 @@ extern int cachefiles_allocate_pages(struct 
fscache_retrieval *,
 extern int cachefiles_write_page(struct fscache_storage *, struct page *);
 extern void cachefiles_uncache_page(struct fscache_object *, struct page *);
 
+/*
+ * rdwr2.c
+ */
+extern int cachefiles_begin_read_operation(str

[PATCH v6 14/30] netfs: Define an interface to talk to a cache

2021-04-08 Thread David Howells
Add an interface to the netfs helper library for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.

The API passes an iov_iter to the cache read/write routines to indicate the
data/buffer to be used.  This is done using the ITER_XARRAY type to provide
direct access to the netfs inode's pagecache.

When the netfs's ->begin_cache_operation() method is called, this must fill
in the cache_resources in the netfs_read_request struct, including the
netfs_cache_ops used by the helper lib to talk to the cache.  The helper
lib does not directly access the cache.

Changes:
v6:
- Call trace_netfs_read() after beginning the cache op so that the cookie
  debug ID can be logged[3].
- Don't record the error from writing to the cache.  We don't want to pass
  it back to the netfs[4].
- Fix copy-to-cache subreq amalgamation to not round up as it goes along
  otherwise it overcalculates the length of the write[5].

v5:
- Use end_page_fscache() rather than unlock_page_fscache()[2].

v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue (can't use in_softirq()[1]).
- Add missing inc of netfs_n_rh_read stat.
- Move initial definition of fscache_begin_read_operation() elsewhere.
- Need to call op->begin_cache_operation() from netfs_write_begin().

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
Link: 
https://lore.kernel.org/r/161781045123.463527.14533348855710902201.st...@warthog.procyon.org.uk/
 [3]
Link: 
https://lore.kernel.org/r/161781046256.463527.18158681600085556192.st...@warthog.procyon.org.uk/
 [4]
Link: 
https://lore.kernel.org/r/161781047695.463527.7463536103593997492.st...@warthog.procyon.org.uk/
 [5]
Link: 
https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539542874.286939.13337898213448136687.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653799826.2770958.9015430297426331950.st...@warthog.procyon.org.uk/
 # v5
---

 fs/netfs/read_helper.c   |  239 ++
 include/linux/netfs.h|   55 ++
 include/trace/events/netfs.h |2 
 3 files changed, 295 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index da34aedea053..cd3b61d5e192 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -88,6 +88,8 @@ static void netfs_free_read_request(struct work_struct *work)
if (rreq->netfs_priv)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
+   if (rreq->cache_resources.ops)
+   
rreq->cache_resources.ops->end_operation(>cache_resources);
kfree(rreq);
netfs_stat_d(_n_rh_rreq);
 }
@@ -154,6 +156,34 @@ static void netfs_clear_unread(struct 
netfs_read_subrequest *subreq)
iov_iter_zero(iov_iter_count(), );
 }
 
+static void netfs_cache_read_terminated(void *priv, ssize_t 
transferred_or_error,
+   bool was_async)
+{
+   struct netfs_read_subrequest *subreq = priv;
+
+   netfs_subreq_terminated(subreq, transferred_or_error, was_async);
+}
+
+/*
+ * Issue a read against the cache.
+ * - Eats the caller's ref on subreq.
+ */
+static void netfs_read_from_cache(struct netfs_read_request *rreq,
+ struct netfs_read_subrequest *subreq,
+ bool seek_data)
+{
+   struct netfs_cache_resources *cres = >cache_resources;
+   struct iov_iter iter;
+
+   netfs_stat(_n_rh_read);
+   iov_iter_xarray(, READ, >mapping->i_pages,
+   subreq->start + subreq->transferred,
+   subreq->len   - subreq->transferred);
+
+   cres->ops->read(cres, subreq->start, , seek_data,
+   netfs_cache_read_terminated, subreq);
+}
+
 /*
  * Fill a subrequest region with zeroes.
  */
@@ -198,6 +228,141 @@ static void netfs_rreq_completed(struct 
netfs_read_request *rreq, bool was_async
netfs_put_read_request(rreq, was_async);
 }
 
+/*
+ * Deal

Re: [PATCH] afs: fix no return statement in function returning non-void

2021-04-08 Thread David Howells
Zheng Zengkai  wrote:

>  static int afs_dir_set_page_dirty(struct page *page)
>  {
>   BUG(); /* This should never happen. */
> + return 0;
>  }

That shouldn't be necessary.  BUG() should be marked as 'no return' to the
compiler.  What arch and compiler are you using?

David



[PATCH v6 13/30] netfs: Add write_begin helper

2021-04-08 Thread David Howells
Add a helper to do the pre-reading work for the netfs write_begin address
space op.

Changes
v6:
- Fixed a missing rreq put in netfs_write_begin()[3].
- Use DEFINE_READAHEAD()[4].

v5:
- Made the wait for PG_fscache in netfs_write_begin() killable[2].

v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue (can't use in_softirq()[1]).

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
Link: 
https://lore.kernel.org/r/161781042127.463527.9154479794406046987.st...@warthog.procyon.org.uk/
 [3]
Link: https://lore.kernel.org/r/1234933.1617886...@warthog.procyon.org.uk/ [4]
Link: 
https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539541541.286939.1889738674057013729.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653798616.2770958.17213315845968485563.st...@warthog.procyon.org.uk/
 # v5
---

 fs/netfs/internal.h  |2 +
 fs/netfs/read_helper.c   |  164 ++
 fs/netfs/stats.c |   11 ++-
 include/linux/netfs.h|8 ++
 include/trace/events/netfs.h |4 +
 5 files changed, 185 insertions(+), 4 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 98b6f4516da1..b7f2c4459f33 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed;
 extern atomic_t netfs_n_rh_zero;
 extern atomic_t netfs_n_rh_short_read;
 extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
+extern atomic_t netfs_n_rh_write_zskip;
 
 
 static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 6d6ed30f417e..da34aedea053 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -772,3 +772,167 @@ int netfs_readpage(struct file *file,
return ret;
 }
 EXPORT_SYMBOL(netfs_readpage);
+
+static void netfs_clear_thp(struct page *page)
+{
+   unsigned int i;
+
+   for (i = 0; i < thp_nr_pages(page); i++)
+   clear_highpage(page + i);
+}
+
+/**
+ * netfs_write_begin - Helper to prepare for writing
+ * @file: The file to read from
+ * @mapping: The mapping to read from
+ * @pos: File position at which the write will begin
+ * @len: The length of the write in this page
+ * @flags: AOP_* flags
+ * @_page: Where to put the resultant page
+ * @_fsdata: Place for the netfs to store a cookie
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Pre-read data for a write-begin request by drawing data from the cache if
+ * possible, or the netfs if not.  Space beyond the EOF is zero-filled.
+ * Multiple I/O requests from different sources will get munged together.  If
+ * necessary, the readahead window can be expanded in either direction to a
+ * more convenient alighment for RPC efficiency or to make storage in the cache
+ * feasible.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.
+ *
+ * The check_write_begin() operation can be provided to check for and flush
+ * conflicting writes once the page is grabbed and locked.  It is passed a
+ * pointer to the fsdata cookie that gets returned to the VM to be passed to
+ * write_end.  It is permitted to sleep.  It should return 0 if the request
+ * should go ahead; unlock the page and return -EAGAIN to cause the page to be
+ * regot; or return an error.
+ *
+ * This is usable whether or not caching is enabled.
+ */
+int netfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned int len, unsigned int flags,
+ struct page **_page, void **_fsdata,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv)
+{
+   struct netfs_read_request *rreq;
+   struct page *page, *xpage;
+   struct inode *inode = file_inode(file);
+   unsigned 

[PATCH v6 12/30] netfs: Gather stats

2021-04-08 Thread David Howells
Gather statistics from the netfs interface that can be exported through a
seqfile.  This is intended to be called by a later patch when viewing
/proc/fs/fscache/stats.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539539959.286939.6794352576462965914.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653797700.2770958.5801990354413178228.st...@warthog.procyon.org.uk/
 # v5
---

 fs/netfs/Kconfig   |   15 +
 fs/netfs/Makefile  |3 +--
 fs/netfs/internal.h|   34 ++
 fs/netfs/read_helper.c |   23 
 fs/netfs/stats.c   |   54 
 include/linux/netfs.h  |1 +
 6 files changed, 128 insertions(+), 2 deletions(-)
 create mode 100644 fs/netfs/stats.c

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
index 2ebf90e6ca95..578112713703 100644
--- a/fs/netfs/Kconfig
+++ b/fs/netfs/Kconfig
@@ -6,3 +6,18 @@ config NETFS_SUPPORT
  This option enables support for network filesystems, including
  helpers for high-level buffered I/O, abstracting out read
  segmentation, local caching and transparent huge page support.
+
+config NETFS_STATS
+   bool "Gather statistical information on local caching"
+   depends on NETFS_SUPPORT && PROC_FS
+   help
+ This option causes statistical information to be gathered on local
+ caching and exported through file:
+
+   /proc/fs/fscache/stats
+
+ The gathering of statistics adds a certain amount of overhead to
+ execution as there are a quite a few stats gathered, and on a
+ multi-CPU system these may be on cachelines that keep bouncing
+ between CPUs.  On the other hand, the stats are very useful for
+ debugging purposes.  Saying 'Y' here is recommended.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 4b4eff2ba369..c15bfc966d96 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -1,6 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-netfs-y := \
-   read_helper.o
+netfs-y := read_helper.o stats.o
 
 obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index ee665c0e7dc8..98b6f4516da1 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -16,8 +16,42 @@
  */
 extern unsigned int netfs_debug;
 
+/*
+ * stats.c
+ */
+#ifdef CONFIG_NETFS_STATS
+extern atomic_t netfs_n_rh_readahead;
+extern atomic_t netfs_n_rh_readpage;
+extern atomic_t netfs_n_rh_rreq;
+extern atomic_t netfs_n_rh_sreq;
+extern atomic_t netfs_n_rh_download;
+extern atomic_t netfs_n_rh_download_done;
+extern atomic_t netfs_n_rh_download_failed;
+extern atomic_t netfs_n_rh_download_instead;
+extern atomic_t netfs_n_rh_read;
+extern atomic_t netfs_n_rh_read_done;
+extern atomic_t netfs_n_rh_read_failed;
+extern atomic_t netfs_n_rh_zero;
+extern atomic_t netfs_n_rh_short_read;
+extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_done;
+extern atomic_t netfs_n_rh_write_failed;
+
+
+static inline void netfs_stat(atomic_t *stat)
+{
+   atomic_inc(stat);
+}
+
+static inline void netfs_stat_d(atomic_t *stat)
+{
+   atomic_dec(stat);
+}
+
+#else
 #define netfs_stat(x) do {} while(0)
 #define netfs_stat_d(x) do {} while(0)
+#endif
 
 /*/
 /*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 799eee7f4ee6..6d6ed30f417e 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -56,6 +56,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
refcount_set(>usage, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
ops->init_rreq(rreq, file);
+   netfs_stat(_n_rh_rreq);
}
 
return rreq;
@@ -88,6 +89,7 @@ static void netfs_free_read_request(struct work_struct *work)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
kfree(rreq);
+   netfs_stat_d(_n_rh_rreq);
 }
 
 static void netfs_put_read_request(struct netfs_read_request *rreq, bool 
was_async)
@@ -117,6 +119,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest(
refcount_set(&

[PATCH v6 11/30] netfs: Add tracepoints

2021-04-08 Thread David Howells
Add three tracepoints to track the activity of the read helpers:

 (1) netfs/netfs_read

 This logs entry to the read helpers and also expansion of the range in
 a readahead request.

 (2) netfs/netfs_rreq

 This logs the progress of netfs_read_request objects which track
 read requests.  A read request may be a compound of multiple
 subrequests.

 (3) netfs/netfs_sreq

 This logs the progress of netfs_read_subrequest objects, which track
 the contributions from various sources to a read request.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539538693.286939.10171713520419106334.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.st...@warthog.procyon.org.uk/
 # v5
---

 fs/netfs/read_helper.c   |   26 +
 include/linux/netfs.h|1 
 include/trace/events/netfs.h |  199 ++
 3 files changed, 226 insertions(+)
 create mode 100644 include/trace/events/netfs.h

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 30d4bf6bf28a..799eee7f4ee6 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include "internal.h"
+#define CREATE_TRACE_POINTS
+#include 
 
 MODULE_DESCRIPTION("Network fs support");
 MODULE_AUTHOR("Red Hat, Inc.");
@@ -84,6 +86,7 @@ static void netfs_free_read_request(struct work_struct *work)
netfs_rreq_clear_subreqs(rreq, false);
if (rreq->netfs_priv)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
+   trace_netfs_rreq(rreq, netfs_rreq_trace_free);
kfree(rreq);
 }
 
@@ -129,6 +132,7 @@ static void __netfs_put_subrequest(struct 
netfs_read_subrequest *subreq,
 {
struct netfs_read_request *rreq = subreq->rreq;
 
+   trace_netfs_sreq(subreq, netfs_sreq_trace_free);
kfree(subreq);
netfs_put_read_request(rreq, was_async);
 }
@@ -183,6 +187,7 @@ static void netfs_read_from_server(struct 
netfs_read_request *rreq,
  */
 static void netfs_rreq_completed(struct netfs_read_request *rreq, bool 
was_async)
 {
+   trace_netfs_rreq(rreq, netfs_rreq_trace_done);
netfs_rreq_clear_subreqs(rreq, was_async);
netfs_put_read_request(rreq, was_async);
 }
@@ -221,6 +226,8 @@ static void netfs_rreq_unlock(struct netfs_read_request 
*rreq)
iopos = 0;
subreq_failed = (subreq->error < 0);
 
+   trace_netfs_rreq(rreq, netfs_rreq_trace_unlock);
+
rcu_read_lock();
xas_for_each(, page, last_page) {
unsigned int pgpos = (page->index - start_page) * PAGE_SIZE;
@@ -281,6 +288,8 @@ static void netfs_rreq_short_read(struct netfs_read_request 
*rreq,
__clear_bit(NETFS_SREQ_SHORT_READ, >flags);
__set_bit(NETFS_SREQ_SEEK_DATA_READ, >flags);
 
+   trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);
+
netfs_get_read_subrequest(subreq);
atomic_inc(>nr_rd_ops);
netfs_read_from_server(rreq, subreq);
@@ -296,6 +305,8 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
 
WARN_ON(in_interrupt());
 
+   trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
+
/* We don't want terminating submissions trying to wake us up whilst
 * we're still going through the list.
 */
@@ -308,6 +319,7 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
break;
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->error = 0;
+   trace_netfs_sreq(subreq, 
netfs_sreq_trace_download_instead);
netfs_get_read_subrequest(subreq);
atomic_inc(>nr_rd_ops);
netfs_read_from_server(rreq, subreq);
@@ -332,6 +344,8 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
  */
 static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async)
 {
+   trace_netfs_rreq(rreq, netfs_rreq_trace_assess);
+
 again:
if (!test_bit(NETFS_RREQ_FAILED, >flags) &&
test_bit(NETFS_RREQ_INCOMPLETE_IO, >flags)) {
@@ -422,6 +436,8 @@ vo

[PATCH v6 10/30] netfs: Provide readahead and readpage netfs helpers

2021-04-08 Thread David Howells
Add a pair of helper functions:

 (*) netfs_readahead()
 (*) netfs_readpage()

to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages.  This is all handled within the helper.

Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.

netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.

The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):

 (*) expand_readahead() [optional]

 Called to allow the netfs to request an expansion of a readahead
 request to meet its own alignment requirements.  This is done by
 changing rreq->start and rreq->len.

 (*) clamp_length() [optional]

 Called to allow the netfs to cut down a subrequest to meet its own
 boundary requirements.  If it does this, the helper will generate
 additional subrequests until the full request is satisfied.

 (*) is_still_valid() [optional]

 Called to find out if the data just read from the cache has been
 invalidated and must be reread from the server.

 (*) issue_op() [required]

 Called to ask the netfs to issue a read to the server.  The subrequest
 describes the read.  The read request holds information about the file
 being accessed.

 The netfs can cache information in rreq->netfs_priv.

 Upon completion, the netfs should set the error, transferred and can
 also set FSCACHE_SREQ_CLEAR_TAIL and then call
 fscache_subreq_terminated().

 (*) done() [optional]

 Called after the pages have been unlocked.  The read request is still
 pinning the file and mapping and may still be pinning pages with
 PG_fscache.  rreq->error indicates any error that has been
 accumulated.

 (*) cleanup() [optional]

 Called when the helper is disposing of a finished read request.  This
 allows the netfs to clear rreq->netfs_priv.

Netfs support is enabled with CONFIG_NETFS_SUPPORT=y.  It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.

Changes:
v5:
 - Comment why netfs_readahead() is putting pages[2].
 - Use page_file_mapping() rather than page->mapping[2].
 - Use page_index() rather than page->index[2].
 - Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an
   appropriate ref too[4].

v4:
 - Folded in a kerneldoc comment fix.
 - Folded in a fix for the error handling in the case that ENOMEM occurs.
 - Added flag to netfs_subreq_terminated() to indicate that the caller may
   have been running async and stuff that might sleep needs punting to a
   workqueue (can't use in_softirq()[1]).

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/20210321014202.gf3...@casper.infradead.org/ [2]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [3]
Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [4]
Link: 
https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539537375.286939.16642940088716990995.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653795430.2770958.494758457372554.st...@warthog.procyon.org.uk/
 # v5
---

 fs/Kconfig |1 
 fs/Makefile|1 
 fs/netfs/Makefile  |6 
 fs/netfs/internal.h|   61 
 fs/netfs/read_helper.c |  725 
 include/linux/netfs.h  |   83 +
 6 files changed, 877 insertions(+)
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c

diff --git a/fs/Kconfig b/fs/Kconfig
index a55bda4233bb..97e7b77c9309 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -125,6 +125,7 @@ source "fs/overlayfs/Kconfig&q

[PATCH v6 09/30] netfs, mm: Add set/end/wait_on_page_fscache() aliases

2021-04-08 Thread David Howells
Add set/end/wait_on_page_fscache() as aliases of
set/end/wait_page_private_2().  These allow a page to marked with
PG_fscache, the flag to be removed and waiters woken and waiting for the
flag to be cleared.  A ref on the page is also taken and dropped.

[Linus suggested putting the fscache-themed functions into the
 caching-specific headers rather than pagemap.h[1]]

Changes:
v5:
- Mirror the changes to the core routines[2].

Signed-off-by: David Howells 
cc: Linus Torvalds 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/1330473.1612974...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [1]
Link: 
https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539536093.286939.5076448803512118764.st...@warthog.procyon.org.uk/
 # v4
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
Link: 
https://lore.kernel.org/r/161653793873.2770958.12157243390965814502.st...@warthog.procyon.org.uk/
 # v5
---

 include/linux/netfs.h |   57 +
 1 file changed, 57 insertions(+)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index cc1102040488..8479d63406f7 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -26,4 +26,61 @@
 #define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
 #define TestClearPageFsCache(page) TestClearPagePrivate2((page))
 
+/**
+ * set_page_fscache - Set PG_fscache on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_fscache (PG_private_2) flag on a page and take the reference
+ * needed for the VM to handle its lifetime correctly.  This sets the flag and
+ * takes the reference unconditionally, so care must be taken not to set the
+ * flag again if it's already set.
+ */
+static inline void set_page_fscache(struct page *page)
+{
+   set_page_private_2(page);
+}
+
+/**
+ * end_page_fscache - Clear PG_fscache and release any waiters
+ * @page: The page
+ *
+ * Clear the PG_fscache (PG_private_2) bit on a page and wake up any sleepers
+ * waiting for this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+static inline void end_page_fscache(struct page *page)
+{
+   end_page_private_2(page);
+}
+
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_fscache (aka PG_private_2) to be cleared on a page.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+   wait_on_page_private_2(page);
+}
+
+/**
+ * wait_on_page_fscache_killable - Wait for PG_fscache to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_fscache (aka PG_private_2) to be cleared on a page or until a
+ * fatal signal is received by the calling task.
+ *
+ * Return:
+ * - 0 if successful.
+ * - -EINTR if a fatal signal was encountered.
+ */
+static inline int wait_on_page_fscache_killable(struct page *page)
+{
+   return wait_on_page_private_2_killable(page);
+}
+
 #endif /* _LINUX_NETFS_H */




[PATCH v6 08/30] netfs, mm: Move PG_fscache helper funcs to linux/netfs.h

2021-04-08 Thread David Howells
Move the PG_fscache related helper funcs (such as SetPageFsCache()) to
linux/netfs.h rather than linux/fscache.h as the intention is to move to a
model where they're used by the network filesystem and the helper library,
but not by fscache/cachefiles itself.

Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539534516.286939.6265142985563005000.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653792959.2770958.5386546945273988117.st...@warthog.procyon.org.uk/
 # v5
---

 include/linux/fscache.h |   11 +--
 include/linux/netfs.h   |   29 +
 2 files changed, 30 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/netfs.h

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a1c928fe98e7..1f8dc72369ee 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
 #define fscache_available() (1)
@@ -29,16 +30,6 @@
 #endif
 
 
-/*
- * overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
- */
-#define PageFsCache(page)  PagePrivate2((page))
-#define SetPageFsCache(page)   SetPagePrivate2((page))
-#define ClearPageFsCache(page) ClearPagePrivate2((page))
-#define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
-#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
-
 /* pattern used to fill dead space in an index entry */
 #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
new file mode 100644
index ..cc1102040488
--- /dev/null
+++ b/include/linux/netfs.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Network filesystem support services.
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * See:
+ *
+ * Documentation/filesystems/netfs_library.rst
+ *
+ * for a description of the network filesystem interface declared here.
+ */
+
+#ifndef _LINUX_NETFS_H
+#define _LINUX_NETFS_H
+
+#include 
+
+/*
+ * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
+ * a page is currently backed by a local disk cache
+ */
+#define PageFsCache(page)  PagePrivate2((page))
+#define SetPageFsCache(page)   SetPagePrivate2((page))
+#define ClearPageFsCache(page) ClearPagePrivate2((page))
+#define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
+#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
+
+#endif /* _LINUX_NETFS_H */




[PATCH v6 07/30] netfs: Documentation for helper library

2021-04-08 Thread David Howells
Add interface documentation for the netfs helper library.

Signed-off-by: David Howells 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161539533275.286939.6246011228676840978.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653791767.2770958.2012814194145060913.st...@warthog.procyon.org.uk/
 # v5
---

 Documentation/filesystems/index.rst |1 
 Documentation/filesystems/netfs_library.rst |  526 +++
 2 files changed, 527 insertions(+)
 create mode 100644 Documentation/filesystems/netfs_library.rst

diff --git a/Documentation/filesystems/index.rst 
b/Documentation/filesystems/index.rst
index 1f76b1cb3348..d4853cb919d2 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -53,6 +53,7 @@ filesystem implementations.
journalling
fscrypt
fsverity
+   netfs_library
 
 Filesystems
 ===
diff --git a/Documentation/filesystems/netfs_library.rst 
b/Documentation/filesystems/netfs_library.rst
new file mode 100644
index ..57a641847818
--- /dev/null
+++ b/Documentation/filesystems/netfs_library.rst
@@ -0,0 +1,526 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+NETWORK FILESYSTEM HELPER LIBRARY
+=
+
+.. Contents:
+
+ - Overview.
+ - Buffered read helpers.
+   - Read helper functions.
+   - Read helper structures.
+   - Read helper operations.
+   - Read helper procedure.
+   - Read helper cache API.
+
+
+Overview
+
+
+The network filesystem helper library is a set of functions designed to aid a
+network filesystem in implementing VM/VFS operations.  For the moment, that
+just includes turning various VM buffered read operations into requests to read
+from the server.  The helper library, however, can also interpose other
+services, such as local caching or local data encryption.
+
+Note that the library module doesn't link against local caching directly, so
+access must be provided by the netfs.
+
+
+Buffered Read Helpers
+=
+
+The library provides a set of read helpers that handle the ->readpage(),
+->readahead() and much of the ->write_begin() VM operations and translate them
+into a common call framework.
+
+The following services are provided:
+
+ * Handles transparent huge pages (THPs).
+
+ * Insulates the netfs from VM interface changes.
+
+ * Allows the netfs to arbitrarily split reads up into pieces, even ones that
+   don't match page sizes or page alignments and that may cross pages.
+
+ * Allows the netfs to expand a readahead request in both directions to meet
+   its needs.
+
+ * Allows the netfs to partially fulfil a read, which will then be resubmitted.
+
+ * Handles local caching, allowing cached data and server-read data to be
+   interleaved for a single request.
+
+ * Handles clearing of bufferage that aren't on the server.
+
+ * Handle retrying of reads that failed, switching reads from the cache to the
+   server as necessary.
+
+ * In the future, this is a place that other services can be performed, such as
+   local encryption of data to be stored remotely or in the cache.
+
+From the network filesystem, the helpers require a table of operations.  This
+includes a mandatory method to issue a read operation along with a number of
+optional methods.
+
+
+Read Helper Functions
+-
+
+Three read helpers are provided::
+
+ * void netfs_readahead(struct readahead_control *ractl,
+   const struct netfs_read_request_ops *ops,
+   void *netfs_priv);``
+ * int netfs_readpage(struct file *file,
+ struct page *page,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv);
+ * int netfs_write_begin(struct file *file,
+struct address_space *mapping,
+loff_t pos,
+unsigned int len,
+unsigned int flags,
+struct page **_page,
+void **_fsdata,
+const struct netfs_read_request_ops *ops,
+void *netfs_priv);
+
+Each corresponds to a VM operation, with the addition of a couple of parameters
+for the use of the read helpers:
+
+ * ``ops``
+
+   A table of operations through which the helpers can talk to the filesystem.
+
+ * ``netfs_priv``
+
+   Filesystem private data (can be NULL).
+
+Both of these values will be stored into the read request structure.
+
+For ->readahead() and ->readpage(), the network filesystem should just jump
+into the corresponding read helper; whereas for ->write_begin(), it may be a
+little more complicated as the 

[PATCH v6 06/30] netfs: Make a netfs helper module

2021-04-08 Thread David Howells
Make a netfs helper module to manage read request segmentation, caching
support and transparent huge page support on behalf of a network
filesystem.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588496284.3465195.10102643717770106661.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118135638.1232039.1622182202673126285.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161031028.2537118.1213974428943508753.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340391427.1303470.14884950716721956560.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539531569.286939.18317119181653706665.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653790328.2770958.6710423217716151549.st...@warthog.procyon.org.uk/
 # v5
---

 fs/netfs/Kconfig |8 
 1 file changed, 8 insertions(+)
 create mode 100644 fs/netfs/Kconfig

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
new file mode 100644
index ..2ebf90e6ca95
--- /dev/null
+++ b/fs/netfs/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config NETFS_SUPPORT
+   tristate "Support for network filesystem high-level I/O"
+   help
+ This option enables support for network filesystems, including
+ helpers for high-level buffered I/O, abstracting out read
+ segmentation, local caching and transparent huge page support.




[PATCH v6 04/30] fs: Document file_ra_state

2021-04-08 Thread David Howells
From: Matthew Wilcox (Oracle) 

Turn the comments into kernel-doc and improve the wording slightly.

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: David Howells 
Link: https://lore.kernel.org/r/20210407201857.3582797-3-wi...@infradead.org/
---

 include/linux/fs.h |   24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec8f3ddf4a6a..33831a8bda52 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -891,18 +891,22 @@ struct fown_struct {
int signum; /* posix.1b rt signal to be delivered on IO */
 };
 
-/*
- * Track a single file's readahead state
+/**
+ * struct file_ra_state - Track a file's readahead state.
+ * @start: Where the most recent readahead started.
+ * @size: Number of pages read in the most recent readahead.
+ * @async_size: Start next readahead when this many pages are left.
+ * @ra_pages: Maximum size of a readahead request.
+ * @mmap_miss: How many mmap accesses missed in the page cache.
+ * @prev_pos: The last byte in the most recent read request.
  */
 struct file_ra_state {
-   pgoff_t start;  /* where readahead started */
-   unsigned int size;  /* # of readahead pages */
-   unsigned int async_size;/* do asynchronous readahead when
-  there are only # of pages ahead */
-
-   unsigned int ra_pages;  /* Maximum readahead window */
-   unsigned int mmap_miss; /* Cache miss stat for mmap accesses */
-   loff_t prev_pos;/* Cache last read() position */
+   pgoff_t start;
+   unsigned int size;
+   unsigned int async_size;
+   unsigned int ra_pages;
+   unsigned int mmap_miss;
+   loff_t prev_pos;
 };
 
 /*




[PATCH v6 05/30] mm: Implement readahead_control pageset expansion

2021-04-08 Thread David Howells
Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.

The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.

The expansion will be cut short if a page already exists in either of the
areas being expanded into.  Note that any expansion made in such a case is
not rolled back.

This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.

Changes:
v6:
- Fold in a patch from Matthew Wilcox to tell the ondemand readahead
  algorithm about the expansion so that the next readahead starts at the
  right place[2].

v4:
- Moved the declaration of readahead_expand() to a better place[1].

Suggested-by: Matthew Wilcox (Oracle) 
Signed-off-by: David Howells 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: Mike Marshall 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210217161358.gm2858...@casper.infradead.org/ 
[1]
Link: 
https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588479816.3465195.553952688795241765.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539530488.286939.18085961677838089157.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653789422.2770958.2108046612147345000.st...@warthog.procyon.org.uk/
 # v5
Link: https://lore.kernel.org/r/20210407201857.3582797-4-wi...@infradead.org/ 
[2]
---

 include/linux/pagemap.h |2 +
 mm/readahead.c  |   75 +++
 2 files changed, 77 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9a9e558ce4c7..ef511364cc0c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -838,6 +838,8 @@ void page_cache_ra_unbounded(struct readahead_control *,
 void page_cache_sync_ra(struct readahead_control *, unsigned long req_count);
 void page_cache_async_ra(struct readahead_control *, struct page *,
unsigned long req_count);
+void readahead_expand(struct readahead_control *ractl,
+ loff_t new_start, size_t new_len);
 
 /**
  * page_cache_sync_readahead - generic file readahead
diff --git a/mm/readahead.c b/mm/readahead.c
index 2088569a947e..f02dbebf1cef 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -638,3 +638,78 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, 
size_t, count)
 {
return ksys_readahead(fd, offset, count);
 }
+
+/**
+ * readahead_expand - Expand a readahead request
+ * @ractl: The request to be expanded
+ * @new_start: The revised start
+ * @new_len: The revised size of the request
+ *
+ * Attempt to expand a readahead request outwards from the current size to the
+ * specified size by inserting locked pages before and after the current window
+ * to increase the size to the new window.  This may involve the insertion of
+ * THPs, in which case the window may get expanded even beyond what was
+ * requested.
+ *
+ * The algorithm will stop if it encounters a conflicting page already in the
+ * pagecache and leave a smaller expansion than requested.
+ *
+ * The caller must check for this by examining the revised @ractl object for a
+ * different expansion than was requested.
+ */
+void readahead_expand(struct readahead_control *ractl,
+ loff_t new_start, size_t new_len)
+{
+   struct address_space *mapping = ractl->mapping;
+   struct file_ra_state *ra = ractl->ra;
+   pgoff_t new_index, new_nr_pages;
+   gfp_t gfp_mask = readahead_gfp_mask(mapping);
+
+   new_index = new_start / PAGE_SIZE;
+
+   /* Expand the leading edge downwards */
+   while (ractl->_index > new_index) {
+   unsigned long index = ractl->_index - 1;
+   struct page *page = xa_load(>i_pages, index);
+
+   if (page && !xa_is_value(page))
+   return; /* Page apparently present */
+
+   page = __page_cache_alloc(gfp_mask);
+   if (!page)
+   return;
+   if (add_to_page_cache_lru(page, map

[PATCH v6 03/30] mm/filemap: Pass the file_ra_state in the ractl

2021-04-08 Thread David Howells
From: Matthew Wilcox (Oracle) 

For readahead_expand(), we need to modify the file ra_state, so pass it
down by adding it to the ractl.  We have to do this because it's not always
the same as f_ra in the struct file that is already being passed.

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: David Howells 
Link: https://lore.kernel.org/r/20210407201857.3582797-2-wi...@infradead.org/
---

 fs/ext4/verity.c|2 +-
 fs/f2fs/file.c  |2 +-
 fs/f2fs/verity.c|2 +-
 include/linux/pagemap.h |   20 +++-
 mm/filemap.c|4 ++--
 mm/internal.h   |7 +++
 mm/readahead.c  |   22 +++---
 7 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c
index 00e3cbde472e..07438f46b558 100644
--- a/fs/ext4/verity.c
+++ b/fs/ext4/verity.c
@@ -370,7 +370,7 @@ static struct page *ext4_read_merkle_tree_page(struct inode 
*inode,
   pgoff_t index,
   unsigned long num_ra_pages)
 {
-   DEFINE_READAHEAD(ractl, NULL, inode->i_mapping, index);
+   DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
struct page *page;
 
index += ext4_verity_metadata_pos(inode) >> PAGE_SHIFT;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d26ff2ae3f5e..c1e6f669a0c4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4051,7 +4051,7 @@ static int f2fs_ioc_set_compress_option(struct file 
*filp, unsigned long arg)
 
 static int redirty_blocks(struct inode *inode, pgoff_t page_idx, int len)
 {
-   DEFINE_READAHEAD(ractl, NULL, inode->i_mapping, page_idx);
+   DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, page_idx);
struct address_space *mapping = inode->i_mapping;
struct page *page;
pgoff_t redirty_idx = page_idx;
diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c
index 054ec852b5ea..a7beff28a3c5 100644
--- a/fs/f2fs/verity.c
+++ b/fs/f2fs/verity.c
@@ -228,7 +228,7 @@ static struct page *f2fs_read_merkle_tree_page(struct inode 
*inode,
   pgoff_t index,
   unsigned long num_ra_pages)
 {
-   DEFINE_READAHEAD(ractl, NULL, inode->i_mapping, index);
+   DEFINE_READAHEAD(ractl, NULL, NULL, inode->i_mapping, index);
struct page *page;
 
index += f2fs_verity_metadata_pos(inode) >> PAGE_SHIFT;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4a7c916abb5c..9a9e558ce4c7 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -811,20 +811,23 @@ static inline int add_to_page_cache(struct page *page,
  * @file: The file, used primarily by network filesystems for authentication.
  *   May be NULL if invoked internally by the filesystem.
  * @mapping: Readahead this filesystem object.
+ * @ra: File readahead state.  May be NULL.
  */
 struct readahead_control {
struct file *file;
struct address_space *mapping;
+   struct file_ra_state *ra;
 /* private: use the readahead_* accessors instead */
pgoff_t _index;
unsigned int _nr_pages;
unsigned int _batch_count;
 };
 
-#define DEFINE_READAHEAD(rac, f, m, i) \
-   struct readahead_control rac = {\
+#define DEFINE_READAHEAD(ractl, f, r, m, i)\
+   struct readahead_control ractl = {  \
.file = f,  \
.mapping = m,   \
+   .ra = r,\
._index = i,\
}
 
@@ -832,10 +835,9 @@ struct readahead_control {
 
 void page_cache_ra_unbounded(struct readahead_control *,
unsigned long nr_to_read, unsigned long lookahead_count);
-void page_cache_sync_ra(struct readahead_control *, struct file_ra_state *,
+void page_cache_sync_ra(struct readahead_control *, unsigned long req_count);
+void page_cache_async_ra(struct readahead_control *, struct page *,
unsigned long req_count);
-void page_cache_async_ra(struct readahead_control *, struct file_ra_state *,
-   struct page *, unsigned long req_count);
 
 /**
  * page_cache_sync_readahead - generic file readahead
@@ -855,8 +857,8 @@ void page_cache_sync_readahead(struct address_space 
*mapping,
struct file_ra_state *ra, struct file *file, pgoff_t index,
unsigned long req_count)
 {
-   DEFINE_READAHEAD(ractl, file, mapping, index);
-   page_cache_sync_ra(, ra, req_count);
+   DEFINE_READAHEAD(ractl, file, ra, mapping, index);
+   page_cache_sync_ra(, req_count);
 }
 
 /**
@@ -878,8 +880,8 @@ void page_cache_asy

[PATCH v6 02/30] mm: Add set/end/wait functions for PG_private_2

2021-04-08 Thread David Howells
Add three functions to manipulate PG_private_2:

 (*) set_page_private_2() - Set the flag and take an appropriate reference
 on the flagged page.

 (*) end_page_private_2() - Clear the flag, drop the reference and wake up
 any waiters, somewhat analogously with end_page_writeback().

 (*) wait_on_page_private_2() - Wait for the flag to be cleared.

Wrappers will need to be placed in the netfs lib header in the patch that
adds that.

[This implements a suggestion by Linus[1] to not mix the terminology of
 PG_private_2 and PG_fscache in the mm core function]

Changes:
v5:
- Add set and end functions, calling the end function end rather than
  unlock[3].
- Keep a ref on the page when PG_private_2 is set[4][5].

v4:
- Remove extern from the declaration[2].

Suggested-by: Linus Torvalds 
Signed-off-by: David Howells 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/1330473.1612974...@warthog.procyon.org.uk/ # v1
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [1]
Link: https://lore.kernel.org/r/20210216102659.ga27...@lst.de/ [2]
Link: 
https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539528910.286939.1252328699383291173.st...@warthog.procyon.org.uk
 # v4
Link: https://lore.kernel.org/r/20210321105309.gg3...@casper.infradead.org [3]
Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [4]
Link: 
https://lore.kernel.org/r/CAHk-=wjsgsrj7xwhsmq6daqiz53xa39pog+xa_wetgwbbu4...@mail.gmail.com/
 [5]
Link: 
https://lore.kernel.org/r/161653788200.2770958.9517755716374927208.st...@warthog.procyon.org.uk/
 # v5
---

 include/linux/pagemap.h |   19 +++
 mm/filemap.c|   59 +++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8c9947fd62f3..4a7c916abb5c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -688,6 +688,25 @@ void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
 
+/**
+ * set_page_private_2 - Set PG_private_2 on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_private_2 flag on a page and take the reference needed for the VM
+ * to handle its lifetime correctly.  This sets the flag and takes the
+ * reference unconditionally, so care must be taken not to set the flag again
+ * if it's already set.
+ */
+static inline void set_page_private_2(struct page *page)
+{
+   get_page(page);
+   SetPagePrivate2(page);
+}
+
+void end_page_private_2(struct page *page);
+void wait_on_page_private_2(struct page *page);
+int wait_on_page_private_2_killable(struct page *page);
+
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
diff --git a/mm/filemap.c b/mm/filemap.c
index 43700480d897..788b71e8a72d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1432,6 +1432,65 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * end_page_private_2 - Clear PG_private_2 and release any waiters
+ * @page: The page
+ *
+ * Clear the PG_private_2 bit on a page and wake up any sleepers waiting for
+ * this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+void end_page_private_2(struct page *page)
+{
+   page = compound_head(page);
+   VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+   clear_bit_unlock(PG_private_2, >flags);
+   wake_up_page_bit(page, PG_private_2);
+   put_page(page);
+}
+EXPORT_SYMBOL(end_page_private_2);
+
+/**
+ * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page.
+ */
+void wait_on_page_private_2(struct page *page)
+{
+   while (PagePrivate2(page))
+   wait_on_page_bit(page, PG_private_2);
+}
+EXPORT_SYMBOL(wait_on_page_private_2);
+
+/**
+ * wait_on_page_private_2_killable - Wait for PG_private_2 to be cleared on a 
page
+ * @page: The page to wait on
+ *
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page or until a
+ * fatal signal is received by the calling task.
+ *
+ * Return:
+ * - 0 if successful.
+ * - -EINTR if a fatal signal was encountered.
+ */
+int wait_on_page_private_2_killable(struct page *page)
+{
+   int ret = 0;
+
+   while (PagePrivate2(page)) {
+   ret = wait_on_page_bit_killa

[PATCH v6 01/30] iov_iter: Add ITER_XARRAY

2021-04-08 Thread David Howells
Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes.  The iterator supports transparent huge pages.

The iterate_xarray() macro calls the helper function with rcu_access()
helped.  I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.

Signed-off-by: David Howells 
cc: Alexander Viro 
cc: Matthew Wilcox (Oracle) 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/3577430.1579705...@warthog.procyon.org.uk/ # rfc
Link: 
https://lore.kernel.org/r/158861205740.340223.16592990225607814022.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539527815.286939.14607323792547049341.st...@warthog.procyon.org.uk/
 # v4
Link: 
https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.st...@warthog.procyon.org.uk/
 # v5
---

 include/linux/uio.h |   11 ++
 lib/iov_iter.c  |  313 +++
 2 files changed, 301 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 27ff8eb786dc..5f5ffc45d4aa 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -10,6 +10,7 @@
 #include 
 
 struct page;
+struct address_space;
 struct pipe_inode_info;
 
 struct kvec {
@@ -24,6 +25,7 @@ enum iter_type {
ITER_BVEC = 16,
ITER_PIPE = 32,
ITER_DISCARD = 64,
+   ITER_XARRAY = 128,
 };
 
 struct iov_iter {
@@ -39,6 +41,7 @@ struct iov_iter {
const struct iovec *iov;
const struct kvec *kvec;
const struct bio_vec *bvec;
+   struct xarray *xarray;
struct pipe_inode_info *pipe;
};
union {
@@ -47,6 +50,7 @@ struct iov_iter {
unsigned int head;
unsigned int start_head;
};
+   loff_t xarray_start;
};
 };
 
@@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter 
*i)
return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_is_xarray(const struct iov_iter *i)
+{
+   return iov_iter_type(i) == ITER_XARRAY;
+}
+
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 {
return i->type & (READ | WRITE);
@@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int 
direction, const struct bio_
 void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct 
pipe_inode_info *pipe,
size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t 
count);
+void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray 
*xarray,
+loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f66c62aa7154..f808c625c11e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -76,7 +76,44 @@
}   \
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K) {  \
+#define iterate_xarray(i, n, __v, skip, STEP) {\
+   struct page *h

[PATCH v6 00/30] Network fs helper library & fscache kiocb API

2021-04-08 Thread David Howells
ropped NFS support and added Ceph support.

ver #2:
  Fixed some bugs and added NFS support.

Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [1]
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [2]
Link: https://lore.kernel.org/r/20210216102614.ga27...@lst.de/ [3]
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [4]
Link: https://lore.kernel.org/r/20210217161358.gm2858...@casper.infradead.org/ 
[5]
Link: https://lore.kernel.org/r/20210321014202.gf3...@casper.infradead.org/ [6]
Link: https://lore.kernel.org/r/20210321105309.gg3...@casper.infradead.org/ [7]
Link: 
https://lore.kernel.org/r/161781041339.463527.18139104281901492882.st...@warthog.procyon.org.uk/
 [8]
Link: https://lore.kernel.org/r/20210407201857.3582797-1-wi...@infradead.org/ 
[9]
Link: https://lore.kernel.org/r/1234933.1617886...@warthog.procyon.org.uk/ [10]

References
==

These patches have been published for review before, firstly as part of a
larger set:

Link: 
https://lore.kernel.org/r/158861203563.340223.7585359869938129395.st...@warthog.procyon.org.uk/

Link: 
https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.st...@warthog.procyon.org.uk/

Link: 
https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.st...@warthog.procyon.org.uk/

Then as a cut-down set:

Link: 
https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.st...@warthog.procyon.org.uk/
 # v1

Link: 
https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.st...@warthog.procyon.org.uk/
 # v2

Link: 
https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.st...@warthog.procyon.org.uk/
 # v3

Link: 
https://lore.kernel.org/r/161539526152.286939.8589700175877370401.st...@warthog.procyon.org.uk/
 # v4

Link: 
https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.st...@warthog.procyon.org.uk/
 # v5

Proposals/information about the design has been published here:

Link: https://lore.kernel.org/r/24942.1573667...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/2758811.1610621...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/1441311.1598547...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160655.1611012...@warthog.procyon.org.uk/

And requests for information:

Link: https://lore.kernel.org/r/3326.1579019...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/4467.1579020...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/3577430.1579705...@warthog.procyon.org.uk/

I've posted partial patches to try and help 9p and cifs along:

Link: https://lore.kernel.org/r/1514086.1605697...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/1794123.1605713...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/241017.1612263...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/270998.1612265...@warthog.procyon.org.uk/

David
---
David Howells (28):
  iov_iter: Add ITER_XARRAY
  mm: Add set/end/wait functions for PG_private_2
  mm: Implement readahead_control pageset expansion
  netfs: Make a netfs helper module
  netfs: Documentation for helper library
  netfs, mm: Move PG_fscache helper funcs to linux/netfs.h
  netfs, mm: Add set/end/wait_on_page_fscache() aliases
  netfs: Provide readahead and readpage netfs helpers
  netfs: Add tracepoints
  netfs: Gather stats
  netfs: Add write_begin helper
  netfs: Define an interface to talk to a cache
  netfs: Add a tracepoint to log failures that would be otherwise unseen
  fscache, cachefiles: Add alternate API to use kiocb for read/write to 
cache
  afs: Disable use of the fscache I/O routines
  afs: Pass page into dirty region helpers to provide THP size
  afs: Print the operation debug_id when logging an unexpected data version
  afs: Move key to afs_read struct
  afs: Don't truncate iter during data fetch
  afs: Log remote unmarshalling errors
  afs: Set up the iov_iter before calling afs_extract_data()
  afs: Use ITER_XARRAY for writing
  afs: Wait on PG_fscache before modifying/releasing a page
  afs: Extract writeback extension into its own function
  afs: Prepare for use of THPs
  afs: Use the fs operation ops to handle FetchData completion
  afs: Use new netfs lib read helper API
  afs: Use the netfs_write_begin() helper

Matthew Wilcox (Oracle) (2):
  mm/filemap: Pass the file_ra_state in the ractl
  fs: Document file_ra_state


 Documentation/filesystems/index.rst |1 +
 Documentation/filesystems/netfs_library.rst |  526 
 fs/Kconfig  |1 +
 fs/Makefile |1 +
 fs/a

[PATCH 5/5] netfs: Add a tracepoint to log failures that would be otherwise unseen

2021-04-07 Thread David Howells
Add a tracepoint to log internal failures (such as cache errors) that we
don't otherwise want to pass back to the netfs.

Signed-off-by: David Howells 
---

 fs/netfs/read_helper.c   |   14 +-
 include/trace/events/netfs.h |   58 ++
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index ce2f31d20250..762a15350242 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -271,6 +271,8 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t 
transferred_or_error,
 
if (IS_ERR_VALUE(transferred_or_error)) {
netfs_stat(_n_rh_write_failed);
+   trace_netfs_failure(rreq, subreq, transferred_or_error,
+   netfs_fail_copy_to_cache);
} else {
netfs_stat(_n_rh_write_done);
}
@@ -323,6 +325,7 @@ static void netfs_rreq_do_write_to_cache(struct 
netfs_read_request *rreq)
ret = cres->ops->prepare_write(cres, >start, 
>len,
   rreq->i_size);
if (ret < 0) {
+   trace_netfs_failure(rreq, subreq, ret, 
netfs_fail_prepare_write);
trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
continue;
}
@@ -627,6 +630,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest 
*subreq,
 
if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error;
+   trace_netfs_failure(rreq, subreq, transferred_or_error,
+   netfs_fail_read);
goto failed;
}
 
@@ -996,8 +1001,10 @@ int netfs_readpage(struct file *file,
} while (test_bit(NETFS_RREQ_IN_PROGRESS, >flags));
 
ret = rreq->error;
-   if (ret == 0 && rreq->submitted < rreq->len)
+   if (ret == 0 && rreq->submitted < rreq->len) {
+   trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_readpage);
ret = -EIO;
+   }
 out:
netfs_put_read_request(rreq, false);
return ret;
@@ -1074,6 +1081,7 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
/* Allow the netfs (eg. ceph) to flush conflicts. */
ret = ops->check_write_begin(file, pos, len, page, _fsdata);
if (ret < 0) {
+   trace_netfs_failure(NULL, NULL, ret, 
netfs_fail_check_write_begin);
if (ret == -EAGAIN)
goto retry;
goto error;
@@ -1150,8 +1158,10 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
}
 
ret = rreq->error;
-   if (ret == 0 && rreq->submitted < rreq->len)
+   if (ret == 0 && rreq->submitted < rreq->len) {
+   trace_netfs_failure(rreq, NULL, ret, 
netfs_fail_short_write_begin);
ret = -EIO;
+   }
netfs_put_read_request(rreq, false);
if (ret < 0)
goto error;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index e3ebeabd3852..de1c64635e42 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -47,6 +47,15 @@ enum netfs_sreq_trace {
netfs_sreq_trace_write_term,
 };
 
+enum netfs_failure {
+   netfs_fail_check_write_begin,
+   netfs_fail_copy_to_cache,
+   netfs_fail_read,
+   netfs_fail_short_readpage,
+   netfs_fail_short_write_begin,
+   netfs_fail_prepare_write,
+};
+
 #endif
 
 #define netfs_read_traces  \
@@ -81,6 +90,14 @@ enum netfs_sreq_trace {
EM(netfs_sreq_trace_write_skip, "SKIP ")\
E_(netfs_sreq_trace_write_term, "WTERM")
 
+#define netfs_failures \
+   EM(netfs_fail_check_write_begin,"check-write-begin")\
+   EM(netfs_fail_copy_to_cache,"copy-to-cache")\
+   EM(netfs_fail_read, "read") \
+   EM(netfs_fail_short_readpage,   "short-readpage")   \
+   EM(netfs_fail_short_write_begin,"short-write-begin")\
+   E_(netfs_fail_prepare_write,"prep-write")
+
 
 /*
  * Export enum symbols via userspace.
@@ -94,6 +111,7 @@ netfs_read_traces;
 netfs_rreq_traces;
 netfs_sreq_sources;
 netfs_sreq_traces;
+netfs_failures;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -197,6 +215,46 @@ TRACE_EVENT(netfs_sreq,
  __entry->error)
);
 
+TRACE_EVENT(netfs_failure,
+   TP_PROTO(str

[PATCH 4/5] netfs: Fix copy-to-cache amalgamation

2021-04-07 Thread David Howells
Fix the amalgamation of subrequests when copying to the cache.  We
shouldn't be rounding up the size to PAGE_SIZE as we go along as that ends
up with the composite subrequest length being too long - and this leads to
EIO from the cache write because the source iterator doesn't contain enough
data.

Instead, we only need to deal with contiguous subreqs and then ask the
cache to round off as it needs - which also means we don't have to make any
assumptions about the cache granularity.

Signed-off-by: David Howells 
---

 fs/cachefiles/io.c   |   17 +
 fs/netfs/read_helper.c   |   19 +--
 include/linux/netfs.h|6 ++
 include/trace/events/netfs.h |2 ++
 4 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 620959d1e95b..b13fb45fc3f3 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -330,6 +330,22 @@ static enum netfs_read_source 
cachefiles_prepare_read(struct netfs_read_subreque
return NETFS_DOWNLOAD_FROM_SERVER;
 }
 
+/*
+ * Prepare for a write to occur.
+ */
+static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
+   loff_t *_start, size_t *_len, loff_t i_size)
+{
+   loff_t start = *_start;
+   size_t len = *_len, down;
+
+   /* Round to DIO size */
+   down = start - round_down(start, PAGE_SIZE);
+   *_start = start - down;
+   *_len = round_up(down + len, PAGE_SIZE);
+   return 0;
+}
+
 /*
  * Clean up an operation.
  */
@@ -355,6 +371,7 @@ static const struct netfs_cache_ops 
cachefiles_netfs_cache_ops = {
.read   = cachefiles_read,
.write  = cachefiles_write,
.prepare_read   = cachefiles_prepare_read,
+   .prepare_write  = cachefiles_prepare_write,
 };
 
 /*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index ad0dc01319ce..ce2f31d20250 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -293,7 +293,7 @@ static void netfs_rreq_do_write_to_cache(struct 
netfs_read_request *rreq)
struct netfs_cache_resources *cres = >cache_resources;
struct netfs_read_subrequest *subreq, *next, *p;
struct iov_iter iter;
-   loff_t pos;
+   int ret;
 
trace_netfs_rreq(rreq, netfs_rreq_trace_write);
 
@@ -311,23 +311,22 @@ static void netfs_rreq_do_write_to_cache(struct 
netfs_read_request *rreq)
 
list_for_each_entry(subreq, >subrequests, rreq_link) {
/* Amalgamate adjacent writes */
-   pos = round_down(subreq->start, PAGE_SIZE);
-   if (pos != subreq->start) {
-   subreq->len += subreq->start - pos;
-   subreq->start = pos;
-   }
-   subreq->len = round_up(subreq->len, PAGE_SIZE);
-
while (!list_is_last(>rreq_link, >subrequests)) {
next = list_next_entry(subreq, rreq_link);
-   if (next->start > subreq->start + subreq->len)
+   if (next->start != subreq->start + subreq->len)
break;
subreq->len += next->len;
-   subreq->len = round_up(subreq->len, PAGE_SIZE);
list_del_init(>rreq_link);
netfs_put_subrequest(next, false);
}
 
+   ret = cres->ops->prepare_write(cres, >start, 
>len,
+  rreq->i_size);
+   if (ret < 0) {
+   trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
+   continue;
+   }
+
iov_iter_xarray(, WRITE, >mapping->i_pages,
subreq->start, subreq->len);
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 2299e7662ff0..9062adfa2fb9 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -206,6 +206,12 @@ struct netfs_cache_ops {
 */
enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest 
*subreq,
   loff_t i_size);
+
+   /* Prepare a write operation, working out what part of the write we can
+* actually do.
+*/
+   int (*prepare_write)(struct netfs_cache_resources *cres,
+loff_t *_start, size_t *_len, loff_t i_size);
 };
 
 struct readahead_control;
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index a2bf6cd84bd4..e3ebeabd3852 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -43,6 +43,7 @@ enum netfs_sreq_trace {
netfs_sreq_trace_submit,
netfs_sreq_trace_terminated,
netfs_sreq_trace_write,
+   netfs_sreq_trace_write_sk

[PATCH 3/5] netfs: Don't record the copy termination error

2021-04-07 Thread David Howells
Don't record the copy termination error in the subrequest.  We shouldn't
return it through netfs_readpage() or netfs_write_begin() as we don't let
the netfs see cache errors.

Signed-off-by: David Howells 
---

 fs/netfs/read_helper.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 8040b76da1b6..ad0dc01319ce 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -270,10 +270,8 @@ static void netfs_rreq_copy_terminated(void *priv, ssize_t 
transferred_or_error,
struct netfs_read_request *rreq = subreq->rreq;
 
if (IS_ERR_VALUE(transferred_or_error)) {
-   subreq->error = transferred_or_error;
netfs_stat(_n_rh_write_failed);
} else {
-   subreq->error = 0;
netfs_stat(_n_rh_write_done);
}
 




[PATCH 2/5] netfs: Call trace_netfs_read() after ->begin_cache_operation()

2021-04-07 Thread David Howells
Reorder the netfs library API functions slightly to log the netfs_read
tracepoint after calling out to the network filesystem to begin a caching
operation.  This sets rreq->cookie_debug_id so that it can be logged in
tracepoints.

Signed-off-by: David Howells 
---

 fs/netfs/read_helper.c |   23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 0066db21aa11..8040b76da1b6 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -890,15 +890,16 @@ void netfs_readahead(struct readahead_control *ractl,
rreq->start = readahead_pos(ractl);
rreq->len   = readahead_length(ractl);
 
-   netfs_stat(_n_rh_readahead);
-   trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
-netfs_read_trace_readahead);
-
if (ops->begin_cache_operation) {
ret = ops->begin_cache_operation(rreq);
if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
goto cleanup_free;
}
+
+   netfs_stat(_n_rh_readahead);
+   trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
+netfs_read_trace_readahead);
+
netfs_rreq_expand(rreq, ractl);
 
atomic_set(>nr_rd_ops, 1);
@@ -968,9 +969,6 @@ int netfs_readpage(struct file *file,
rreq->start = page_index(page) * PAGE_SIZE;
rreq->len   = thp_size(page);
 
-   netfs_stat(_n_rh_readpage);
-   trace_netfs_read(rreq, rreq->start, rreq->len, 
netfs_read_trace_readpage);
-
if (ops->begin_cache_operation) {
ret = ops->begin_cache_operation(rreq);
if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) {
@@ -979,6 +977,9 @@ int netfs_readpage(struct file *file,
}
}
 
+   netfs_stat(_n_rh_readpage);
+   trace_netfs_read(rreq, rreq->start, rreq->len, 
netfs_read_trace_readpage);
+
netfs_get_read_request(rreq);
 
atomic_set(>nr_rd_ops, 1);
@@ -,15 +1112,15 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
__set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, >flags);
netfs_priv = NULL;
 
-   netfs_stat(_n_rh_write_begin);
-   trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin);
-
if (ops->begin_cache_operation) {
ret = ops->begin_cache_operation(rreq);
if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
-   goto error;
+   goto error_put;
}
 
+   netfs_stat(_n_rh_write_begin);
+   trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin);
+
/* Expand the request to meet caching requirements and download
 * preferences.
 */




[PATCH 1/5] netfs: Fix a missing rreq put in netfs_write_begin()

2021-04-07 Thread David Howells
netfs_write_begin() needs to drop a ref on the read request if the network
filesystem gives an error when called to begin the caching op.

Signed-off-by: David Howells 
---

 fs/netfs/read_helper.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 3498bde035eb..0066db21aa11 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -1169,6 +1169,8 @@ int netfs_write_begin(struct file *file, struct 
address_space *mapping,
_leave(" = 0");
return 0;
 
+error_put:
+   netfs_put_read_request(rreq, false);
 error:
unlock_page(page);
put_page(page);




[PATCH 0/5] netfs: Fixes for the netfs lib

2021-04-07 Thread David Howells


Hi Jeff,

Here's a bunch of fixes plus a tracepoint for the netfs library.  I'm going
to roll them into other patches, but I'm posting them here for separate
review.

David
---
David Howells (5):
  netfs: Fix a missing rreq put in netfs_write_begin()
  netfs: Call trace_netfs_read() after ->begin_cache_operation()
  netfs: Don't record the copy termination error
  netfs: Fix copy-to-cache amalgamation
  netfs: Add a tracepoint to log failures that would be otherwise unseen


 fs/cachefiles/io.c   | 17 ++
 fs/netfs/read_helper.c   | 58 +++---
 include/linux/netfs.h|  6 
 include/trace/events/netfs.h | 60 
 4 files changed, 116 insertions(+), 25 deletions(-)




Re: [PATCH] net/rxrpc: Fix a use after free in rxrpc_input_packet

2021-04-01 Thread David Howells
Lv Yunlong  wrote:

> In the case RXRPC_PACKET_TYPE_DATA of rxrpc_input_packet, if
> skb_unshare(skb,..) failed, it will free the skb and return NULL.
> But if skb_unshare() return NULL, the freed skb will be used by
> rxrpc_eaten_skb(skb,..).

That's not precisely the case:

void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
int n = atomic_inc_return(_n_rx_skbs);
trace_rxrpc_skb(skb, op, 0, n, 0, here);
}

The only thing that happens to skb here is that it's passed to
trace_rxrpc_skb(), but that doesn't dereference it either.  The *address* is
used for display purposes, but that's all.

> I see that rxrpc_eaten_skb() is used to drop a ref of skb.

It isn't.

> As the skb is already freed in skb_unshare() on error, my patch removes the
> rxrpc_eaten_skb() to avoid the uaf.

But you remove the accounting, which might lead to an assertion failure in
af_rxrpc_exit().

That said, rxrpc_eaten_skb() should probably decrement rxrpc_n_rx_skbs, not
increment it...

David



Re: [PATCH v1 0/3] KEYS: trusted: Introduce support for NXP CAAM-based trusted keys

2021-04-01 Thread David Howells
Richard Weinberger  wrote:

> On Wed, Mar 17, 2021 at 3:08 PM Ahmad Fatoum  wrote:
> > keyctl add trusted $KEYNAME "load $(cat ~/kmk.blob)" @s
>
> Is there a reason why we can't pass the desired backend name in the
> trusted key parameters?
> e.g.
> keyctl add trusted $KEYNAME "backendtype caam load $(cat ~/kmk.blob)" @s

I wonder...  Does it make sense to add a new variant of the add_key() and
keyctl_instantiate() syscalls that takes an additional parameter string,
separate from the payload blob?

   key_serial_t add_key2(const char *type, const char *description,
 const char *params,
 const void *payload, size_t plen,
 key_serial_t keyring);

which could then by used, say:

keyctl add --payload=~/kmk.blob trusted $KEYNAME "backendtype caam 
load" @s

This would then appear in

struct key_preparsed_payload {
const char  *orig_description;
char*description;
char*params;<---
union key_payload payload;
const void  *data;
size_t  datalen;
size_t  quotalen;
time64_texpiry;
};

params would then be NULL for add_key().

If add_key2() is not available, the --payload param gets concatenated to the
parameters string.

Might be too complicated, I guess.  Though it might make sense just to do the
concatenation inside the keyctl program.

David



[PATCH] cachefiles: do not yet allow on idmapped mounts

2021-03-24 Thread David Howells
From: Christian Brauner 

Based on discussions (e.g. in [1]) my understanding of cachefiles and
the cachefiles userspace daemon is that it creates a cache on a local
filesystem (e.g. ext4, xfs etc.) for a network filesystem. The way this
is done is by writing "bind" to /dev/cachefiles and pointing it to the
directory to use as the cache.
Currently this directory can technically also be an idmapped mount but
cachefiles aren't yet fully aware of such mounts and thus don't take the
idmapping into account when creating cache entries. This could leave
users confused as the ownership of the files wouldn't match to what they
expressed in the idmapping. Block cache files on idmapped mounts until
the fscache rework is done and we have ported it to support idmapped
mounts.

Signed-off-by: Christian Brauner 
Signed-off-by: David Howells 
Cc: linux-cach...@redhat.com
Link: https://lore.kernel.org/lkml/20210303161528.n3jzg66ou2wa43qb@wittgenstein 
[1]
Link: 
https://lore.kernel.org/r/20210316112257.2974212-1-christian.brau...@ubuntu.com/
 # v1
Link: 
https://listman.redhat.com/archives/linux-cachefs/2021-March/msg00044.html # v2
Link: 
https://lore.kernel.org/r/20210319114146.410329-1-christian.brau...@ubuntu.com/ 
# v3
---

 fs/cachefiles/bind.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/cachefiles/bind.c b/fs/cachefiles/bind.c
index dfb14dbddf51..38bb7764b454 100644
--- a/fs/cachefiles/bind.c
+++ b/fs/cachefiles/bind.c
@@ -118,6 +118,12 @@ static int cachefiles_daemon_add_cache(struct 
cachefiles_cache *cache)
cache->mnt = path.mnt;
root = path.dentry;
 
+   ret = -EINVAL;
+   if (mnt_user_ns(path.mnt) != _user_ns) {
+   pr_warn("File cache on idmapped mounts not supported");
+   goto error_unsupported;
+   }
+
/* check parameters */
ret = -EOPNOTSUPP;
if (d_is_negative(root) ||




[GIT PULL] cachefiles, afs: mm wait fixes

2021-03-24 Thread David Howells
Hi Linus,

Could you pull these patches from Matthew Wilcox to fix page
waiting-related issues in cachefiles and afs as extracted from his folio
series[1]:

 (1) In cachefiles, remove the use of the wait_bit_key struct to access
 something that's actually in wait_page_key format.  The proper struct
 is now available in the header, so that should be used instead.

 (2) Add a proper wait function for waiting killably on the page writeback
 flag.  This includes a recent bugfix[2] that's not in the afs code.

 (3) In afs, use the function added in (2) rather than using
 wait_on_page_bit_killable() which doesn't provide the aforementioned
 bugfix.

Notes:

 - I've included these together since they are an excerpt from a patch
   series of Willy's, but I can send the first separately from the other
   two if you'd prefer since they touch different modules.

 - The cachefiles patch could be deferred to the next merge window as
   whichever compiler is used probably *should* generate the same code for
   both structs, even with struct randomisation turned on.

 - AuriStor (auristor.com) have added certain of my branches to their
   automated AFS testing, hence the Tested-by kafs-test...@auristor.com tag
   on the patches in this set.  Is this the best way to represent this?

David

Link: https://lore.kernel.org/r/20210320054104.1300774-1-wi...@infradead.org[1]
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7
 [2]
Link: https://lore.kernel.org/r/20210323120829.gc1719...@casper.infradead.org/ 
# v1

---
The following changes since commit 0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b:

  Linux 5.12-rc4 (2021-03-21 14:56:43 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/afs-cachefiles-fixes-20210323

for you to fetch changes up to 75b69799610c2b909a18e709c402923ea61aedc0:

  afs: Use wait_on_page_writeback_killable (2021-03-23 20:54:37 +)


cachefiles, afs: mm wait fixes


Matthew Wilcox (Oracle) (3):
  fs/cachefiles: Remove wait_bit_key layout dependency
  mm/writeback: Add wait_on_page_writeback_killable
  afs: Use wait_on_page_writeback_killable

 fs/afs/write.c  |  3 +--
 fs/cachefiles/rdwr.c|  7 +++
 include/linux/pagemap.h |  2 +-
 mm/page-writeback.c | 16 
 4 files changed, 21 insertions(+), 7 deletions(-)



[PATCH v5 28/28] afs: Use the fscache_write_begin() helper

2021-03-23 Thread David Howells
Make AFS use the new fscache_write_begin() helper to do the pre-reading
required before the write.  If successful, the helper returns with the
required page filled in and locked.  It may read more than just one page,
expanding the read to meet cache granularity requirements as necessary.

Note: A more advanced version of this could be made that does
generic_perform_write() for a whole cache granule.  This would make it
easier to avoid doing the download/read for the data to be overwritten.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588546422.3465195.1546354372589291098.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161539563244.286939.16537296241609909980.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/file.c |   19 +
 fs/afs/internal.h |1 
 fs/afs/write.c|  108 ++---
 3 files changed, 31 insertions(+), 97 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 99bb4649a306..cf2b664a68a5 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -334,6 +334,13 @@ static void afs_init_rreq(struct netfs_read_request *rreq, 
struct file *file)
rreq->netfs_priv = key_get(afs_file_key(file));
 }
 
+static bool afs_is_cache_enabled(struct inode *inode)
+{
+   struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode));
+
+   return fscache_cookie_enabled(cookie) && 
!hlist_empty(>backing_objects);
+}
+
 static int afs_begin_cache_operation(struct netfs_read_request *rreq)
 {
struct afs_vnode *vnode = AFS_FS_I(rreq->inode);
@@ -341,14 +348,24 @@ static int afs_begin_cache_operation(struct 
netfs_read_request *rreq)
return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode));
 }
 
+static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
+struct page *page, void **_fsdata)
+{
+   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+
+   return test_bit(AFS_VNODE_DELETED, >flags) ? -ESTALE : 0;
+}
+
 static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
 {
key_put(netfs_priv);
 }
 
-static const struct netfs_read_request_ops afs_req_ops = {
+const struct netfs_read_request_ops afs_req_ops = {
.init_rreq  = afs_init_rreq,
+   .is_cache_enabled   = afs_is_cache_enabled,
.begin_cache_operation  = afs_begin_cache_operation,
+   .check_write_begin  = afs_check_write_begin,
.issue_op   = afs_req_issue_op,
.cleanup= afs_priv_cleanup,
 };
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 96b33d2e3116..9f4040724318 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1045,6 +1045,7 @@ extern void afs_dynroot_depopulate(struct super_block *);
 extern const struct address_space_operations afs_fs_aops;
 extern const struct inode_operations afs_file_inode_operations;
 extern const struct file_operations afs_file_operations;
+extern const struct netfs_read_request_ops afs_req_ops;
 
 extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
 extern void afs_put_wb_key(struct afs_wb_key *);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index f55b48e2db29..f0f0496f1a7b 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "internal.h"
 
 /*
@@ -22,68 +24,6 @@ int afs_set_page_dirty(struct page *page)
return __set_page_dirty_nobuffers(page);
 }
 
-/*
- * Handle completion of a read operation to fill a page.
- */
-static void afs_fill_hole(struct afs_read *req)
-{
-   if (iov_iter_count(req->iter) > 0)
-   /* The read was short - clear the excess buffer. */
-   iov_iter_zero(iov_iter_count(req->iter), req->iter);
-}
-
-/*
- * partly or wholly fill a page that's under preparation for writing
- */
-static int afs_fill_page(struct file *file,
-loff_t pos, unsigned int len, struct page *page)
-{
-   struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
-   struct afs_read *req;
-   size_t p;
-   void *data;
-   int ret;
-
-   _enter(",,%llu", (unsigned long long)pos);
-
-   if (pos >= vnode->vfs_inode.i_size) {
-   p = pos & ~PAGE_MASK;
-   ASSERTCMP(p + len, <=, PAGE_SIZE);
-   data = kmap(page);
-   memset(data + p, 0, len);
-   kunmap(page);
-   return 0;
-   }
-
-   req = kzalloc(sizeof(struct afs_read), GFP_KERNEL);
-   if (!req)
-   return -ENOMEM;
-
-   refcount_set(>usage, 1);
-   req->vnode  = vnode;
-   req->done   = afs_fill_hole;
-   req->key= key_get(afs_file_key(file));
-   req->pos= po

[PATCH v5 27/28] afs: Use new fscache read helper API

2021-03-23 Thread David Howells
Make AFS use the new fscache read helpers to implement the VM read
operations:

 - afs_readpage() now hands off responsibility to fscache_readpage().

 - afs_readpages() is gone and replaced with afs_readahead().

 - afs_readahead() just hands off responsibility to fscache_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk [1]
Link: 
https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539561926.286939.5729036262354802339.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/Kconfig|1 
 fs/afs/file.c |  327 +
 fs/afs/fsclient.c |1 
 fs/afs/internal.h |3 
 fs/afs/write.c|7 +
 5 files changed, 88 insertions(+), 251 deletions(-)

diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index 1ad211d72b3b..fc8ba9142f2f 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -4,6 +4,7 @@ config AFS_FS
depends on INET
select AF_RXRPC
select DNS_RESOLVER
+   select NETFS_SUPPORT
help
  If you say Y here, you will get an experimental Andrew File System
  driver. It currently only supports unsecured read-only AFS access.
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 231e9fd7882b..99bb4649a306 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 static int afs_file_mmap(struct file *file, struct vm_area_struct *vma);
@@ -22,8 +23,7 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
   unsigned int length);
 static int afs_releasepage(struct page *page, gfp_t gfp_flags);
 
-static int afs_readpages(struct file *filp, struct address_space *mapping,
-struct list_head *pages, unsigned nr_pages);
+static void afs_readahead(struct readahead_control *ractl);
 
 const struct file_operations afs_file_operations = {
.open   = afs_open,
@@ -48,7 +48,7 @@ const struct inode_operations afs_file_inode_operations = {
 
 const struct address_space_operations afs_fs_aops = {
.readpage   = afs_readpage,
-   .readpages  = afs_readpages,
+   .readahead  = afs_readahead,
.set_page_dirty = afs_set_page_dirty,
.launder_page   = afs_launder_page,
.releasepage= afs_releasepage,
@@ -185,61 +185,17 @@ int afs_release(struct inode *inode, struct file *file)
 }
 
 /*
- * Handle completion of a read operation.
+ * Allocate a new read record.
  */
-static void afs_file_read_done(struct afs_read *req)
+struct afs_read *afs_alloc_read(gfp_t gfp)
 {
-   struct afs_vnode *vnode = req->vnode;
-   struct page *page;
-   pgoff_t index = req->pos >> PAGE_SHIFT;
-   pgoff_t last = index + req->nr_pages - 1;
-
-   XA_STATE(xas, >vfs_inode.i_mapping->i_pages, index);
-
-   if (iov_iter_count(req->iter) > 0) {
-   /* The read was short - clear the excess buffer. */
-   _debug("afterclear %zx %zx %llx/%llx",
-  req->iter->iov_offset,
-  iov_iter_count(req->iter),
-  req->actual_len, req->len);
-   iov_iter_zero(iov_iter_count(req->iter), req->iter);
-   }
-
-   rcu_read_lock();
-   xas_for_each(, page, last) {
-   page_endio(page, false, 0);
-   put_page(page);
-   }
-   rcu_read_unlock();
-
-   task_io_account_read(req->len);
-   req->cleanup = NULL;
-}
-
-/*
- * Dispose of our locks and refs on the pages if the read failed.
- */
-static void afs_file_read_cleanup(struct afs_read *req)
-{
-   struct page *page;
-   pgoff_t index = req->pos >> PAGE_SHIFT;
-   pgoff_t last = index + req->nr_pages - 1;
-
-   if (req->iter) {
-   XA_STATE(xas, >

[PATCH v5 26/28] afs: Use the fs operation ops to handle FetchData completion

2021-03-23 Thread David Howells
Use the 'success' and 'aborted' afs_operations_ops methods and add a
'failed' method to handle the completion of an AFS.FetchData,
AFS.FetchData64 or YFS.FetchData64 RPC operation rather than directly
calling the done func pointed to by the afs_read struct from the call
delivery handler.

This means the done function will be called back on error also, not just on
successful completion.

This allows motion towards asynchronous data reception on data fetch calls
and allows any error to be handed off to the fscache read helper in the
same place as a successful completion.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588541471.3465195.8807019223378490810.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118157260.1232039.6549085372718234792.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161052647.2537118.12922380836599003659.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340417106.1303470.3502017303898569631.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539560673.286939.391310781674212229.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/file.c |   15 +++
 fs/afs/fs_operation.c |4 +++-
 fs/afs/fsclient.c |3 ---
 fs/afs/internal.h |1 +
 fs/afs/yfsclient.c|3 ---
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index f6282ac0d222..231e9fd7882b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -255,6 +255,19 @@ void afs_put_read(struct afs_read *req)
}
 }
 
+static void afs_fetch_data_notify(struct afs_operation *op)
+{
+   struct afs_read *req = op->fetch.req;
+   int error = op->error;
+
+   if (error == -ECONNABORTED)
+   error = afs_abort_to_error(op->ac.abort_code);
+   req->error = error;
+
+   if (req->done)
+   req->done(req);
+}
+
 static void afs_fetch_data_success(struct afs_operation *op)
 {
struct afs_vnode *vnode = op->file[0].vnode;
@@ -263,6 +276,7 @@ static void afs_fetch_data_success(struct afs_operation *op)
afs_vnode_commit_status(op, >file[0]);
afs_stat_v(vnode, n_fetches);
atomic_long_add(op->fetch.req->actual_len, >net->n_fetch_bytes);
+   afs_fetch_data_notify(op);
 }
 
 static void afs_fetch_data_put(struct afs_operation *op)
@@ -276,6 +290,7 @@ static const struct afs_operation_ops 
afs_fetch_data_operation = {
.issue_yfs_rpc  = yfs_fs_fetch_data,
.success= afs_fetch_data_success,
.aborted= afs_check_for_remote_deletion,
+   .failed = afs_fetch_data_notify,
.put= afs_fetch_data_put,
 };
 
diff --git a/fs/afs/fs_operation.c b/fs/afs/fs_operation.c
index 97cab12b0a6c..938e28a00101 100644
--- a/fs/afs/fs_operation.c
+++ b/fs/afs/fs_operation.c
@@ -195,8 +195,10 @@ void afs_wait_for_operation(struct afs_operation *op)
case -ECONNABORTED:
if (op->ops->aborted)
op->ops->aborted(op);
-   break;
+   fallthrough;
default:
+   if (op->ops->failed)
+   op->ops->failed(op);
break;
}
 
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 31e6b3635541..5e34f4dbd385 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -392,9 +392,6 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
break;
}
 
-   if (req->done)
-   req->done(req);
-
_leave(" = 0 [done]");
return 0;
 }
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 889f504d7308..62c1b38fa98b 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -742,6 +742,7 @@ struct afs_operation_ops {
void (*issue_yfs_rpc)(struct afs_operation *op);
void (*success)(struct afs_operation *op);
void (*aborted)(struct afs_operation *op);
+   void (*failed)(struct afs_operation *op);
void (*edit_dir)(struct afs_operation *op);
void (*put)(struct afs_operation *op);
 };
diff --git a/fs/afs/yfsclient.c b/fs/afs/yfsclient.c
index 363d6dd276c0..2b35cba8ad62 100644
--- a/fs/afs/yfsclient.c
+++ b/fs/afs/yfsclient.c
@@ -449,9 +449,6 @@ static int yfs_deliver_fs_fetch_data64(struct afs_call 
*call)
break;
}
 
-   if (req->done)
-   req->done(req);
-
_leave(" = 0 [done]");
return 0;
 }




[PATCH v5 25/28] afs: Prepare for use of THPs

2021-03-23 Thread David Howells
As a prelude to supporting transparent huge pages, use thp_size() and
similar rather than PAGE_SIZE/SHIFT.

Further, try and frame everything in terms of file positions and lengths
rather than page indices and numbers of pages.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588540227.3465195.4752143929716269062.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118155821.1232039.540445038028845740.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161051439.2537118.15577827510426326534.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340415869.1303470.6040191748634322355.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539559365.286939.18344613540296085269.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/dir.c  |2 
 fs/afs/file.c |8 -
 fs/afs/internal.h |2 
 fs/afs/write.c|  436 +
 4 files changed, 245 insertions(+), 203 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 021489497801..0dc6e1409405 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -2084,6 +2084,6 @@ static void afs_dir_invalidatepage(struct page *page, 
unsigned int offset,
afs_stat_v(dvnode, n_inval);
 
/* we clean up only if the entire page is being invalidated */
-   if (offset == 0 && length == PAGE_SIZE)
+   if (offset == 0 && length == thp_size(page))
detach_page_private(page);
 }
diff --git a/fs/afs/file.c b/fs/afs/file.c
index acbc21a8c80e..f6282ac0d222 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -330,8 +330,8 @@ static int afs_page_filler(struct key *key, struct page 
*page)
req->vnode  = vnode;
req->key= key_get(key);
req->pos= (loff_t)page->index << PAGE_SHIFT;
-   req->len= PAGE_SIZE;
-   req->nr_pages   = 1;
+   req->len= thp_size(page);
+   req->nr_pages   = thp_nr_pages(page);
req->done   = afs_file_read_done;
req->cleanup= afs_file_read_cleanup;
 
@@ -575,8 +575,8 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
clear_page_dirty_for_io(page);
 full_invalidate:
-   detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
+   detach_page_private(page);
 }
 
 /*
@@ -621,8 +621,8 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
 #endif
 
if (PagePrivate(page)) {
-   detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
+   detach_page_private(page);
}
 
/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 4076c6ba43eb..889f504d7308 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -815,8 +815,6 @@ struct afs_operation {
loff_t  pos;
loff_t  size;
loff_t  i_size;
-   pgoff_t first;  /* first page in mapping to 
deal with */
-   pgoff_t last;   /* last page in mapping to deal 
with */
boollaundering; /* Laundering page, 
PG_writeback not set */
} store;
struct {
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 89c804bfe253..e672833c99bc 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -94,15 +94,15 @@ int afs_write_begin(struct file *file, struct address_space 
*mapping,
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
struct page *page;
unsigned long priv;
-   unsigned f, from = pos & (PAGE_SIZE - 1);
-   unsigned t, to = from + len;
-   pgoff_t index = pos >> PAGE_SHIFT;
+   unsigned f, from;
+   unsigned t, to;
+   pgoff_t index;
int ret;
 
-   _enter("{%llx:%llu},{%lx},%u,%u",
-  vnode->fid.vid, vnode->fid.vnode, index, from, to);
+   _enter("{%llx:%llu},%llx,%x",
+  vnode->fid.vid, vnode->fid.vnode, pos, len);
 
-   page = grab_cache_page_write_begin(mapping, index, flags);
+   page = grab_cache_page_write_begin(mapping, pos / PAGE_SIZE, flags);
if (!page)
return -ENOMEM;
 
@@ -121,19 +121,20 @@ int afs_write_begin(struct file *file, struct 
address_space *mapping,
wait_on_page_fscache(page);
 #endif
 
+   index = page->index;
+   from = pos - index * PAGE_SIZE;
+   to = from + len;
+
 try_again:
/* See if

[PATCH v5 24/28] afs: Extract writeback extension into its own function

2021-03-23 Thread David Howells
Extract writeback extension into its own function to break up the writeback
function a bit.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588538471.3465195.782513375683399583.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118154610.1232039.1765365632920504822.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161050546.2537118.2202554806419189453.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340414102.1303470.9078891484034668985.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539558417.286939.2879469588895925399.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/write.c |  109 ++--
 1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index e1791de90478..89c804bfe253 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -490,47 +490,25 @@ static int afs_store_data(struct afs_vnode *vnode, struct 
iov_iter *iter,
 }
 
 /*
- * Synchronously write back the locked page and any subsequent non-locked dirty
- * pages.
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ *
+ * If this page holds new content, then we can include filler zeros in the
+ * writeback.
  */
-static int afs_write_back_from_locked_page(struct address_space *mapping,
-  struct writeback_control *wbc,
-  struct page *primary_page,
-  pgoff_t final_page)
+static void afs_extend_writeback(struct address_space *mapping,
+struct afs_vnode *vnode,
+long *_count,
+pgoff_t start,
+pgoff_t final_page,
+unsigned *_offset,
+unsigned *_to,
+bool new_content)
 {
-   struct afs_vnode *vnode = AFS_FS_I(mapping->host);
-   struct iov_iter iter;
struct page *pages[8], *page;
-   unsigned long count, priv;
-   unsigned n, offset, to, f, t;
-   pgoff_t start, first, last;
-   loff_t i_size, pos, end;
-   int loop, ret;
-
-   _enter(",%lx", primary_page->index);
-
-   count = 1;
-   if (test_set_page_writeback(primary_page))
-   BUG();
-
-   /* Find all consecutive lockable dirty pages that have contiguous
-* written regions, stopping when we find a page that is not
-* immediately lockable, is not dirty or is missing, or we reach the
-* end of the range.
-*/
-   start = primary_page->index;
-   priv = page_private(primary_page);
-   offset = afs_page_dirty_from(primary_page, priv);
-   to = afs_page_dirty_to(primary_page, priv);
-   trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
-
-   WARN_ON(offset == to);
-   if (offset == to)
-   trace_afs_page_dirty(vnode, tracepoint_string("WARN"), 
primary_page);
-
-   if (start >= final_page ||
-   (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, >flags)))
-   goto no_more;
+   unsigned long count = *_count, priv;
+   unsigned offset = *_offset, to = *_to, n, f, t;
+   int loop;
 
start++;
do {
@@ -551,8 +529,7 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
 
for (loop = 0; loop < n; loop++) {
page = pages[loop];
-   if (to != PAGE_SIZE &&
-   !test_bit(AFS_VNODE_NEW_CONTENT, >flags))
+   if (to != PAGE_SIZE && !new_content)
break;
if (page->index > final_page)
break;
@@ -566,8 +543,7 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
priv = page_private(page);
f = afs_page_dirty_from(page, priv);
t = afs_page_dirty_to(page, priv);
-   if (f != 0 &&
-   !test_bit(AFS_VNODE_NEW_CONTENT, >flags)) {
+   if (f != 0 && !new_content) {
unlock_page(page);
break;
}
@@ -593,6 +569,55 @@ static int afs_write_back_from_locked_page(struct 
address_space *mapping,
} while (start <= final_page && count < 65536);
 
 no_more:
+   *_count = count;
+   *_offset = offset;
+   *_to = to;

[PATCH v5 23/28] afs: Wait on PG_fscache before modifying/releasing a page

2021-03-23 Thread David Howells
PG_fscache is going to be used to indicate that a page is being written to
the cache, and that the page should not be modified or released until it's
finished.

Make afs_invalidatepage() and afs_releasepage() wait for it.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861253957.340223.7465334678444521655.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465832417.1377938.3571599385208729791.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588536286.3465195.13231895135369807920.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118153708.1232039.3535103645871176749.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161049369.2537118.11591934943429117060.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340412903.1303470.6424701655031380012.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539556890.286939.5873470593519458598.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/file.c  |9 +
 fs/afs/write.c |   10 ++
 2 files changed, 19 insertions(+)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index f1bab69e99d4..acbc21a8c80e 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -594,6 +594,7 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
if (PagePrivate(page))
afs_invalidate_dirty(page, offset, length);
 
+   wait_on_page_fscache(page);
_leave("");
 }
 
@@ -611,6 +612,14 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
 
/* deny if page is being written to the cache and the caller hasn't
 * elected to wait */
+#ifdef CONFIG_AFS_FSCACHE
+   if (PageFsCache(page)) {
+   if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & 
__GFP_FS))
+   return false;
+   wait_on_page_fscache(page);
+   }
+#endif
+
if (PagePrivate(page)) {
detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index dd4dc1c868b5..e1791de90478 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -117,6 +117,10 @@ int afs_write_begin(struct file *file, struct 
address_space *mapping,
SetPageUptodate(page);
}
 
+#ifdef CONFIG_AFS_FSCACHE
+   wait_on_page_fscache(page);
+#endif
+
 try_again:
/* See if this page is already partially written in a way that we can
 * merge the new write with.
@@ -857,6 +861,11 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
/* Wait for the page to be written to the cache before we allow it to
 * be modified.  We then assume the entire page will need writing back.
 */
+#ifdef CONFIG_AFS_FSCACHE
+   if (PageFsCache(vmf->page) &&
+   wait_on_page_bit_killable(vmf->page, PG_fscache) < 0)
+   return VM_FAULT_RETRY;
+#endif
 
if (PageWriteback(vmf->page) &&
wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
@@ -948,5 +957,6 @@ int afs_launder_page(struct page *page)
 
detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
+   wait_on_page_fscache(page);
return ret;
 }




[PATCH v5 22/28] afs: Use ITER_XARRAY for writing

2021-03-23 Thread David Howells
Use a single ITER_XARRAY iterator to describe the portion of a file to be
transmitted to the server rather than generating a series of small
ITER_BVEC iterators on the fly.  This will make it easier to implement AIO
in afs.

In theory we could maybe use one giant ITER_BVEC, but that means
potentially allocating a huge array of bio_vec structs (max 256 per page)
when in fact the pagecache already has a structure listing all the relevant
pages (radix_tree/xarray) that can be walked over.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/153685395197.14766.16289516750731233933.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/158861251312.340223.17924900795425422532.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465828607.1377938.6903132788463419368.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588535018.3465195.14509994354240338307.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118152415.1232039.6452879415814850025.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161048194.2537118.13763612220937637316.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340411602.1303470.4661108879482218408.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539555629.286939.5241869986617154517.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/fsclient.c  |   50 +
 fs/afs/internal.h  |   15 +++---
 fs/afs/rxrpc.c |  103 ++--
 fs/afs/write.c |  100 ---
 fs/afs/yfsclient.c |   25 +++
 include/trace/events/afs.h |   51 --
 6 files changed, 126 insertions(+), 218 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 897b37301851..31e6b3635541 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -1055,8 +1055,7 @@ static const struct afs_call_type afs_RXFSStoreData64 = {
 /*
  * store a set of pages to a very large file
  */
-static void afs_fs_store_data64(struct afs_operation *op,
-   loff_t pos, loff_t size, loff_t i_size)
+static void afs_fs_store_data64(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
struct afs_call *call;
@@ -1071,7 +1070,7 @@ static void afs_fs_store_data64(struct afs_operation *op,
if (!call)
return afs_op_nomem(op);
 
-   call->send_pages = true;
+   call->write_iter = op->store.write_iter;
 
/* marshall the parameters */
bp = call->request;
@@ -1087,47 +1086,38 @@ static void afs_fs_store_data64(struct afs_operation 
*op,
*bp++ = 0; /* unix mode */
*bp++ = 0; /* segment size */
 
-   *bp++ = htonl(upper_32_bits(pos));
-   *bp++ = htonl(lower_32_bits(pos));
-   *bp++ = htonl(upper_32_bits(size));
-   *bp++ = htonl(lower_32_bits(size));
-   *bp++ = htonl(upper_32_bits(i_size));
-   *bp++ = htonl(lower_32_bits(i_size));
+   *bp++ = htonl(upper_32_bits(op->store.pos));
+   *bp++ = htonl(lower_32_bits(op->store.pos));
+   *bp++ = htonl(upper_32_bits(op->store.size));
+   *bp++ = htonl(lower_32_bits(op->store.size));
+   *bp++ = htonl(upper_32_bits(op->store.i_size));
+   *bp++ = htonl(lower_32_bits(op->store.i_size));
 
trace_afs_make_fs_call(call, >fid);
afs_make_op_call(op, call, GFP_NOFS);
 }
 
 /*
- * store a set of pages
+ * Write data to a file on the server.
  */
 void afs_fs_store_data(struct afs_operation *op)
 {
struct afs_vnode_param *vp = >file[0];
struct afs_call *call;
-   loff_t size, pos, i_size;
__be32 *bp;
 
_enter(",%x,{%llx:%llu},,",
   key_serial(op->key), vp->fid.vid, vp->fid.vnode);
 
-   size = (loff_t)op->store.last_to - (loff_t)op->store.first_offset;
-   if (op->store.first != op->store.last)
-   size += (loff_t)(op->store.last - op->store.first) << 
PAGE_SHIFT;
-   pos = (loff_t)op->store.first << PAGE_SHIFT;
-   pos += op->store.first_offset;
-
-   i_size = i_size_read(>vnode->vfs_inode);
-   if (pos + size > i_size)
-   i_size = size + pos;
-
_debug("size %llx, at %llx, i_size %llx",
-  (unsigned long long) size, (unsigned long long) pos,
-  (unsigned long long) i_size);
+  (unsigned long long)op->store.size,
+  (unsigned long long)op->store.pos,
+  (unsigned long long)op->store.i_size);
 
-   if (upper_32_bits(pos) || upper_32_bits(i_size) || upper_32_bits(size) 
||
-   upper_32_bits(pos + size))
-   return afs_fs

[PATCH v5 21/28] afs: Set up the iov_iter before calling afs_extract_data()

2021-03-23 Thread David Howells
afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
 appropriate places.  A number of convenience functions are provided to
 this end (eg. afs_extract_to_buf()).

 This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_MAPPING iterator when reading data to load directly
 into the inode's pages without needing to create a list of them.

This will allow O_DIRECT calls to be supported in future patches.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/152898380012.11616.12094591785228251717.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/153685394431.14766.3178466345696987059.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/153999787395.866.11218209749223643998.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/154033911195.12041.3882700371848894587.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/158861250059.340223.1248231474865140653.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465827399.1377938.11181327349704960046.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588533776.3465195.3612752083351956948.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118151238.1232039.17015723405750601161.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161047240.2537118.14721975104810564022.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340410333.1303470.16260122230371140878.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539554187.286939.15305559004905459852.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/dir.c   |  222 +++-
 fs/afs/file.c  |  190 ++---
 fs/afs/fsclient.c  |   54 +++--
 fs/afs/internal.h  |   16 ++--
 fs/afs/write.c |   27 --
 fs/afs/yfsclient.c |   54 +++--
 6 files changed, 314 insertions(+), 249 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 30c769efee26..021489497801 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -103,6 +103,35 @@ struct afs_lookup_cookie {
struct afs_fid  fids[50];
 };
 
+/*
+ * Drop the refs that we're holding on the pages we were reading into.  We've
+ * got refs on the first nr_pages pages.
+ */
+static void afs_dir_read_cleanup(struct afs_read *req)
+{
+   struct address_space *mapping = req->vnode->vfs_inode.i_mapping;
+   struct page *page;
+   pgoff_t last = req->nr_pages - 1;
+
+   XA_STATE(xas, >i_pages, 0);
+
+   if (unlikely(!req->nr_pages))
+   return;
+
+   rcu_read_lock();
+   xas_for_each(, page, last) {
+   if (xas_retry(, page))
+   continue;
+   BUG_ON(xa_is_value(page));
+   BUG_ON(PageCompound(page));
+   ASSERTCMP(page->mapping, ==, mapping);
+
+   put_page(page);
+   }
+
+   rcu_read_unlock();
+}
+
 /*
  * check that a directory page is valid
  */
@@ -128,7 +157,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
qty /= sizeof(union afs_xdr_dir_block);
 
/* check them */
-   dbuf = kmap(page);
+   dbuf = kmap_atomic(page);
for (tmp = 0; tmp < qty; tmp++) {
if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) {
printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n",
@@ -147,7 +176,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
((u8 *)>blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0;
}
 
-   kunmap(page);
+   kunmap_atomic(dbuf);
 
 checked:
afs_stat_v(dvnode, n_read_dir);
@@ -158,35 +187,74 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, 
struct page *page,
 }
 
 /*
- * Check the contents of a directory that we've just read.
+ * Dump the contents of a directory.
  */
-static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req)
+static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req)
 {
struct afs_xdr_dir_page *dbuf;
-   unsigned int i, j, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+   struct address_space *mapping = dvnode->vfs_inode.i_mapping;
+   struct page *page;
+   unsigned int i, qty = PAGE_SIZE / sizeof(union afs_xdr_dir_block);
+   pgoff_t last = req->nr_pages - 1;
 
-   for (i = 0; i < req->nr_pages; i++)
-   if (!afs_dir_check_page(dvnode, req->pages[i], req->actual_len))
-   goto b

[PATCH v5 20/28] afs: Log remote unmarshalling errors

2021-03-23 Thread David Howells
Log unmarshalling errors reported by the peer (ie. it can't parse what we
sent it).  Limit the maximum number of messages to 3.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/159465826250.1377938.16372395422217583913.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588532584.3465195.15618385466614028590.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118149739.1232039.208060911149801695.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161046033.2537118.7779717661044373273.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340409118.1303470.17812607349396199116.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539552964.286939.16503232687974398308.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/rxrpc.c |   34 ++
 1 file changed, 34 insertions(+)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 0ec38b758f29..ae68576f822f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -500,6 +500,39 @@ void afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call, gfp_t gfp)
_leave(" = %d", ret);
 }
 
+/*
+ * Log remote abort codes that indicate that we have a protocol disagreement
+ * with the server.
+ */
+static void afs_log_error(struct afs_call *call, s32 remote_abort)
+{
+   static int max = 0;
+   const char *msg;
+   int m;
+
+   switch (remote_abort) {
+   case RX_EOF: msg = "unexpected EOF";break;
+   case RXGEN_CC_MARSHAL:   msg = "client marshalling";break;
+   case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling";  break;
+   case RXGEN_SS_MARSHAL:   msg = "server marshalling";break;
+   case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling";  break;
+   case RXGEN_DECODE:   msg = "opcode decode"; break;
+   case RXGEN_SS_XDRFREE:   msg = "server XDR cleanup";break;
+   case RXGEN_CC_XDRFREE:   msg = "client XDR cleanup";break;
+   case -32:msg = "insufficient data"; break;
+   default:
+   return;
+   }
+
+   m = max;
+   if (m < 3) {
+   max = m + 1;
+   pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n",
+ msg, call->type->name,
+ >alist->addrs[call->addr_ix].transport);
+   }
+}
+
 /*
  * deliver messages to a call
  */
@@ -563,6 +596,7 @@ static void afs_deliver_to_call(struct afs_call *call)
goto out;
case -ECONNABORTED:
ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
+   afs_log_error(call, call->abort_code);
goto done;
case -ENOTSUPP:
abort_code = RXGEN_OPCODE;




[PATCH v5 19/28] afs: Don't truncate iter during data fetch

2021-03-23 Thread David Howells
Don't truncate the iterator to correspond to the actual data size when
fetching the data from the server - rather, pass the length we want to read
to rxrpc.

This will allow the clear-after-read code in future to simply clear the
remaining iterator capacity rather than having to reinitialise the
iterator.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861249201.340223.13035445866976590375.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465825061.1377938.14403904452300909320.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588531418.3465195.10712005940763063144.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118148567.1232039.13380313332292947956.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161044610.2537118.17908520793806837792.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340407907.1303470.6501394859511712746.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539551721.286939.14655713136572200716.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/fsclient.c  |6 --
 fs/afs/internal.h  |6 ++
 fs/afs/rxrpc.c |   13 +
 fs/afs/yfsclient.c |6 --
 include/net/af_rxrpc.h |2 +-
 net/rxrpc/recvmsg.c|9 +
 6 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index 1d95ed9dd86e..4a57c6c6f12b 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -305,8 +305,9 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
unsigned int size;
int ret;
 
-   _enter("{%u,%zu/%llu}",
-  call->unmarshall, iov_iter_count(call->iter), req->actual_len);
+   _enter("{%u,%zu,%zu/%llu}",
+  call->unmarshall, call->iov_len, iov_iter_count(call->iter),
+  req->actual_len);
 
switch (call->unmarshall) {
case 0:
@@ -343,6 +344,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call)
size = PAGE_SIZE - req->offset;
else
size = req->remain;
+   call->iov_len = size;
call->bvec[0].bv_len = size;
call->bvec[0].bv_offset = req->offset;
call->bvec[0].bv_page = req->pages[req->index];
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 921e7d3b2cfa..4725cfc4aaef 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -104,6 +104,7 @@ struct afs_call {
struct afs_server   *server;/* The fileserver record if fs 
op (pins ref) */
struct afs_vlserver *vlserver;  /* The vlserver record if vl op 
*/
void*request;   /* request data (first part) */
+   size_t  iov_len;/* Size of *iter to be used */
struct iov_iter def_iter;   /* Default buffer/data iterator 
*/
struct iov_iter *iter;  /* Iterator currently in use */
union { /* Convenience for ->def_iter */
@@ -1271,6 +1272,7 @@ static inline void afs_make_op_call(struct afs_operation 
*op, struct afs_call *c
 
 static inline void afs_extract_begin(struct afs_call *call, void *buf, size_t 
size)
 {
+   call->iov_len = size;
call->kvec[0].iov_base = buf;
call->kvec[0].iov_len = size;
iov_iter_kvec(>def_iter, READ, call->kvec, 1, size);
@@ -1278,21 +1280,25 @@ static inline void afs_extract_begin(struct afs_call 
*call, void *buf, size_t si
 
 static inline void afs_extract_to_tmp(struct afs_call *call)
 {
+   call->iov_len = sizeof(call->tmp);
afs_extract_begin(call, >tmp, sizeof(call->tmp));
 }
 
 static inline void afs_extract_to_tmp64(struct afs_call *call)
 {
+   call->iov_len = sizeof(call->tmp64);
afs_extract_begin(call, >tmp64, sizeof(call->tmp64));
 }
 
 static inline void afs_extract_discard(struct afs_call *call, size_t size)
 {
+   call->iov_len = size;
iov_iter_discard(>def_iter, READ, size);
 }
 
 static inline void afs_extract_to_buf(struct afs_call *call, size_t size)
 {
+   call->iov_len = size;
afs_extract_begin(call, call->buffer, size);
 }
 
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 8be709cb8542..0ec38b758f29 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -363,6 +363,7 @@ void afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call, gfp_t gfp)
struct rxrpc_call *rxcall;
struct msghdr msg;
struct kvec iov[1];
+   size_t len;
s64 tx_total_len;
int ret;
 
@@ -466,9 +467,10 @@ void afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call, gfp_t gfp)
rxrpc_ke

[PATCH v5 17/28] afs: Print the operation debug_id when logging an unexpected data version

2021-03-23 Thread David Howells
Print the afs_operation debug_id when logging an unexpected change in the
data version.  This allows the logged message to be matched against
tracelines.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588528377.3465195.2206051235095182302.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118146111.1232039.11398082422487058312.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161042180.2537118.2471333561661033316.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340405772.1303470.3877167548944248214.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539549628.286939.15234870409714613954.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/inode.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 48519ee00eef..af6556bb3d6a 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -215,11 +215,12 @@ static void afs_apply_status(struct afs_operation *op,
 
if (vp->dv_before + vp->dv_delta != status->data_version) {
if (test_bit(AFS_VNODE_CB_PROMISED, >flags))
-   pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx 
%s\n",
+   pr_warn("kAFS: vnode modified {%llx:%llu} %llx->%llx %s 
(op=%x)\n",
vnode->fid.vid, vnode->fid.vnode,
(unsigned long long)vp->dv_before + 
vp->dv_delta,
(unsigned long long)status->data_version,
-   op->type ? op->type->name : "???");
+   op->type ? op->type->name : "???",
+   op->debug_id);
 
vnode->invalid_before = status->data_version;
if (vnode->status.type == AFS_FTYPE_DIR) {




[PATCH v5 18/28] afs: Move key to afs_read struct

2021-03-23 Thread David Howells
Stash the key used to authenticate read operations in the afs_read struct.
This will be necessary to reissue the operation against the server if a
read from the cache fails in upcoming cache changes.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861248336.340223.1851189950710196001.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465823899.1377938.11925978022348532049.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588529557.3465195.7303323479305254243.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118147693.1232039.13780672951838643842.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161043340.2537118.511899217704140722.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340406678.1303470.12676824086429446370.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539550819.286939.1268332875889175195.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/dir.c  |3 ++-
 fs/afs/file.c |   16 +---
 fs/afs/internal.h |3 ++-
 fs/afs/write.c|   12 ++--
 4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 714fcca9af99..30c769efee26 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -242,6 +242,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
return ERR_PTR(-ENOMEM);
 
refcount_set(>usage, 1);
+   req->key = key_get(key);
req->nr_pages = nr_pages;
req->actual_len = i_size; /* May change */
req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
@@ -306,7 +307,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode 
*dvnode, struct key *key)
 
if (!test_bit(AFS_VNODE_DIR_VALID, >flags)) {
trace_afs_reload_dir(dvnode);
-   ret = afs_fetch_data(dvnode, key, req);
+   ret = afs_fetch_data(dvnode, req);
if (ret < 0)
goto error_unlock;
 
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 21868bfc3a44..d23192b3b933 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -199,6 +199,7 @@ void afs_put_read(struct afs_read *req)
if (req->pages != req->array)
kfree(req->pages);
}
+   key_put(req->key);
kfree(req);
}
 }
@@ -229,7 +230,7 @@ static const struct afs_operation_ops 
afs_fetch_data_operation = {
 /*
  * Fetch file data from the volume.
  */
-int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read 
*req)
+int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
 {
struct afs_operation *op;
 
@@ -238,9 +239,9 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key 
*key, struct afs_read *re
   vnode->fid.vid,
   vnode->fid.vnode,
   vnode->fid.unique,
-  key_serial(key));
+  key_serial(req->key));
 
-   op = afs_alloc_operation(key, vnode->volume);
+   op = afs_alloc_operation(req->key, vnode->volume);
if (IS_ERR(op))
return PTR_ERR(op);
 
@@ -279,6 +280,7 @@ int afs_page_filler(void *data, struct page *page)
 * unmarshalling code will clear the unfilled space.
 */
refcount_set(>usage, 1);
+   req->key = key_get(key);
req->pos = (loff_t)page->index << PAGE_SHIFT;
req->len = PAGE_SIZE;
req->nr_pages = 1;
@@ -288,7 +290,7 @@ int afs_page_filler(void *data, struct page *page)
 
/* read the contents of the file from the server into the
 * page */
-   ret = afs_fetch_data(vnode, key, req);
+   ret = afs_fetch_data(vnode, req);
afs_put_read(req);
 
if (ret < 0) {
@@ -373,7 +375,6 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
struct afs_read *req;
struct list_head *p;
struct page *first, *page;
-   struct key *key = afs_file_key(file);
pgoff_t index;
int ret, n, i;
 
@@ -397,6 +398,7 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
 
refcount_set(>usage, 1);
req->vnode = vnode;
+   req->key = key_get(afs_file_key(file));
req->page_done = afs_readpages_page_done;
req->pos = first->index;
req->pos <<= PAGE_SHIFT;
@@ -426,11 +428,11 @@ static int afs_readpages_one(struct file *file, struct 
address_space *mapping,
} while (req->nr_pages < n);
 
if (req->nr_pages == 0) {
-   kfree(req);
+   afs_put_read(req);
return 0;
}
 
-   ret = afs_fetch_data(vnode, ke

[PATCH v5 16/28] afs: Pass page into dirty region helpers to provide THP size

2021-03-23 Thread David Howells
Pass a pointer to the page being accessed into the dirty region helpers so
that the size of the page can be determined in case it's a transparent huge
page.

This also required the page to be passed into the afs_page_dirty trace
point - so there's no need to specifically pass in the index or private
data as these can be retrieved directly from the page struct.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539548385.286939.8864598314493255313.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/file.c  |   20 +++
 fs/afs/internal.h  |   16 ++--
 fs/afs/write.c |   60 ++--
 include/trace/events/afs.h |   23 ++---
 4 files changed, 55 insertions(+), 64 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 6d43713fde01..21868bfc3a44 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -515,8 +515,8 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
return;
 
/* We may need to shorten the dirty region */
-   f = afs_page_dirty_from(priv);
-   t = afs_page_dirty_to(priv);
+   f = afs_page_dirty_from(page, priv);
+   t = afs_page_dirty_to(page, priv);
 
if (t <= offset || f >= end)
return; /* Doesn't overlap */
@@ -534,17 +534,17 @@ static void afs_invalidate_dirty(struct page *page, 
unsigned int offset,
if (f == t)
goto undirty;
 
-   priv = afs_page_dirty(f, t);
+   priv = afs_page_dirty(page, f, t);
set_page_private(page, priv);
-   trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, 
priv);
+   trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page);
return;
 
 undirty:
-   trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, 
priv);
+   trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
clear_page_dirty_for_io(page);
 full_invalidate:
-   priv = (unsigned long)detach_page_private(page);
-   trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, 
priv);
+   detach_page_private(page);
+   trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
 }
 
 /*
@@ -572,7 +572,6 @@ static void afs_invalidatepage(struct page *page, unsigned 
int offset,
 static int afs_releasepage(struct page *page, gfp_t gfp_flags)
 {
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-   unsigned long priv;
 
_enter("{{%llx:%llu}[%lu],%lx},%x",
   vnode->fid.vid, vnode->fid.vnode, page->index, page->flags,
@@ -581,9 +580,8 @@ static int afs_releasepage(struct page *page, gfp_t 
gfp_flags)
/* deny if page is being written to the cache and the caller hasn't
 * elected to wait */
if (PagePrivate(page)) {
-   priv = (unsigned long)detach_page_private(page);
-   trace_afs_page_dirty(vnode, tracepoint_string("rel"),
-page->index, priv);
+   detach_page_private(page);
+   trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
}
 
/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index b626e38e9ab5..180eae8134da 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -875,31 +875,31 @@ struct afs_vnode_cache_aux {
 #define __AFS_PAGE_PRIV_MMAPPED0x8000UL
 #endif
 
-static inline unsigned int afs_page_dirty_resolution(void)
+static inline unsigned int afs_page_dirty_resolution(struct page *page)
 {
-   int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
+   int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
return (shift > 0) ? shift : 0;
 }
 
-static inline size_t afs_page_dirty_from(unsigned long priv)
+static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv)
 {
unsigned long x = priv & __AFS_PAGE_PRIV_MASK;
 
/* The lower bound is inclusive */
-   return x << afs_page_dirty_resolution();
+   return x << afs_page_dirty_resolution(page);
 }
 
-static inline size_t afs_page_dirty_to(unsigned long priv)
+static inline size_t afs_page_dirty_to(struct page *page, unsigned long priv)
 {

[PATCH v5 15/28] afs: Disable use of the fscache I/O routines

2021-03-23 Thread David Howells
Disable use of the fscache I/O routined by the AFS filesystem.  It's about
to transition to passing iov_iters down and fscache is about to have its
I/O path to use iov_iter, so all that needs to change.

Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux-cach...@redhat.com
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/158861209824.340223.1864211542341758994.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465768717.1376105.2229314852486665807.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588457929.3465195.1730097418904945578.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118143744.1232039.2727898205333669064.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161039077.2537118.7986870854927176905.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340403323.1303470.8159439948319423431.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539547167.286939.3536238932531122332.st...@warthog.procyon.org.uk/
 # v4
---

 fs/afs/file.c  |  199 ++--
 fs/afs/inode.c |2 -
 fs/afs/write.c |   10 ---
 3 files changed, 36 insertions(+), 175 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 85f5adf21aa0..6d43713fde01 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -203,24 +203,6 @@ void afs_put_read(struct afs_read *req)
}
 }
 
-#ifdef CONFIG_AFS_FSCACHE
-/*
- * deal with notification that a page was read from the cache
- */
-static void afs_file_readpage_read_complete(struct page *page,
-   void *data,
-   int error)
-{
-   _enter("%p,%p,%d", page, data, error);
-
-   /* if the read completes with an error, we just unlock the page and let
-* the VM reissue the readpage */
-   if (!error)
-   SetPageUptodate(page);
-   unlock_page(page);
-}
-#endif
-
 static void afs_fetch_data_success(struct afs_operation *op)
 {
struct afs_vnode *vnode = op->file[0].vnode;
@@ -288,89 +270,46 @@ int afs_page_filler(void *data, struct page *page)
if (test_bit(AFS_VNODE_DELETED, >flags))
goto error;
 
-   /* is it cached? */
-#ifdef CONFIG_AFS_FSCACHE
-   ret = fscache_read_or_alloc_page(vnode->cache,
-page,
-afs_file_readpage_read_complete,
-NULL,
-GFP_KERNEL);
-#else
-   ret = -ENOBUFS;
-#endif
-   switch (ret) {
-   /* read BIO submitted (page in cache) */
-   case 0:
-   break;
-
-   /* page not yet cached */
-   case -ENODATA:
-   _debug("cache said ENODATA");
-   goto go_on;
-
-   /* page will not be cached */
-   case -ENOBUFS:
-   _debug("cache said ENOBUFS");
-
-   fallthrough;
-   default:
-   go_on:
-   req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
-   if (!req)
-   goto enomem;
-
-   /* We request a full page.  If the page is a partial one at the
-* end of the file, the server will return a short read and the
-* unmarshalling code will clear the unfilled space.
-*/
-   refcount_set(>usage, 1);
-   req->pos = (loff_t)page->index << PAGE_SHIFT;
-   req->len = PAGE_SIZE;
-   req->nr_pages = 1;
-   req->pages = req->array;
-   req->pages[0] = page;
-   get_page(page);
-
-   /* read the contents of the file from the server into the
-* page */
-   ret = afs_fetch_data(vnode, key, req);
-   afs_put_read(req);
-
-   if (ret < 0) {
-   if (ret == -ENOENT) {
-   _debug("got NOENT from server"
-  " - marking file deleted and stale");
-   set_bit(AFS_VNODE_DELETED, >flags);
-   ret = -ESTALE;
-   }
-
-#ifdef CONFIG_AFS_FSCACHE
-   fscache_uncache_page(vnode->cache, page);
-#endif
-   BUG_ON(PageFsCache(page));
-
-   if (ret == -EINTR ||
-   ret == -ENOMEM ||
-   ret == -ERESTARTSYS ||
-   ret == -EAGAIN)
-   goto error;
-   goto io_error;
-   }
+   req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
+   if (!req)
+   goto enomem;
 
-

[PATCH v5 14/28] fscache, cachefiles: Add alternate API to use kiocb for read/write to cache

2021-03-23 Thread David Howells
Add an alternate API by which the cache can be accessed through a kiocb,
doing async DIO, rather than using the current API that tells the cache
where all the pages are.

The new API is intended to be used in conjunction with the netfs helper
library.  A filesystem must pick one or the other and not mix them.

Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API
before #including the header.  This prevents them from continuing to use
the old API at the same time as there are incompatibilities in how the
PG_fscache page bit is used.

Changes:
 - Use the vfs_iocb_iter_read/write() helpers[1]
 - Move initial definition of fscache_begin_read_operation() here.
 - Remove a commented-out line[2]
 - Combine ki->term_func calls in cachefiles_read_complete()[2].
 - Remove explicit NULL initialiser[2].
 - Remove extern on func decl[2].
 - Put in param names on func decl[2].
 - Remove redundant else[2].
 - Fill out the kdoc comment for fscache_begin_read_operation().
 - Rename fs/fscache/page2.c to io.c to match later patches.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Christoph Hellwig 
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216102614.ga27...@lst.de/ [1]
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [2]
Link: 
https://lore.kernel.org/r/161118142558.1232039.17993829899588971439.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161037850.2537118.8819808229350326503.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340402057.1303470.8038373593844486698.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539545919.286939.14573472672781434757.st...@warthog.procyon.org.uk/
 # v4
---

 fs/cachefiles/Makefile|1 
 fs/cachefiles/interface.c |5 -
 fs/cachefiles/internal.h  |9 +
 fs/cachefiles/io.c|  403 +
 fs/fscache/Kconfig|1 
 fs/fscache/Makefile   |1 
 fs/fscache/internal.h |4 
 fs/fscache/io.c   |  116 
 fs/fscache/page.c |2 
 fs/fscache/stats.c|1 
 include/linux/fscache-cache.h |4 
 include/linux/fscache.h   |   39 
 12 files changed, 583 insertions(+), 3 deletions(-)
 create mode 100644 fs/cachefiles/io.c
 create mode 100644 fs/fscache/io.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 891dedda5905..2227dc2d5498 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
bind.o \
daemon.o \
interface.o \
+   io.o \
key.o \
main.o \
namei.o \
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 5efa6a3702c0..da3948fdb615 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -319,8 +319,8 @@ static void cachefiles_drop_object(struct fscache_object 
*_object)
 /*
  * dispose of a reference to an object
  */
-static void cachefiles_put_object(struct fscache_object *_object,
- enum fscache_obj_ref_trace why)
+void cachefiles_put_object(struct fscache_object *_object,
+  enum fscache_obj_ref_trace why)
 {
struct cachefiles_object *object;
struct fscache_cache *cache;
@@ -568,4 +568,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.uncache_page   = cachefiles_uncache_page,
.dissociate_pages   = cachefiles_dissociate_pages,
.check_consistency  = cachefiles_check_consistency,
+   .begin_read_operation   = cachefiles_begin_read_operation,
 };
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index cf9bd6401c2d..4ed83aa5253b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -150,6 +150,9 @@ extern int cachefiles_has_space(struct cachefiles_cache 
*cache,
  */
 extern const struct fscache_cache_ops cachefiles_cache_ops;
 
+void cachefiles_put_object(struct fscache_object *_object,
+  enum fscache_obj_ref_trace why);
+
 /*
  * key.c
  */
@@ -217,6 +220,12 @@ extern int cachefiles_allocate_pages(struct 
fscache_retrieval *,
 extern int cachefiles_write_page(struct fscache_storage *, struct page *);
 extern void cachefiles_uncache_page(struct fscache_object *, struct page *);
 
+/*
+ * rdwr2.c
+ */
+extern int cachefiles_begin_read_operation(struct netfs_read_request *,
+  struct fscache_retrieval *);
+
 /*
  * security.c
  */
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
new file mode 100644
index ..620959d1e95b
--- /dev/null
+++ b/fs/cachefiles/io.c
@@ -0,0 +1,403 @@
+// SPDX-License-Identifier: GPL-2.0-or-la

[PATCH v5 13/28] netfs: Define an interface to talk to a cache

2021-03-23 Thread David Howells
Add an interface to the netfs helper library for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.

The API passes an iov_iter to the cache read/write routines to indicate the
data/buffer to be used.  This is done using the ITER_XARRAY type to provide
direct access to the netfs inode's pagecache.

When the netfs's ->begin_cache_operation() method is called, this must fill
in the cache_resources in the netfs_read_request struct, including the
netfs_cache_ops used by the helper lib to talk to the cache.  The helper
lib does not directly access the cache.

Changes
v5:
- Use end_page_fscache() rather than unlock_page_fscache()[2].

v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue (can't use in_softirq()[1]).
- Add missing inc of netfs_n_rh_read stat.
- Move initial definition of fscache_begin_read_operation() elsewhere.
- Need to call op->begin_cache_operation() from netfs_write_begin().

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
Link: 
https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539542874.286939.13337898213448136687.st...@warthog.procyon.org.uk/
 # v4
---

 fs/netfs/read_helper.c |  241 
 include/linux/netfs.h  |   49 ++
 2 files changed, 289 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 54febf354588..3498bde035eb 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -88,6 +88,8 @@ static void netfs_free_read_request(struct work_struct *work)
if (rreq->netfs_priv)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
+   if (rreq->cache_resources.ops)
+   
rreq->cache_resources.ops->end_operation(>cache_resources);
kfree(rreq);
netfs_stat_d(_n_rh_rreq);
 }
@@ -154,6 +156,34 @@ static void netfs_clear_unread(struct 
netfs_read_subrequest *subreq)
iov_iter_zero(iov_iter_count(), );
 }
 
+static void netfs_cache_read_terminated(void *priv, ssize_t 
transferred_or_error,
+   bool was_async)
+{
+   struct netfs_read_subrequest *subreq = priv;
+
+   netfs_subreq_terminated(subreq, transferred_or_error, was_async);
+}
+
+/*
+ * Issue a read against the cache.
+ * - Eats the caller's ref on subreq.
+ */
+static void netfs_read_from_cache(struct netfs_read_request *rreq,
+ struct netfs_read_subrequest *subreq,
+ bool seek_data)
+{
+   struct netfs_cache_resources *cres = >cache_resources;
+   struct iov_iter iter;
+
+   netfs_stat(_n_rh_read);
+   iov_iter_xarray(, READ, >mapping->i_pages,
+   subreq->start + subreq->transferred,
+   subreq->len   - subreq->transferred);
+
+   cres->ops->read(cres, subreq->start, , seek_data,
+   netfs_cache_read_terminated, subreq);
+}
+
 /*
  * Fill a subrequest region with zeroes.
  */
@@ -198,6 +228,144 @@ static void netfs_rreq_completed(struct 
netfs_read_request *rreq, bool was_async
netfs_put_read_request(rreq, was_async);
 }
 
+/*
+ * Deal with the completion of writing the data to the cache.  We have to clear
+ * the PG_fscache bits on the pages involved and release the caller's ref.
+ *
+ * May be called in softirq mode and we inherit a ref from the caller.
+ */
+static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq,
+ bool was_async)
+{
+   struct netfs_read_subrequest *subreq;
+   struct page *page;
+   pgoff_t unlocked = 0;
+   bool have_unlocked = false;
+
+   rcu_read_lock();
+
+   list_for_each_entry(subreq, >subrequests, rreq_link) {
+   XA_STATE(xas, >mapping->i_pages, subreq->start / 
PAGE_SIZE);
+
+   xas_for_each(, page, (subreq->start + subreq->len - 1) / 
PAGE_SIZE) {
+   /* We migh

[PATCH v5 12/28] netfs: Add write_begin helper

2021-03-23 Thread David Howells
Add a helper to do the pre-reading work for the netfs write_begin address
space op.

Changes
v5:
- Made the wait for PG_fscache in netfs_write_begin() killable[2].

v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue (can't use in_softirq()[1]).

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
Link: 
https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539541541.286939.1889738674057013729.st...@warthog.procyon.org.uk/
 # v4
---

 fs/netfs/internal.h  |2 +
 fs/netfs/read_helper.c   |  167 ++
 fs/netfs/stats.c |   11 ++-
 include/linux/netfs.h|8 ++
 include/trace/events/netfs.h |4 +
 5 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 98b6f4516da1..b7f2c4459f33 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed;
 extern atomic_t netfs_n_rh_zero;
 extern atomic_t netfs_n_rh_short_read;
 extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_begin;
 extern atomic_t netfs_n_rh_write_done;
 extern atomic_t netfs_n_rh_write_failed;
+extern atomic_t netfs_n_rh_write_zskip;
 
 
 static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 6d6ed30f417e..54febf354588 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -772,3 +772,170 @@ int netfs_readpage(struct file *file,
return ret;
 }
 EXPORT_SYMBOL(netfs_readpage);
+
+static void netfs_clear_thp(struct page *page)
+{
+   unsigned int i;
+
+   for (i = 0; i < thp_nr_pages(page); i++)
+   clear_highpage(page + i);
+}
+
+/**
+ * netfs_write_begin - Helper to prepare for writing
+ * @file: The file to read from
+ * @mapping: The mapping to read from
+ * @pos: File position at which the write will begin
+ * @len: The length of the write in this page
+ * @flags: AOP_* flags
+ * @_page: Where to put the resultant page
+ * @_fsdata: Place for the netfs to store a cookie
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Pre-read data for a write-begin request by drawing data from the cache if
+ * possible, or the netfs if not.  Space beyond the EOF is zero-filled.
+ * Multiple I/O requests from different sources will get munged together.  If
+ * necessary, the readahead window can be expanded in either direction to a
+ * more convenient alighment for RPC efficiency or to make storage in the cache
+ * feasible.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.
+ *
+ * The check_write_begin() operation can be provided to check for and flush
+ * conflicting writes once the page is grabbed and locked.  It is passed a
+ * pointer to the fsdata cookie that gets returned to the VM to be passed to
+ * write_end.  It is permitted to sleep.  It should return 0 if the request
+ * should go ahead; unlock the page and return -EAGAIN to cause the page to be
+ * regot; or return an error.
+ *
+ * This is usable whether or not caching is enabled.
+ */
+int netfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned int len, unsigned int flags,
+ struct page **_page, void **_fsdata,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv)
+{
+   struct netfs_read_request *rreq;
+   struct page *page, *xpage;
+   struct inode *inode = file_inode(file);
+   unsigned int debug_index = 0;
+   pgoff_t index = pos >> PAGE_SHIFT;
+   int pos_in_page = pos & ~PAGE_MASK;
+   loff_t size;
+   int ret;
+
+   struct readahead_control ractl = {
+   .file   = file,
+   .mapping= mapping,
+   ._index = index,
+   ._nr_pages  = 0,
+   };
+
+retry:

[PATCH v5 11/28] netfs: Gather stats

2021-03-23 Thread David Howells
Gather statistics from the netfs interface that can be exported through a
seqfile.  This is intended to be called by a later patch when viewing
/proc/fs/fscache/stats.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539539959.286939.6794352576462965914.st...@warthog.procyon.org.uk/
 # v4
---

 fs/netfs/Kconfig   |   15 +
 fs/netfs/Makefile  |3 +--
 fs/netfs/internal.h|   34 ++
 fs/netfs/read_helper.c |   23 
 fs/netfs/stats.c   |   54 
 include/linux/netfs.h  |1 +
 6 files changed, 128 insertions(+), 2 deletions(-)
 create mode 100644 fs/netfs/stats.c

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
index 2ebf90e6ca95..578112713703 100644
--- a/fs/netfs/Kconfig
+++ b/fs/netfs/Kconfig
@@ -6,3 +6,18 @@ config NETFS_SUPPORT
  This option enables support for network filesystems, including
  helpers for high-level buffered I/O, abstracting out read
  segmentation, local caching and transparent huge page support.
+
+config NETFS_STATS
+   bool "Gather statistical information on local caching"
+   depends on NETFS_SUPPORT && PROC_FS
+   help
+ This option causes statistical information to be gathered on local
+ caching and exported through file:
+
+   /proc/fs/fscache/stats
+
+ The gathering of statistics adds a certain amount of overhead to
+ execution as there are a quite a few stats gathered, and on a
+ multi-CPU system these may be on cachelines that keep bouncing
+ between CPUs.  On the other hand, the stats are very useful for
+ debugging purposes.  Saying 'Y' here is recommended.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 4b4eff2ba369..c15bfc966d96 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -1,6 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
-netfs-y := \
-   read_helper.o
+netfs-y := read_helper.o stats.o
 
 obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index ee665c0e7dc8..98b6f4516da1 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -16,8 +16,42 @@
  */
 extern unsigned int netfs_debug;
 
+/*
+ * stats.c
+ */
+#ifdef CONFIG_NETFS_STATS
+extern atomic_t netfs_n_rh_readahead;
+extern atomic_t netfs_n_rh_readpage;
+extern atomic_t netfs_n_rh_rreq;
+extern atomic_t netfs_n_rh_sreq;
+extern atomic_t netfs_n_rh_download;
+extern atomic_t netfs_n_rh_download_done;
+extern atomic_t netfs_n_rh_download_failed;
+extern atomic_t netfs_n_rh_download_instead;
+extern atomic_t netfs_n_rh_read;
+extern atomic_t netfs_n_rh_read_done;
+extern atomic_t netfs_n_rh_read_failed;
+extern atomic_t netfs_n_rh_zero;
+extern atomic_t netfs_n_rh_short_read;
+extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_done;
+extern atomic_t netfs_n_rh_write_failed;
+
+
+static inline void netfs_stat(atomic_t *stat)
+{
+   atomic_inc(stat);
+}
+
+static inline void netfs_stat_d(atomic_t *stat)
+{
+   atomic_dec(stat);
+}
+
+#else
 #define netfs_stat(x) do {} while(0)
 #define netfs_stat_d(x) do {} while(0)
+#endif
 
 /*/
 /*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 799eee7f4ee6..6d6ed30f417e 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -56,6 +56,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
refcount_set(>usage, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, >flags);
ops->init_rreq(rreq, file);
+   netfs_stat(_n_rh_rreq);
}
 
return rreq;
@@ -88,6 +89,7 @@ static void netfs_free_read_request(struct work_struct *work)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
kfree(rreq);
+   netfs_stat_d(_n_rh_rreq);
 }
 
 static void netfs_put_read_request(struct netfs_read_request *rreq, bool 
was_async)
@@ -117,6 +119,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest(
refcount_set(>usage, 2);
subreq->rreq = rreq;
netfs_get_read_request

[PATCH v5 10/28] netfs: Add tracepoints

2021-03-23 Thread David Howells
Add three tracepoints to track the activity of the read helpers:

 (1) netfs/netfs_read

 This logs entry to the read helpers and also expansion of the range in
 a readahead request.

 (2) netfs/netfs_rreq

 This logs the progress of netfs_read_request objects which track
 read requests.  A read request may be a compound of multiple
 subrequests.

 (3) netfs/netfs_sreq

 This logs the progress of netfs_read_subrequest objects, which track
 the contributions from various sources to a read request.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539538693.286939.10171713520419106334.st...@warthog.procyon.org.uk/
 # v4
---

 fs/netfs/read_helper.c   |   26 +
 include/linux/netfs.h|1 
 include/trace/events/netfs.h |  199 ++
 3 files changed, 226 insertions(+)
 create mode 100644 include/trace/events/netfs.h

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 30d4bf6bf28a..799eee7f4ee6 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -16,6 +16,8 @@
 #include 
 #include 
 #include "internal.h"
+#define CREATE_TRACE_POINTS
+#include 
 
 MODULE_DESCRIPTION("Network fs support");
 MODULE_AUTHOR("Red Hat, Inc.");
@@ -84,6 +86,7 @@ static void netfs_free_read_request(struct work_struct *work)
netfs_rreq_clear_subreqs(rreq, false);
if (rreq->netfs_priv)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
+   trace_netfs_rreq(rreq, netfs_rreq_trace_free);
kfree(rreq);
 }
 
@@ -129,6 +132,7 @@ static void __netfs_put_subrequest(struct 
netfs_read_subrequest *subreq,
 {
struct netfs_read_request *rreq = subreq->rreq;
 
+   trace_netfs_sreq(subreq, netfs_sreq_trace_free);
kfree(subreq);
netfs_put_read_request(rreq, was_async);
 }
@@ -183,6 +187,7 @@ static void netfs_read_from_server(struct 
netfs_read_request *rreq,
  */
 static void netfs_rreq_completed(struct netfs_read_request *rreq, bool 
was_async)
 {
+   trace_netfs_rreq(rreq, netfs_rreq_trace_done);
netfs_rreq_clear_subreqs(rreq, was_async);
netfs_put_read_request(rreq, was_async);
 }
@@ -221,6 +226,8 @@ static void netfs_rreq_unlock(struct netfs_read_request 
*rreq)
iopos = 0;
subreq_failed = (subreq->error < 0);
 
+   trace_netfs_rreq(rreq, netfs_rreq_trace_unlock);
+
rcu_read_lock();
xas_for_each(, page, last_page) {
unsigned int pgpos = (page->index - start_page) * PAGE_SIZE;
@@ -281,6 +288,8 @@ static void netfs_rreq_short_read(struct netfs_read_request 
*rreq,
__clear_bit(NETFS_SREQ_SHORT_READ, >flags);
__set_bit(NETFS_SREQ_SEEK_DATA_READ, >flags);
 
+   trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);
+
netfs_get_read_subrequest(subreq);
atomic_inc(>nr_rd_ops);
netfs_read_from_server(rreq, subreq);
@@ -296,6 +305,8 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
 
WARN_ON(in_interrupt());
 
+   trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit);
+
/* We don't want terminating submissions trying to wake us up whilst
 * we're still going through the list.
 */
@@ -308,6 +319,7 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
break;
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->error = 0;
+   trace_netfs_sreq(subreq, 
netfs_sreq_trace_download_instead);
netfs_get_read_subrequest(subreq);
atomic_inc(>nr_rd_ops);
netfs_read_from_server(rreq, subreq);
@@ -332,6 +344,8 @@ static bool netfs_rreq_perform_resubmissions(struct 
netfs_read_request *rreq)
  */
 static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async)
 {
+   trace_netfs_rreq(rreq, netfs_rreq_trace_assess);
+
 again:
if (!test_bit(NETFS_RREQ_FAILED, >flags) &&
test_bit(NETFS_RREQ_INCOMPLETE_IO, >flags)) {
@@ -422,6 +436,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest 
*subreq,
set_bit(NETFS_RREQ_WRIT

[PATCH v5 09/28] netfs: Provide readahead and readpage netfs helpers

2021-03-23 Thread David Howells
Add a pair of helper functions:

 (*) netfs_readahead()
 (*) netfs_readpage()

to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages.  This is all handled within the helper.

Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.

netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.

The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):

 (*) expand_readahead() [optional]

 Called to allow the netfs to request an expansion of a readahead
 request to meet its own alignment requirements.  This is done by
 changing rreq->start and rreq->len.

 (*) clamp_length() [optional]

 Called to allow the netfs to cut down a subrequest to meet its own
 boundary requirements.  If it does this, the helper will generate
 additional subrequests until the full request is satisfied.

 (*) is_still_valid() [optional]

 Called to find out if the data just read from the cache has been
 invalidated and must be reread from the server.

 (*) issue_op() [required]

 Called to ask the netfs to issue a read to the server.  The subrequest
 describes the read.  The read request holds information about the file
 being accessed.

 The netfs can cache information in rreq->netfs_priv.

 Upon completion, the netfs should set the error, transferred and can
 also set FSCACHE_SREQ_CLEAR_TAIL and then call
 fscache_subreq_terminated().

 (*) done() [optional]

 Called after the pages have been unlocked.  The read request is still
 pinning the file and mapping and may still be pinning pages with
 PG_fscache.  rreq->error indicates any error that has been
 accumulated.

 (*) cleanup() [optional]

 Called when the helper is disposing of a finished read request.  This
 allows the netfs to clear rreq->netfs_priv.

Netfs support is enabled with CONFIG_NETFS_SUPPORT=y.  It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.

Changes:
v5:
 - Comment why netfs_readahead() is putting pages[2].
 - Use page_file_mapping() rather than page->mapping[2].
 - Use page_index() rather than page->index[2].
 - Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an
   appropriate ref too[4].

v4:
 - Folded in a kerneldoc comment fix.
 - Folded in a fix for the error handling in the case that ENOMEM occurs.
 - Added flag to netfs_subreq_terminated() to indicate that the caller may
   have been running async and stuff that might sleep needs punting to a
   workqueue (can't use in_softirq()[1]).

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210216084230.ga23...@lst.de/ [1]
Link: https://lore.kernel.org/r/20210321014202.gf3...@casper.infradead.org/ [2]
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [3]
Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [4]
Link: 
https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539537375.286939.16642940088716990995.st...@warthog.procyon.org.uk/
 # v4
---

 fs/Kconfig |1 
 fs/Makefile|1 
 fs/netfs/Makefile  |6 
 fs/netfs/internal.h|   61 
 fs/netfs/read_helper.c |  725 
 include/linux/netfs.h  |   83 +
 6 files changed, 877 insertions(+)
 create mode 100644 fs/netfs/Makefile
 create mode 100644 fs/netfs/internal.h
 create mode 100644 fs/netfs/read_helper.c

diff --git a/fs/Kconfig b/fs/Kconfig
index 462253ae483a..eccbcf1e3f2e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -125,6 +125,7 @@ source "fs/overlayfs/Kconfig"
 
 menu "Caches"
 
+source "fs/netfs/Kconfig"
 source "fs/fscache/Kconfig"

[PATCH v5 08/28] netfs, mm: Add set/end/wait_on_page_fscache() aliases

2021-03-23 Thread David Howells
Add set/end/wait_on_page_fscache() as aliases of
set/end/wait_page_private_2().  These allow a page to marked with
PG_fscache, the flag to be removed and waiters woken and waiting for the
flag to be cleared.  A ref on the page is also taken and dropped.

[Linus suggested putting the fscache-themed functions into the
 caching-specific headers rather than pagemap.h[1]]

Changes:
v5:
- Mirror the changes to the core routines[2].

Signed-off-by: David Howells 
cc: Linus Torvalds 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/1330473.1612974...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [1]
Link: 
https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539536093.286939.5076448803512118764.st...@warthog.procyon.org.uk/
 # v4
Link: https://lore.kernel.org/r/2499407.1616505...@warthog.procyon.org.uk/ [2]
---

 include/linux/netfs.h |   57 +
 1 file changed, 57 insertions(+)

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index cc1102040488..8479d63406f7 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -26,4 +26,61 @@
 #define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
 #define TestClearPageFsCache(page) TestClearPagePrivate2((page))
 
+/**
+ * set_page_fscache - Set PG_fscache on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_fscache (PG_private_2) flag on a page and take the reference
+ * needed for the VM to handle its lifetime correctly.  This sets the flag and
+ * takes the reference unconditionally, so care must be taken not to set the
+ * flag again if it's already set.
+ */
+static inline void set_page_fscache(struct page *page)
+{
+   set_page_private_2(page);
+}
+
+/**
+ * end_page_fscache - Clear PG_fscache and release any waiters
+ * @page: The page
+ *
+ * Clear the PG_fscache (PG_private_2) bit on a page and wake up any sleepers
+ * waiting for this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+static inline void end_page_fscache(struct page *page)
+{
+   end_page_private_2(page);
+}
+
+/**
+ * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_fscache (aka PG_private_2) to be cleared on a page.
+ */
+static inline void wait_on_page_fscache(struct page *page)
+{
+   wait_on_page_private_2(page);
+}
+
+/**
+ * wait_on_page_fscache_killable - Wait for PG_fscache to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_fscache (aka PG_private_2) to be cleared on a page or until a
+ * fatal signal is received by the calling task.
+ *
+ * Return:
+ * - 0 if successful.
+ * - -EINTR if a fatal signal was encountered.
+ */
+static inline int wait_on_page_fscache_killable(struct page *page)
+{
+   return wait_on_page_private_2_killable(page);
+}
+
 #endif /* _LINUX_NETFS_H */




[PATCH v5 07/28] netfs, mm: Move PG_fscache helper funcs to linux/netfs.h

2021-03-23 Thread David Howells
Move the PG_fscache related helper funcs (such as SetPageFsCache()) to
linux/netfs.h rather than linux/fscache.h as the intention is to move to a
model where they're used by the network filesystem and the helper library,
but not by fscache/cachefiles itself.

Signed-off-by: David Howells 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539534516.286939.6265142985563005000.st...@warthog.procyon.org.uk/
 # v4
---

 include/linux/fscache.h |   11 +--
 include/linux/netfs.h   |   29 +
 2 files changed, 30 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/netfs.h

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index a1c928fe98e7..1f8dc72369ee 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
 #define fscache_available() (1)
@@ -29,16 +30,6 @@
 #endif
 
 
-/*
- * overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
- */
-#define PageFsCache(page)  PagePrivate2((page))
-#define SetPageFsCache(page)   SetPagePrivate2((page))
-#define ClearPageFsCache(page) ClearPagePrivate2((page))
-#define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
-#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
-
 /* pattern used to fill dead space in an index entry */
 #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
 
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
new file mode 100644
index ..cc1102040488
--- /dev/null
+++ b/include/linux/netfs.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Network filesystem support services.
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * See:
+ *
+ * Documentation/filesystems/netfs_library.rst
+ *
+ * for a description of the network filesystem interface declared here.
+ */
+
+#ifndef _LINUX_NETFS_H
+#define _LINUX_NETFS_H
+
+#include 
+
+/*
+ * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
+ * a page is currently backed by a local disk cache
+ */
+#define PageFsCache(page)  PagePrivate2((page))
+#define SetPageFsCache(page)   SetPagePrivate2((page))
+#define ClearPageFsCache(page) ClearPagePrivate2((page))
+#define TestSetPageFsCache(page)   TestSetPagePrivate2((page))
+#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
+
+#endif /* _LINUX_NETFS_H */




[PATCH v5 06/28] netfs: Documentation for helper library

2021-03-23 Thread David Howells
Add interface documentation for the netfs helper library.

Signed-off-by: David Howells 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/161539533275.286939.6246011228676840978.st...@warthog.procyon.org.uk/
 # v4
---

 Documentation/filesystems/index.rst |1 
 Documentation/filesystems/netfs_library.rst |  526 +++
 2 files changed, 527 insertions(+)
 create mode 100644 Documentation/filesystems/netfs_library.rst

diff --git a/Documentation/filesystems/index.rst 
b/Documentation/filesystems/index.rst
index 1f76b1cb3348..d4853cb919d2 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -53,6 +53,7 @@ filesystem implementations.
journalling
fscrypt
fsverity
+   netfs_library
 
 Filesystems
 ===
diff --git a/Documentation/filesystems/netfs_library.rst 
b/Documentation/filesystems/netfs_library.rst
new file mode 100644
index ..57a641847818
--- /dev/null
+++ b/Documentation/filesystems/netfs_library.rst
@@ -0,0 +1,526 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+NETWORK FILESYSTEM HELPER LIBRARY
+=
+
+.. Contents:
+
+ - Overview.
+ - Buffered read helpers.
+   - Read helper functions.
+   - Read helper structures.
+   - Read helper operations.
+   - Read helper procedure.
+   - Read helper cache API.
+
+
+Overview
+
+
+The network filesystem helper library is a set of functions designed to aid a
+network filesystem in implementing VM/VFS operations.  For the moment, that
+just includes turning various VM buffered read operations into requests to read
+from the server.  The helper library, however, can also interpose other
+services, such as local caching or local data encryption.
+
+Note that the library module doesn't link against local caching directly, so
+access must be provided by the netfs.
+
+
+Buffered Read Helpers
+=
+
+The library provides a set of read helpers that handle the ->readpage(),
+->readahead() and much of the ->write_begin() VM operations and translate them
+into a common call framework.
+
+The following services are provided:
+
+ * Handles transparent huge pages (THPs).
+
+ * Insulates the netfs from VM interface changes.
+
+ * Allows the netfs to arbitrarily split reads up into pieces, even ones that
+   don't match page sizes or page alignments and that may cross pages.
+
+ * Allows the netfs to expand a readahead request in both directions to meet
+   its needs.
+
+ * Allows the netfs to partially fulfil a read, which will then be resubmitted.
+
+ * Handles local caching, allowing cached data and server-read data to be
+   interleaved for a single request.
+
+ * Handles clearing of bufferage that aren't on the server.
+
+ * Handle retrying of reads that failed, switching reads from the cache to the
+   server as necessary.
+
+ * In the future, this is a place that other services can be performed, such as
+   local encryption of data to be stored remotely or in the cache.
+
+From the network filesystem, the helpers require a table of operations.  This
+includes a mandatory method to issue a read operation along with a number of
+optional methods.
+
+
+Read Helper Functions
+-
+
+Three read helpers are provided::
+
+ * void netfs_readahead(struct readahead_control *ractl,
+   const struct netfs_read_request_ops *ops,
+   void *netfs_priv);``
+ * int netfs_readpage(struct file *file,
+ struct page *page,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv);
+ * int netfs_write_begin(struct file *file,
+struct address_space *mapping,
+loff_t pos,
+unsigned int len,
+unsigned int flags,
+struct page **_page,
+void **_fsdata,
+const struct netfs_read_request_ops *ops,
+void *netfs_priv);
+
+Each corresponds to a VM operation, with the addition of a couple of parameters
+for the use of the read helpers:
+
+ * ``ops``
+
+   A table of operations through which the helpers can talk to the filesystem.
+
+ * ``netfs_priv``
+
+   Filesystem private data (can be NULL).
+
+Both of these values will be stored into the read request structure.
+
+For ->readahead() and ->readpage(), the network filesystem should just jump
+into the corresponding read helper; whereas for ->write_begin(), it may be a
+little more complicated as the network filesystem might want to flush
+conflicting writes or track dirty data and needs to put the acquired

[PATCH v5 05/28] netfs: Make a netfs helper module

2021-03-23 Thread David Howells
Make a netfs helper module to manage read request segmentation, caching
support and transparent huge page support on behalf of a network
filesystem.

Signed-off-by: David Howells 
Reviewed-by: Jeff Layton 
cc: Matthew Wilcox 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: 
https://lore.kernel.org/r/160588496284.3465195.10102643717770106661.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118135638.1232039.1622182202673126285.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161031028.2537118.1213974428943508753.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340391427.1303470.14884950716721956560.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539531569.286939.18317119181653706665.st...@warthog.procyon.org.uk/
 # v4
---

 fs/netfs/Kconfig |8 
 1 file changed, 8 insertions(+)
 create mode 100644 fs/netfs/Kconfig

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
new file mode 100644
index ..2ebf90e6ca95
--- /dev/null
+++ b/fs/netfs/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config NETFS_SUPPORT
+   tristate "Support for network filesystem high-level I/O"
+   help
+ This option enables support for network filesystems, including
+ helpers for high-level buffered I/O, abstracting out read
+ segmentation, local caching and transparent huge page support.




[PATCH v5 04/28] mm: Implement readahead_control pageset expansion

2021-03-23 Thread David Howells
Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.

The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.

The expansion will be cut short if a page already exists in either of the
areas being expanded into.  Note that any expansion made in such a case is
not rolled back.

This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.

Changes:
- Moved the declaration of readahead_expand() to a better place[1].

Suggested-by: Matthew Wilcox (Oracle) 
Signed-off-by: David Howells 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/20210217161358.gm2858...@casper.infradead.org/ 
[1]
Link: 
https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588479816.3465195.553952688795241765.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539530488.286939.18085961677838089157.st...@warthog.procyon.org.uk/
 # v4
---

 include/linux/pagemap.h |2 +
 mm/readahead.c  |   70 +++
 2 files changed, 72 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index da5c38864037..5c14a9365aae 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -837,6 +837,8 @@ void page_cache_sync_ra(struct readahead_control *, struct 
file_ra_state *,
unsigned long req_count);
 void page_cache_async_ra(struct readahead_control *, struct file_ra_state *,
struct page *, unsigned long req_count);
+void readahead_expand(struct readahead_control *ractl,
+ loff_t new_start, size_t new_len);
 
 /**
  * page_cache_sync_readahead - generic file readahead
diff --git a/mm/readahead.c b/mm/readahead.c
index c5b0457415be..4446dada0bc2 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -638,3 +638,73 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, 
size_t, count)
 {
return ksys_readahead(fd, offset, count);
 }
+
+/**
+ * readahead_expand - Expand a readahead request
+ * @ractl: The request to be expanded
+ * @new_start: The revised start
+ * @new_len: The revised size of the request
+ *
+ * Attempt to expand a readahead request outwards from the current size to the
+ * specified size by inserting locked pages before and after the current window
+ * to increase the size to the new window.  This may involve the insertion of
+ * THPs, in which case the window may get expanded even beyond what was
+ * requested.
+ *
+ * The algorithm will stop if it encounters a conflicting page already in the
+ * pagecache and leave a smaller expansion than requested.
+ *
+ * The caller must check for this by examining the revised @ractl object for a
+ * different expansion than was requested.
+ */
+void readahead_expand(struct readahead_control *ractl,
+ loff_t new_start, size_t new_len)
+{
+   struct address_space *mapping = ractl->mapping;
+   pgoff_t new_index, new_nr_pages;
+   gfp_t gfp_mask = readahead_gfp_mask(mapping);
+
+   new_index = new_start / PAGE_SIZE;
+
+   /* Expand the leading edge downwards */
+   while (ractl->_index > new_index) {
+   unsigned long index = ractl->_index - 1;
+   struct page *page = xa_load(>i_pages, index);
+
+   if (page && !xa_is_value(page))
+   return; /* Page apparently present */
+
+   page = __page_cache_alloc(gfp_mask);
+   if (!page)
+   return;
+   if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) {
+   put_page(page);
+   return;
+   }
+
+   ractl->_nr_pages++;
+   ractl->_index = page->index;
+   }
+
+   new_len += new_start - readahead_pos(ractl);
+   new_nr_pages = DIV_ROUND_UP(new_len, PAGE_SIZE);
+
+   /* Expand the trailing edge upwards */
+   while (ractl->_nr_pages < new_

[PATCH v5 03/28] mm: Add set/end/wait functions for PG_private_2

2021-03-23 Thread David Howells
Add three functions to manipulate PG_private_2:

 (*) set_page_private_2() - Set the flag and take an appropriate reference
 on the flagged page.

 (*) end_page_private_2() - Clear the flag, drop the reference and wake up
 any waiters, somewhat analogously with end_page_writeback().

 (*) wait_on_page_private_2() - Wait for the flag to be cleared.

Wrappers will need to be placed in the netfs lib header in the patch that
adds that.

[This implements a suggestion by Linus[1] to not mix the terminology of
 PG_private_2 and PG_fscache in the mm core function]

Changes:
v5:
- Add set and end functions, calling the end function end rather than
  unlock[3].
- Keep a ref on the page when PG_private_2 is set[4][5].

v4:
- Remove extern from the declaration[2].

Suggested-by: Linus Torvalds 
Signed-off-by: David Howells 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/1330473.1612974...@warthog.procyon.org.uk/ # v1
Link: 
https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=xaemtkswpu1yw4uaro1r3ibs27...@mail.gmail.com/
 [1]
Link: https://lore.kernel.org/r/20210216102659.ga27...@lst.de/ [2]
Link: 
https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539528910.286939.1252328699383291173.st...@warthog.procyon.org.uk
 # v4
Link: https://lore.kernel.org/r/20210321105309.gg3...@casper.infradead.org [3]
Link: 
https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pum...@mail.gmail.com/
 [4]
Link: 
https://lore.kernel.org/r/CAHk-=wjsgsrj7xwhsmq6daqiz53xa39pog+xa_wetgwbbu4...@mail.gmail.com/
 [5]
---

 include/linux/pagemap.h |   19 +++
 mm/filemap.c|   59 +++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 444155ae56c0..da5c38864037 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -689,6 +689,25 @@ void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
 
+/**
+ * set_page_private_2 - Set PG_private_2 on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_private_2 flag on a page and take the reference needed for the VM
+ * to handle its lifetime correctly.  This sets the flag and takes the
+ * reference unconditionally, so care must be taken not to set the flag again
+ * if it's already set.
+ */
+static inline void set_page_private_2(struct page *page)
+{
+   get_page(page);
+   SetPagePrivate2(page);
+}
+
+void end_page_private_2(struct page *page);
+void wait_on_page_private_2(struct page *page);
+int wait_on_page_private_2_killable(struct page *page);
+
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
diff --git a/mm/filemap.c b/mm/filemap.c
index 43700480d897..788b71e8a72d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1432,6 +1432,65 @@ void unlock_page(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page);
 
+/**
+ * end_page_private_2 - Clear PG_private_2 and release any waiters
+ * @page: The page
+ *
+ * Clear the PG_private_2 bit on a page and wake up any sleepers waiting for
+ * this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
+ */
+void end_page_private_2(struct page *page)
+{
+   page = compound_head(page);
+   VM_BUG_ON_PAGE(!PagePrivate2(page), page);
+   clear_bit_unlock(PG_private_2, >flags);
+   wake_up_page_bit(page, PG_private_2);
+   put_page(page);
+}
+EXPORT_SYMBOL(end_page_private_2);
+
+/**
+ * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
+ * @page: The page to wait on
+ *
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page.
+ */
+void wait_on_page_private_2(struct page *page)
+{
+   while (PagePrivate2(page))
+   wait_on_page_bit(page, PG_private_2);
+}
+EXPORT_SYMBOL(wait_on_page_private_2);
+
+/**
+ * wait_on_page_private_2_killable - Wait for PG_private_2 to be cleared on a 
page
+ * @page: The page to wait on
+ *
+ * Wait for PG_private_2 (aka PG_fscache) to be cleared on a page or until a
+ * fatal signal is received by the calling task.
+ *
+ * Return:
+ * - 0 if successful.
+ * - -EINTR if a fatal signal was encountered.
+ */
+int wait_on_page_private_2_killable(struct page *page)
+{
+   int ret = 0;
+
+   while (PagePrivate2(page)) {
+   ret = wait_on_page_bit_killable(page, PG_private_2);
+   if (ret < 0)
+   break;
+   }
+
+   

[PATCH v5 02/28] mm: Add wait_on_page_writeback_killable()

2021-03-23 Thread David Howells
Add a function to wait killably on the PG_writeback page flag.

Signed-off-by: David Howells 
cc: Matthew Wilcox (Oracle) 
cc: Alexander Viro 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
---

 include/linux/pagemap.h |1 +
 mm/page-writeback.c |   25 +
 2 files changed, 26 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 20225b067583..444155ae56c0 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -683,6 +683,7 @@ static inline int wait_on_page_locked_killable(struct page 
*page)
 
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
+int wait_on_page_writeback_killable(struct page *page);
 extern void end_page_writeback(struct page *page);
 void wait_for_stable_page(struct page *page);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eb34d204d4ee..b8bad275f94b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2833,6 +2833,31 @@ void wait_on_page_writeback(struct page *page)
 }
 EXPORT_SYMBOL_GPL(wait_on_page_writeback);
 
+/**
+ * Wait for a page to complete writeback
+ * @page: The page to wait on
+ *
+ * Wait for the writeback status of a page to clear or a fatal signal to occur.
+ *
+ * Return:
+ * - 0 on success.
+ * - -EINTR if a fatal signal was encountered.
+ */
+int wait_on_page_writeback_killable(struct page *page)
+{
+   int ret = 0;
+
+   while (PageWriteback(page)) {
+   trace_wait_on_page_writeback(page, page_mapping(page));
+   ret = wait_on_page_bit_killable(page, PG_writeback);
+   if (ret < 0)
+   break;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(wait_on_page_writeback_killable);
+
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
  * @page:  The page to wait on.




[PATCH v5 01/28] iov_iter: Add ITER_XARRAY

2021-03-23 Thread David Howells
Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes.  The iterator supports transparent huge pages.

The iterate_xarray() macro calls the helper function with rcu_access()
helped.  I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.

Signed-off-by: David Howells 
cc: Alexander Viro 
cc: Matthew Wilcox (Oracle) 
cc: Christoph Hellwig 
cc: linux...@kvack.org
cc: linux-cach...@redhat.com
cc: linux-...@lists.infradead.org
cc: linux-...@vger.kernel.org
cc: linux-c...@vger.kernel.org
cc: ceph-de...@vger.kernel.org
cc: v9fs-develo...@lists.sourceforge.net
cc: linux-fsde...@vger.kernel.org
Link: https://lore.kernel.org/r/3577430.1579705...@warthog.procyon.org.uk/ # rfc
Link: 
https://lore.kernel.org/r/158861205740.340223.16592990225607814022.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.st...@warthog.procyon.org.uk/
 # rfc
Link: 
https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.st...@warthog.procyon.org.uk/
 # v2
Link: 
https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.st...@warthog.procyon.org.uk/
 # v3
Link: 
https://lore.kernel.org/r/161539527815.286939.14607323792547049341.st...@warthog.procyon.org.uk/
 # v4
---

 include/linux/uio.h |   11 ++
 lib/iov_iter.c  |  313 +++
 2 files changed, 301 insertions(+), 23 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 27ff8eb786dc..5f5ffc45d4aa 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -10,6 +10,7 @@
 #include 
 
 struct page;
+struct address_space;
 struct pipe_inode_info;
 
 struct kvec {
@@ -24,6 +25,7 @@ enum iter_type {
ITER_BVEC = 16,
ITER_PIPE = 32,
ITER_DISCARD = 64,
+   ITER_XARRAY = 128,
 };
 
 struct iov_iter {
@@ -39,6 +41,7 @@ struct iov_iter {
const struct iovec *iov;
const struct kvec *kvec;
const struct bio_vec *bvec;
+   struct xarray *xarray;
struct pipe_inode_info *pipe;
};
union {
@@ -47,6 +50,7 @@ struct iov_iter {
unsigned int head;
unsigned int start_head;
};
+   loff_t xarray_start;
};
 };
 
@@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter 
*i)
return iov_iter_type(i) == ITER_DISCARD;
 }
 
+static inline bool iov_iter_is_xarray(const struct iov_iter *i)
+{
+   return iov_iter_type(i) == ITER_XARRAY;
+}
+
 static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 {
return i->type & (READ | WRITE);
@@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int 
direction, const struct bio_
 void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct 
pipe_inode_info *pipe,
size_t count);
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t 
count);
+void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray 
*xarray,
+loff_t start, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f66c62aa7154..f808c625c11e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -76,7 +76,44 @@
}   \
 }
 
-#define iterate_all_kinds(i, n, v, I, B, K) {  \
+#define iterate_xarray(i, n, __v, skip, STEP) {\
+   struct page *head = NULL;   \
+   size_t wanted = n, seg, offset; \
+   loff_t start = i->xarray

[PATCH v5 00/28] Network fs helper library & fscache kiocb API

2021-03-23 Thread David Howells
rnel.org/r/20210321014202.gf3...@casper.infradead.org/ [6]
Link: https://lore.kernel.org/r/20210321105309.gg3...@casper.infradead.org/ [7]

References
==

These patches have been published for review before, firstly as part of a
larger set:

Link: 
https://lore.kernel.org/r/158861203563.340223.7585359869938129395.st...@warthog.procyon.org.uk/

Link: 
https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.st...@warthog.procyon.org.uk/
Link: 
https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.st...@warthog.procyon.org.uk/

Link: 
https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.st...@warthog.procyon.org.uk/

Then as a cut-down set:

Link: 
https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.st...@warthog.procyon.org.uk/
 # v1

Link: 
https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.st...@warthog.procyon.org.uk/
 # v2

Link: 
https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.st...@warthog.procyon.org.uk/
 # v3

Link: 
https://lore.kernel.org/r/161539526152.286939.8589700175877370401.st...@warthog.procyon.org.uk/
 # v4

Proposals/information about the design has been published here:

Link: https://lore.kernel.org/r/24942.1573667...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/2758811.1610621...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/1441311.1598547...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160655.1611012...@warthog.procyon.org.uk/

And requests for information:

Link: https://lore.kernel.org/r/3326.1579019...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/4467.1579020...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/3577430.1579705...@warthog.procyon.org.uk/

The NFS parts, though not included here, have been tested by someone who's
using fscache in production:

Link: 
https://listman.redhat.com/archives/linux-cachefs/2020-December/msg0.html

I've posted partial patches to try and help 9p and cifs along:

Link: https://lore.kernel.org/r/1514086.1605697...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/1794123.1605713...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/241017.1612263...@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/270998.1612265...@warthog.procyon.org.uk/

David
---
David Howells (28):
  iov_iter: Add ITER_XARRAY
  mm: Add wait_on_page_writeback_killable()
  mm: Add set/end/wait functions for PG_private_2
  mm: Implement readahead_control pageset expansion
  netfs: Make a netfs helper module
  netfs: Documentation for helper library
  netfs, mm: Move PG_fscache helper funcs to linux/netfs.h
  netfs, mm: Add set/end/wait_on_page_fscache() aliases
  netfs: Provide readahead and readpage netfs helpers
  netfs: Add tracepoints
  netfs: Gather stats
  netfs: Add write_begin helper
  netfs: Define an interface to talk to a cache
  fscache, cachefiles: Add alternate API to use kiocb for read/write to 
cache
  afs: Disable use of the fscache I/O routines
  afs: Pass page into dirty region helpers to provide THP size
  afs: Print the operation debug_id when logging an unexpected data version
  afs: Move key to afs_read struct
  afs: Don't truncate iter during data fetch
  afs: Log remote unmarshalling errors
  afs: Set up the iov_iter before calling afs_extract_data()
  afs: Use ITER_XARRAY for writing
  afs: Wait on PG_fscache before modifying/releasing a page
  afs: Extract writeback extension into its own function
  afs: Prepare for use of THPs
  afs: Use the fs operation ops to handle FetchData completion
  afs: Use new fscache read helper API
  afs: Use the fscache_write_begin() helper


 Documentation/filesystems/index.rst |1 +
 Documentation/filesystems/netfs_library.rst |  526 +
 fs/Kconfig  |1 +
 fs/Makefile |1 +
 fs/afs/Kconfig  |1 +
 fs/afs/dir.c|  225 ++--
 fs/afs/file.c   |  483 ++--
 fs/afs/fs_operation.c   |4 +-
 fs/afs/fsclient.c   |  108 +-
 fs/afs/inode.c  |7 +-
 fs/afs/internal.h   |   59 +-
 fs/afs/rxrpc.c  |  150 +--
 fs/afs/write.c  |  659 +--
 fs/afs/yfsclient.c  |   82 +-
 fs/cachefiles/Makefile  |1 +
 fs/cachefiles/interface.c   |5 +-
 fs/cachefiles/internal.h|9 +
 fs/cachefiles/io.c  |  403 +++
 fs/fscache/Kconfig  |1 +
 fs/fscache/Makefile |1 +
 fs/fscache/

Re: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache

2021-03-23 Thread David Howells
David Howells  wrote:

> > > - wait_on_page_writeback(page);
> > > + if (wait_on_page_writeback_killable(page) < 0)
> > > + return VM_FAULT_RETRY | VM_FAULT_LOCKED;
> > 
> > You forgot to unlock the page.
> 
> Do I need to?  Doesn't VM_FAULT_LOCKED indicate that to the caller?  Or is it
> impermissible to do it like that?

Looks like, yes, I do need to.  VM_FAULT_LOCKED is ignored if RETRY is given.

David



Re: [PATCH v5 00/27] Memory Folios

2021-03-23 Thread David Howells
Johannes Weiner  wrote:

> So I fully agree with the motivation behind this patch. But I do
> wonder why it's special-casing the commmon case instead of the rare
> case. It comes at a huge cost. Short term, the churn of replacing
> 'page' with 'folio' in pretty much all instances is enormous.
> 
> And longer term, I'm not convinced folio is the abstraction we want
> throughout the kernel. If nobody should be dealing with tail pages in
> the first place, why are we making everybody think in 'folios'? Why
> does a filesystem care that huge pages are composed of multiple base
> pages internally? This feels like an implementation detail leaking out
> of the MM code. The vast majority of places should be thinking 'page'
> with a size of 'page_size()'. Including most parts of the MM itself.

I like the idea of logically separating individual hardware pages from
abstract bundles of pages by using a separate type for them - at least in
filesystem code.  I'm trying to abstract some of the handling out of the
network filesystems and into a common library plus ITER_XARRAY to insulate
those filesystems from the VM.

David



Re: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache

2021-03-23 Thread David Howells
Matthew Wilcox  wrote:

> On Tue, Mar 23, 2021 at 01:17:20PM +0000, David Howells wrote:
> > +++ b/fs/afs/write.c
> > @@ -846,7 +846,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
> >  */
> >  #ifdef CONFIG_AFS_FSCACHE
> > if (PageFsCache(page) &&
> > -   wait_on_page_bit_killable(page, PG_fscache) < 0)
> > +   wait_on_page_fscache_killable(page) < 0)
> > return VM_FAULT_RETRY;
> >  #endif
> >  
> > @@ -861,7 +861,8 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
> >  * details the portion of the page we need to write back and we might
> >  * need to redirty the page if there's a problem.
> >  */
> > -   wait_on_page_writeback(page);
> > +   if (wait_on_page_writeback_killable(page) < 0)
> > +   return VM_FAULT_RETRY | VM_FAULT_LOCKED;
> 
> You forgot to unlock the page.

Do I need to?  Doesn't VM_FAULT_LOCKED indicate that to the caller?  Or is it
impermissible to do it like that?

> Also, if you're waiting killably here, do you need to wait before you get
> the page lock?  Ditto for waiting on fscache -- do you want to do that
> before or after you get the page lock?

I'm waiting both before and after.  If I wait before, write() can go and
trample over the page between PG_writeback/PG_fscache being cleared and us
getting the lock here.  Probably I should only be waiting after locking the
page.

> Also, I never quite understood why you needed to wait for fscache
> writes to finish before allowing the page to be dirtied.  Is this a
> wait_for_stable_page() kind of situation, where the cache might be
> calculating a checksum on it?  Because as far as I can tell, once the
> page is dirty in RAM, the contents of the on-disk cache are irrelevant ...
> unless they're part of a RAID 5 checksum kind of situation.

Um.  I do want to add disconnected operation in the future and cache
encryption, but, as things currently stand, it isn't necessary because the
cache object is marked "in use" and will be discarded on rebinding after a
power loss or crash if it's still marked when it's opened again.

Also, the thought has occurred to me that I can make use of reflink copy to
handle the caching of local modifications to cached files, in which case I'd
rather have a clean copy to link from.

David



Re: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache

2021-03-23 Thread David Howells
David Howells  wrote:

> Matthew Wilcox  wrote:
> 
> > That also brings up that there is no set_page_private_2().  I think
> > that's OK -- you only set PageFsCache() immediately after reading the
> > page from the server.  But I feel this "unlock_page_private_2" is actually
> > "clear_page_private_2" -- ie it's equivalent to writeback, not to lock.
> 
> How about I do the following:
> 
>  (1) Add set_page_private_2() or mark_page_private_2() to set the PG_fscache_2
>  bit.  It could take a ref on the page here.
> 
>  (2) Rename unlock_page_private_2() to end_page_private_2().  It could drop
>  the ref on the page here, but that then means I can't use
>  pagevec_release().
> 
>  (3) Add wait_on_page_private_2() an analogue of wait_on_page_writeback()
>  rather than wait_on_page_locked().
> 
>  (4) Provide fscache synonyms of the above.

Perhaps something like the attached changes (they'll need merging back into
the other patches).

David
---
 include/linux/pagemap.h |   21 +-
 include/linux/netfs.h   |   54 
 fs/afs/write.c  |5 ++--
 fs/netfs/read_helper.c  |   17 +--
 mm/filemap.c|   49 +++
 mm/page-writeback.c |   25 ++
 6 files changed, 139 insertions(+), 32 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index bf05e99ce588..5c14a9365aae 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -591,7 +591,6 @@ extern int __lock_page_async(struct page *page, struct 
wait_page_queue *wait);
 extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
 extern void unlock_page(struct page *page);
-void unlock_page_private_2(struct page *page);
 
 /*
  * Return true if the page was successfully locked
@@ -684,11 +683,31 @@ static inline int wait_on_page_locked_killable(struct 
page *page)
 
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
+int wait_on_page_writeback_killable(struct page *page);
 extern void end_page_writeback(struct page *page);
 void wait_for_stable_page(struct page *page);
 
 void page_endio(struct page *page, bool is_write, int err);
 
+/**
+ * set_page_private_2 - Set PG_private_2 on a page and take a ref
+ * @page: The page.
+ *
+ * Set the PG_private_2 flag on a page and take the reference needed for the VM
+ * to handle its lifetime correctly.  This sets the flag and takes the
+ * reference unconditionally, so care must be taken not to set the flag again
+ * if it's already set.
+ */
+static inline void set_page_private_2(struct page *page)
+{
+   get_page(page);
+   SetPagePrivate2(page);
+}
+
+void end_page_private_2(struct page *page);
+void wait_on_page_private_2(struct page *page);
+int wait_on_page_private_2_killable(struct page *page);
+
 /*
  * Add an arbitrary waiter to a page's wait queue
  */
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 9d3fbed4e30a..2299e7662ff0 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -29,32 +29,60 @@
 #define TestClearPageFsCache(page) TestClearPagePrivate2((page))
 
 /**
- * unlock_page_fscache - Unlock a page that's locked with PG_fscache
- * @page: The page
+ * set_page_fscache - Set PG_fscache on a page and take a ref
+ * @page: The page.
  *
- * Unlocks a page that's locked with PG_fscache and wakes up sleepers in
- * wait_on_page_fscache().  This page bit is used by the netfs helpers when a
- * netfs page is being written to a local disk cache, thereby allowing writes
- * to the cache for the same page to be serialised.
+ * Set the PG_fscache (PG_private_2) flag on a page and take the reference
+ * needed for the VM to handle its lifetime correctly.  This sets the flag and
+ * takes the reference unconditionally, so care must be taken not to set the
+ * flag again if it's already set.
  */
-static inline void unlock_page_fscache(struct page *page)
+static inline void set_page_fscache(struct page *page)
 {
-   unlock_page_private_2(page);
+   set_page_private_2(page);
 }
 
 /**
- * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * end_page_fscache - Clear PG_fscache and release any waiters
  * @page: The page
  *
- * Wait for the PG_fscache (PG_private_2) page bit to be removed from a page.
- * This is, for example, used to handle a netfs page being written to a local
+ * Clear the PG_fscache (PG_private_2) bit on a page and wake up any sleepers
+ * waiting for this.  The page ref held for PG_private_2 being set is released.
+ *
+ * This is, for example, used when a netfs page is being written to a local
  * disk cache, thereby allowing writes to the cache for the same page to be
  * serialised.
  *

[PATCH 3/3] afs: Use wait_on_page_writeback_killable

2021-03-23 Thread David Howells
From: Matthew Wilcox (Oracle) 

Open-coding this function meant it missed out on the recent bugfix
for waiters being woken by a delayed wake event from a previous
instantiation of the page.

[DH: Changed the patch to use vmf->page rather than variable page which
 doesn't exist yet upstream]

Fixes: 1cf7a1518aef ("afs: Implement shared-writeable mmap")
Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux...@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-4-wi...@infradead.org
---

 fs/afs/write.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index c9195fc67fd8..eb737ed63afb 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -851,8 +851,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
fscache_wait_on_page_write(vnode->cache, vmf->page);
 #endif
 
-   if (PageWriteback(vmf->page) &&
-   wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
+   if (wait_on_page_writeback_killable(vmf->page))
return VM_FAULT_RETRY;
 
if (lock_page_killable(vmf->page) < 0)




[PATCH 2/3] mm/writeback: Add wait_on_page_writeback_killable

2021-03-23 Thread David Howells
From: Matthew Wilcox (Oracle) 

This is the killable version of wait_on_page_writeback.

Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Signed-off-by: David Howells 
cc: linux-...@lists.infradead.org
cc: linux...@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-3-wi...@infradead.org
---

 include/linux/pagemap.h |1 +
 mm/page-writeback.c |   16 
 2 files changed, 17 insertions(+)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 8f4daac6eb4b..8c9947fd62f3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -682,6 +682,7 @@ static inline int wait_on_page_locked_killable(struct page 
*page)
 
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
+int wait_on_page_writeback_killable(struct page *page);
 extern void end_page_writeback(struct page *page);
 void wait_for_stable_page(struct page *page);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index eb34d204d4ee..9e35b636a393 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2833,6 +2833,22 @@ void wait_on_page_writeback(struct page *page)
 }
 EXPORT_SYMBOL_GPL(wait_on_page_writeback);
 
+/*
+ * Wait for a page to complete writeback.  Returns -EINTR if we get a
+ * fatal signal while waiting.
+ */
+int wait_on_page_writeback_killable(struct page *page)
+{
+   while (PageWriteback(page)) {
+   trace_wait_on_page_writeback(page, page_mapping(page));
+   if (wait_on_page_bit_killable(page, PG_writeback))
+   return -EINTR;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(wait_on_page_writeback_killable);
+
 /**
  * wait_for_stable_page() - wait for writeback to finish, if necessary.
  * @page:  The page to wait on.




[PATCH 1/3] fs/cachefiles: Remove wait_bit_key layout dependency

2021-03-23 Thread David Howells
From: Matthew Wilcox (Oracle) 

Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile.  Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility

A comment on the need to maintain structure layout equivalence was added by
Linus[1] and that is no longer applicable.

Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a 
page bit")
Signed-off-by: Matthew Wilcox (Oracle) 
Reviewed-by: Christoph Hellwig 
Signed-off-by: David Howells 
cc: linux-cach...@redhat.com
cc: linux...@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-2-wi...@infradead.org/
Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2
 [1]
---

 fs/cachefiles/rdwr.c|7 +++
 include/linux/pagemap.h |1 -
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e027c718ca01..8ffc40e84a59 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -24,17 +24,16 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, 
unsigned mode,
container_of(wait, struct cachefiles_one_read, monitor);
struct cachefiles_object *object;
struct fscache_retrieval *op = monitor->op;
-   struct wait_bit_key *key = _key;
+   struct wait_page_key *key = _key;
struct page *page = wait->private;
 
ASSERT(key);
 
_enter("{%lu},%u,%d,{%p,%u}",
   monitor->netfs_page->index, mode, sync,
-  key->flags, key->bit_nr);
+  key->page, key->bit_nr);
 
-   if (key->flags != >flags ||
-   key->bit_nr != PG_locked)
+   if (key->page != page || key->bit_nr != PG_locked)
return 0;
 
_debug("--- monitor %p %lx ---", page, page->flags);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 20225b067583..8f4daac6eb4b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -559,7 +559,6 @@ static inline pgoff_t linear_page_index(struct 
vm_area_struct *vma,
return pgoff;
 }
 
-/* This has the same layout as wait_bit_key - see fs/cachefiles/rdwr.c */
 struct wait_page_key {
struct page *page;
int bit_nr;




[PATCH 0/3] cachefiles, afs: mm wait fixes

2021-03-23 Thread David Howells


Here are some patches to fix page waiting-related issues in cachefiles and
afs[1]:

 (1) In cachefiles, remove the use of the wait_bit_key struct to access
 something that's actually in wait_page_key format.  The proper struct
 is now available in the header, so that should be used instead.

 (2) Add a proper wait function for waiting killably on the page writeback
 flag.  This includes a recent bugfix here (presumably commit
 c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7).

 (3) In afs, use the function added in (2) rather than using
 wait_on_page_bit_killable() which doesn't have the aforementioned
 bugfix.

 Note that I modified this to work with the upstream code where the
 page pointer isn't cached in a local variable.

The patches can be found here:


https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes

David

Link: https://lore.kernel.org/r/20210320054104.1300774-1-wi...@infradead.org[1]

---
Matthew Wilcox (Oracle) (3):
  fs/cachefiles: Remove wait_bit_key layout dependency
  mm/writeback: Add wait_on_page_writeback_killable
  afs: Use wait_on_page_writeback_killable


 fs/afs/write.c  |  3 +--
 include/linux/pagemap.h |  1 +
 mm/page-writeback.c | 16 
 3 files changed, 18 insertions(+), 2 deletions(-)




Re: [PATCH v5 03/27] afs: Use wait_on_page_writeback_killable

2021-03-23 Thread David Howells
Matthew Wilcox (Oracle)  wrote:

> Open-coding this function meant it missed out on the recent bugfix

Would that be:

c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7
mm: make wait_on_page_writeback() wait for multiple pending writebacks

David



  1   2   3   4   5   6   7   8   9   10   >