** Attachment added: "reproduce-ceph-punch-hole-corruption.py"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144592/+attachment/5953206/+files/reproduce-ceph-punch-hole-corruption.py

** Description changed:

  Running Ceph FS on Ubuntu 24.04 (6.8 kernel) - Ubuntu
  6.8.0-100.100-generic 6.8.12
  
  Enclosed script reproduce-ceph-punch-hole-corruption.py exposes issue
  that we have found that on recent kernels CephFS silently corrupts 16KB
  of data before the requested hole when trying to punch a hole through
  file (test uses fallocate()). Corruption only occurs when hole touches
  or crosses a 4MB RADOS object boundary (4MB is the default stripe size).
  
  Execution shows the corruption:
  
  root@EdgeOS-5HB6Q54:/home/eceuser# python3 
./reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
  CephFS PUNCH_HOLE data corruption reproducer
  ============================================================
  Mount point: /Shared_DataStore/
  Object size: 4194304 (4 MiB)
  
  Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
  ------------------------------------------------------------
-   FAIL  1 page before boundary, 2 pages
-         hole=[4190208, 4198400)  checked=[4173824, 4190208)
-         16384/16384 bytes read as 0x00 (expected 0xFF)
-   FAIL  2 pages before boundary, 4 pages
-         hole=[4186112, 4202496)  checked=[4169728, 4186112)
-         16384/16384 bytes read as 0x00 (expected 0xFF)
-   FAIL  4 pages before boundary, 8 pages
-         hole=[4177920, 4210688)  checked=[4161536, 4177920)
-         16384/16384 bytes read as 0x00 (expected 0xFF)
-   FAIL  ends at boundary, 2 pages
-         hole=[4186112, 4194304)  checked=[4169728, 4186112)
-         16384/16384 bytes read as 0x00 (expected 0xFF)
-   FAIL  ends at boundary, 1 page
-         hole=[4190208, 4194304)  checked=[4173824, 4190208)
-         16384/16384 bytes read as 0x00 (expected 0xFF)
+   FAIL  1 page before boundary, 2 pages
+         hole=[4190208, 4198400)  checked=[4173824, 4190208)
+         16384/16384 bytes read as 0x00 (expected 0xFF)
+   FAIL  2 pages before boundary, 4 pages
+         hole=[4186112, 4202496)  checked=[4169728, 4186112)
+         16384/16384 bytes read as 0x00 (expected 0xFF)
+   FAIL  4 pages before boundary, 8 pages
+         hole=[4177920, 4210688)  checked=[4161536, 4177920)
+         16384/16384 bytes read as 0x00 (expected 0xFF)
+   FAIL  ends at boundary, 2 pages
+         hole=[4186112, 4194304)  checked=[4169728, 4186112)
+         16384/16384 bytes read as 0x00 (expected 0xFF)
+   FAIL  ends at boundary, 1 page
+         hole=[4190208, 4194304)  checked=[4173824, 4190208)
+         16384/16384 bytes read as 0x00 (expected 0xFF)
  
  Tests NOT crossing boundary (should always PASS):
  ------------------------------------------------------------
-   PASS  within object 0
-         hole=[4161536, 4169728)  checked=[4145152, 4161536)
-   PASS  mid object 0
-         hole=[1048576, 1056768)  checked=[1032192, 1048576)
-   PASS  start of object 1
-         hole=[4194304, 4202496)  checked=[4177920, 4194304)
-   PASS  within object 1
-         hole=[5242880, 5251072)  checked=[5226496, 5242880)
+   PASS  within object 0
+         hole=[4161536, 4169728)  checked=[4145152, 4161536)
+   PASS  mid object 0
+         hole=[1048576, 1056768)  checked=[1032192, 1048576)
+   PASS  start of object 1
+         hole=[4194304, 4202496)  checked=[4177920, 4194304)
+   PASS  within object 1
+         hole=[5242880, 5251072)  checked=[5226496, 5242880)
  
  ============================================================
  Results: 4 passed, 5 failed out of 9
  
  BUG CONFIRMED: This kernel has the CephFS PUNCH_HOLE corruption bug.
  
- Enclosed is a patch submission detailing issue, 0001-ceph-fix-data-
- corruption-from-short-read-on-punch-hole.patch
+ Enclosed is a patch submission detailing issue (AI created): 0001-ceph-
+ fix-data-corruption-from-short-read-on-punch-hole.patch
  
  With patch test script now passes:
  root@EdgeOS-3CD6Q54:~# python3 
/home/eceuser/reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
  CephFS PUNCH_HOLE data corruption reproducer
  ============================================================
  Mount point: /Shared_DataStore/
  Object size: 4194304 (4 MiB)
  
  Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
  ------------------------------------------------------------
-   PASS  1 page before boundary, 2 pages
-         hole=[4190208, 4198400)  checked=[4173824, 4190208)
-   PASS  2 pages before boundary, 4 pages
-         hole=[4186112, 4202496)  checked=[4169728, 4186112)
-   PASS  4 pages before boundary, 8 pages
-         hole=[4177920, 4210688)  checked=[4161536, 4177920)
-   PASS  ends at boundary, 2 pages
-         hole=[4186112, 4194304)  checked=[4169728, 4186112)
-   PASS  ends at boundary, 1 page
-         hole=[4190208, 4194304)  checked=[4173824, 4190208)
+   PASS  1 page before boundary, 2 pages
+         hole=[4190208, 4198400)  checked=[4173824, 4190208)
+   PASS  2 pages before boundary, 4 pages
+         hole=[4186112, 4202496)  checked=[4169728, 4186112)
+   PASS  4 pages before boundary, 8 pages
+         hole=[4177920, 4210688)  checked=[4161536, 4177920)
+   PASS  ends at boundary, 2 pages
+         hole=[4186112, 4194304)  checked=[4169728, 4186112)
+   PASS  ends at boundary, 1 page
+         hole=[4190208, 4194304)  checked=[4173824, 4190208)
  
  Tests NOT crossing boundary (should always PASS):
  ------------------------------------------------------------
-   PASS  within object 0
-         hole=[4161536, 4169728)  checked=[4145152, 4161536)
-   PASS  mid object 0
-         hole=[1048576, 1056768)  checked=[1032192, 1048576)
-   PASS  start of object 1
-         hole=[4194304, 4202496)  checked=[4177920, 4194304)
-   PASS  within object 1
-         hole=[5242880, 5251072)  checked=[5226496, 5242880)
+   PASS  within object 0
+         hole=[4161536, 4169728)  checked=[4145152, 4161536)
+   PASS  mid object 0
+         hole=[1048576, 1056768)  checked=[1032192, 1048576)
+   PASS  start of object 1
+         hole=[4194304, 4202496)  checked=[4177920, 4194304)
+   PASS  within object 1
+         hole=[5242880, 5251072)  checked=[5226496, 5242880)
  
  ============================================================
  Results: 9 passed, 0 failed out of 9
  
  All tests passed. This kernel is not affected (or the fix is applied).
  
  Appears as if following commit causes the issue:
  92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to describe various 
buffers") by David Howells, authored 2023-09-27, committed 2023-12-24. Merged 
in v6.8-rc1.
  
  This is only present in 6.8 and 6.9 kernels, 6.10 rewrote this activity
  under ee4cdf7ba857 ("netfs: Speed up buffered reading") by David
  Howells, 2024-07-02. Merged in v6.10.) which no longer has this issue.
  
  Asking for either analysis of enclosed patch to be included into Stable
  or if there is another/better way to fix.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2144592

Title:
  Punching hole through CephFS hosted file causes corruption when
  crossing 4MB RADOS object boundary

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144592/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to