** Summary changed:

- kernel panic using CIFS share smb2_push_mandatory_locks
+ kernel panic using CIFS share in smb2_push_mandatory_locks()

** Description changed:

- Description:    Ubuntu 16.04.5 LTS
- Release:        16.04
+ [Impact]
  
- Kernel: 4.15.0-36-generic #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC
- 2018 x86_64 x86_64 x86_64 GNU/Linux
+ * We got reports of a kernel crash in cifs module with the following
+ signature:
  
- Under load, getting a system crash when accessing files on an SMB3
- share. dmesg from crash attached. I can upload the crash dump if needed.
+ BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
+ IP: smb2_push_mandatory_locks+0x10e/0x3b0 [cifs]
+ PGD 0 P4D 0
+ RIP: 0010:smb2_push_mandatory_locks+0x10e/0x3b0 [cifs]
+ Call Trace:
+  cifs_oplock_break+0x12f/0x3d0 [cifs]
+  process_one_work+0x14d/0x410
+  worker_thread+0x4b/0x460
+  kthread+0x105/0x140
+ [...]
  
- Share is mounted with the following options:
+ Low-level analysis (decodecode script output and the objdump of the
+ function) revealed that we are crashing in a NULL ptr dereference when
+ trying to access "cfile->tlink"; below, a snippet of the objdump at
+ function smb2_push_mandatory_locks():
  
- 
"ro,_netdev,username=*****,password=*****,domain=*****,vers=3.02,sec=ntlmsspi,nounix,noserverino,nobrl,cache=none"
+ [...]
+ mov    0x10(%r14),%r15   # %r15 = cifsFileInfo *cfile
+ mov    0x18(%r14),%rbx   # %rbx = cifsLockInfo *li = (fdlocks->locks)
+ lea    0x18(%r14),%r12
+ mov    0x90(%r15),%rax   # %rax = struct tcon_link *tlink (cfile->tlink)
+ cmp    %r12,%rbx
+ mov    0x38(%rax),%rax   # <--- TRAP [trying to get cifs_tcon *tl_tcon]
+ [...]
  
- Dmesg points to the cifs module
+ * After discussing the issue with CIFS maintainers (Steve French and
+ Pavel Shilovsky) they suggested commit b98749cac4a6 ("CIFS: keep
+ FileInfo handle live during oplock break")
+ [http://git.kernel.org/linus/b98749cac4a6] as a fix for multiple reports
+ of this kind of crash.
  
- [ 2192.662345] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000038
- [ 2192.662407] IP: smb2_push_mandatory_locks+0x10e/0x3b0 [cifs]
+ * The fix was sent to stable kernels and is present in Ubuntu kernels
+ 5.0 and newer. We are requesting the SRU for this patch here in order to
+ fix the crashes, after reports of successful testing with the patch (see
+ below section) and since the patch is restricted to the cifs module
+ scope and accepted on linux stable.
+ 
+ * Alternatively the issue is known to be avoided when oplocks are
+ disabled using "cifs.enable_oplocks=N" module parameter.
+ 
+ [Test case]
+ 
+ * Unfortunately we cannot reproduce the issue. The patch proposed here was
+ validated by us with xfstests (instructions followed from 
+ https://wiki.samba.org/index.php/Xfstesting-cifs) and fio. Also, we
+ have a user report of test validation using LISA 
(https://github.com/LIS/LISAv2).
+ 
+ * Using xfstest with the exclusions proposed in the link above we
+ managed to get the same results as a non-patched kernel, i.e., the same
+ tests failed in both kernels, we didn't get worse results with the
+ patch. Fio also didn't show noticeable performance regression with the
+ patch.
+ 
+ [Regression potential]
+ 
+ * The patch was validated by the cifs filesystem maintainers (in fact
+ they suggested its inclusion in Ubuntu) and by the aforementioned tests;
+ also, the scope is restricted to cifs only so the likelihood of
+ regressions is considered low.
+ 
+ *Due to the nature of the code modification (add a new reference of a
+ file handler and manipulate it in different places), I consider that if
+ we have a regression it'll manifest as deadlock/blocked tasks, not
+ something more serious like crashes or data corruption.

** Description changed:

  [Impact]
  
  * We got reports of a kernel crash in cifs module with the following
  signature:
  
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
  IP: smb2_push_mandatory_locks+0x10e/0x3b0 [cifs]
  PGD 0 P4D 0
  RIP: 0010:smb2_push_mandatory_locks+0x10e/0x3b0 [cifs]
  Call Trace:
-  cifs_oplock_break+0x12f/0x3d0 [cifs]
-  process_one_work+0x14d/0x410
-  worker_thread+0x4b/0x460
-  kthread+0x105/0x140
+  cifs_oplock_break+0x12f/0x3d0 [cifs]
+  process_one_work+0x14d/0x410
+  worker_thread+0x4b/0x460
+  kthread+0x105/0x140
  [...]
  
- Low-level analysis (decodecode script output and the objdump of the
+ * Low-level analysis (decodecode script output and the objdump of the
  function) revealed that we are crashing in a NULL ptr dereference when
  trying to access "cfile->tlink"; below, a snippet of the objdump at
  function smb2_push_mandatory_locks():
  
  [...]
  mov    0x10(%r14),%r15   # %r15 = cifsFileInfo *cfile
  mov    0x18(%r14),%rbx   # %rbx = cifsLockInfo *li = (fdlocks->locks)
  lea    0x18(%r14),%r12
  mov    0x90(%r15),%rax   # %rax = struct tcon_link *tlink (cfile->tlink)
  cmp    %r12,%rbx
  mov    0x38(%rax),%rax   # <--- TRAP [trying to get cifs_tcon *tl_tcon]
  [...]
  
  * After discussing the issue with CIFS maintainers (Steve French and
  Pavel Shilovsky) they suggested commit b98749cac4a6 ("CIFS: keep
  FileInfo handle live during oplock break")
  [http://git.kernel.org/linus/b98749cac4a6] as a fix for multiple reports
  of this kind of crash.
  
  * The fix was sent to stable kernels and is present in Ubuntu kernels
  5.0 and newer. We are requesting the SRU for this patch here in order to
  fix the crashes, after reports of successful testing with the patch (see
  below section) and since the patch is restricted to the cifs module
  scope and accepted on linux stable.
  
  * Alternatively the issue is known to be avoided when oplocks are
  disabled using "cifs.enable_oplocks=N" module parameter.
  
  [Test case]
  
  * Unfortunately we cannot reproduce the issue. The patch proposed here was
- validated by us with xfstests (instructions followed from 
+ validated by us with xfstests (instructions followed from
  https://wiki.samba.org/index.php/Xfstesting-cifs) and fio. Also, we
  have a user report of test validation using LISA 
(https://github.com/LIS/LISAv2).
  
  * Using xfstest with the exclusions proposed in the link above we
  managed to get the same results as a non-patched kernel, i.e., the same
  tests failed in both kernels, we didn't get worse results with the
  patch. Fio also didn't show noticeable performance regression with the
  patch.
  
  [Regression potential]
  
  * The patch was validated by the cifs filesystem maintainers (in fact
  they suggested its inclusion in Ubuntu) and by the aforementioned tests;
  also, the scope is restricted to cifs only so the likelihood of
  regressions is considered low.
  
- *Due to the nature of the code modification (add a new reference of a
+ * Due to the nature of the code modification (add a new reference of a
  file handler and manipulate it in different places), I consider that if
  we have a regression it'll manifest as deadlock/blocked tasks, not
  something more serious like crashes or data corruption.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1795659

Title:
  kernel panic using CIFS share in smb2_push_mandatory_locks()

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1795659/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to