Public bug reported:

{forward from James Troup}:

Just to followup to this with a little more information, we have now
reproduced this in the following scenarios:

 * Ubuntu kernel 4.4 (i.e. 16.04) and kernel 4.8 (i.e. HWE-Y)
 * With and without Bcache involved
 * With both XFS and ext4
 * With HIO driver versions 2.1.0-23 and 2.1.0-25
 * With HIO Firmware 640 and 650
 * With and without the following two patches
  - 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=7290fa97b945c288d8dd8eb8f284b98cb495b35b
  - 
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=901a3142db778ddb9ed6a9000ce8e5b0f66c48ba

In all cases, we applied the following two patches in order to get hio
to build at all with a 4.4 or later kernel:

  
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=0abbb90372847caeeedeaa9db0f21e05ad8e9c74
  
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/xenial/commit/?id=a0705c5ff3d12fc31f18f5d3c8589eaaed1aa577

We've confirmed that we can reproduce the corruption on any machine in
Tele2's Vienna facility.

We've confirmed that, other than 1 machine, the 'hio_info' command
says the health is 'OK'.

Our most common reproducer is one of two scenarios:

 a) http://paste.ubuntu.com/23405150/

 b) http://paste.ubuntu.com/23405234/

In the last example, it's possible to see corruption faster by
increasing the 'count' argument to dd and avoid it by lowering it.
e.g. on the machine I'm currently testing on count=52450 doesn't
appear to show corruption, but a count of even 53000 would show it
immediately every time.

I hope this helps - please let us know what further information we can
provide to debug this problem.

** Affects: linux (Ubuntu)
     Importance: High
     Assignee: Kamal Mostafa (kamalmostafa)
         Status: In Progress

** Affects: linux (Ubuntu Xenial)
     Importance: High
     Assignee: Kamal Mostafa (kamalmostafa)
         Status: In Progress

** Affects: linux (Ubuntu Yakkety)
     Importance: High
     Assignee: Kamal Mostafa (kamalmostafa)
         Status: In Progress

** Affects: linux (Ubuntu Zesty)
     Importance: High
     Assignee: Kamal Mostafa (kamalmostafa)
         Status: In Progress


** Tags: bot-stop-nagging

** Also affects: linux (Ubuntu Yakkety)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Zesty)
   Importance: High
     Assignee: Kamal Mostafa (kamalmostafa)
       Status: In Progress

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Yakkety)
       Status: New => In Progress

** Changed in: linux (Ubuntu Xenial)
       Status: New => In Progress

** Changed in: linux (Ubuntu Yakkety)
     Assignee: (unassigned) => Kamal Mostafa (kamalmostafa)

** Changed in: linux (Ubuntu Xenial)
     Assignee: (unassigned) => Kamal Mostafa (kamalmostafa)

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Yakkety)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1638700

Title:
  hio: SSD data corruption under stress test

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1638700/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to