** Description changed:

  [Impact]
  
  The mdadm package is missing the mdcheck script. This has two
  consequences:
  
  In the immediate term, that means that we get failed systemd units on
  all of our physical machines (because they have mirrored disks) as we
  upgrade them to 20.04. This raises alarms in our monitoring system as we
  monitor systemd unit failures.
  
  In the longer-term, this means that the arrays are not being checked. If
  a drive develops a bad sector, this would normally be caught by the
  checking and a good copy would be rewritten from the other side of the
  mirror. Without the checking, that will not happen. If the other drive
  (the one with the good version of the sector) dies, then that sector's
  data is lost permanently. The consequences of that depend on what that
  sector was storing, but it's not good, obviously.
  
  [Test Case]
  
  * systemctl start mdcheck_start.service
  
  * journalctl -u mdcheck_start
  -- Logs begin at Wed 2020-09-23 18:33:35 UTC, end at Wed 2020-09-23 18:40:27 
UTC. --
  Sep 23 18:40:27 mdadmgroovy systemd[1]: Starting MD array scrubbing...
  Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed to 
execute command: No such file or directory
  Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed at 
step EXEC spawning /usr/share/mdadm/mdcheck: No such file or directory
  Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Main process 
exited, code=exited, status=203/EXEC
  Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Failed with 
result 'exit-code'.
  Sep 23 18:40:27 mdadmgroovy systemd[1]: Failed to start MD array scrubbing.
  
  * ls -altr /usr/share/mdadm/mdcheck
  ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
  
  * dpkg -l mdadm
  ii  mdadm          4.1-5ubuntu1 amd64        tool to administer Linux MD 
arrays (software RAID)
  
  * dpkg -L mdadm | grep -i mdcheck
  /lib/systemd/system/mdcheck_continue.service
  /lib/systemd/system/mdcheck_continue.timer
  /lib/systemd/system/mdcheck_start.service
  /lib/systemd/system/mdcheck_start.timer
  
  [Regression Potential]
  
  * 'misc/mdcheck' will be introduced in Ubuntu for the first time, and is
  pretty young in the Debian mdadm story too (introduced in Sept 12 2020).
  
  Not known fix since debian introduced it 2 weeks-ish ago has been added
  on top of it so far.
  
  $ git log --oneline --grep="mdcheck"
  5a3db0f Install misc/mdcheck; turn on hardening; enable dh_lintian. (Closes: 
#960132)
  f258a5e mdcheck: improve cleanup
  ea83549 mdcheck: add some logging.
  979b1fe mdcheck: be careful when sourcing the output of "mdadm --detail 
--export"
  36dab45 mdcheck: don't git error if not /dev/md?* devices exist.
  868ab80 mdcheck: don't pass the '+' to "date".
  df881f7 mdcheck: new script to help with regular checks of md arrays.
  
  And no presence of new opened bug(s) related to mdcheck introduction.
  
  At code inspection, 'mdcheck' script seems to be harmless (at least at
  first glance), of course, real case scenario testing within raid types
  situations will be needed to conclude during the verification testing
  phase, and if possible, running the script in debug mode (set -xv) might
  be a good idea to see the script workflow in action.
  
  This change will permit 'mdcheck' to be run on the first Sunday of each
  month for 6 hours (mdcheck_start.timer: OnCalendar=Sun *-*-1..7
  1:00:00), then on every subsequent morning until the check is finished
  (mdcheck_continue.timer:OnCalendar=daily).
  
  It's not a script that one would typically run manually on a regular
  basis.
  
  The script uses 'logger' to enter messages into the system log, so we
  will have a trace of its execution (in addition the systemd unit,timer
  usual logs)
  
  I would suggest we don't release the package in focal-updates before
- having at least one sample of a scheduled execution on the first Sunday
- of the month (October 4th ?), and have an impacted user to report
- feedback supported with logs.
+ having at least one sample of a 'natural' scheduled execution on the
+ first Sunday of the month (Next should be October 4th ?), and have
+ impacted users to report feedback supported with logs.
+ 
+ I think running it on Sunday is reasonable, (just like fstrim, zfs
+ scrub, ...). Typically, Sunday is a day when cron and timer runs to do
+ some execution like that.
+ 
+ One thing, I would like to confirm, but maybe not a blocker for this
+ case, is to make sure mdcheck_continue start fine when condition are
+ met.
  
  [Other Info]
  
  Debian bug:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132
  
  salsa commit:
  
https://salsa.debian.org/lechner/mdadm/-/commit/5a3db0f5429fc81e0f53cbf9aa473059b74fe057
  
  [Original Description]
  
  mdcheck_start.service trying to start unexisting file
  
  root@d:~# cat /lib/systemd/system/mdcheck_start.service  | grep Exec
  ExecStart=/usr/share/mdadm/mdcheck --duration $MDADM_CHECK_DURATION
  
  root@d:~# ls -la /usr/share/mdadm/mdcheck
  ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
  
  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: mdadm 4.1-2ubuntu3
  ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
  Uname: Linux 5.3.0-19-generic x86_64
  ApportVersion: 2.20.11-0ubuntu8.2
  Architecture: amd64
  Date: Fri Nov 15 13:13:17 2019
  Lspci: Error: [Errno 2] No such file or directory: 'lspci': 'lspci'
  Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
  MachineType: HP HP EliteBook x360 1030 G3
  ProcEnviron:
   LANG=C
   TERM=screen
   PATH=(custom, no user)
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic 
root=/dev/mapper/system-root ro 
cryptdevice=UUID=95c107ea-73d0-4206-a31c-fb0ed6d7d6a9:cryptlvm 
mem_sleep_default=deep
  ProcMDstat:
   Personalities :
   unused devices: <none>
  SourcePackage: mdadm
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/07/2019
  dmi.bios.vendor: HP
  dmi.bios.version: Q90 Ver. 01.08.01
  dmi.board.name: 8438
  dmi.board.vendor: HP
  dmi.board.version: KBC Version 14.3F.00
  dmi.chassis.asset.tag: 5CD9296RDC
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrQ90Ver.01.08.01:bd08/07/2019:svnHP:pnHPEliteBookx3601030G3:pvr:rvnHP:rn8438:rvrKBCVersion14.3F.00:cvnHP:ct31:cvr:
  dmi.product.family: 103C_5336AN HP EliteBook x360
  dmi.product.name: HP EliteBook x360 1030 G3
  dmi.product.sku: 5SR46ES#ACB
  dmi.sys.vendor: HP
  etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
  initrd.files: Error: [Errno 2] No such file or directory: 
'/boot/initrd.img-5.3.0-19-generic'

** Description changed:

  [Impact]
  
  The mdadm package is missing the mdcheck script. This has two
  consequences:
  
  In the immediate term, that means that we get failed systemd units on
  all of our physical machines (because they have mirrored disks) as we
  upgrade them to 20.04. This raises alarms in our monitoring system as we
  monitor systemd unit failures.
  
  In the longer-term, this means that the arrays are not being checked. If
  a drive develops a bad sector, this would normally be caught by the
  checking and a good copy would be rewritten from the other side of the
  mirror. Without the checking, that will not happen. If the other drive
  (the one with the good version of the sector) dies, then that sector's
  data is lost permanently. The consequences of that depend on what that
  sector was storing, but it's not good, obviously.
  
  [Test Case]
  
  * systemctl start mdcheck_start.service
  
  * journalctl -u mdcheck_start
  -- Logs begin at Wed 2020-09-23 18:33:35 UTC, end at Wed 2020-09-23 18:40:27 
UTC. --
  Sep 23 18:40:27 mdadmgroovy systemd[1]: Starting MD array scrubbing...
  Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed to 
execute command: No such file or directory
  Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed at 
step EXEC spawning /usr/share/mdadm/mdcheck: No such file or directory
  Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Main process 
exited, code=exited, status=203/EXEC
  Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Failed with 
result 'exit-code'.
  Sep 23 18:40:27 mdadmgroovy systemd[1]: Failed to start MD array scrubbing.
  
  * ls -altr /usr/share/mdadm/mdcheck
  ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
  
  * dpkg -l mdadm
  ii  mdadm          4.1-5ubuntu1 amd64        tool to administer Linux MD 
arrays (software RAID)
  
  * dpkg -L mdadm | grep -i mdcheck
  /lib/systemd/system/mdcheck_continue.service
  /lib/systemd/system/mdcheck_continue.timer
  /lib/systemd/system/mdcheck_start.service
  /lib/systemd/system/mdcheck_start.timer
  
  [Regression Potential]
  
  * 'misc/mdcheck' will be introduced in Ubuntu for the first time, and is
  pretty young in the Debian mdadm story too (introduced in Sept 12 2020).
  
  Not known fix since debian introduced it 2 weeks-ish ago has been added
  on top of it so far.
  
  $ git log --oneline --grep="mdcheck"
  5a3db0f Install misc/mdcheck; turn on hardening; enable dh_lintian. (Closes: 
#960132)
  f258a5e mdcheck: improve cleanup
  ea83549 mdcheck: add some logging.
  979b1fe mdcheck: be careful when sourcing the output of "mdadm --detail 
--export"
  36dab45 mdcheck: don't git error if not /dev/md?* devices exist.
  868ab80 mdcheck: don't pass the '+' to "date".
  df881f7 mdcheck: new script to help with regular checks of md arrays.
  
  And no presence of new opened bug(s) related to mdcheck introduction.
  
  At code inspection, 'mdcheck' script seems to be harmless (at least at
  first glance), of course, real case scenario testing within raid types
  situations will be needed to conclude during the verification testing
  phase, and if possible, running the script in debug mode (set -xv) might
  be a good idea to see the script workflow in action.
  
  This change will permit 'mdcheck' to be run on the first Sunday of each
  month for 6 hours (mdcheck_start.timer: OnCalendar=Sun *-*-1..7
  1:00:00), then on every subsequent morning until the check is finished
  (mdcheck_continue.timer:OnCalendar=daily).
  
  It's not a script that one would typically run manually on a regular
  basis.
  
  The script uses 'logger' to enter messages into the system log, so we
  will have a trace of its execution (in addition the systemd unit,timer
  usual logs)
  
  I would suggest we don't release the package in focal-updates before
  having at least one sample of a 'natural' scheduled execution on the
  first Sunday of the month (Next should be October 4th ?), and have
  impacted users to report feedback supported with logs.
  
  I think running it on Sunday is reasonable, (just like fstrim, zfs
  scrub, ...). Typically, Sunday is a day when cron and timer runs to do
  some execution like that.
  
  One thing, I would like to confirm, but maybe not a blocker for this
- case, is to make sure mdcheck_continue start fine when condition are
- met.
+ case, is to make sure 'mdcheck_continue' starts fine when condition are
+ met, since it has never been tester due to 'mdcheck_start' failure due
+ to missing 'mdcheck' script.
  
  [Other Info]
  
  Debian bug:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132
  
  salsa commit:
  
https://salsa.debian.org/lechner/mdadm/-/commit/5a3db0f5429fc81e0f53cbf9aa473059b74fe057
  
  [Original Description]
  
  mdcheck_start.service trying to start unexisting file
  
  root@d:~# cat /lib/systemd/system/mdcheck_start.service  | grep Exec
  ExecStart=/usr/share/mdadm/mdcheck --duration $MDADM_CHECK_DURATION
  
  root@d:~# ls -la /usr/share/mdadm/mdcheck
  ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
  
  ProblemType: Bug
  DistroRelease: Ubuntu 19.10
  Package: mdadm 4.1-2ubuntu3
  ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
  Uname: Linux 5.3.0-19-generic x86_64
  ApportVersion: 2.20.11-0ubuntu8.2
  Architecture: amd64
  Date: Fri Nov 15 13:13:17 2019
  Lspci: Error: [Errno 2] No such file or directory: 'lspci': 'lspci'
  Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
  MachineType: HP HP EliteBook x360 1030 G3
  ProcEnviron:
   LANG=C
   TERM=screen
   PATH=(custom, no user)
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic 
root=/dev/mapper/system-root ro 
cryptdevice=UUID=95c107ea-73d0-4206-a31c-fb0ed6d7d6a9:cryptlvm 
mem_sleep_default=deep
  ProcMDstat:
   Personalities :
   unused devices: <none>
  SourcePackage: mdadm
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/07/2019
  dmi.bios.vendor: HP
  dmi.bios.version: Q90 Ver. 01.08.01
  dmi.board.name: 8438
  dmi.board.vendor: HP
  dmi.board.version: KBC Version 14.3F.00
  dmi.chassis.asset.tag: 5CD9296RDC
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.modalias: 
dmi:bvnHP:bvrQ90Ver.01.08.01:bd08/07/2019:svnHP:pnHPEliteBookx3601030G3:pvr:rvnHP:rn8438:rvrKBCVersion14.3F.00:cvnHP:ct31:cvr:
  dmi.product.family: 103C_5336AN HP EliteBook x360
  dmi.product.name: HP EliteBook x360 1030 G3
  dmi.product.sku: 5SR46ES#ACB
  dmi.sys.vendor: HP
  etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
  initrd.files: Error: [Errno 2] No such file or directory: 
'/boot/initrd.img-5.3.0-19-generic'

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1852747

Title:
  mdcheck_start.service trying to start unexisting file

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1852747/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to