Package: smartmontools Version: 6.5+svn4324-1 Severity: wishlist Tags: upstream
Dear Maintainer, Using Areca official proprietary `cli64` [0] utility to check raidset/volume status (used by Nagios/Icinga monitoring tools) produces "conflict" with `smartctl`. Munin and smartd monitoring fails at rare, but still annoyingly enough, cases, when `cli64` utility has opened `/dev/sg2` device with exclusive lock at the same time. This is example email sent by `smartd`: ``` This message was generated by the smartd daemon running on: host name: hostname DNS domain: my.tld The following warning/error was logged by the smartd daemon: Device: /dev/sg2 [areca_disk#01_enc#01], unable to open device Device info: WDC WD2005FBYZ-01YCBB2, S/N:WD-WMC..., WWN:5-0014ee-059..., FW:RR07, 2.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. Another message will be sent in 24 hours if the problem persists. ``` This is example of running `cli64` and `smartctl` at the same time: ``` # cli64 rsf info & [1] 3934 # smartctl -d areca,3 /dev/sg2 -x smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-0.bpo.1-amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org Unable to open /proc/scsi/sg/device_hdr for reading do_scsi_cmnd_io with write buffer failed code = ffffffff Unable to open /proc/scsi/sg/device_hdr for reading do_scsi_cmnd_io with write buffer failed code = ffffffff Unable to open /proc/scsi/sg/device_hdr for reading do_scsi_cmnd_io with write buffer failed code = ffffffff Smartctl open device: /dev/sg2 [areca_disk#03_enc#01] failed: Input/output error ``` `strace` shows that `cli64` uses `O_EXCL` mode: ``` # strace -t -f -e open cli64 rsf info ... 11:07:40 open("/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:0:16/0:0:16:0/scsi_generic/sg2/dev", O_RDONLY) = 3 11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 # Name Disks TotalCap FreeCap DiskChannels State =============================================================================== 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3 1 Gold 4 8000.0GB 0.0GB 139B Normal =============================================================================== GuiErrMsg<0x00>: Success. ``` So in the end, this produces syslog, email spam and some lost columns in Munin graphs. It also produces issues when you use configuration management system that tries to detect available drives, "hidden" under hw raid adapter, via `smartctl` calls, for rendering Munin/smartd configuration - results are flapping due to random lock conflicts. It would be "fixable" by retrying at least 2-3 times (probably with a delay of a second or so) in case of device inspected by smartctl is locked at the moment. Or maybe waiting on device, or other similar implementation. Although it is possible to workaround this issue in your own Salt modules by retrying `smartctl` call multiple times, but `smartd` would still have this issue. [0] http://www.areca.us/support/s_linux/driver/cli/linuxcli_V1.15.8_180529.zip -- System Information: Debian Release: 9.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.18.0-0.bpo.1-amd64 (SMP w/6 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages smartmontools depends on: ii debianutils 4.8.1.1 ii init-system-helpers 1.48 ii libc6 2.24-11+deb9u3 ii libcap-ng0 0.7.7-3+b1 ii libgcc1 1:6.3.0-18+deb9u1 ii libselinux1 2.6-3+b3 ii libstdc++6 6.3.0-18+deb9u1 ii lsb-base 9.20161125 Versions of packages smartmontools recommends: ii bsd-mailx [mailx] 8.1.2-0.20160123cvs-4 Versions of packages smartmontools suggests: pn gsmartcontrol <none> pn smart-notifier <none> -- Configuration Files: /etc/smartd.conf changed: /dev/sg2 -d areca,1 -H -l selftest -l error -f -m root -M exec /usr/share/smartmontools/smartd-runner /dev/sg2 -d areca,3 -H -l selftest -l error -f -m root -M exec /usr/share/smartmontools/smartd-runner /dev/sg2 -d areca,9 -H -l selftest -l error -f -m root -M exec /usr/share/smartmontools/smartd-runner /dev/sg2 -d areca,11 -H -l selftest -l error -f -m root -M exec /usr/share/smartmontools/smartd-runner DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner -- no debconf information