Peter,

First: happy new year!

I've been doing some more tests to track down the cause of this bug.
Since it looks like a kernel bug, I tried reproducing this with kernel
3.5.0, version 3.5.0-21.32~precise1. I could reproduce the faulty paths
that multipathd was unable to remove, however: there were no hanging
processes this time and thus no kernel crash.. which is an improvement.
During the test I did see this happening:

LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- #:#:#:# -   #:#   active faulty running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 4:0:0:1 sdg 8:96  active ready running
  `- #:#:#:# -   #:#   active faulty running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 ,
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- #:#:#:# -   #:#   failed faulty running
| `- 4:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- #:#:#:# -   #:#   active faulty running
  `- 4:0:1:0 sdh 8:112 active ready running

As you can see, multipathd fails to remove the 'faulty' paths from the
device-mapping again. However, for some reason this didn't lead to
processes stuck in 'D' state this time. During this, the following
message was logged repeatedly:

Jan  3 10:24:14 ealxs00161 multipathd: sdd: failed to get sysfs information
Jan  3 10:24:14 ealxs00161 multipathd: sdd: unusable path

So multipathd was retrying the removal, but it failed every time. After
bringing the path back up, it restored OK and everything was fine again:

LUN-DATABASE (36006016061e02e003cf1aca4ae07e211) dm-1 DGC,VRAID
size=200G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 4:0:1:1 sdi 8:128 active ready running
| `- 3:0:0:1 sdc 8:32  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 4:0:0:1 sdg 8:96  active ready running
  `- 3:0:1:1 sdf 8:80  active ready running
LUN-LOGGING (36006016061e02e000286c1adae07e211) dm-2 DGC,VRAID
size=20G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=130 status=active
| |- 3:0:1:0 sdd 8:48  active ready running
| `- 4:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 3:0:0:0 sdb 8:16  active ready running
  `- 4:0:1:0 sdh 8:112 active ready running

After this, failing over again worked just fine, the paths that failed
to be removed the last time were now removed without problems... Both
machines survived about 10 up/down testruns. I'll attach the syslog of
this run shortly.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1032550

Title:
  [multipath]  failed to get sysfs information

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1032550/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to