------- Comment From bbl...@de.ibm.com 2020-05-28 14:37 EDT-------
(In reply to comment #20)
> Hi Benjamin,
> if it's an issue somewhere in scsi-midlayer/block-layer/wbt wouldn't it then
> also happen with zFCP on DS8k and on other patforms?
> So far we did some testing with zFCP on DS8k (the only storage sub-system we
> have) as part of the release testing and server certification and on top we
> have constantly several zFCP systems currently running on 20.04 (probably
> less big systems and/or with less load), but so far we didn't faced a single
> crash.
> So I'm assuming more that is is XIV related, no?

Hey Frank,

I suspect this is a follow-on error from SCSI requests running into
timeouts and subsequently being aborted and LUN/Target resets being send
by the SCSI Error Handling code. Those cause abnormal request
terminations (its rather unusual to have request timeouts) that might
cause this WBT crash. At least that is my working theory so far.

I am looking into why the requests timeout in the first place in
parallel to this report internally. But anyway, I don't think it should
crash even with the timeouts. The last test also shows that if we
disable WBT the setup doesn't seem to crash anymore, although the
timeouts are still present - it "just" slows the workload for a time,
but ultimately recovers.

At this point I don't have any evidence that XIV causes this problem.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1881109

Title:
  [Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be
  triggered by scsi errors.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1881109/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to