Kernel SRU request submitted: https://lists.ubuntu.com/archives/kernel-team/2020-July/thread.html#112154 Updating status to 'In Progress'.
** Changed in: linux (Ubuntu Focal) Status: New => In Progress ** Changed in: ubuntu-z-systems Status: Triaged => In Progress ** Description changed: + SRU Justification: + ================== + + [Impact] + + * Linux kernel panics due to kernel page fault in IRQ context when + running zfcp_erp_timeout_handler() calling zfcp_erp_notify(). + + [Fix] + + * 936e6b85da0476dd2edac7c51c68072da9fb4ba2 936e6b85da04 "scsi: zfcp: Fix + panic on ERP timeout for previously dismissed ERP action" + + [Test Case] + + * Requires an IBM z13/z13s or LinuxONE Rockhopper/Emperor system (or + newer) connected to zfcp capcble storage sub-system. + + * Initiate an (ERP) timeout (maybe by injection or by causing a slow + recovery otherwise). + + * Monitor the system log for any kernel panics. + + [Regression Potential] + + * The regression can be considered as medium since the modification is + platform specific / limited to s390x and again limited to the zfcp + layer. + + * Within zfcp it's further limited to the error recovery procedure (ERP) + of fcp and only touches zfcp_erp.c, means the code path is mainly active + under error conditions. + + [Other] + + * The above fix is upstream accepted with v5.8-rc3, hence will make it's + way to groovy with kernel 5.8. + + * Therefore this SRU request was submitted for bionic and focal only and + not for groovy. + + __________ + Description: zfcp: Fix panic on ERP timeout for previously dismissed ERP Symptom: Linux kernel panic due to kernel page fault in IRQ context - when running zfcp_erp_timeout_handler() calling - zfcp_erp_notify(). + when running zfcp_erp_timeout_handler() calling + zfcp_erp_notify(). Problem: Suppose that, for unrelated reasons, FSF requests on behalf - of recovery are very slow and can run into the ERP timeout. - In the case at hand, we did adapter recovery to a large - degree. However due to the slowness a LUN open is pending so - the corresponding fc_rport remains blocked. After - fast_io_fail_tmo we trigger close physical port recovery for - the port under which the LUN should have been opened. The - new higher order port recovery dismisses the pending LUN - open ERP action and dismisses the pending LUN open FSF - request. Such dismissal decouples the ERP action from the - pending corresponding FSF request by setting - zfcp_fsf_req->erp_action to NULL (among other things) - [zfcp_erp_strategy_check_fsfreq()]. - If now the ERP timeout for the pending open LUN request runs - out, we must not use zfcp_fsf_req->erp_action in the ERP - timeout handler. This is a problem since v4.15 commit - 75492a51568b ("s390/scsi: Convert timers to use - timer_setup()"). Before that we intentionally only passed - zfcp_erp_action as context argument to - zfcp_erp_timeout_handler(). - Note: The lifetime of the corresponding zfcp_fsf_req object - continues until a (late) response or an (unrelated) adapter - recovery. + of recovery are very slow and can run into the ERP timeout. + In the case at hand, we did adapter recovery to a large + degree. However due to the slowness a LUN open is pending so + the corresponding fc_rport remains blocked. After + fast_io_fail_tmo we trigger close physical port recovery for + the port under which the LUN should have been opened. The + new higher order port recovery dismisses the pending LUN + open ERP action and dismisses the pending LUN open FSF + request. Such dismissal decouples the ERP action from the + pending corresponding FSF request by setting + zfcp_fsf_req->erp_action to NULL (among other things) + [zfcp_erp_strategy_check_fsfreq()]. + If now the ERP timeout for the pending open LUN request runs + out, we must not use zfcp_fsf_req->erp_action in the ERP + timeout handler. This is a problem since v4.15 commit + 75492a51568b ("s390/scsi: Convert timers to use + timer_setup()"). Before that we intentionally only passed + zfcp_erp_action as context argument to + zfcp_erp_timeout_handler(). + Note: The lifetime of the corresponding zfcp_fsf_req object + continues until a (late) response or an (unrelated) adapter + recovery. Solution: Just like the regular response path ignores dismissed - requests [zfcp_fsf_req_complete() => - zfcp_fsf_protstatus_eval() => return early] the ERP timeout - handler now needs to ignore dismissed requests. So simply - return early in the ERP timeout handler if the FSF request - is marked as dismissed in its status flags. To protect - against the race where zfcp_erp_strategy_check_fsfreq() - dismisses and sets zfcp_fsf_req->erp_action to NULL after - our previous status flag check, return early if - zfcp_fsf_req->erp_action is NULL. After all, the former ERP - action does not need to be woken up as that was already done - as part of the dismissal above [zfcp_erp_action_dismiss()]. + requests [zfcp_fsf_req_complete() => + zfcp_fsf_protstatus_eval() => return early] the ERP timeout + handler now needs to ignore dismissed requests. So simply + return early in the ERP timeout handler if the FSF request + is marked as dismissed in its status flags. To protect + against the race where zfcp_erp_strategy_check_fsfreq() + dismisses and sets zfcp_fsf_req->erp_action to NULL after + our previous status flag check, return early if + zfcp_fsf_req->erp_action is NULL. After all, the former ERP + action does not need to be woken up as that was already done + as part of the dismissal above [zfcp_erp_action_dismiss()]. Upstream-ID: 936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8 Will be integrated by kernel 5.8 by groovy. Please check that this also be integrated into 20.04 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1887774 Title: [UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed ERP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887774/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs