** Description changed: Hi, [Impact] Currently in focal, devices reporter recovery is enabled even if state is healthy. + + [fix] + 402818205c9e devlink: don't do reporter recovery if the state is healthy + this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal tree. + the commit prevents reporter recovery when device in healthy state. + when applied, issuing + # devlink health recover pci/0000:05:00.0 reporter fw_fatal + on healthy state reporter return successfully, but dmesg is clean and recover counter do not change. [test case] 1) display devlink health status # devlink health show pci/0000:05:00.0 reporter fw_fatal pci/0000:05:00.0: reporter fw_fatal state healthy error 0 recover 0 grace_period 1200000 auto_recover true 2) perform reporter recovery using devlink, # devlink health recover pci/0000:05:00.0 reporter fw_fatal 3)see that recovery was performed. # dmesg [776733.438708] mlx5_core 0000:05:00.0: mlx5_health_try_recover:316:(pid 563178): handling bad device here [776733.438717] mlx5_core 0000:05:00.0: mlx5_handle_bad_state:278:(pid 563178): Expected to see disabled NIC but it is full driver [776735.591522] mlx5_core 0000:05:00.0: mlx5_health_try_recover:328:(pid 563178): starting health recovery flow ... # devlink health show pci/0000:05:00.0 reporter fw_fatal pci/0000:05:00.0: reporter fw_fatal state healthy error 0 recover 1 grace_period 1200000 auto_recover true - [fix] - 402818205c9e devlink: don't do reporter recovery if the state is healthy - this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal tree. - the commit prevents reporter recovery when device in healthy state. - when applied, issuing - # devlink health recover pci/0000:05:00.0 reporter fw_fatal - on healthy state reporter return successfully, but dmesg is clean and recover counter do not change. - [Regression Potential] - very small as it is a very minor change, also this patch has been tested internally on upstream setups for a while and no degradation has been found. - one obvious change is that a user cannot force devlink recovery when state is healthy but I'm not aware of such use case. + Very small as it is a very minor change, also this patch has been tested internally on upstream setups for a while and no degradation has been found. + One obvious change is that a user cannot force devlink recovery when state is healthy but I'm not aware of such use case. Thanks, Amir
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915403 Title: devlink: don't do reporter recovery if the state is healthy To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1915403/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs