Hey Corey, thanks for taking a look! On Mon, 2020-09-07 at 19:34 -0500, Corey Minyard wrote: > On Mon, Sep 07, 2020 at 06:25:37PM +0200, Markus Boehme wrote: > > > > We have observed hosts with misbehaving BMCs that receive a Get Channel > > Info command but don't respond. This leads to an indefinite wait in the > > ipmi_msghandler's __scan_channels function, showing up as hung task > > messages for modprobe. > > > > Add a timeout waiting for the channel scan to complete. If the scan > > fails to complete within that time, treat that like IPMI 1.0 and only > > assume the presence of the primary IPMB channel at channel number 0. > [...] > While thinking about this, I realized an issue with these patches. > There should be timers in the lower layers that detect that the BMC does > not respond and should return an error response. This is supposed to be > guaranteed by the lower layer, you shouldn't need timers here. In fact, > if you abort with a timer here, you should get a lower layer reponds > later, causing other issues. > > So, this is wrong. If you are never getting a response, there is a bug > in the lower layer. If you are not getting the error response as > quickly as you would like, I'm not sure what to do about that. >
I see. I might not be able to get hold of more hosts behaving this way, but if I do, I'll dig deeper into why the lower layer timeouts didn't save us here. Thanks for the pointer. > > Signed-off-by: Stefan Nuernberger <s...@amazon.com> > > Signed-off-by: Markus Boehme <mark...@amazon.com> > > --- > > drivers/char/ipmi/ipmi_msghandler.c | 72 > > ++++++++++++++++++++----------------- > > 1 file changed, 39 insertions(+), 33 deletions(-) > > > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c > > b/drivers/char/ipmi/ipmi_msghandler.c > > index 2a2e8b2..9de9ba6 100644 > > --- a/drivers/char/ipmi/ipmi_msghandler.c > > +++ b/drivers/char/ipmi/ipmi_msghandler.c > > @@ -3315,46 +3315,52 @@ channel_handler(struct ipmi_smi *intf, struct > > ipmi_recv_msg *msg) > > */ > > static int __scan_channels(struct ipmi_smi *intf, struct ipmi_device_id > > *id) > > { > > - int rv; > > + long rv; > > + unsigned int set; > > > > - if (ipmi_version_major(id) > 1 > > - || (ipmi_version_major(id) == 1 > > - && ipmi_version_minor(id) >= 5)) { > > - unsigned int set; > > + if (ipmi_version_major(id) == 1 && ipmi_version_minor(id) < 5) { > This is incorrect, it will not correctly handle IPMI 0.x BMCs. Yes, > they exist. Interesting! I wasn't aware of those. Searching the web doesn't turn up much and the spec doesn't mention them either. Are these pre-release implementations of the IPMI 1.0 spec or some kind of "IPMI light"? Markus Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879