Hi, Paweł, Thanks for your help. I was just starting to think the same thing, the code seemed to be trying to put all cpus into WFS. I tested your patch on the 5.12 commit that broke and also the latest 5.18 git kernel and indeed it works fine. On a quad core I can see 3 cores go into WFS and then tboot continues take bfs to s3 state. Very good!
Derek On 5/27/22 9:50 AM, Randzio, Pawel wrote: > Hi Derek, > > I wish to gladly inform you that I've fixed the bug preventing suspend with > Tboot. > > It was in the piece of code that you've highlighted - the atomic increment on > ap_wfs_count variable need have been moved into the else clause of the > following if statement, something like this: > > static int tboot_dying_cpu(unsigned int cpu) { > if (num_online_cpus() == 1) { > if (tboot_wait_for_aps(atomic_read(&ap_wfs_count))) > return -EBUSY; > } else > atomic_inc(&ap_wfs_count); > return 0; > } > > Otherwise, when the procedure shutdown all APs, BSP was also counted in as > one of them, thus creating a mismatch between what kernel had counted and > what TBOOT reported back to it. > This caused a kernel panic, which froze the machine while putting it into > suspend. > > I've sent a version of TBOOT Live Image equipped with this patch of the > kernel to our validation teams and I'm awaiting their response after the > weekend. > If I get positive results from their tests I will try and have this patch > attached to the official kernel repository and featured in versions 5.19 and > up. > > Kind regards, > Paweł > > > > > > -----Original Message----- > From: Randzio, Pawel > Sent: Friday, May 13, 2022 12:12 PM > To: 'Derek Dolney' <z...@posteo.net>; Łukasz Hawryłko <luk...@hawrylko.pl> > Cc: tboot-devel@lists.sourceforge.net > Subject: RE: [tboot-devel] suspend problem since kernel 5.15 > > Hi Derek, > > I've also been picking at that issue for the past few months and reached the > same wall as you have with the -EBUSY callback return, although your message > kind of gives me a new idea for where to look for the root cause as it seems > I have not tracked all possible callbacks. > I'm not a kernel developer either and honestly debugging that S3 issue is > troublesome to me too, to say the least. If anyone on this mailing list has > any idea how to further the debugging or even better - solve this issue, > please feel free to share ideas. > > On a side note, please add Vincent into the communication, that might speed > up the process. > Vincent may add others that could know what might be going on with that issue. > > Thanks, > Paweł > > -----Original Message----- > From: Derek Dolney <z...@posteo.net> > Sent: Thursday, May 12, 2022 6:20 PM > To: Łukasz Hawryłko <luk...@hawrylko.pl>; tboot-devel@lists.sourceforge.net > Subject: Re: [tboot-devel] suspend problem since kernel 5.15 > > I have been working on this as best I can. However, I confess that I am not a > kernel developer and have really no understanding of these tboot internals. > Nevertheless here is a brief update. Please anyone feel free to share any > ideas how to move forward to some resolution. > > I got a desktop machine with rs232 serial output running tboot and reproduced > the suspend problem that way and with this setup I can collect kernel printk > and also cpu hotplug (cpuhp) tracing output. I have also thankfully got quite > a bit of help from Vincent Donnefort who wrote the cpuhp changes (the commit > I posted) that have exposed the issue. He has been very helpful, let me try > to tell you what we have figured out. > > On suspend, I get into the tboot callback: > > static int tboot_dying_cpu(unsigned int cpu) { > atomic_inc(&ap_wfs_count); > if (num_online_cpus() == 1) { > if (tboot_wait_for_aps(atomic_read(&ap_wfs_count))) > return -EBUSY; > } > return 0; > } > > but the tboot_wait_for_aps times out for me so the callback returns EBUSY. > The problem with that happening is that there is not a rollback mechanism in > place at this point in the cpuhp sequence. So I mean from cpuhp point of > view, there is not even a mechanism to handle the tboot callback failure. > Besides that, we don't know what could be a sensible thing to do in the case > of EBUSY. What does it mean tboot is busy and what should be done about it? > Please help us to understand. > > > _______________________________________________ > tboot-devel mailing list > tboot-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tboot-devel > --------------------------------------------------------------------- > Intel Technology Poland sp. z o.o. > ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII > Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP > 957-07-52-316 | Kapital zakladowy 200.000 PLN. > Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i > moze zawierac informacje poufne. W razie przypadkowego otrzymania tej > wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; > jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione. > This e-mail and any attachments may contain confidential material for the > sole use of the intended recipient(s). If you are not the intended recipient, > please contact the sender and delete all copies; any review or distribution > by others is strictly prohibited. > _______________________________________________ tboot-devel mailing list tboot-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tboot-devel