I have been working on this as best I can. However, I confess that I am not a kernel developer and have really no understanding of these tboot internals. Nevertheless here is a brief update. Please anyone feel free to share any ideas how to move forward to some resolution.
I got a desktop machine with rs232 serial output running tboot and reproduced the suspend problem that way and with this setup I can collect kernel printk and also cpu hotplug (cpuhp) tracing output. I have also thankfully got quite a bit of help from Vincent Donnefort who wrote the cpuhp changes (the commit I posted) that have exposed the issue. He has been very helpful, let me try to tell you what we have figured out. On suspend, I get into the tboot callback: static int tboot_dying_cpu(unsigned int cpu) { atomic_inc(&ap_wfs_count); if (num_online_cpus() == 1) { if (tboot_wait_for_aps(atomic_read(&ap_wfs_count))) return -EBUSY; } return 0; } but the tboot_wait_for_aps times out for me so the callback returns EBUSY. The problem with that happening is that there is not a rollback mechanism in place at this point in the cpuhp sequence. So I mean from cpuhp point of view, there is not even a mechanism to handle the tboot callback failure. Besides that, we don't know what could be a sensible thing to do in the case of EBUSY. What does it mean tboot is busy and what should be done about it? Please help us to understand. _______________________________________________ tboot-devel mailing list tboot-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tboot-devel