On Tue, 11 Apr 2023 at 02:15, Norbert Braun <norb...@xrpbot.org> wrote: > > Hi all, > > I recently ran into a problem on Arch Linux ARM (32 bit) where logging > in as root on the console would often, but not always, fail (much like > in https://github.com/systemd/systemd/issues/17266). While investigating > the problem, I found the following: > > Systemd ships with two PAM modules, pam_systemd.so and > pam_systemd_home.so. Both of these use pam_acquire_bus_connection to > open a connection to the system bus. pam_acquire_bus_connection opens a > connection on the first call, then uses pam_set_data and pam_get_data to > cache the connection object for subsequent calls. Since the namespace > for pam_set_data/pam_get_data is shared between all PAM modules, it can > happen that one PAM module opens the connection and another one uses it. > In my case, pam_systemd_home.so opens the connection and sends the Hello > message. If the root user attempts to log in, pam_systemd_home.so exits > early and leaves the connection open, to be re-used by pam_systemd.so. > > This is problematic because struct sd_bus contains OrderedHashmap > *reply_callbacks, and OrderedHashmap internally uses a global variable > shared_hash_key. The PAM modules are statically linked with libsystemd, > so this variable effectively exists twice in each of the two PAM > modules. Since it is initialized to a random value, the value differs > between the PAM modules. In the scenario above, it therefore differs > between the sending of the Hello message and the processing of the > reply. Thus, when the reply to the Hello message arrives, process_reply > effectively looks for the reply cookie in a random hash bucket, and may > or may not find it. In the latter case, this eventually leads to the > somewhat cryptic error message: "pam_systemd(login:session): Failed to > create session: Input/output error". > > The problem is hidden on 64 bit systems, because the sizes of struct > ordered_hashmap_entry and struct indirect_storage are such that an > OrderedHashmap with direct storage only has a single bucket, and the > value of shared_hash_key is therefore irrelevant. On a 32 bit system, > however, the sizes are such that there are two buckets, and root login > fails, with 50% probability, with the error message mentioned above. As > expected from the above, it is possible to cause the problem to appear > on a 64 bit system by changing "uint8_t _pad[3];" to "uint8_t _pad[19];" > in struct indirect_storage (in src/basic/hashmap.c). > > After the above, another problem surfaces during cleanup: bus_free calls > ordered_hashmap_free_free(b->reply_callbacks), which calls free on each > value in the hashmap. However, the struct reply_callback that > sd_bus_call_async puts into the hashmap was not individually allocated, > but part of a larger struct sd_bus_slot. free is unhappy about that, and > the login process finally dies with a segmentation fault, aborting the > login attempt entirely. This problem is normally hidden by the fact that > reply_callbacks is empty by the time that bus_free is called.
Could you please file a bug on Github?