> Module Name: src > Committed By: mlelstv > Date: Sun Jul 18 06:57:28 UTC 2021 > > Modified Files: > src/sys/kern: kern_ksyms.c > > Log Message: > skip symbol tables that were unloaded again to avoid EFAULT when reading > ksyms. > > also restore TAILQ_FOREACH idiom.
This change isn't quite right: Reading past st = ksyms_last_snapshot in ksymsread, or in any context where we rely on the snapshot without holding ksyms_lock, is not allowed -- it will lead to a corrupted view of the snapshot, and possibly end up reading uninitialized memory. That's why this code didn't use TAILQ_FOREACH -- it must stop at ksyms_last_snapshot; if it gets to the end of the list, and witnesses st = NULL, then that's a bug. The code also didn't skip entries with st->sd_gone because we arrange for st->sd_nmap not to be freed until the last ksymsclose, at which point no more ksymsread can be active. If you start skipping some entries, that will cause assertions to fire about mismatched symbol table size which we computed up front in ksymsopen. I made a mistake in some previous changes: Although we defer freeing st->sd_nmap while /dev/ksyms is open, we do not defer freeing st->sd_strstart or st->sd_symstart. And it is possible for a concurrent module _unload_ to issue kobj_unload in the middle of concurrent ksymsread and free st->sd_strstart or st->sd_symstart while ksymsread is reading from them. (My last round of changes was done to fix bugs found during module _load_.) The commit message doesn't say, but I assume your change was meant to fix that bug. Unfortunately, the change doesn't completely fix it -- we could easily find ourselves in the following situation: CPU 0 CPU 1 ----- ----- read from /dev/ksyms modunload foo.kmod load st->sd_gone = false kobj_unload store st->sd_gone = true kobj_free(ko, ko->ko_symtab); copy from st->sd_symstart -> FAULT In order to avoid this situation, we either need to: 1. Make ksyms_modunload block until last /dev/ksyms close. => Easy enough: slap a condvar on ksyms_opencnt, wait for ksyms_opencnt == 0 in ksyms_modunload, notify if ksyms_opencnt == 0 in ksymsclose. => Holding /dev/ksyms open indefinitely can block module unload indefinitely; unclear if this might lead to problematic deadlock scenarios. => Bonus: Can eliminate some of the logic in ksymsclose, since ksyms_modunload will handle it synchronously before return anyway. 2. Convert ksymsread to use psref or localcount. => Takes a little more effort to convert tailq to pslist -- which requires some care because struct ksyms_symtab is shared with userland. So we still have to keep the tailq entry records (or add disgusting compat code). This is why I didn't go for pserialize in my last round of changes. => Complicated because we would only get a snapshot within a single ksymsread call, not from open to close of /dev/ksyms; this would require some kind of mechanism to persuade userland tools to restart if the snapshot has changed from read to read. => (Can't use pserialize because we have to hold the snapshot stable across copyout to userland, which may sleep, which is not allowed in pserialize read sections.) 3. Find some other way to copy the tables or something while /dev/ksyms is open so they won't go away when kobj_unload runs, or just keep copies of the strtab and symtab for ksyms that persist unload -- might cost a lot of kernel memory, probably not worth pursuing. In any case this change should otherwise be reverted, since it is no necessary (sd_symstart/sd_strstart will no longer go away during ksymsread) and it is also wrong (iteration past ksyms_last_snapshot is not allowed). I'm leaning toward option (1); other thoughts welcome.