Richard L. Hamilton wrote: > I haven't read enough on the ELF format to be able to answer this myself: > why can't one process an ELF file in such a way as to re-create it with > a particular string table enlarged sufficiently to hold whatever one wants > (not needing to reserve space for patching string table entries in the > binary)? > > For that matter, if string table contents are only pointed at by structures > that are part of the ELF format (and not by arbitrary text or data section > contents), then why isn't it possible (if tedious) to determine whether there > are any unused bytes in a string table? That is, if one can find the size of > the string table, and everything pointing into it should be findable by > navigating the various data structures, and all strings are null-terminated, > then it should be possible to determine how many times every byte in the > string table falls within one or more references, and what those references > are. > > Don't get me wrong; being able to fix run paths at all is much needed. But > I was hoping for either a truly smart ELF editor that could reconstitute an > ELF file with all necessary adjustments without needing to reserve space > for such purposes (and with the potential flexibility to make other > interesting > adjustments), or alternatively, something with ld.so.1 and crle such > that one could set specific overriding run paths for specific execnames > (which would at least centralize all such overrides and avoid the need to > hack up binaries, making their checksums, md5sums, or signatures invalid). > > > This message posted from opensolaris.org > _______________________________________________ > tools-linking mailing list > tools-linking at opensolaris.org
Hi Richard, The basic answer is that the information needed to reverse a link doesn't exist in the file. The linker takes the inputs, combines sections, creates references to things by their relative distance from each other, processes relocations, and so forth, and then creates a new output. This new file contains the information needed by other linker passes to use the results, but the information about how the object was put together is gone, and can't be reliably reversed. As Rod said when I asked him exactly the same question a year ago: "How would it do that?" To oversimplify, there is no trail of bread crumbs to follow back. This is partly for efficiency/space reasons, but also to contain complexity of the linking format. After all, we can usually relink an object in the normal way to achieve these goals. The need to take apart, modify, and reassemble an object without starting with the sources is a fringe case, and maybe not worth a big infrastructure to support. So, we can't increase the size of the string table because that would break all of the things in the file that refer to other things by relative offset, and we lack the information needed to find and fix those offsets. Your other point is also true: I said that you can't tell who is referencing the strings, but clearly you can in fact take the file apart byte by byte and figure it out. Tedious yes, but not impossible. I'm sure I don't need to convince you that it is much easier to explicitly record that information (DT_SUNW_STRPAD) than it is to figure it out from scratch, and probably more reliable. However, there is also another consideration: time travel of objects from the future. It happens more than you might think, that a file built on a newer version of the system is copied to an older system. Suppose that new system has a new ELF feature that references a string (a new dynamic entry seems likely) that the old system doesn't know about. A brute force search won't work in this case, because the old system won't understand the new feature, and will miss the string reference. If you take time travel into account, finding all the string references by searching the file may not be possible. It turns out that having DT_SUNW_STRPAD is not only easier to do, but also a bit more robust. Now a time traveling object may or may not work when it goes to the past for other reasons, but often it does work, and the linkers try hard not to be the cause of such failures. I'll finish with some quick answers to your closing points: > Don't get me wrong; being able to fix run paths at all is much needed. But > I was hoping for either a truly smart ELF editor that could reconstitute an > ELF file with all necessary adjustments without needing to reserve space > for such purposes (and with the potential flexibility to make other > interesting > adjustments) Sure, we'd all like that (but see above). Since ELF doesn't support it, the fixed extra space idea is a "better than nothing" answer that handles most cases we've encountered over the years. I don't mean to sell it as more than that. Simple and useful, but not perfect... > or alternatively, something with ld.so.1 and crle such > that one could set specific overriding run paths for specific execnames > (which would at least centralize all such overrides and avoid the need to > hack up binaries The runpath editing doesn't preclude a crle extension like this, but there are downsides to that approach too: - Every program on the system pays some overhead for having ld.so.1 examine a config file. Do you really want all the programs that don't need this "fixup" to pay this price? - Centralized databases can be fragile and difficult to manage too (think Windows registry). Although my bias is to fix this sort of thing at the edges, I think there's room for more than one answer. I've also seen proposals to use extended file attributes to do similar things, and that is another interesting idea. > , making their checksums, md5sums, or signatures invalid). Agreed. This might not be a huge issue though. If an object is signed (a sophisticated concept), might we not expect its creator to get the runpath right also? (I know, I know, but still... :-) ) Or, the signing might be done after the runpath change. For example, we have discussed using this feature to fix up runpaths in some of the freeware we ship --- doing it in their makefiles can be painful, and a post processing step to fix them up might be easier and more reliable. In a case like that, any signing would be applied to the file after it was edited, so the signatures would be valid. In many cases, the objects are not signed, or the end user would gladly invalidate them in exchange for a working runpath. So I guess it depends on the situation. Thanks for the great questions... - Ali