Hallo Alexander,

On 10/12/20 11:30 PM, Alexander Tormasov via users wrote:
> What I found is a deadlock of recursive Linker::mutex call.
> 
> - If we have an exception in some code (e.g. code which call NOVA syscall, in 
> my case this is attach_at() RPC call) then it somehow processed in caller.
> In particular, during processing it call the following stack from injected by 
> gcc function _Unwind_Resume -  pay attention to function dl_iterate_phdr():
> 
> #0  Linker::mutex () at /home/tor/gen/20.08/repos/base/src/lib/ldso/main.cc:68
> #1  0x0000000000124997 in dl_iterate_phdr (callback=0x119e7a0 
> <_Unwind_IteratePhdrCallback>, data=0x403fdde0) at 
> /home/tor/gen/20.08/repos/base/src/lib/ldso/exception.cc:41
> #2  0x000000000119fa0f in _Unwind_Find_FDE (pc=0x119dc76 <_Unwind_Resume+54>, 
> bases=bases@entry=0x403fe128) at 
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c:469
> #3  0x000000000119bfc3 in uw_frame_state_for 
> (context=context@entry=0x403fe080, fs=fs@entry=0x403fdec0) at 
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1257
> #4  0x000000000119cfe0 in uw_init_context_1 
> (context=context@entry=0x403fe080, outer_cfa=outer_cfa@entry=0x403fe2b0, 
> outer_ra=0x1000bcd 
> <Genode::Region_map::attach_at(Genode::Capability<Genode::Dataspace>, 
> unsigned long, unsigned long, long)+259>) at 
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2.c:1586
> #5  0x000000000119dc77 in _Unwind_Resume (exc=0x1b41a8 
> <Genode::init_cxx_heap(Genode::Env&)::initial_block+5256>) at 
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind.inc:235
> #6  0x0000000001000bcd in Genode::Region_map::attach_at (this=0x1304068 
> <vm_reg0+8648>, ds=..., local_addr=0x80000000, size=0x40000, offset=0x0) at 
> /home/tor/gen/20.08/repos/base/include/region_map/region_map.h:127
> 
> The code of dl_iterate_phdr():
> extern "C" int dl_iterate_phdr(int (*callback) (Phdr_info *info, size_t size, 
> void *data), void *data)
> {
>     int err = 0;
>     Phdr_info info;
> 
>     Mutex::Guard guard(mutex());
> 
>     for (Object *e = obj_list_head();e; e = e->next_obj()) {
> 
>         info.addr  = e->reloc_base();
>         info.name  = e->name();
>         info.phdr  = e->file()->phdr.phdr;
>         info.phnum = e->file()->phdr.count;
> 
>         if (verbose_exception)
>             log(e->name(), " reloc ", Hex(e->reloc_base()));
> 
>         if ((err = callback(&info, sizeof(Phdr_info), data)))
>             break;
>     }
> 
>     return err;
> }
> 
> Py attention that it take Linker::_mutex object (lock).
> 
> Inside, it call the callback() function for main C++ code which resolved to
> _Unwind_IteratePhdrCallback
> from 
> contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde-dip.c
> which internally call get_fde_encoding() and get_cie_encoding() which contain 
> very simple line
> /home/tor/gen/20.08/contrib/gcc-20345a83596fa42a25a85938329aea54bb4b2146/src/noux-pkg/gcc/libgcc/unwind-dw2-fde.c:300
> 
>   p = aug + strlen ((const char *)aug) + 1; /* Skip the augmentation string.  
> */
> 
> strlen() is not inlined/instantiated here.
> In machine code it call strlen@plt which mean that strlen assumed in the 
> shared library, and typically it should be processed by linker relocation 
> code.
> 
> To find the code it call jmp_slot@PLT and, in turn,
> call from src/lib/ldso/main.cc:294 function
> Elf::Addr Ld::jmp_slot(Dependency const &dep, Elf::Size index)
> {
>     Mutex::Guard guard(mutex());
> 
>     if (verbose_relocation)
> …
> 
> Pay attention that it call the same Linker::_mutex object (lock)
> Voila! 
> we have recursive call of the same linker mutex and deadlock in exception 
> processing.
> 
> definitely key problem here is in the usage of linker mutex in Genode 
> implementation of dl_iterate_phdr() 
> 
> So, question: how to fix this?
> May be we need different mutexes for  Ld::jmp_slot and for dl_iterate_phdr?

The 'strlen' function should be provided by the cxx library
(repos/base/src/lib/cxx/misc.cc) at link time and this way not produce a
jmp slot (i.e. strlen@plt). So, the problem here is that the jump slot
is created. Is there a way to reproduce this easily?

Regards,

Sebastian

_______________________________________________
Genode users mailing list
[email protected]
https://lists.genode.org/listinfo/users

Reply via email to