https://sourceware.org/bugzilla/show_bug.cgi?id=29189
Bug ID: 29189 Summary: dlltool delaylibs corrupt float/double arguments Product: binutils Version: 2.39 (HEAD) Status: UNCONFIRMED Severity: normal Priority: P2 Component: binutils Assignee: unassigned at sourceware dot org Reporter: strager.nds at gmail dot com Target Milestone: --- Created attachment 14120 --> https://sourceware.org/bugzilla/attachment.cgi?id=14120&action=edit semi-tested bug fix (This report was originally posted on the mailing list: https://lists.gnu.org/archive/html/bug-binutils/2022-05/msg00099.html) I am calling a function in another x64 DLL with the following C signature: int napi_create_double(void*, double, void*); The first time I call this function, the 'double' argument ends up as 1.20305e-307 inside napi_create_double, no matter what value the caller gives. The 'double' is corrupted. Calls after the first don't corrupt the 'double'. The cause is ntdll.dll, eventually called by MinGW's __delayLoadHelper2, modifying the xmm1 register: #0 0x00007ffd26ce3006 in ntdll!RtlLookupFunctionEntry () from C:\WINDOWS\SYSTEM32\ntdll.dll #1 0x00007ffd26ce05e8 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #2 0x00007ffd26ce00a5 in ntdll!LdrGetProcedureAddressForCaller () from C:\WINDOWS\SYSTEM32\ntdll.dll #3 0x00007ffd245b53dc in KERNELBASE!GetProcAddressForCaller () from C:\WINDOWS\System32\KernelBase.dll #4 0x00007ffcd7b7ca6f in __delayLoadHelper2 (pidd=0x7ffcd7b8ba70 <__DELAY_IMPORT_DESCRIPTOR_node_napi_lib>, ppfnIATEntry=0x7ffcd7ecd134 <__imp_napi_create_double>) at C:/M/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/misc/delayimp.c:209 #5 0x00007ffcd7b717c9 in __tailMerge_node_napi_lib () from MYDLL.dll #6 0x000002ad2fe84c50 in ?? () 0x00007ffd26ce2ffb <+1051>: movups (%rdx),%xmm0 0x00007ffd26ce2ffe <+1054>: movups %xmm0,(%rsi) 0x00007ffd26ce3001 <+1057>: movsd 0x10(%rdx),%xmm1 => 0x00007ffd26ce3006 <+1062>: movsd %xmm1,0x10(%rsi) 0x00007ffd26ce300b <+1067>: mov (%rsi),%rbp 0x00007ffd26ce300e <+1070>: mov %r11,%rax 0x00007ffd26ce3011 <+1073>: lock cmpxchg %r12,0x1384d6(%rip) # 0x7ffd26e1b4f0 0x00007ffd26ce301a <+1082>: jne 0x7ffd26ce3102 <ntdll!RtlLookupFunctionEntry+1314> According to Windows x64 documentation, xmm1 is a volatile register: https://docs.microsoft.com/en-us/cpp/build/x64-software-conventions?redirectedfrom=MSDN&view=msvc-170 I think the solution is for dll's delaylib trampoline to save xmm1 on the stack before calling __delayLoadHelper2. I made a patch which does this, and it fixes the bug for my code. See attached patch. I think my patch has two problems: 1. AVX/vmovupd/ymm might not be usable on the target machine, but saving just xmm isn't enough. Should we perform a CPUID check? 2. We store unaligned with vmovupd. Storing aligned with vmovapd would be better. I haven't looked into how to align ymm registers when storing on the stack. I'd love to get this bug fixed so others don't spend two days debugging assembly code! -- You are receiving this mail because: You are on the CC list for the bug.