Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]
kou commented on issue #46708: URL: https://github.com/apache/arrow/issues/46708#issuecomment-296890 Issue resolved by pull request 46709 https://github.com/apache/arrow/pull/46709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]
DenisTarasyuk commented on issue #46708: URL: https://github.com/apache/arrow/issues/46708#issuecomment-2958979494 > Thanks. Does it mean that this is a LLVM optimization bug? We can avoid this bug by setting `out_high`/`out_low` on invalid case, right? I think this is called "undefined behaviour". It is not llvm bug it is just something not specified in cpp specs so LLVM was free to to anything. If I am not wrong there is some function after castDECIMAL_utf8 that uses `out_high`/`out_low` and as they are primitives that function expects them to have some value. So on O3 optimization level LLVM done some inlining and backtracked generated code from that function body back to those lines I mentioned and decided that if those are primitives required to have valid values that means function that populates them should always populate them which means gdv_fn_dec_from_string should always return 0. I am not 100% sure it works like that but that is how I understand it now. Setting values to 0 fixes it for me. I have seen some documents that state even printf in return -1 block will cause LLVM to fix code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]
kou commented on issue #46708: URL: https://github.com/apache/arrow/issues/46708#issuecomment-2957831609 Thanks. Does it mean that this is a LLVM optimization bug? We can avoid this bug by setting `out_high`/`out_low` on invalid case, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]
DenisTarasyuk commented on issue #46708: URL: https://github.com/apache/arrow/issues/46708#issuecomment-2955947276 Additional information. I think this issue happens due to undefined behavior in castDECIMAL_utf8. This code does not set out_high, out_low. ``` int32_t status = gdv_fn_dec_from_string(context, in, in_length, &precision_from_str, &scale_from_str, &dec_high_from_str, &dec_low_from_str); if (status != 0) { return; } ``` And LLVM decides to optimize out some parts of code due to this behavior. In linked PR there is test that reproduces issue. Here I link some IR code. [ir_bug_raw.txt](https://github.com/user-attachments/files/20656070/ir_bug_raw.txt) is IR code with bug reproduced. This IR code '_raw' was captured before any optimisations applied by LLVM. [ir_fixed_raw.txt](https://github.com/user-attachments/files/20656068/ir_fixed_raw.txt) is non optimized IR with fix for out_high, out_low set to 0. Then I used LLVM opt command like this: `/llvm/bin/opt -passes='default' -debug-pass-manager -S ir_fixed_raw.txt -o ir_fixed_O1.txt` This produces optimised IR code. Here are original and fixed IR with O1 optimisation level. They both work fine. [ir_bug_O1.txt](https://github.com/user-attachments/files/20656069/ir_bug_O1.txt) [ir_fixed_O1.txt](https://github.com/user-attachments/files/20656071/ir_fixed_O1.txt) Here are the same but with O3 level. And here original IR has issue: [ir_bug_O3.txt](https://github.com/user-attachments/files/20656073/ir_bug_O3.txt) [ir_fixed_O3.txt](https://github.com/user-attachments/files/20656072/ir_fixed_O3.txt) The issue is here: Original: ``` %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, ptr noundef nonnull %9, ptr noundef nonnull %10) %62 = icmp eq i32 %61, 0 call void @llvm.assume(i1 %62) call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %13) #7 ``` Fixed: ``` %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, ptr noundef nonnull %9, ptr noundef nonnull %10) %62 = icmp eq i32 %61, 0 br i1 %62, label %63, label %castDECIMAL_utf8.exit ``` If I understand correctly in original IR LLVM just assumed that gdv_fn_dec_from_string should always return 0. So code that should have been skipped by return is now executed. This and uninitialised out_high, out_low cause SIGSEGV later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org