Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]

2025-06-12 Thread via GitHub


kou commented on issue #46708:
URL: https://github.com/apache/arrow/issues/46708#issuecomment-296890

   Issue resolved by pull request 46709
   https://github.com/apache/arrow/pull/46709


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]

2025-06-10 Thread via GitHub


DenisTarasyuk commented on issue #46708:
URL: https://github.com/apache/arrow/issues/46708#issuecomment-2958979494

   > Thanks. Does it mean that this is a LLVM optimization bug? We can avoid 
this bug by setting `out_high`/`out_low` on invalid case, right?
   
   I think this is called "undefined behaviour". It is not llvm bug it is just 
something not specified in cpp specs so LLVM was free to to anything. If I am 
not wrong there is some function after castDECIMAL_utf8 that uses 
`out_high`/`out_low` and as they are primitives that function expects them to 
have some value. So on O3 optimization level LLVM done some inlining and 
backtracked generated code from that function body back to those lines I 
mentioned and decided that if those are primitives required to have valid 
values that means function that populates them should always populate them 
which means gdv_fn_dec_from_string should always return 0. I am not 100% sure 
it works like that but that is how I understand it now.
   Setting values to 0 fixes it for me. I have seen some documents that state 
even printf in return -1 block will cause LLVM to fix code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]

2025-06-10 Thread via GitHub


kou commented on issue #46708:
URL: https://github.com/apache/arrow/issues/46708#issuecomment-2957831609

   Thanks.
   Does it mean that this is a LLVM optimization bug? We can avoid this bug by 
setting `out_high`/`out_low` on invalid case, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [C++] castDECIMAL_utf8 has undefined behaviour in case of invalid input value [arrow]

2025-06-09 Thread via GitHub


DenisTarasyuk commented on issue #46708:
URL: https://github.com/apache/arrow/issues/46708#issuecomment-2955947276

   Additional information.
   I think this issue happens due to undefined behavior in castDECIMAL_utf8. 
   This code does not set out_high, out_low.
   ```
 int32_t status =
 gdv_fn_dec_from_string(context, in, in_length, &precision_from_str, 
&scale_from_str,
&dec_high_from_str, &dec_low_from_str);
 if (status != 0) {
   return;
 }
   ```
   And LLVM decides to optimize out some parts of code due to this behavior.
   In linked PR there is test that reproduces issue.
   Here I link some IR code. 
[ir_bug_raw.txt](https://github.com/user-attachments/files/20656070/ir_bug_raw.txt)
 is IR code with bug reproduced. This IR code '_raw' was captured before any 
optimisations applied by LLVM. 
[ir_fixed_raw.txt](https://github.com/user-attachments/files/20656068/ir_fixed_raw.txt)
 is non optimized IR with fix for out_high, out_low set to 0. 
   Then I used LLVM opt command like this:
   `/llvm/bin/opt -passes='default' -debug-pass-manager -S ir_fixed_raw.txt 
-o ir_fixed_O1.txt`
   This produces optimised IR code.
   Here are original and fixed IR with O1 optimisation level. They both work 
fine.
   
[ir_bug_O1.txt](https://github.com/user-attachments/files/20656069/ir_bug_O1.txt)
   
[ir_fixed_O1.txt](https://github.com/user-attachments/files/20656071/ir_fixed_O1.txt)
   Here are the same but with O3 level. And here original IR has issue:
   
[ir_bug_O3.txt](https://github.com/user-attachments/files/20656073/ir_bug_O3.txt)
   
[ir_fixed_O3.txt](https://github.com/user-attachments/files/20656072/ir_fixed_O3.txt)
   
   The issue is here:
   
   Original:
   ```
   %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef 
nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, 
ptr noundef nonnull %9, ptr noundef nonnull %10)
 %62 = icmp eq i32 %61, 0
 call void @llvm.assume(i1 %62)
 call void @llvm.lifetime.start.p0(i64 24, ptr nonnull %13) #7
   ```
   Fixed:
   ```
   %61 = call i32 @gdv_fn_dec_from_string(i64 noundef %context_ptr, ptr noundef 
nonnull @0, i32 noundef 3, ptr noundef nonnull %11, ptr noundef nonnull %12, 
ptr noundef nonnull %9, ptr noundef nonnull %10)
 %62 = icmp eq i32 %61, 0
 br i1 %62, label %63, label %castDECIMAL_utf8.exit
   ```
   If I understand correctly in original IR LLVM just assumed that 
gdv_fn_dec_from_string should always return 0. So code that should have been 
skipped by return is now executed. This and uninitialised  out_high, out_low 
cause SIGSEGV later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org