Re: [I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-05-05 Thread via GitHub


comphead commented on issue #21510:
URL: https://github.com/apache/datafusion/issues/21510#issuecomment-4381485478

   it should be closed via #21963 and #21979 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-05-05 Thread via GitHub


comphead closed issue #21510: bug: datafusion-spark substring returns wrong 
result for large negative start positions
URL: https://github.com/apache/datafusion/issues/21510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-04-29 Thread via GitHub


comphead commented on issue #21510:
URL: https://github.com/apache/datafusion/issues/21510#issuecomment-4345367116

   DF would need to adopt changes from 
https://github.com/apache/datafusion-comet/pull/4017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-04-09 Thread via GitHub


andygrove closed issue #21510: bug: datafusion-spark substring returns wrong 
result for large negative start positions
URL: https://github.com/apache/datafusion/issues/21510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-04-09 Thread via GitHub


andygrove closed issue #21510: bug: datafusion-spark substring returns wrong 
result for large negative start positions
URL: https://github.com/apache/datafusion/issues/21510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[I] bug: datafusion-spark substring returns wrong result for large negative start positions [datafusion]

2026-04-09 Thread via GitHub


andygrove opened a new issue, #21510:
URL: https://github.com/apache/datafusion/issues/21510

   ### Describe the bug
   
   The `datafusion-spark` implementation of `substring` does not match Apache 
Spark behavior when the negative start position exceeds the string length. 
DataFusion-spark clamps to position 1 and returns a full-length result, while 
Spark reduces the available length based on how far before position 1 the start 
is.
   
   This was discovered by running a PySpark validation script against the 
`.slt` test files (see #17045, #21508).
   
   ### To Reproduce
   
   The `.slt` test at 
`datafusion/sqllogictest/test_files/spark/string/substring.slt` line 138 
contains:
   
   ```sql
   SELECT substring('Spark SQL', -300, 3);
   ```
   
   The test expects `Spa`, but Apache Spark returns an empty string.
   
   ### Expected behavior
   
   `substring` should match Spark's semantics for negative start positions:
   
   | Expression | Spark result | datafusion-spark result |
   |---|---|---|
   | `substring('Spark SQL', -9, 3)` | `Spa` | `Spa` ✓ |
   | `substring('Spark SQL', -10, 3)` | `Sp` | (likely `Spa`) |
   | `substring('Spark SQL', -11, 3)` | `S` | (likely `Spa`) |
   | `substring('Spark SQL', -12, 3)` | `` (empty) | (likely `Spa`) |
   | `substring('Spark SQL', -300, 3)` | `` (empty) | `Spa` ✗ |
   
   Spark's behavior: for negative `start`, the effective position is `len(str) 
+ start + 1`. When this position is before 1, the available length is reduced 
by the overshoot. When `start + length` doesn't reach position 1, the result is 
empty.
   
   ### Additional context
   
   The same bug affects `substr` (alias for `substring`). The corresponding 
`.slt` test at line 189 also has wrong expected values for the same reason.
   
   The `.slt` expected values at lines 138 and 189 will need to be updated 
along with the implementation fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]