Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-04-05 Thread via GitHub


kazuyukitanimura merged PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-26 Thread via GitHub


YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2013820176


##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
   protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
 
   protected def getPartitionedFile(f: FileStatusWithMetadata, p: 
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
 
   protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] 
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, 
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, 
maxSplitBytes, partitionValues)

Review Comment:
   I've tried using reflection in #1573 (also adding 3.5.4 to the testing 
matrix) -
   Let's see how it fares.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-26 Thread via GitHub


YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2013670173


##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
   protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
 
   protected def getPartitionedFile(f: FileStatusWithMetadata, p: 
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
 
   protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] 
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, 
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, 
maxSplitBytes, partitionValues)

Review Comment:
   That would be certainly much easier from a release-engineering perspective 
(it's also not big of a development effort) -
   but the question is whether this is required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-25 Thread via GitHub


wForget commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011644208


##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
   protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
 
   protected def getPartitionedFile(f: FileStatusWithMetadata, p: 
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
 
   protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] 
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, 
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, 
maxSplitBytes, partitionValues)

Review Comment:
   We may be able to simply make them compatible through reflection



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-25 Thread via GitHub


YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011633435


##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
   protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
 
   protected def getPartitionedFile(f: FileStatusWithMetadata, p: 
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
 
   protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] 
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, 
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, 
maxSplitBytes, partitionValues)

Review Comment:
   That would be possible using minor-version-specific shims, but it will 
change the release structure -
   which assumed a single release for each spark major version.
   What do the maintainers think?
   If not, I think the latest minor should be preferred.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-24 Thread via GitHub


wForget commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011249870


##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
   protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
 
   protected def getPartitionedFile(f: FileStatusWithMetadata, p: 
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
 
   protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile] 
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable, 
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable, 
maxSplitBytes, partitionValues)

Review Comment:
   This change seems incompatible with versions below 3.5.4 of 3.5.X. Should we 
try to maintain compatibility among minor versions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-24 Thread via GitHub


YanivKunda commented on PR #1565:
URL: 
https://github.com/apache/datafusion-comet/pull/1565#issuecomment-2749446970

   @parthchandra I regenerated the diff file per the instructions in 
https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/spark-sql-tests.md
 -
   Although they use the 3.4.3 diff file as the base (which obviously created a 
lot of rejects).
   I used the existing diff file for the previous 3.5.x minor (3.5.4) which 
resulted in no rejects.
   Regenerating it ensures it is tailored fit for the exact code base of the 
version it supports.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]

2025-03-24 Thread via GitHub


codecov-commenter commented on PR #1565:
URL: 
https://github.com/apache/datafusion-comet/pull/1565#issuecomment-2749077884

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1565?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 56.02%. Comparing base 
[(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`45fbe75`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/45fbe7540dfaa290550ecdcf9747600d15660667?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 98 commits behind head on main.
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1565  +/-   ##
   
   - Coverage 56.12%   56.02%   -0.10% 
   + Complexity  976  907  -69 
   
 Files   119  122   +3 
 Lines 1174312208 +465 
 Branches   2251 2264  +13 
   
   + Hits   6591 6840 +249 
   - Misses 4012 4253 +241 
   + Partials   1140 1115  -25 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1565?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]