Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
kazuyukitanimura merged PR #1565: URL: https://github.com/apache/datafusion-comet/pull/1565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2013820176
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
protected def getPartitionedFile(f: FileStatusWithMetadata, p:
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile]
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable,
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable,
maxSplitBytes, partitionValues)
Review Comment:
I've tried using reflection in #1573 (also adding 3.5.4 to the testing
matrix) -
Let's see how it fares.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2013670173
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
protected def getPartitionedFile(f: FileStatusWithMetadata, p:
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile]
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable,
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable,
maxSplitBytes, partitionValues)
Review Comment:
That would be certainly much easier from a release-engineering perspective
(it's also not big of a development effort) -
but the question is whether this is required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
wForget commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011644208
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
protected def getPartitionedFile(f: FileStatusWithMetadata, p:
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile]
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable,
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable,
maxSplitBytes, partitionValues)
Review Comment:
We may be able to simply make them compatible through reflection
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
YanivKunda commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011633435
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
protected def getPartitionedFile(f: FileStatusWithMetadata, p:
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile]
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable,
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable,
maxSplitBytes, partitionValues)
Review Comment:
That would be possible using minor-version-specific shims, but it will
change the release structure -
which assumed a single release for each spark major version.
What do the maintainers think?
If not, I think the latest minor should be preferred.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
wForget commented on code in PR #1565:
URL: https://github.com/apache/datafusion-comet/pull/1565#discussion_r2011249870
##
spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimCometScanExec.scala:
##
@@ -55,15 +55,15 @@ trait ShimCometScanExec {
protected def isNeededForSchema(sparkSchema: StructType): Boolean = false
protected def getPartitionedFile(f: FileStatusWithMetadata, p:
PartitionDirectory): PartitionedFile =
-PartitionedFileUtil.getPartitionedFile(f, p.values)
+PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
protected def splitFiles(sparkSession: SparkSession,
file: FileStatusWithMetadata,
filePath: Path,
isSplitable: Boolean,
maxSplitBytes: Long,
partitionValues: InternalRow): Seq[PartitionedFile]
=
-PartitionedFileUtil.splitFiles(sparkSession, file, isSplitable,
maxSplitBytes, partitionValues)
+PartitionedFileUtil.splitFiles(sparkSession, file, filePath, isSplitable,
maxSplitBytes, partitionValues)
Review Comment:
This change seems incompatible with versions below 3.5.4 of 3.5.X. Should we
try to maintain compatibility among minor versions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
YanivKunda commented on PR #1565: URL: https://github.com/apache/datafusion-comet/pull/1565#issuecomment-2749446970 @parthchandra I regenerated the diff file per the instructions in https://github.com/apache/datafusion-comet/blob/main/docs/source/contributor-guide/spark-sql-tests.md - Although they use the 3.4.3 diff file as the base (which obviously created a lot of rejects). I used the existing diff file for the previous 3.5.x minor (3.5.4) which resulted in no rejects. Regenerating it ensures it is tailored fit for the exact code base of the version it supports. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] upgraded spark 3.5.4 to 3.5.5 [datafusion-comet]
codecov-commenter commented on PR #1565: URL: https://github.com/apache/datafusion-comet/pull/1565#issuecomment-2749077884 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1565?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report All modified and coverable lines are covered by tests :white_check_mark: > Project coverage is 56.02%. Comparing base [(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) to head [(`45fbe75`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/45fbe7540dfaa290550ecdcf9747600d15660667?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). > Report is 98 commits behind head on main. Additional details and impacted files ```diff @@ Coverage Diff @@ ## main#1565 +/- ## - Coverage 56.12% 56.02% -0.10% + Complexity 976 907 -69 Files 119 122 +3 Lines 1174312208 +465 Branches 2251 2264 +13 + Hits 6591 6840 +249 - Misses 4012 4253 +241 + Partials 1140 1115 -25 ``` [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1565?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache). :rocket: New features to boost your workflow: - :snowflake: [Test Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, report on failures, and find test suite problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
