[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1526809418 @dstrodtman-db We will release this feature in Spark 3.5.0. We don't have the tentative date to release Spark 3.5.0. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1500894916 Thanks all for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1500894854 Confirmed CI passed for last commit. https://github.com/HeartSaVioR/spark/runs/12606973127 -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-07 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1500752714 The last update is to rebase with master branch - just to make sure CI is happy with the change before merging this. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-05 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1497194248 I just added the pyspark implementation in this PR. It doesn't seem to be worthwhile to have another round of review specifically for pyspark, given that the review phase is not going

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-03 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1495445673 Just filed another JIRA ticket https://issues.apache.org/jira/browse/SPARK-43027 to support PySpark. Once we merge this in I'll work on PySpark side. -- This is an automated messag

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-03 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1495329109 cc. @zsxwing @viirya @rangadi Could you please review this again? I feel this is very close to the final shape. -- This is an automated message from the Apache Git Service. To r

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1491105348 > What is the decision about batch support? I just added support of batch in the latest commit. It needs be more test coverage for batch query support so that's why we have new

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-28 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1487880188 cc. @zsxwing @viirya @rangadi Friendly reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1486008365 Sigh I didn't indicate we already took a step of Scala API with Spark connect. I thought there's only in PySpark. Thanks for correcting me. -- This is an automated message from the

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485999022 I was wondering what is different from dropDuplicates and this one. I don't see dropDuplicates being handled separately. Is it because the PySpark implementation of dropDuplicates is

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485964292 @HyukjinKwon @amaliujia Would you mind if I ask what happens with the mima check for this PR? https://github.com/HeartSaVioR/spark/actions/runs/4536405777/jobs/7993077860

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485834545 Just added a dummy implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1485819017 The error only occurred from linter - it now does not allow a new PR to introduce a new public API "without adding to spark-connect". This PR intentionally postpones addressing PySpar

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-27 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1484700118 cc. @zsxwing @viirya @rangadi Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a