[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-07 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160800555 ## python/pyspark/sql/dataframe.py: ## @@ -3928,6 +3928,71 @@ def dropDuplicates(self, subset: Optional[List[str]] = None) -> "DataFrame": jdf = self._jdf

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160032216 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -3038,6 +3025,107 @@ class Dataset[T] private[sql]( dropDuplicates(colNames) } + /** +

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160030393 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -679,6 +679,8 @@ object RemoveNoopUnion extends Rule[LogicalPlan] {

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160030393 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -679,6 +679,8 @@ object RemoveNoopUnion extends Rule[LogicalPlan] {

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160027853 ## python/pyspark/sql/dataframe.py: ## @@ -3928,6 +3928,71 @@ def dropDuplicates(self, subset: Optional[List[str]] = None) -> "DataFrame": jdf = self._jdf

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160029232 ## python/pyspark/sql/dataframe.py: ## @@ -3928,6 +3928,71 @@ def dropDuplicates(self, subset: Optional[List[str]] = None) -> "DataFrame": jdf = self._jdf

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160025207 ## python/pyspark/sql/dataframe.py: ## @@ -3928,6 +3928,71 @@ def dropDuplicates(self, subset: Optional[List[str]] = None) -> "DataFrame": jdf = self._jdf

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-06 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1160025207 ## python/pyspark/sql/dataframe.py: ## @@ -3928,6 +3928,71 @@ def dropDuplicates(self, subset: Optional[List[str]] = None) -> "DataFrame": jdf = self._jdf

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-05 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1159091700 ## python/pyspark/sql/tests/connect/test_parity_dataframe.py: ## @@ -41,6 +41,11 @@ def test_observe(self): def test_observe_str(self): super().test_obser

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-31 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1154705202 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -980,3 +1022,65 @@ object StreamingDeduplicateExec { private val E

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-31 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1154700264 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -980,3 +1022,65 @@ object StreamingDeduplicateExec { private val E

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153915858 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationWithinWatermarkSuite.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153915713 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationWithinWatermarkSuite.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153915130 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationWithinWatermarkSuite.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153914621 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationWithinWatermarkSuite.scala: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153675952 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -1742,6 +1742,8 @@ class DataFrameSuite extends QueryTest Seq(Row(2, 1, 2), Row(1, 2,

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-30 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1153505179 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -679,6 +679,8 @@ object RemoveNoopUnion extends Rule[LogicalPlan] {

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1151482426 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -679,6 +679,8 @@ object RemoveNoopUnion extends Rule[LogicalPlan] {

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-28 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1151447031 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -980,3 +980,117 @@ object StreamingDeduplicateExec { private val E

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-28 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1151443903 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -679,6 +679,8 @@ object RemoveNoopUnion extends Rule[LogicalPlan] {

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-28 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1151397406 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -980,3 +980,117 @@ object StreamingDeduplicateExec { private val E

[GitHub] [spark] rangadi commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-28 Thread via GitHub
rangadi commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1151374343 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -464,6 +469,19 @@ object UnsupportedOperationChecker exte