This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 9e35b0067756 [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates to use the right syntax 9e35b0067756 is described below commit 9e35b00677566c00e906b8d5168acdd6ebb953a1 Author: Yuchen Liu <yuchen....@databricks.com> AuthorDate: Fri May 31 08:30:15 2024 +0900 [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates to use the right syntax ### What changes were proposed in this pull request? This PR fixes the wrong usage of `dropDuplicates` and `dropDuplicatesWithinWatermark` in the Structured Streaming Programming Guide. ### Why are the changes needed? Previously the syntax in the guide was wrong, so users will see an error if directly using the example. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Made sure that the updated examples conform to the API doc, and can run out of the box. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46797 from eason-yuchen-liu/dropduplicate-doc. Authored-by: Yuchen Liu <yuchen....@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- docs/structured-streaming-programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index fabe7f17b78b..4c3eca6b6d55 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -2082,12 +2082,12 @@ You can deduplicate records in data streams using a unique identifier in the eve streamingDf = spark.readStream. ... # Without watermark using guid column -streamingDf.dropDuplicates("guid") +streamingDf.dropDuplicates(["guid"]) # With watermark using guid and eventTime columns streamingDf \ .withWatermark("eventTime", "10 seconds") \ - .dropDuplicates("guid", "eventTime") + .dropDuplicates(["guid", "eventTime"]) {% endhighlight %} </div> @@ -2163,7 +2163,7 @@ streamingDf = spark.readStream. ... # deduplicate using guid column with watermark based on eventTime column streamingDf \ .withWatermark("eventTime", "10 hours") \ - .dropDuplicatesWithinWatermark("guid") + .dropDuplicatesWithinWatermark(["guid"]) {% endhighlight %} </div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org