(spark) branch master updated: [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates to use the right syntax

gurwls223 Thu, 30 May 2024 16:30:33 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9e35b0067756 [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates 
to use the right syntax
9e35b0067756 is described below

commit 9e35b00677566c00e906b8d5168acdd6ebb953a1
Author: Yuchen Liu <yuchen....@databricks.com>
AuthorDate: Fri May 31 08:30:15 2024 +0900

    [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates to use the right 
syntax
    
    ### What changes were proposed in this pull request?
    This PR fixes the wrong usage of `dropDuplicates` and 
`dropDuplicatesWithinWatermark` in the Structured Streaming Programming Guide.
    
    ### Why are the changes needed?
    Previously the syntax in the guide was wrong, so users will see an error if 
directly using the example.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Made sure that the updated examples conform to the API doc, and can run out 
of the box.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No.
    
    Closes #46797 from eason-yuchen-liu/dropduplicate-doc.
    
    Authored-by: Yuchen Liu <yuchen....@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 docs/structured-streaming-programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index fabe7f17b78b..4c3eca6b6d55 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -2082,12 +2082,12 @@ You can deduplicate records in data streams using a 
unique identifier in the eve
 streamingDf = spark.readStream. ...
 
 # Without watermark using guid column
-streamingDf.dropDuplicates("guid")
+streamingDf.dropDuplicates(["guid"])
 
 # With watermark using guid and eventTime columns
 streamingDf \
   .withWatermark("eventTime", "10 seconds") \
-  .dropDuplicates("guid", "eventTime")
+  .dropDuplicates(["guid", "eventTime"])
 {% endhighlight %}
 
 </div>
@@ -2163,7 +2163,7 @@ streamingDf = spark.readStream. ...
 # deduplicate using guid column with watermark based on eventTime column
 streamingDf \
   .withWatermark("eventTime", "10 hours") \
-  .dropDuplicatesWithinWatermark("guid")
+  .dropDuplicatesWithinWatermark(["guid"])
 {% endhighlight %}
 
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48446][SS][DOCS] Update SS doc of dropDuplicates to use the right syntax

Reply via email to