luigi created SPARK-37662:
-----------------------------

             Summary: exception when handling late data with watermarking and 
window
                 Key: SPARK-37662
                 URL: https://issues.apache.org/jira/browse/SPARK-37662
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.2.0
         Environment: spark v3.2.0

scala v2.12.12
            Reporter: luigi


when i use watermark to block late data, meanwhile window for state 
de-duplication, the order will cause unexpected behavior.

a)below code will cause exception state that {color:#172b4d}"Couldn't find 
{color:#de350b}timestamp#58-T5000ms{color} in 
[{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color}
{code:java}
// code placeholder
withWatermark("timestamp", "5 seconds").
withColumn("window", window($"timestamp", "1 hours")).
dropDuplicates("window", "raid", "app"). {code}
b) but when i switch the order of watermark and window config as below, it work 
without any exception 
{code:java}
// code placeholder
withColumn("window", window($"timestamp", "1 hours")). 
withWatermark("timestamp", "5 seconds").
dropDuplicates("window", "raid", "app").  {code}
pls. note ,  this issue does not exist on spark v3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to