luigi created SPARK-37662: ----------------------------- Summary: exception when handling late data with watermarking and window Key: SPARK-37662 URL: https://issues.apache.org/jira/browse/SPARK-37662 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.2.0 Environment: spark v3.2.0
scala v2.12.12 Reporter: luigi when i use watermark to block late data, meanwhile window for state de-duplication, the order will cause unexpected behavior. a)below code will cause exception state that {color:#172b4d}"Couldn't find {color:#de350b}timestamp#58-T5000ms{color} in [{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color} {code:java} // code placeholder withWatermark("timestamp", "5 seconds"). withColumn("window", window($"timestamp", "1 hours")). dropDuplicates("window", "raid", "app"). {code} b) but when i switch the order of watermark and window config as below, it work without any exception {code:java} // code placeholder withColumn("window", window($"timestamp", "1 hours")). withWatermark("timestamp", "5 seconds"). dropDuplicates("window", "raid", "app"). {code} pls. note , this issue does not exist on spark v3.1.2 -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org