One more thing to say, unfortunately, the number is not accurate compared to the input rows on streaming aggregation, because Spark does local-aggregate and counts dropped inputs based on "pre-locally-aggregated" rows. You may want to treat the number as whether dropping inputs is happening or not.
On Fri, Aug 21, 2020 at 3:31 PM Yuanjian Li <xyliyuanj...@gmail.com> wrote: > The metrics have been added in > https://issues.apache.org/jira/browse/SPARK-24634, but the target version > is 3.1. > Maybe you can backport for testing since it's not a big change. > > Best, > Yuanjian > > GOEL Rajat <rajat.g...@thalesgroup.com> 于2020年8月20日周四 下午9:14写道: > >> Hi All, >> >> >> >> I have a query if someone can please help. Is there any metric or >> mechanism of printing count of input records dropped due to watermarking >> (late data count) in a stream, during a window based aggregation, in >> Structured Streaming ? I am using Spark 3.0. >> >> >> >> Thanks & Regards, >> >> Rajat >> >