Re: Structured Streaming metric for count of delayed/late data

2020-08-21 Thread GOEL Rajat
Thanks for pointing me to the Spark ticket and its limitations. Will try these changes. Is there any workaround for this limitation of inaccurate count, maybe by adding some additional streaming operation in SS job without impacting perf too much ? Regards, Rajat From: Jungtaek Lim Date:

Re: Structured Streaming metric for count of delayed/late data

2020-08-21 Thread Jungtaek Lim
One more thing to say, unfortunately, the number is not accurate compared to the input rows on streaming aggregation, because Spark does local-aggregate and counts dropped inputs based on "pre-locally-aggregated" rows. You may want to treat the number as whether dropping inputs is happening or

Re: Structured Streaming metric for count of delayed/late data

2020-08-21 Thread Yuanjian Li
The metrics have been added in https://issues.apache.org/jira/browse/SPARK-24634, but the target version is 3.1. Maybe you can backport for testing since it's not a big change. Best, Yuanjian GOEL Rajat 于2020年8月20日周四 下午9:14写道: > Hi All, > > > > I have a query if someone can please help. Is