Re: Structured Streaming metric for count of delayed/late data

Jungtaek Lim Thu, 20 Aug 2020 23:37:54 -0700

One more thing to say, unfortunately, the number is not accurate compared
to the input rows on streaming aggregation, because Spark does
local-aggregate and counts dropped inputs based on "pre-locally-aggregated"
rows. You may want to treat the number as whether dropping inputs is
happening or not.


On Fri, Aug 21, 2020 at 3:31 PM Yuanjian Li <xyliyuanj...@gmail.com> wrote:

> The metrics have been added in
> https://issues.apache.org/jira/browse/SPARK-24634, but the target version
> is 3.1.
> Maybe you can backport for testing since it's not a big change.
>
> Best,
> Yuanjian
>
> GOEL Rajat <rajat.g...@thalesgroup.com> 于2020年8月20日周四 下午9:14写道：
>
>> Hi All,
>>
>>
>>
>> I have a query if someone can please help. Is there any metric or
>> mechanism of printing count of input records dropped due to watermarking
>> (late data count) in a stream, during a window based aggregation, in
>> Structured Streaming ? I am using Spark 3.0.
>>
>>
>>
>> Thanks & Regards,
>>
>> Rajat
>>
>

Re: Structured Streaming metric for count of delayed/late data

Reply via email to