Thanks for the pointers. I will try these changes.
From: Jungtaek Lim
Date: Saturday, 22 August 2020 at 2:41 PM
To: GOEL Rajat
Cc: "user@spark.apache.org"
Subject: Re: Structured Streaming metric for count of delayed/late data
I proposed another approach which provided accurate cou
:07 PM
> *To: *Yuanjian Li
> *Cc: *GOEL Rajat , "user@spark.apache.org" <
> user@spark.apache.org>
> *Subject: *Re: Structured Streaming metric for count of delayed/late data
>
>
>
> One more thing to say, unfortunately, the number is not accurate compared
> t
: Friday, 21 August 2020 at 12:07 PM
To: Yuanjian Li
Cc: GOEL Rajat , "user@spark.apache.org"
Subject: Re: Structured Streaming metric for count of delayed/late data
One more thing to say, unfortunately, the number is not accurate compared to
the input rows on streaming aggregation, bec
One more thing to say, unfortunately, the number is not accurate compared
to the input rows on streaming aggregation, because Spark does
local-aggregate and counts dropped inputs based on "pre-locally-aggregated"
rows. You may want to treat the number as whether dropping inputs is
happening or not.
The metrics have been added in
https://issues.apache.org/jira/browse/SPARK-24634, but the target version
is 3.1.
Maybe you can backport for testing since it's not a big change.
Best,
Yuanjian
GOEL Rajat 于2020年8月20日周四 下午9:14写道:
> Hi All,
>
>
>
> I have a query if someone can please help. Is ther
Hi All,
I have a query if someone can please help. Is there any metric or mechanism of
printing count of input records dropped due to watermarking (late data count)
in a stream, during a window based aggregation, in Structured Streaming ? I am
using Spark 3.0.
Thanks & Regards,
Rajat
Hi All,
I have a query if someone can please help. Is there any metric or mechanism of
printing count of input records dropped due to watermarking (late data count)
in a stream, during a window based aggregation, in Structured Streaming ? I am
using Spark 3.0.
Thanks & Regards,
Rajat