Re: Structured Streaming metric for count of delayed/late data

2020-08-23 Thread GOEL Rajat
Thanks for the pointers. I will try these changes. From: Jungtaek Lim Date: Saturday, 22 August 2020 at 2:41 PM To: GOEL Rajat Cc: "user@spark.apache.org" Subject: Re: Structured Streaming metric for count of delayed/late data I proposed another approach which provided accurate cou

Re: Structured Streaming metric for count of delayed/late data

2020-08-22 Thread Jungtaek Lim
:07 PM > *To: *Yuanjian Li > *Cc: *GOEL Rajat , "user@spark.apache.org" < > user@spark.apache.org> > *Subject: *Re: Structured Streaming metric for count of delayed/late data > > > > One more thing to say, unfortunately, the number is not accurate compared > t

Re: Structured Streaming metric for count of delayed/late data

2020-08-21 Thread GOEL Rajat
: Friday, 21 August 2020 at 12:07 PM To: Yuanjian Li Cc: GOEL Rajat , "user@spark.apache.org" Subject: Re: Structured Streaming metric for count of delayed/late data One more thing to say, unfortunately, the number is not accurate compared to the input rows on streaming aggregation, bec

Re: Structured Streaming metric for count of delayed/late data

2020-08-20 Thread Jungtaek Lim
One more thing to say, unfortunately, the number is not accurate compared to the input rows on streaming aggregation, because Spark does local-aggregate and counts dropped inputs based on "pre-locally-aggregated" rows. You may want to treat the number as whether dropping inputs is happening or not.

Re: Structured Streaming metric for count of delayed/late data

2020-08-20 Thread Yuanjian Li
The metrics have been added in https://issues.apache.org/jira/browse/SPARK-24634, but the target version is 3.1. Maybe you can backport for testing since it's not a big change. Best, Yuanjian GOEL Rajat 于2020年8月20日周四 下午9:14写道: > Hi All, > > > > I have a query if someone can please help. Is ther

Structured Streaming metric for count of delayed/late data

2020-08-20 Thread GOEL Rajat
Hi All, I have a query if someone can please help. Is there any metric or mechanism of printing count of input records dropped due to watermarking (late data count) in a stream, during a window based aggregation, in Structured Streaming ? I am using Spark 3.0. Thanks & Regards, Rajat

Structured Streaming metric for count of delayed/late data

2020-08-20 Thread GOEL Rajat
Hi All, I have a query if someone can please help. Is there any metric or mechanism of printing count of input records dropped due to watermarking (late data count) in a stream, during a window based aggregation, in Structured Streaming ? I am using Spark 3.0. Thanks & Regards, Rajat