Thanks for pointing me to the Spark ticket and its limitations. Will try these
changes.
Is there any workaround for this limitation of inaccurate count, maybe by
adding some additional streaming operation in SS job without impacting perf too
much ?
Regards,
Rajat
From: Jungtaek Lim
Date:
One more thing to say, unfortunately, the number is not accurate compared
to the input rows on streaming aggregation, because Spark does
local-aggregate and counts dropped inputs based on "pre-locally-aggregated"
rows. You may want to treat the number as whether dropping inputs is
happening or
The metrics have been added in
https://issues.apache.org/jira/browse/SPARK-24634, but the target version
is 3.1.
Maybe you can backport for testing since it's not a big change.
Best,
Yuanjian
GOEL Rajat 于2020年8月20日周四 下午9:14写道:
> Hi All,
>
>
>
> I have a query if someone can please help. Is