Thanks everyone for joining the discussion!

> Do you have any idea what has happened in the process here?

The discussion in this PR [1] shows some details and could be helpful to 
understand the original motivation of the renaming. We do have a test case for 
guarding metrics but unfortunaly the case was also modified so the defense was 
broken.

I think the reason why both the developer and the reviewer forgot to trigger an 
discussion and gave a green pass on the change is that metrics are quite 
“trivial” to be noticed as public APIs. As mentioned by Martijn I couldn’t find 
a place noting that metrics are public APIs and should be treated carefully 
while contributing and reviewing.

IMHO three actions could be made to prevent this kind of changes in the future:

a. Add test case for metrics (which we already have in SinkMetricsITCase)
b. We emphasize that any public-interface breaking changes should be proposed 
by a FLIP or discussed in mailing list, and should be listed in the release 
note.
c. We remind contributors and reviewers about what should be considered as 
public API, and include metric names in it.

For b and c these two pages [2][3] might be proper places.

About the patch to revert this, it looks like we have a consensus on 1.16. As 
of 1.15 I think it’s worthy to trigger a minor version. I didn’t see complaints 
about this for now so it should be OK to save the situation asap. I’m with 
Xintong’s idea to treat numXXXSend as an alias of numXXXOut considering there 
could possibly some users have already adapted their system to the new naming, 
and have another internal metric for reflecting number of outgoing committable 
batches (actually the numRecordsIn of sink committer operator should be 
carrying this info already).

[1] https://github.com/apache/flink/pull/18825
[2] https://flink.apache.org/contributing/contribute-code.html
[3] https://flink.apache.org/contributing/reviewing-prs.html

Best,
Qingsheng
On Oct 10, 2022, 17:40 +0800, Xintong Song <tonysong...@gmail.com>, wrote:
> +1 for reverting these changes in Flink 1.16.
>
> For 1.15.3, can we make these metrics available via both names (numXXXOut and 
> numXXXSend)? In this way we don't break it for those who already migrated to 
> 1.15 and numXXXSend. That means we still need to change SinkWriterOperator to 
> use another metric name in 1.15.3, which IIUC is internal to Flink sink.
>
> I'm overall +1 to change numXXXOut back to its original semantics. AFAIK 
> (from meetup / flink-forward questionaires), most users do not migrate to a 
> new Flink release immediately, until the next 1-2 major releases are out.
>
> Best,
> Xintong
>
>
> > On Mon, Oct 10, 2022 at 5:26 PM Martijn Visser <martijnvis...@apache.org> 
> > wrote:
> > > Hi Qingsheng,
> > >
> > > Do you have any idea what has happened in the process here? Do we know why
> > > they were changed? I was under the impression that these metric names were
> > > newly introduced due to the new interfaces and because it still depends on
> > > each connector implementing these.
> > >
> > > Sidenote: metric names are not mentioned in the FLIP process as a public
> > > API. Might make sense to have a separate follow-up to add that to the list
> > > (I do think we should list them there).
> > >
> > > +1 for reverting this and make this change in Flink 1.16
> > >
> > > I'm not in favour of releasing a Flink 1.15.3 with this change: I think 
> > > the
> > > impact is too big for a patch version, especially given how long Flink 
> > > 1.15
> > > is already out there.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Mon, Oct 10, 2022 at 11:13 AM Leonard Xu <xbjt...@gmail.com> wrote:
> > >
> > > > Thanks Qingsheng for starting this thread.
> > > >
> > > > +1 on reverting sink metric name and releasing 1.15.3 to fix this
> > > > inconsistent behavior.
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2022年10月10日 下午3:06,Jark Wu <imj...@gmail.com> 写道:
> > > >
> > > > Thanks for discovering this problem, Qingsheng!
> > > >
> > > > I'm also +1 for reverting the breaking changes.
> > > >
> > > > IIUC, currently, the behavior of "numXXXOut" metrics of the new and old
> > > > sink is inconsistent.
> > > > We have to break one of them to have consistent behavior. Sink V2 is an
> > > > evolving API which is just introduced in 1.15.
> > > > I think it makes sense to break the unstable API instead of the stable 
> > > > API
> > > > which many connectors and users depend on.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Mon, 10 Oct 2022 at 11:36, Jingsong Li <jingsongl...@gmail.com> 
> > > > wrote:
> > > >
> > > >> Thanks for driving, Qingsheng.
> > > >>
> > > >> +1 for reverting sink metric name.
> > > >>
> > > >> We often forget that metric is also one of the important APIs.
> > > >>
> > > >> +1 for releasing 1.15.3 to fix this.
> > > >>
> > > >> Best,
> > > >> Jingsong
> > > >>
> > > >> On Sun, Oct 9, 2022 at 11:35 PM Becket Qin <becket....@gmail.com> 
> > > >> wrote:
> > > >> >
> > > >> > Thanks for raising the discussion, Qingsheng,
> > > >> >
> > > >> > +1 on reverting the breaking changes.
> > > >> >
> > > >> > In addition, we might want to release a 1.15.3 to fix this and update
> > > >> the previous release docs with this known issue, so that users can 
> > > >> upgrade
> > > >> to 1.15.3 when they hit it. It would also be good to add some backwards
> > > >> compatibility tests on metrics to avoid unintended breaking changes 
> > > >> like
> > > >> this in the future.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Sun, Oct 9, 2022 at 10:35 AM Qingsheng Ren <re...@apache.org> 
> > > >> > wrote:
> > > >> >>
> > > >> >> Hi devs and users,
> > > >> >>
> > > >> >> I’d like to start a discussion about reverting a breaking change 
> > > >> >> about
> > > >> sink metrics made in 1.15 by FLINK-26126 [1] and FLINK-26492 [2].
> > > >> >>
> > > >> >> TL;DR
> > > >> >>
> > > >> >> All sink metrics with name “numXXXOut” defined in FLIP-33 are 
> > > >> >> replace
> > > >> by “numXXXSend” in FLINK-26126 and FLINK-26492. Considering metric 
> > > >> names
> > > >> are public APIs, this is a breaking change to end users and not 
> > > >> backward
> > > >> compatible. Also unfortunately this breaking change was not discussed 
> > > >> in
> > > >> the mailing list before.
> > > >> >>
> > > >> >> Background
> > > >> >>
> > > >> >> As defined previously in FLIP-33 (the FLIP page has been changed so
> > > >> please refer to the old version [3] ), metric “numRecordsOut” is used 
> > > >> for
> > > >> reporting the total number of output records since the sink started 
> > > >> (number
> > > >> of records written to the external system), and similarly for
> > > >> “numRecordsOutPerSecond”, “numBytesOut”, “numBytesOutPerSecond” and
> > > >> “numRecordsOutError”. Most sinks are following this naming and 
> > > >> definition.
> > > >> However, these metrics are ambiguous in the new Sink API as “numXXXOut”
> > > >> could be used by the output of SinkWriterOperator for reporting number 
> > > >> of
> > > >> Committables delivered to SinkCommitterOperator. In order to resolve 
> > > >> the
> > > >> conflict, FLINK-26126 and FLINK-26492 changed names of these metrics 
> > > >> with
> > > >> “numXXXSend”.
> > > >> >>
> > > >> >> Necessity of reverting this change
> > > >> >>
> > > >> >> - Metric names are actually public API, as end users need to 
> > > >> >> configure
> > > >> metric collecting and alerting system with metric names. Users have to
> > > >> reset all configurations related to affected metrics.
> > > >> >> - This could also affect custom and external sinks not maintained by
> > > >> Flink, which might have implemented with numXXXOut metrics.
> > > >> >> - The number of records sent to external system is way more 
> > > >> >> important
> > > >> than the number of Committables sent to SinkCommitterOperator, as the
> > > >> latter one is just an internal implementation of sink. We could have a 
> > > >> new
> > > >> metric name for the latter one instead.
> > > >> >> - We could avoid splitting the project by version (like “plz use
> > > >> numXXXOut before 1.15 and use numXXXSend after”) if we revert it ASAP,
> > > >> cosidering 1.16 is still not released for now.
> > > >> >>
> > > >> >> As a consequence, I’d like to hear from devs and users about your
> > > >> opinion on changing these metrics back to “numXXXOut”.
> > > >> >>
> > > >> >> Looking forward to your reply!
> > > >> >>
> > > >> >> [1] https://issues.apache.org/jira/browse/FLINK-26126
> > > >> >> [2] https://issues.apache.org/jira/browse/FLINK-26492
> > > >> >> [1] FLIP-33, version 18:
> > > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883136
> > > >> >>
> > > >> >> Best,
> > > >> >> Qingsheng
> > > >>
> > > >
> > > >

Reply via email to