I chimed in initially, because admittedly, I am usually the one breaking
the api, (with Vratko telling me to stop breaking CSIT ;/ ) and I learned a
lot in the process.

1. The thing we don't have today that would have caught these endian issues
are api tests.
  a. Send in a create and verify all the fields in the _details record.
  b. Remove the magic from the vpp_papi_provider, thich "adapts" the api,
and test the defaults.
  I know the api is tested indirectly via its use, but it is not the same.

2.  The tests are frequently "massaged" to get the code through the gate.
I recently submitted https://gerrit.fd.io/r/c/vpp/+/26703 because the tests
were changed so that python prefixes (with 0'd host bits) were being used
in place of interface addresses.  I don't know if my fix is ok, but I don't
believe the existing code is currently correct.

As to Andrew's comments, above,
I agree with the decision to cherry pick to 2001 to avoid api changes,
although my understanding is that it was agreed that changes would go to
master before being cherry picked.  That was why my original suggestion was
for 2 dependent changes, one for 2001, and one based off of that to correct
the master branch.  A consumer shouldn't have to "bunny-hop" through each
version to get to the latest branch.

With respect to https://gerrit.fd.io/r/c/vpp/+/26879,  I disagree with
Ole's -2.  If you move from a 19.x branch, it is transparent.  But
moreover, the feedback to just change the handler is the issue for me.  I
understand that the implemented flag day process is heavy, but it has been
specified that the version is the semantic version, and therefore changing
the handler requires bumping the semver, which triggers the crc.  I know
the api is heavily modelled on protobuf, but if the will is not present in
the community, maybe we shouldn't use semver for the version number and
just tie the version to the git release tag.  Such a decision could of
course, be reverted in the future if the process ever becomes lighter, or
whatever may be.

Let's consider the api changing process for a moment.  There were a
number of lisp api changes made recently.
https://gerrit.fd.io/r/c/vpp/+/24663
https://gerrit.fd.io/r/c/vpp/+/26932
https://gerrit.fd.io/r/c/vpp/+/27044

The first was Jakub's.  It was substantial, changed the crc and merged.
The 2nd was mine, a trivial cleanup that as I understand it, should change
the crc and was not merged.
The last was Onong's, a trivial cleanup that did not change the crc and was
merged.

If api changing changes need to meet some threshold, we should chain them
until they meet the threshold and can be merged.  I don't think they should
be squashed to reduce the impact in case something individually needs to be
reverted.  That implies that the version bump would then be tied to the
last commit in the chain.

Thoughts?

Paul

On Fri, May 15, 2020 at 11:47 AM Christian Hopps <cho...@chopps.org> wrote:

> If I read you correctly, this bug went unnoticed, but has never seen an
> LTS release. LTS might be the only release people are actually developing
> products with. :)
>
> I also sent some mail about brokenness in the auto-generated endian
> functions. I noticed this while I was investigating a post 19.08 change
> that actually made the u8 declared enums be __packed__ (vs just being uint
> anyway), which was then breaking my endian conversion code.
>
> FWIW These changes w/ breakages that had to be RCAd and fixed were
> frustrating b/c they seemed like "cleanup" going on in the API w/o enough
> support from the API infrastructure to avoid user pain downstream. Why did
> we need any of these cleanup changes? It would be one thing if the cleanup
> was done at zero cost to the users (i.e., transparently handled by the API
> infrastructure), but that's not the case.
>
> So with that in mind (and understanding I'm only interested in LTS
> releases :) my vote is:
>
> - Back out the u8 change
> - Leave the u32 fix.
>
> More importantly:
>
> 1) Stop making enum u32->u8 changes until this can be handled better by
> the API infrastructure.
> 2) Improve the API infrastructure before making anymore of these enum (or
> any field) size changes.
>
> To accomplish 2) the generated API functions which try to do the endian
> conversion, should be fixed, and more importantly the whole thing needs to
> be made usable by providing non-callback functionality for the synchronous
> calls.
>
> When all that is done then people can actually use the generated API
> function calls, the endian stuff won't be being done by hand then, so the
> size changes will be handled automatically.
>
> Thanks,
> Chris.
>
> > On May 15, 2020, at 10:20 AM, Andrew Yourtchenko <ayour...@gmail.com>
> wrote:
> >
> > There's a very interesting couple of gerrit changes that just came in
> > today that is worth discussing,
> > and they are a perfect case study to further clarify the process - so
> > I tweaked the subject accordingly..
> > The API message itself is relatively minor, but regardless of what is
> > agreed that makes a good case study.
> >
> > Backstory:
> >
> > Once upon a time on Aug 20 2019, commit 053204ab changed
> > sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> > conversion
> > function didn't make it there (enums are u32 by default as far as I can
> see).
> >
> > This was after the 19.08 branch pull, and it wasn't ever tackled, so
> > this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> > current 20.05-rc1.
> >
> > Fast forward a bit, today I was pinged about the two changes - one for
> > master, one for stable/2001:
> >
> > https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to
> be a u8
> >
> > https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> > htonl() call and changes the existing ("buggy")
> > behavior in the 20.01.2 - thus would silently break any API consumers
> > that coded against the previous "buggy" behavior.
> >
> > And then we have a question open about stable/2005, which "by the
> > letter" can potentially accept only the second approach, since it is
> > past the API freeze.
> >
> > Additional bit: this API is not touched in make test, so this bug had
> > slipped through.
> >
> > So there are the following options:
> >
> > 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> > making a "silent change" in both, but making the master look closer to
> > a 19.08 format.
> >
> > 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> > merge the master patch that shrinks the enum down to 1 byte
> >
> > 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> > endianness of the u32 enum everywhere, but making an effective "silent
> > change" in 20.01&20.05
> >
> > 4)  merge the patch in master that shrinks the enum down to one byte,
> > and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> > "no api changes" but at least this gets to be explicitly visible early
> > on.
> >
> > 5) under the current proposal, if the API message is defined as
> > "production" then there would need to be a new "in-progress" message
> > in master with either of the two fixes, the buggy message would need
> > to be marked as "deprecated".  And under the current proposal by the
> > 20.09 the "in-progress" message would become "production", the current
> > message would be shown as "deprecated",  to be deleted in 21.01.
> >
> > So, the feedback that I would appreciate on the above:
> >
> > 1) the least worst course of action "right now", for this particular
> > issue. I discussed this with Ole and we have different opinions, so I
> > would like the actual API users to chime in. Please tell me which is
> > the least worst option from your point of view :-)
> >
> > 2) What is the best course of action in the future. Note, this is also
> > the simpler case in that there is a way to trigger a CRC-affecting
> > change by forcing the enum to be a u8. What would have been the best
> > course of action if it was simply a missing ntohl() for a field that
> > noone complained about for 1.5 releases. Can we assume that no-one
> > used that API and "fix the representation" ?
> >
> > 3) would there be an interest in having a sort of registry of "who
> > wants to know about things related to each API" - so that if a bug
> > like this arises that requires a behavior change to fix, I could poll
> > the affected folks and maybe be able to get away with (1) or (3) from
> > above ?
> >
> > And a tweak to the process (and potentially tooling) with regards to
> > going to "production API status":
> >
> > An API that is not touched during a "make test" can not be moved
> > beyond the "in-progress" status. Adding the unit test implies the
> > commitment from the API contributor to follow the option (5) above
> > even for "trivial endianness bugs".
> >
> > Thoughts ?
> >
> > --a
> >
> >
> >
> >
> > On 5/14/20, Christian Hopps <cho...@chopps.org> wrote:
> >> API stability is borderline critical for a project like VPP I think.
> Yes, it
> >> can be used stand-alone, but real value is added by building products
> on top
> >> of it.
> >>
> >> Also important is having an API framework that allows for
> >> backward-compatible changes to the API for making improvements.
> >>
> >> I'm not sure if VPP has the latter, but if there's too much pain with
> your
> >> proposed rules I think the latter should addressed rather than
> sacrificing
> >> the former. So I support your rules.
> >>
> >> Thanks,
> >> Chris.
> >>
> >>> On May 14, 2020, at 8:28 AM, Ole Troan <otr...@employees.org> wrote:
> >>>
> >>> Andrew and I have discussed a process around API changes with the goal
> to
> >>> limit the impact on VPP consumers.
> >>> One big painpoint in tracking VPP releases for users of the VPP binary
> has
> >>> been API changes.
> >>>
> >>> We want to ensure:
> >>> - A production API never changes.
> >>>
> >>> - A production API can be deprecated with one release notice.
> >>> E.g. show_version_2 is introduced in 20.01 and show_version_1 is
> marked as
> >>> deprecated.
> >>> show_version_1 is marked with "option delete_by = "v20.05";
> >>>
> >>> - APIs marked as experimental / not released can change with no notice.
> >>> Added support for option status = "in_progress";
> >>>
> >>> This patch:
> >>> https://gerrit.fd.io/r/c/vpp/+/26881
> >>>
> >>> has a tool "crcchecker" that we intend to run as part of the verify
> job.
> >>> It will block modifications to existing production APIs.
> >>> make checkstyle-api
> >>> It also supports dumping API manifests and make API comparisions across
> >>> releases/revisions/commits.
> >>>
> >>> A production API is defined by having .api semantic major version > 0.
> >>> New messages can override that by setting status="in_progress".
> >>>
> >>> The consequence of this is that developers must maintain two (or more)
> >>> versions of an API for some time.
> >>> This is obviously painful, but at least it aligns cost and benefit
> better
> >>> than when we forced the API users to do it.
> >>>
> >>> Looking back from 17.01 onwards, it looks like we on average have done
> >>> about 50 backwards incompatible changes per release.
> >>> That's ignoring the work on API typing, which have resulted in
> >>> significantly more changes.
> >>>
> >>> The hope is that we can start v20.09 development with this process in
> >>> place.
> >>>
> >>> What do people think?
> >>>
> >>> Best regards,
> >>> Ole
> >>
> >>
> >
>
> 
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16411): https://lists.fd.io/g/vpp-dev/message/16411
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to