> On May 16, 2020, at 9:52 AM, Andrew 👽 Yourtchenko <ayour...@gmail.com> wrote:
> 
> On 5/16/20, Dave Barach (dbarach) <dbar...@cisco.com> wrote:
>> API messages in network byte order made sense 10 years ago when I worked
>> with a mixed x86_64 / ppc32 system. As Damjan points out, API
>> interoperability between big-endian and little-endian systems is a boutique
>> use-case these days.
>> 
>> Timing is key. We won’t be able to cherry-pick API message handler fixes
>> across an endian-order flag-day. If we decide to switch to native byte
>> order, we’d better switch right before we pull our next LTS release.
> 
> Do I understand it right that the idea here would be to flip the
> endianness right before pulling stable/2009 branch ?

If this is done right before (or near) the branch pull for 2009 then I think 
the release candidate should be left to bake longer than normal, and API users 
encouraged to port/test to the release candidate. Obviously any bugs found 
would then be fixed on the RC. Maybe ask for a couple/few stakeholders to sign 
up to do this and then wait a reasonable amount of time for them to sign-off?

I know we use the binary APIs, I believe Netgate does as well. I'm sure there 
are others too (might be good to collect a list of these folks if one doesn't 
exist yet).

If these changes were coupled with the (more important IMO) change i've been 
(ceaselessly :) asking for in the API (non-callback sync functions) I would be 
more than willing to sign up for this.

Thanks,
Chris.

> 
> --a
> 
> 
>> 
>> FWIW... Dave
>> 
>> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Damjan Marion
>> via lists.fd.io
>> Sent: Saturday, May 16, 2020 7:23 AM
>> To: Andrew Yourtchenko <ayour...@gmail.com>
>> Cc: Christian Hopps <cho...@chopps.org>; Ole Troan <otr...@employees.org>;
>> vpp-dev <vpp-dev@lists.fd.io>; Jerome Tollet (jtollet) <jtol...@cisco.com>
>> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev]
>> Proposal for VPP binary API stability
>> 
>> 
>> Knowing that even hard-core big-endian systems like PowerPC decided to
>> switch to Little endian and that situation where binary api will be
>> exchanged between BE and LE systems is extremely small, maybe we should
>> consider removing endian conversion from APIs and simply represent data
>> in the native form. Looks like this is fertile ground for new bugs…
>> 
>> Thoughts?
>> 
>> —
>> Damjan
>> 
>> 
>>> On 15 May 2020, at 16:20, Andrew Yourtchenko
>>> <ayour...@gmail.com<mailto:ayour...@gmail.com>> wrote:
>>> 
>>> There's a very interesting couple of gerrit changes that just came in
>>> today that is worth discussing,
>>> and they are a perfect case study to further clarify the process - so
>>> I tweaked the subject accordingly..
>>> The API message itself is relatively minor, but regardless of what is
>>> agreed that makes a good case study.
>>> 
>>> Backstory:
>>> 
>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>> conversion
>>> function didn't make it there (enums are u32 by default as far as I can
>>> see).
>>> 
>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>> current 20.05-rc1.
>>> 
>>> Fast forward a bit, today I was pinged about the two changes - one for
>>> master, one for stable/2001:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a
>>> u8
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>> htonl() call and changes the existing ("buggy")
>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>> that coded against the previous "buggy" behavior.
>>> 
>>> And then we have a question open about stable/2005, which "by the
>>> letter" can potentially accept only the second approach, since it is
>>> past the API freeze.
>>> 
>>> Additional bit: this API is not touched in make test, so this bug had
>>> slipped through.
>>> 
>>> So there are the following options:
>>> 
>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>> making a "silent change" in both, but making the master look closer to
>>> a 19.08 format.
>>> 
>>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>>> merge the master patch that shrinks the enum down to 1 byte
>>> 
>>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>>> endianness of the u32 enum everywhere, but making an effective "silent
>>> change" in 20.01&20.05
>>> 
>>> 4)  merge the patch in master that shrinks the enum down to one byte,
>>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>>> "no api changes" but at least this gets to be explicitly visible early
>>> on.
>>> 
>>> 5) under the current proposal, if the API message is defined as
>>> "production" then there would need to be a new "in-progress" message
>>> in master with either of the two fixes, the buggy message would need
>>> to be marked as "deprecated".  And under the current proposal by the
>>> 20.09 the "in-progress" message would become "production", the current
>>> message would be shown as "deprecated",  to be deleted in 21.01.
>>> 
>>> So, the feedback that I would appreciate on the above:
>>> 
>>> 1) the least worst course of action "right now", for this particular
>>> issue. I discussed this with Ole and we have different opinions, so I
>>> would like the actual API users to chime in. Please tell me which is
>>> the least worst option from your point of view :-)
>>> 
>>> 2) What is the best course of action in the future. Note, this is also
>>> the simpler case in that there is a way to trigger a CRC-affecting
>>> change by forcing the enum to be a u8. What would have been the best
>>> course of action if it was simply a missing ntohl() for a field that
>>> noone complained about for 1.5 releases. Can we assume that no-one
>>> used that API and "fix the representation" ?
>>> 
>>> 3) would there be an interest in having a sort of registry of "who
>>> wants to know about things related to each API" - so that if a bug
>>> like this arises that requires a behavior change to fix, I could poll
>>> the affected folks and maybe be able to get away with (1) or (3) from
>>> above ?
>>> 
>>> And a tweak to the process (and potentially tooling) with regards to
>>> going to "production API status":
>>> 
>>> An API that is not touched during a "make test" can not be moved
>>> beyond the "in-progress" status. Adding the unit test implies the
>>> commitment from the API contributor to follow the option (5) above
>>> even for "trivial endianness bugs".
>>> 
>>> Thoughts ?
>>> 
>>> --a
>>> 
>>> 
>>> 
>>> 
>>> On 5/14/20, Christian Hopps <cho...@chopps.org<mailto:cho...@chopps.org>>
>>> wrote:
>>>> API stability is borderline critical for a project like VPP I think. Yes,
>>>> it
>>>> can be used stand-alone, but real value is added by building products on
>>>> top
>>>> of it.
>>>> 
>>>> Also important is having an API framework that allows for
>>>> backward-compatible changes to the API for making improvements.
>>>> 
>>>> I'm not sure if VPP has the latter, but if there's too much pain with
>>>> your
>>>> proposed rules I think the latter should addressed rather than
>>>> sacrificing
>>>> the former. So I support your rules.
>>>> 
>>>> Thanks,
>>>> Chris.
>>>> 
>>>>> On May 14, 2020, at 8:28 AM, Ole Troan
>>>>> <otr...@employees.org<mailto:otr...@employees.org>> wrote:
>>>>> 
>>>>> Andrew and I have discussed a process around API changes with the goal
>>>>> to
>>>>> limit the impact on VPP consumers.
>>>>> One big painpoint in tracking VPP releases for users of the VPP binary
>>>>> has
>>>>> been API changes.
>>>>> 
>>>>> We want to ensure:
>>>>> - A production API never changes.
>>>>> 
>>>>> - A production API can be deprecated with one release notice.
>>>>> E.g. show_version_2 is introduced in 20.01 and show_version_1 is marked
>>>>> as
>>>>> deprecated.
>>>>> show_version_1 is marked with "option delete_by = "v20.05";
>>>>> 
>>>>> - APIs marked as experimental / not released can change with no notice.
>>>>> Added support for option status = "in_progress";
>>>>> 
>>>>> This patch:
>>>>> https://gerrit.fd.io/r/c/vpp/+/26881
>>>>> 
>>>>> has a tool "crcchecker" that we intend to run as part of the verify
>>>>> job.
>>>>> It will block modifications to existing production APIs.
>>>>> make checkstyle-api
>>>>> It also supports dumping API manifests and make API comparisions across
>>>>> releases/revisions/commits.
>>>>> 
>>>>> A production API is defined by having .api semantic major version > 0.
>>>>> New messages can override that by setting status="in_progress".
>>>>> 
>>>>> The consequence of this is that developers must maintain two (or more)
>>>>> versions of an API for some time.
>>>>> This is obviously painful, but at least it aligns cost and benefit
>>>>> better
>>>>> than when we forced the API users to do it.
>>>>> 
>>>>> Looking back from 17.01 onwards, it looks like we on average have done
>>>>> about 50 backwards incompatible changes per release.
>>>>> That's ignoring the work on API typing, which have resulted in
>>>>> significantly more changes.
>>>>> 
>>>>> The hope is that we can start v20.09 development with this process in
>>>>> place.
>>>>> 
>>>>> What do people think?
>>>>> 
>>>>> Best regards,
>>>>> Ole
>>>> 
>>>> 
>>> 
>> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16430): https://lists.fd.io/g/vpp-dev/message/16430
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to