> On May 16, 2020, at 9:52 AM, Andrew 👽 Yourtchenko <ayour...@gmail.com> wrote: > > On 5/16/20, Dave Barach (dbarach) <dbar...@cisco.com> wrote: >> API messages in network byte order made sense 10 years ago when I worked >> with a mixed x86_64 / ppc32 system. As Damjan points out, API >> interoperability between big-endian and little-endian systems is a boutique >> use-case these days. >> >> Timing is key. We won’t be able to cherry-pick API message handler fixes >> across an endian-order flag-day. If we decide to switch to native byte >> order, we’d better switch right before we pull our next LTS release. > > Do I understand it right that the idea here would be to flip the > endianness right before pulling stable/2009 branch ?
If this is done right before (or near) the branch pull for 2009 then I think the release candidate should be left to bake longer than normal, and API users encouraged to port/test to the release candidate. Obviously any bugs found would then be fixed on the RC. Maybe ask for a couple/few stakeholders to sign up to do this and then wait a reasonable amount of time for them to sign-off? I know we use the binary APIs, I believe Netgate does as well. I'm sure there are others too (might be good to collect a list of these folks if one doesn't exist yet). If these changes were coupled with the (more important IMO) change i've been (ceaselessly :) asking for in the API (non-callback sync functions) I would be more than willing to sign up for this. Thanks, Chris. > > --a > > >> >> FWIW... Dave >> >> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Damjan Marion >> via lists.fd.io >> Sent: Saturday, May 16, 2020 7:23 AM >> To: Andrew Yourtchenko <ayour...@gmail.com> >> Cc: Christian Hopps <cho...@chopps.org>; Ole Troan <otr...@employees.org>; >> vpp-dev <vpp-dev@lists.fd.io>; Jerome Tollet (jtollet) <jtol...@cisco.com> >> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] >> Proposal for VPP binary API stability >> >> >> Knowing that even hard-core big-endian systems like PowerPC decided to >> switch to Little endian and that situation where binary api will be >> exchanged between BE and LE systems is extremely small, maybe we should >> consider removing endian conversion from APIs and simply represent data >> in the native form. Looks like this is fertile ground for new bugs… >> >> Thoughts? >> >> — >> Damjan >> >> >>> On 15 May 2020, at 16:20, Andrew Yourtchenko >>> <ayour...@gmail.com<mailto:ayour...@gmail.com>> wrote: >>> >>> There's a very interesting couple of gerrit changes that just came in >>> today that is worth discussing, >>> and they are a perfect case study to further clarify the process - so >>> I tweaked the subject accordingly.. >>> The API message itself is relatively minor, but regardless of what is >>> agreed that makes a good case study. >>> >>> Backstory: >>> >>> Once upon a time on Aug 20 2019, commit 053204ab changed >>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl >>> conversion >>> function didn't make it there (enums are u32 by default as far as I can >>> see). >>> >>> This was after the 19.08 branch pull, and it wasn't ever tackled, so >>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the >>> current 20.05-rc1. >>> >>> Fast forward a bit, today I was pinged about the two changes - one for >>> master, one for stable/2001: >>> >>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a >>> u8 >>> >>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the >>> htonl() call and changes the existing ("buggy") >>> behavior in the 20.01.2 - thus would silently break any API consumers >>> that coded against the previous "buggy" behavior. >>> >>> And then we have a question open about stable/2005, which "by the >>> letter" can potentially accept only the second approach, since it is >>> past the API freeze. >>> >>> Additional bit: this API is not touched in make test, so this bug had >>> slipped through. >>> >>> So there are the following options: >>> >>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus >>> making a "silent change" in both, but making the master look closer to >>> a 19.08 format. >>> >>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and >>> merge the master patch that shrinks the enum down to 1 byte >>> >>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the >>> endianness of the u32 enum everywhere, but making an effective "silent >>> change" in 20.01&20.05 >>> >>> 4) merge the patch in master that shrinks the enum down to one byte, >>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of >>> "no api changes" but at least this gets to be explicitly visible early >>> on. >>> >>> 5) under the current proposal, if the API message is defined as >>> "production" then there would need to be a new "in-progress" message >>> in master with either of the two fixes, the buggy message would need >>> to be marked as "deprecated". And under the current proposal by the >>> 20.09 the "in-progress" message would become "production", the current >>> message would be shown as "deprecated", to be deleted in 21.01. >>> >>> So, the feedback that I would appreciate on the above: >>> >>> 1) the least worst course of action "right now", for this particular >>> issue. I discussed this with Ole and we have different opinions, so I >>> would like the actual API users to chime in. Please tell me which is >>> the least worst option from your point of view :-) >>> >>> 2) What is the best course of action in the future. Note, this is also >>> the simpler case in that there is a way to trigger a CRC-affecting >>> change by forcing the enum to be a u8. What would have been the best >>> course of action if it was simply a missing ntohl() for a field that >>> noone complained about for 1.5 releases. Can we assume that no-one >>> used that API and "fix the representation" ? >>> >>> 3) would there be an interest in having a sort of registry of "who >>> wants to know about things related to each API" - so that if a bug >>> like this arises that requires a behavior change to fix, I could poll >>> the affected folks and maybe be able to get away with (1) or (3) from >>> above ? >>> >>> And a tweak to the process (and potentially tooling) with regards to >>> going to "production API status": >>> >>> An API that is not touched during a "make test" can not be moved >>> beyond the "in-progress" status. Adding the unit test implies the >>> commitment from the API contributor to follow the option (5) above >>> even for "trivial endianness bugs". >>> >>> Thoughts ? >>> >>> --a >>> >>> >>> >>> >>> On 5/14/20, Christian Hopps <cho...@chopps.org<mailto:cho...@chopps.org>> >>> wrote: >>>> API stability is borderline critical for a project like VPP I think. Yes, >>>> it >>>> can be used stand-alone, but real value is added by building products on >>>> top >>>> of it. >>>> >>>> Also important is having an API framework that allows for >>>> backward-compatible changes to the API for making improvements. >>>> >>>> I'm not sure if VPP has the latter, but if there's too much pain with >>>> your >>>> proposed rules I think the latter should addressed rather than >>>> sacrificing >>>> the former. So I support your rules. >>>> >>>> Thanks, >>>> Chris. >>>> >>>>> On May 14, 2020, at 8:28 AM, Ole Troan >>>>> <otr...@employees.org<mailto:otr...@employees.org>> wrote: >>>>> >>>>> Andrew and I have discussed a process around API changes with the goal >>>>> to >>>>> limit the impact on VPP consumers. >>>>> One big painpoint in tracking VPP releases for users of the VPP binary >>>>> has >>>>> been API changes. >>>>> >>>>> We want to ensure: >>>>> - A production API never changes. >>>>> >>>>> - A production API can be deprecated with one release notice. >>>>> E.g. show_version_2 is introduced in 20.01 and show_version_1 is marked >>>>> as >>>>> deprecated. >>>>> show_version_1 is marked with "option delete_by = "v20.05"; >>>>> >>>>> - APIs marked as experimental / not released can change with no notice. >>>>> Added support for option status = "in_progress"; >>>>> >>>>> This patch: >>>>> https://gerrit.fd.io/r/c/vpp/+/26881 >>>>> >>>>> has a tool "crcchecker" that we intend to run as part of the verify >>>>> job. >>>>> It will block modifications to existing production APIs. >>>>> make checkstyle-api >>>>> It also supports dumping API manifests and make API comparisions across >>>>> releases/revisions/commits. >>>>> >>>>> A production API is defined by having .api semantic major version > 0. >>>>> New messages can override that by setting status="in_progress". >>>>> >>>>> The consequence of this is that developers must maintain two (or more) >>>>> versions of an API for some time. >>>>> This is obviously painful, but at least it aligns cost and benefit >>>>> better >>>>> than when we forced the API users to do it. >>>>> >>>>> Looking back from 17.01 onwards, it looks like we on average have done >>>>> about 50 backwards incompatible changes per release. >>>>> That's ignoring the work on API typing, which have resulted in >>>>> significantly more changes. >>>>> >>>>> The hope is that we can start v20.09 development with this process in >>>>> place. >>>>> >>>>> What do people think? >>>>> >>>>> Best regards, >>>>> Ole >>>> >>>> >>> >>
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16430): https://lists.fd.io/g/vpp-dev/message/16430 Mute This Topic: https://lists.fd.io/mt/74228149/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-