Very valid point. Endianness is only aspect. Accidental size changes on the 
wire is different.

But this drifted very far astray from my original question: if we keep 
shuffling the bits on wire regardless, how much of a value does a meticulous 
CRC tracking process provides ? i see it as actively harmful in that case, 
since it creates false expectations.

--a

> On 16 May 2020, at 14:02, Christian Hopps <cho...@chopps.org> wrote:
> 
> As a user of these APIs I would hugely appreciate work be done on the API 
> framework to make it more usable (don't require callbacks) fix the automatic 
> endian conversion it already generates, improve support for argument 
> additions (these are easy to support in a backward compatible way), and maybe 
> arg size changes, into non-issues, rather than punting on this particular 
> endian aspect and thinking everything is all-good.
> 
> Thanks,
> Chris.
> 
>> On May 16, 2020, at 7:49 AM, Andrew Yourtchenko <ayour...@gmail.com> wrote:
>> 
>> I was hesitating to write exactly that - so at least I am not alone in 
>> thinking this... :-)
>> 
>> With my RelMgr hat on I would be seeing this as to have a -2’d  patch in 
>> gerrit whose existence  I can announce in 20.05 release notes, which would 
>> apply to both stable/2005 branch and to master branch, with an optional 
>> add-on patch that was master specific - so that the change can be 
>> concentrated. Probably even a separate make verify job(s) that would 
>> cherrypick this on top of whatever patch is being verified - such that we 
>> have some assurance that things don’t break.
>> 
>> This patch would add a single boolean “api is big endian”, and create 
>> api_htonX/api_ntohX versions of the endianness functions which will be 
>> no-ops if that boolean is false.
>> 
>> This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
>> would return a the value of the above Boolean.
>> 
>> Its presence means that the whatever abstraction layer is there needs to 
>> query it and to if it returns true, to  make htonX()/ntohX calls in the APIs 
>> to be a a conditional no-op in the same fashion as in VPP.
>> 
>> This will make for a backwards compatible change that can even be even 
>> backported to earlier branches if needs to... 
>> 
>> Sometime in July this could be going into master.
>> 
>> Then that change goes into 2009, with the default still being big-endian, as 
>> well as is committed to 2005 (it’s an api addition that doesn’t break 
>> anything else by default, and by then the folks must have seen it on master).
>> 
>> Then somewhere before 2101, say in November, we flip the default.
>> 
>> This gives two releases to iron out the kinks, and 2109 LTS ships with the 
>> default little endian API.
>> 
>> 2201 then removes the wrappers and makes is_vpp_api_is_big_endian always 
>> return 0.
>> 
>> Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d 
>> be doing and doing so in a graceful fashion will allow to avoid annoying the 
>> downstream users..
>> 
>> Also of course this implies that we MUST start having a longer term road map 
>> for strategic stuff like this, that should be published and vetted by the 
>> community.
>> 
>> Thoughts ?
>> 
>> --a
>> 
>>>> On 16 May 2020, at 13:22, Damjan Marion <dmar...@me.com> wrote:
>>> 
>>> 
>>> Knowing that even hard-core big-endian systems like PowerPC decided to 
>>> switch to Little endian and that situation where binary api will be 
>>> exchanged between BE and LE systems is extremely small, maybe we should 
>>> consider removing endian conversion from APIs and simply represent data
>>> in the native form. Looks like this is fertile ground for new bugs…
>>> 
>>> Thoughts?
>>> 
>>> — 
>>> Damjan
>>> 
>>> 
>>>> On 15 May 2020, at 16:20, Andrew Yourtchenko <ayour...@gmail.com> wrote:
>>>> 
>>>> There's a very interesting couple of gerrit changes that just came in
>>>> today that is worth discussing,
>>>> and they are a perfect case study to further clarify the process - so
>>>> I tweaked the subject accordingly..
>>>> The API message itself is relatively minor, but regardless of what is
>>>> agreed that makes a good case study.
>>>> 
>>>> Backstory:
>>>> 
>>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>>> conversion
>>>> function didn't make it there (enums are u32 by default as far as I can 
>>>> see).
>>>> 
>>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>>> current 20.05-rc1.
>>>> 
>>>> Fast forward a bit, today I was pinged about the two changes - one for
>>>> master, one for stable/2001:
>>>> 
>>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
>>>> u8
>>>> 
>>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>>> htonl() call and changes the existing ("buggy")
>>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>>> that coded against the previous "buggy" behavior.
>>>> 
>>>> And then we have a question open about stable/2005, which "by the
>>>> letter" can potentially accept only the second approach, since it is
>>>> past the API freeze.
>>>> 
>>>> Additional bit: this API is not touched in make test, so this bug had
>>>> slipped through.
>>>> 
>>>> So there are the following options:
>>>> 
>>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>>> making a "silent change" in both, but making the master look closer to
>>>> a 19.08 format.
>>>> 
>>>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>>>> merge the master patch that shrinks the enum down to 1 byte
>>>> 
>>>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>>>> endianness of the u32 enum everywhere, but making an effective "silent
>>>> change" in 20.01&20.05
>>>> 
>>>> 4)  merge the patch in master that shrinks the enum down to one byte,
>>>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>>>> "no api changes" but at least this gets to be explicitly visible early
>>>> on.
>>>> 
>>>> 5) under the current proposal, if the API message is defined as
>>>> "production" then there would need to be a new "in-progress" message
>>>> in master with either of the two fixes, the buggy message would need
>>>> to be marked as "deprecated".  And under the current proposal by the
>>>> 20.09 the "in-progress" message would become "production", the current
>>>> message would be shown as "deprecated",  to be deleted in 21.01.
>>>> 
>>>> So, the feedback that I would appreciate on the above:
>>>> 
>>>> 1) the least worst course of action "right now", for this particular
>>>> issue. I discussed this with Ole and we have different opinions, so I
>>>> would like the actual API users to chime in. Please tell me which is
>>>> the least worst option from your point of view :-)
>>>> 
>>>> 2) What is the best course of action in the future. Note, this is also
>>>> the simpler case in that there is a way to trigger a CRC-affecting
>>>> change by forcing the enum to be a u8. What would have been the best
>>>> course of action if it was simply a missing ntohl() for a field that
>>>> noone complained about for 1.5 releases. Can we assume that no-one
>>>> used that API and "fix the representation" ?
>>>> 
>>>> 3) would there be an interest in having a sort of registry of "who
>>>> wants to know about things related to each API" - so that if a bug
>>>> like this arises that requires a behavior change to fix, I could poll
>>>> the affected folks and maybe be able to get away with (1) or (3) from
>>>> above ?
>>>> 
>>>> And a tweak to the process (and potentially tooling) with regards to
>>>> going to "production API status":
>>>> 
>>>> An API that is not touched during a "make test" can not be moved
>>>> beyond the "in-progress" status. Adding the unit test implies the
>>>> commitment from the API contributor to follow the option (5) above
>>>> even for "trivial endianness bugs".
>>>> 
>>>> Thoughts ?
>>>> 
>>>> --a
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 5/14/20, Christian Hopps <cho...@chopps.org> wrote:
>>>>> API stability is borderline critical for a project like VPP I think. Yes, 
>>>>> it
>>>>> can be used stand-alone, but real value is added by building products on 
>>>>> top
>>>>> of it.
>>>>> 
>>>>> Also important is having an API framework that allows for
>>>>> backward-compatible changes to the API for making improvements.
>>>>> 
>>>>> I'm not sure if VPP has the latter, but if there's too much pain with your
>>>>> proposed rules I think the latter should addressed rather than sacrificing
>>>>> the former. So I support your rules.
>>>>> 
>>>>> Thanks,
>>>>> Chris.
>>>>> 
>>>>>> On May 14, 2020, at 8:28 AM, Ole Troan <otr...@employees.org> wrote:
>>>>>> 
>>>>>> Andrew and I have discussed a process around API changes with the goal to
>>>>>> limit the impact on VPP consumers.
>>>>>> One big painpoint in tracking VPP releases for users of the VPP binary 
>>>>>> has
>>>>>> been API changes.
>>>>>> 
>>>>>> We want to ensure:
>>>>>> - A production API never changes.
>>>>>> 
>>>>>> - A production API can be deprecated with one release notice.
>>>>>> E.g. show_version_2 is introduced in 20.01 and show_version_1 is marked 
>>>>>> as
>>>>>> deprecated.
>>>>>> show_version_1 is marked with "option delete_by = "v20.05";
>>>>>> 
>>>>>> - APIs marked as experimental / not released can change with no notice.
>>>>>> Added support for option status = "in_progress";
>>>>>> 
>>>>>> This patch:
>>>>>> https://gerrit.fd.io/r/c/vpp/+/26881
>>>>>> 
>>>>>> has a tool "crcchecker" that we intend to run as part of the verify job.
>>>>>> It will block modifications to existing production APIs.
>>>>>> make checkstyle-api
>>>>>> It also supports dumping API manifests and make API comparisions across
>>>>>> releases/revisions/commits.
>>>>>> 
>>>>>> A production API is defined by having .api semantic major version > 0.
>>>>>> New messages can override that by setting status="in_progress".
>>>>>> 
>>>>>> The consequence of this is that developers must maintain two (or more)
>>>>>> versions of an API for some time.
>>>>>> This is obviously painful, but at least it aligns cost and benefit better
>>>>>> than when we forced the API users to do it.
>>>>>> 
>>>>>> Looking back from 17.01 onwards, it looks like we on average have done
>>>>>> about 50 backwards incompatible changes per release.
>>>>>> That's ignoring the work on API typing, which have resulted in
>>>>>> significantly more changes.
>>>>>> 
>>>>>> The hope is that we can start v20.09 development with this process in
>>>>>> place.
>>>>>> 
>>>>>> What do people think?
>>>>>> 
>>>>>> Best regards,
>>>>>> Ole
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16429): https://lists.fd.io/g/vpp-dev/message/16429
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to