(Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-15 Thread Andrew Yourtchenko
There's a very interesting couple of gerrit changes that just came in
today that is worth discussing,
and they are a perfect case study to further clarify the process - so
I tweaked the subject accordingly..
The API message itself is relatively minor, but regardless of what is
agreed that makes a good case study.

Backstory:

Once upon a time on Aug 20 2019, commit 053204ab changed
sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
conversion
function didn't make it there (enums are u32 by default as far as I can see).

This was after the 19.08 branch pull, and it wasn't ever tackled, so
this (buggy) behavior ended being in 20.01, 20.01.1, and in the
current 20.05-rc1.

Fast forward a bit, today I was pinged about the two changes - one for
master, one for stable/2001:

https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8

https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
htonl() call and changes the existing ("buggy")
 behavior in the 20.01.2 - thus would silently break any API consumers
that coded against the previous "buggy" behavior.

And then we have a question open about stable/2005, which "by the
letter" can potentially accept only the second approach, since it is
past the API freeze.

Additional bit: this API is not touched in make test, so this bug had
slipped through.

So there are the following options:

1) Merge both patches, and treat the 20.05 similar to 20.01, thus
making a "silent change" in both, but making the master look closer to
a 19.08 format.

2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
merge the master patch that shrinks the enum down to 1 byte

3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
endianness of the u32 enum everywhere, but making an effective "silent
change" in 20.01&20.05

4)  merge the patch in master that shrinks the enum down to one byte,
and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
"no api changes" but at least this gets to be explicitly visible early
on.

5) under the current proposal, if the API message is defined as
"production" then there would need to be a new "in-progress" message
in master with either of the two fixes, the buggy message would need
to be marked as "deprecated".  And under the current proposal by the
20.09 the "in-progress" message would become "production", the current
message would be shown as "deprecated",  to be deleted in 21.01.

So, the feedback that I would appreciate on the above:

1) the least worst course of action "right now", for this particular
issue. I discussed this with Ole and we have different opinions, so I
would like the actual API users to chime in. Please tell me which is
the least worst option from your point of view :-)

2) What is the best course of action in the future. Note, this is also
the simpler case in that there is a way to trigger a CRC-affecting
change by forcing the enum to be a u8. What would have been the best
course of action if it was simply a missing ntohl() for a field that
noone complained about for 1.5 releases. Can we assume that no-one
used that API and "fix the representation" ?

3) would there be an interest in having a sort of registry of "who
wants to know about things related to each API" - so that if a bug
like this arises that requires a behavior change to fix, I could poll
the affected folks and maybe be able to get away with (1) or (3) from
above ?

And a tweak to the process (and potentially tooling) with regards to
going to "production API status":

An API that is not touched during a "make test" can not be moved
beyond the "in-progress" status. Adding the unit test implies the
commitment from the API contributor to follow the option (5) above
even for "trivial endianness bugs".

Thoughts ?

--a




On 5/14/20, Christian Hopps  wrote:
> API stability is borderline critical for a project like VPP I think. Yes, it
> can be used stand-alone, but real value is added by building products on top
> of it.
>
> Also important is having an API framework that allows for
> backward-compatible changes to the API for making improvements.
>
> I'm not sure if VPP has the latter, but if there's too much pain with your
> proposed rules I think the latter should addressed rather than sacrificing
> the former. So I support your rules.
>
> Thanks,
> Chris.
>
>> On May 14, 2020, at 8:28 AM, Ole Troan  wrote:
>>
>> Andrew and I have discussed a process around API changes with the goal to
>> limit the impact on VPP consumers.
>> One big painpoint in tracking VPP releases for users of the VPP binary has
>> been API changes.
>>
>> We want to ensure:
>> - A production API never changes.
>>
>> - A production API can be deprecated with one release notice.
>> E.g. show_version_2 is introduced in 20.01 and show_version_1 is marked as
>> deprecated.
>> show_version_1 is marked with "option delete_by = "v20.05";
>>
>> - APIs marked as experimental / not released can change with no notice.
>> Added suppo

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-15 Thread Christian Hopps
If I read you correctly, this bug went unnoticed, but has never seen an LTS 
release. LTS might be the only release people are actually developing products 
with. :)

I also sent some mail about brokenness in the auto-generated endian functions. 
I noticed this while I was investigating a post 19.08 change that actually made 
the u8 declared enums be __packed__ (vs just being uint anyway), which was then 
breaking my endian conversion code.

FWIW These changes w/ breakages that had to be RCAd and fixed were frustrating 
b/c they seemed like "cleanup" going on in the API w/o enough support from the 
API infrastructure to avoid user pain downstream. Why did we need any of these 
cleanup changes? It would be one thing if the cleanup was done at zero cost to 
the users (i.e., transparently handled by the API infrastructure), but that's 
not the case.

So with that in mind (and understanding I'm only interested in LTS releases :) 
my vote is:

- Back out the u8 change
- Leave the u32 fix.

More importantly:

1) Stop making enum u32->u8 changes until this can be handled better by the API 
infrastructure.
2) Improve the API infrastructure before making anymore of these enum (or any 
field) size changes.

To accomplish 2) the generated API functions which try to do the endian 
conversion, should be fixed, and more importantly the whole thing needs to be 
made usable by providing non-callback functionality for the synchronous calls.

When all that is done then people can actually use the generated API function 
calls, the endian stuff won't be being done by hand then, so the size changes 
will be handled automatically.

Thanks,
Chris.

> On May 15, 2020, at 10:20 AM, Andrew Yourtchenko  wrote:
> 
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
> 
> Backstory:
> 
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
> 
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
> 
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
> 
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
> 
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
> 
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
> 
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
> 
> So there are the following options:
> 
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
> 
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
> 
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
> 
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
> 
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
> 
> So, the feedback that I would appreciate on the above:
> 
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
> 
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "f

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-15 Thread Paul Vinciguerra
I chimed in initially, because admittedly, I am usually the one breaking
the api, (with Vratko telling me to stop breaking CSIT ;/ ) and I learned a
lot in the process.

1. The thing we don't have today that would have caught these endian issues
are api tests.
  a. Send in a create and verify all the fields in the _details record.
  b. Remove the magic from the vpp_papi_provider, thich "adapts" the api,
and test the defaults.
  I know the api is tested indirectly via its use, but it is not the same.

2.  The tests are frequently "massaged" to get the code through the gate.
I recently submitted https://gerrit.fd.io/r/c/vpp/+/26703 because the tests
were changed so that python prefixes (with 0'd host bits) were being used
in place of interface addresses.  I don't know if my fix is ok, but I don't
believe the existing code is currently correct.

As to Andrew's comments, above,
I agree with the decision to cherry pick to 2001 to avoid api changes,
although my understanding is that it was agreed that changes would go to
master before being cherry picked.  That was why my original suggestion was
for 2 dependent changes, one for 2001, and one based off of that to correct
the master branch.  A consumer shouldn't have to "bunny-hop" through each
version to get to the latest branch.

With respect to https://gerrit.fd.io/r/c/vpp/+/26879,  I disagree with
Ole's -2.  If you move from a 19.x branch, it is transparent.  But
moreover, the feedback to just change the handler is the issue for me.  I
understand that the implemented flag day process is heavy, but it has been
specified that the version is the semantic version, and therefore changing
the handler requires bumping the semver, which triggers the crc.  I know
the api is heavily modelled on protobuf, but if the will is not present in
the community, maybe we shouldn't use semver for the version number and
just tie the version to the git release tag.  Such a decision could of
course, be reverted in the future if the process ever becomes lighter, or
whatever may be.

Let's consider the api changing process for a moment.  There were a
number of lisp api changes made recently.
https://gerrit.fd.io/r/c/vpp/+/24663
https://gerrit.fd.io/r/c/vpp/+/26932
https://gerrit.fd.io/r/c/vpp/+/27044

The first was Jakub's.  It was substantial, changed the crc and merged.
The 2nd was mine, a trivial cleanup that as I understand it, should change
the crc and was not merged.
The last was Onong's, a trivial cleanup that did not change the crc and was
merged.

If api changing changes need to meet some threshold, we should chain them
until they meet the threshold and can be merged.  I don't think they should
be squashed to reduce the impact in case something individually needs to be
reverted.  That implies that the version bump would then be tied to the
last commit in the chain.

Thoughts?

Paul

On Fri, May 15, 2020 at 11:47 AM Christian Hopps  wrote:

> If I read you correctly, this bug went unnoticed, but has never seen an
> LTS release. LTS might be the only release people are actually developing
> products with. :)
>
> I also sent some mail about brokenness in the auto-generated endian
> functions. I noticed this while I was investigating a post 19.08 change
> that actually made the u8 declared enums be __packed__ (vs just being uint
> anyway), which was then breaking my endian conversion code.
>
> FWIW These changes w/ breakages that had to be RCAd and fixed were
> frustrating b/c they seemed like "cleanup" going on in the API w/o enough
> support from the API infrastructure to avoid user pain downstream. Why did
> we need any of these cleanup changes? It would be one thing if the cleanup
> was done at zero cost to the users (i.e., transparently handled by the API
> infrastructure), but that's not the case.
>
> So with that in mind (and understanding I'm only interested in LTS
> releases :) my vote is:
>
> - Back out the u8 change
> - Leave the u32 fix.
>
> More importantly:
>
> 1) Stop making enum u32->u8 changes until this can be handled better by
> the API infrastructure.
> 2) Improve the API infrastructure before making anymore of these enum (or
> any field) size changes.
>
> To accomplish 2) the generated API functions which try to do the endian
> conversion, should be fixed, and more importantly the whole thing needs to
> be made usable by providing non-callback functionality for the synchronous
> calls.
>
> When all that is done then people can actually use the generated API
> function calls, the endian stuff won't be being done by hand then, so the
> size changes will be handled automatically.
>
> Thanks,
> Chris.
>
> > On May 15, 2020, at 10:20 AM, Andrew Yourtchenko 
> wrote:
> >
> > There's a very interesting couple of gerrit changes that just came in
> > today that is worth discussing,
> > and they are a perfect case study to further clarify the process - so
> > I tweaked the subject accordingly..
> > The API message itself is relatively minor, but regardless of what is
> 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-15 Thread Balaji Venkatraman via lists.fd.io
Hi Andrew,



“

An API that is not touched during a "make test" can not be moved

beyond the "in-progress" status. Adding the unit test implies the

commitment from the API contributor to follow the option (5) above

even for "trivial endianness bugs".

“



Absolutely essential! More so, after a change to the API.





Thanks!



--

Regards,

Balaji.





On 5/15/20, 7:20 AM, "vpp-dev@lists.fd.io on behalf of Andrew Yourtchenko" 
 wrote:



There's a very interesting couple of gerrit changes that just came in

today that is worth discussing,

and they are a perfect case study to further clarify the process - so

I tweaked the subject accordingly..

The API message itself is relatively minor, but regardless of what is

agreed that makes a good case study.



Backstory:



Once upon a time on Aug 20 2019, commit 053204ab changed

sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl

conversion

function didn't make it there (enums are u32 by default as far as I can 
see).



This was after the 19.08 branch pull, and it wasn't ever tackled, so

this (buggy) behavior ended being in 20.01, 20.01.1, and in the

current 20.05-rc1.



Fast forward a bit, today I was pinged about the two changes - one for

master, one for stable/2001:



https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
u8



https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the

htonl() call and changes the existing ("buggy")

 behavior in the 20.01.2 - thus would silently break any API consumers

that coded against the previous "buggy" behavior.



And then we have a question open about stable/2005, which "by the

letter" can potentially accept only the second approach, since it is

past the API freeze.



Additional bit: this API is not touched in make test, so this bug had

slipped through.



So there are the following options:



1) Merge both patches, and treat the 20.05 similar to 20.01, thus

making a "silent change" in both, but making the master look closer to

a 19.08 format.



2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and

merge the master patch that shrinks the enum down to 1 byte



3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the

endianness of the u32 enum everywhere, but making an effective "silent

change" in 20.01&20.05



4)  merge the patch in master that shrinks the enum down to one byte,

and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of

   "no api changes" but at least this gets to be explicitly visible early

on.



5) under the current proposal, if the API message is defined as

"production" then there would need to be a new "in-progress" message

in master with either of the two fixes, the buggy message would need

to be marked as "deprecated".  And under the current proposal by the

20.09 the "in-progress" message would become "production", the current

message would be shown as "deprecated",  to be deleted in 21.01.



So, the feedback that I would appreciate on the above:



1) the least worst course of action "right now", for this particular

issue. I discussed this with Ole and we have different opinions, so I

would like the actual API users to chime in. Please tell me which is

the least worst option from your point of view :-)



2) What is the best course of action in the future. Note, this is also

the simpler case in that there is a way to trigger a CRC-affecting

change by forcing the enum to be a u8. What would have been the best

course of action if it was simply a missing ntohl() for a field that

noone complained about for 1.5 releases. Can we assume that no-one

used that API and "fix the representation" ?



3) would there be an interest in having a sort of registry of "who

wants to know about things related to each API" - so that if a bug

like this arises that requires a behavior change to fix, I could poll

the affected folks and maybe be able to get away with (1) or (3) from

above ?



And a tweak to the process (and potentially tooling) with regards to

going to "production API status":



An API that is not touched during a "make test" can not be moved

beyond the "in-progress" status. Adding the unit test implies the

commitment from the API contributor to follow the option (5) above

even for "trivial endianness bugs".



Thoughts ?



--a









On 5/14/20, Christian Hopps  wrote:

> API stability is borderline critical for a project like VPP I think. Yes, 
it

> can be used stand-alone, but real value is added by building products on 
top

> of it.

>

> Also important is having an API framework that allows for

> backward-compatible changes to the API for making improvements.

>

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Damjan Marion via lists.fd.io

Knowing that even hard-core big-endian systems like PowerPC decided to switch 
to Little endian and that situation where binary api will be exchanged between 
BE and LE systems is extremely small, maybe we should consider removing endian 
conversion from APIs and simply represent data
in the native form. Looks like this is fertile ground for new bugs…

Thoughts?

— 
Damjan


> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
> 
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
> 
> Backstory:
> 
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
> 
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
> 
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
> 
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
> 
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
> 
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
> 
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
> 
> So there are the following options:
> 
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
> 
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
> 
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
> 
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
> 
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
> 
> So, the feedback that I would appreciate on the above:
> 
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
> 
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "fix the representation" ?
> 
> 3) would there be an interest in having a sort of registry of "who
> wants to know about things related to each API" - so that if a bug
> like this arises that requires a behavior change to fix, I could poll
> the affected folks and maybe be able to get away with (1) or (3) from
> above ?
> 
> And a tweak to the process (and potentially tooling) with regards to
> going to "production API status":
> 
> An API that is not touched during a "make test" can not be moved
> beyond the "in-progress" status. Adding the unit test implies the
> commitment from the API contributor to follow the option (5) above
> even for "trivial endianness bugs".
> 
> Thoughts ?
> 
> --a
> 
> 
> 
> 
> On 5/14/20, Christian Hopps  wrote:
>> API stability is borderline critical for a project like VPP I think. Yes, it
>> can be used stand-alone, but real value is added by building products on top
>> of it.
>> 
>> Also important is having an API framework that allows for
>> backward-compatible changes to the API for making improvements.
>> 
>> I'm not sure if VPP has the latter, but if there's too much pain with your
>> proposed rules I think the latter should addressed rather than sacrificing
>> the former. So I support your rules.
>> 
>> Thanks,
>> Chris.
>> 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
I was hesitating to write exactly that - so at least I am not alone in thinking 
this... :-)

With my RelMgr hat on I would be seeing this as to have a -2’d  patch in gerrit 
whose existence  I can announce in 20.05 release notes, which would apply to 
both stable/2005 branch and to master branch, with an optional add-on patch 
that was master specific - so that the change can be concentrated. Probably 
even a separate make verify job(s) that would cherrypick this on top of 
whatever patch is being verified - such that we have some assurance that things 
don’t break.

This patch would add a single boolean “api is big endian”, and create 
api_htonX/api_ntohX versions of the endianness functions which will be no-ops 
if that boolean is false.

This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
would return a the value of the above Boolean.

Its presence means that the whatever abstraction layer is there needs to query 
it and to if it returns true, to  make htonX()/ntohX calls in the APIs to be a 
a conditional no-op in the same fashion as in VPP.

This will make for a backwards compatible change that can even be even 
backported to earlier branches if needs to... 

Sometime in July this could be going into master.

Then that change goes into 2009, with the default still being big-endian, as 
well as is committed to 2005 (it’s an api addition that doesn’t break anything 
else by default, and by then the folks must have seen it on master).

Then somewhere before 2101, say in November, we flip the default.

This gives two releases to iron out the kinks, and 2109 LTS ships with the 
default little endian API.

2201 then removes the wrappers and makes is_vpp_api_is_big_endian always return 
0.

Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d be 
doing and doing so in a graceful fashion will allow to avoid annoying the 
downstream users..

Also of course this implies that we MUST start having a longer term road map 
for strategic stuff like this, that should be published and vetted by the 
community.

Thoughts ?

--a

> On 16 May 2020, at 13:22, Damjan Marion  wrote:
> 
> 
> Knowing that even hard-core big-endian systems like PowerPC decided to switch 
> to Little endian and that situation where binary api will be exchanged 
> between BE and LE systems is extremely small, maybe we should consider 
> removing endian conversion from APIs and simply represent data
> in the native form. Looks like this is fertile ground for new bugs…
> 
> Thoughts?
> 
> — 
> Damjan
> 
> 
>> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
>> 
>> There's a very interesting couple of gerrit changes that just came in
>> today that is worth discussing,
>> and they are a perfect case study to further clarify the process - so
>> I tweaked the subject accordingly..
>> The API message itself is relatively minor, but regardless of what is
>> agreed that makes a good case study.
>> 
>> Backstory:
>> 
>> Once upon a time on Aug 20 2019, commit 053204ab changed
>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>> conversion
>> function didn't make it there (enums are u32 by default as far as I can see).
>> 
>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>> current 20.05-rc1.
>> 
>> Fast forward a bit, today I was pinged about the two changes - one for
>> master, one for stable/2001:
>> 
>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
>> 
>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>> htonl() call and changes the existing ("buggy")
>> behavior in the 20.01.2 - thus would silently break any API consumers
>> that coded against the previous "buggy" behavior.
>> 
>> And then we have a question open about stable/2005, which "by the
>> letter" can potentially accept only the second approach, since it is
>> past the API freeze.
>> 
>> Additional bit: this API is not touched in make test, so this bug had
>> slipped through.
>> 
>> So there are the following options:
>> 
>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>> making a "silent change" in both, but making the master look closer to
>> a 19.08 format.
>> 
>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>> merge the master patch that shrinks the enum down to 1 byte
>> 
>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>> endianness of the u32 enum everywhere, but making an effective "silent
>> change" in 20.01&20.05
>> 
>> 4)  merge the patch in master that shrinks the enum down to one byte,
>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>> "no api changes" but at least this gets to be explicitly visible early
>> on.
>> 
>> 5) under the current proposal, if the API message is defined as
>> "production" then there would need to be a new "in-progress" message
>> in master with either of the t

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Christian Hopps
As a user of these APIs I would hugely appreciate work be done on the API 
framework to make it more usable (don't require callbacks) fix the automatic 
endian conversion it already generates, improve support for argument additions 
(these are easy to support in a backward compatible way), and maybe arg size 
changes, into non-issues, rather than punting on this particular endian aspect 
and thinking everything is all-good.

Thanks,
Chris.

> On May 16, 2020, at 7:49 AM, Andrew Yourtchenko  wrote:
> 
> I was hesitating to write exactly that - so at least I am not alone in 
> thinking this... :-)
> 
> With my RelMgr hat on I would be seeing this as to have a -2’d  patch in 
> gerrit whose existence  I can announce in 20.05 release notes, which would 
> apply to both stable/2005 branch and to master branch, with an optional 
> add-on patch that was master specific - so that the change can be 
> concentrated. Probably even a separate make verify job(s) that would 
> cherrypick this on top of whatever patch is being verified - such that we 
> have some assurance that things don’t break.
> 
> This patch would add a single boolean “api is big endian”, and create 
> api_htonX/api_ntohX versions of the endianness functions which will be no-ops 
> if that boolean is false.
> 
> This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
> would return a the value of the above Boolean.
> 
> Its presence means that the whatever abstraction layer is there needs to 
> query it and to if it returns true, to  make htonX()/ntohX calls in the APIs 
> to be a a conditional no-op in the same fashion as in VPP.
> 
> This will make for a backwards compatible change that can even be even 
> backported to earlier branches if needs to... 
> 
> Sometime in July this could be going into master.
> 
> Then that change goes into 2009, with the default still being big-endian, as 
> well as is committed to 2005 (it’s an api addition that doesn’t break 
> anything else by default, and by then the folks must have seen it on master).
> 
> Then somewhere before 2101, say in November, we flip the default.
> 
> This gives two releases to iron out the kinks, and 2109 LTS ships with the 
> default little endian API.
> 
> 2201 then removes the wrappers and makes is_vpp_api_is_big_endian always 
> return 0.
> 
> Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d 
> be doing and doing so in a graceful fashion will allow to avoid annoying the 
> downstream users..
> 
> Also of course this implies that we MUST start having a longer term road map 
> for strategic stuff like this, that should be published and vetted by the 
> community.
> 
> Thoughts ?
> 
> --a
> 
>> On 16 May 2020, at 13:22, Damjan Marion  wrote:
>> 
>> 
>> Knowing that even hard-core big-endian systems like PowerPC decided to 
>> switch to Little endian and that situation where binary api will be 
>> exchanged between BE and LE systems is extremely small, maybe we should 
>> consider removing endian conversion from APIs and simply represent data
>> in the native form. Looks like this is fertile ground for new bugs…
>> 
>> Thoughts?
>> 
>> — 
>> Damjan
>> 
>> 
>>> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
>>> 
>>> There's a very interesting couple of gerrit changes that just came in
>>> today that is worth discussing,
>>> and they are a perfect case study to further clarify the process - so
>>> I tweaked the subject accordingly..
>>> The API message itself is relatively minor, but regardless of what is
>>> agreed that makes a good case study.
>>> 
>>> Backstory:
>>> 
>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>> conversion
>>> function didn't make it there (enums are u32 by default as far as I can 
>>> see).
>>> 
>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>> current 20.05-rc1.
>>> 
>>> Fast forward a bit, today I was pinged about the two changes - one for
>>> master, one for stable/2001:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
>>> u8
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>> htonl() call and changes the existing ("buggy")
>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>> that coded against the previous "buggy" behavior.
>>> 
>>> And then we have a question open about stable/2005, which "by the
>>> letter" can potentially accept only the second approach, since it is
>>> past the API freeze.
>>> 
>>> Additional bit: this API is not touched in make test, so this bug had
>>> slipped through.
>>> 
>>> So there are the following options:
>>> 
>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>> making a "silent change" in both, but making the master look closer to
>>> a 19.08 format.
>>> 
>>> 2) Leave the 20.05 and 20.01 alone w

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Dave Barach via lists.fd.io
API messages in network byte order made sense 10 years ago when I worked with a 
mixed x86_64 / ppc32 system. As Damjan points out, API interoperability between 
big-endian and little-endian systems is a boutique use-case these days.

Timing is key. We won’t be able to cherry-pick API message handler fixes across 
an endian-order flag-day. If we decide to switch to native byte order, we’d 
better switch right before we pull our next LTS release.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Saturday, May 16, 2020 7:23 AM
To: Andrew Yourtchenko 
Cc: Christian Hopps ; Ole Troan ; 
vpp-dev ; Jerome Tollet (jtollet) 
Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] 
Proposal for VPP binary API stability


Knowing that even hard-core big-endian systems like PowerPC decided to switch 
to Little endian and that situation where binary api will be exchanged between 
BE and LE systems is extremely small, maybe we should consider removing endian 
conversion from APIs and simply represent data
in the native form. Looks like this is fertile ground for new bugs…

Thoughts?

—
Damjan


> On 15 May 2020, at 16:20, Andrew Yourtchenko 
> mailto:ayour...@gmail.com>> wrote:
>
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
>
> Backstory:
>
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
>
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
>
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
>
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
>
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
>
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
>
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
>
> So there are the following options:
>
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
>
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
>
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
>
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
>
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
>
> So, the feedback that I would appreciate on the above:
>
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
>
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "fix the representation" ?
>
> 3) would there be an interest in having a sort of

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
On 5/16/20, Dave Barach (dbarach)  wrote:
> API messages in network byte order made sense 10 years ago when I worked
> with a mixed x86_64 / ppc32 system. As Damjan points out, API
> interoperability between big-endian and little-endian systems is a boutique
> use-case these days.
>
> Timing is key. We won’t be able to cherry-pick API message handler fixes
> across an endian-order flag-day. If we decide to switch to native byte
> order, we’d better switch right before we pull our next LTS release.

Do I understand it right that the idea here would be to flip the
endianness right before pulling stable/2009 branch ?

--a


>
> FWIW... Dave
>
> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
> via lists.fd.io
> Sent: Saturday, May 16, 2020 7:23 AM
> To: Andrew Yourtchenko 
> Cc: Christian Hopps ; Ole Troan ;
> vpp-dev ; Jerome Tollet (jtollet) 
> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev]
> Proposal for VPP binary API stability
>
>
> Knowing that even hard-core big-endian systems like PowerPC decided to
> switch to Little endian and that situation where binary api will be
> exchanged between BE and LE systems is extremely small, maybe we should
> consider removing endian conversion from APIs and simply represent data
> in the native form. Looks like this is fertile ground for new bugs…
>
> Thoughts?
>
> —
> Damjan
>
>
>> On 15 May 2020, at 16:20, Andrew Yourtchenko
>> mailto:ayour...@gmail.com>> wrote:
>>
>> There's a very interesting couple of gerrit changes that just came in
>> today that is worth discussing,
>> and they are a perfect case study to further clarify the process - so
>> I tweaked the subject accordingly..
>> The API message itself is relatively minor, but regardless of what is
>> agreed that makes a good case study.
>>
>> Backstory:
>>
>> Once upon a time on Aug 20 2019, commit 053204ab changed
>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>> conversion
>> function didn't make it there (enums are u32 by default as far as I can
>> see).
>>
>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>> current 20.05-rc1.
>>
>> Fast forward a bit, today I was pinged about the two changes - one for
>> master, one for stable/2001:
>>
>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a
>> u8
>>
>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>> htonl() call and changes the existing ("buggy")
>> behavior in the 20.01.2 - thus would silently break any API consumers
>> that coded against the previous "buggy" behavior.
>>
>> And then we have a question open about stable/2005, which "by the
>> letter" can potentially accept only the second approach, since it is
>> past the API freeze.
>>
>> Additional bit: this API is not touched in make test, so this bug had
>> slipped through.
>>
>> So there are the following options:
>>
>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>> making a "silent change" in both, but making the master look closer to
>> a 19.08 format.
>>
>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>> merge the master patch that shrinks the enum down to 1 byte
>>
>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>> endianness of the u32 enum everywhere, but making an effective "silent
>> change" in 20.01&20.05
>>
>> 4)  merge the patch in master that shrinks the enum down to one byte,
>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>> "no api changes" but at least this gets to be explicitly visible early
>> on.
>>
>> 5) under the current proposal, if the API message is defined as
>> "production" then there would need to be a new "in-progress" message
>> in master with either of the two fixes, the buggy message would need
>> to be marked as "deprecated".  And under the current proposal by the
>> 20.09 the "in-progress" message would become "production", the current
>> message would be shown as "deprecated",  to be deleted in 21.01.
>>
>> So, the feedback that I would appreciate on the above:
>>
>> 1) the least worst course of action "right now", for this particular
>> issue. I discussed this with Ole and we have different opinions, so I
>> would like the 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
Very valid point. Endianness is only aspect. Accidental size changes on the 
wire is different.

But this drifted very far astray from my original question: if we keep 
shuffling the bits on wire regardless, how much of a value does a meticulous 
CRC tracking process provides ? i see it as actively harmful in that case, 
since it creates false expectations.

--a

> On 16 May 2020, at 14:02, Christian Hopps  wrote:
> 
> As a user of these APIs I would hugely appreciate work be done on the API 
> framework to make it more usable (don't require callbacks) fix the automatic 
> endian conversion it already generates, improve support for argument 
> additions (these are easy to support in a backward compatible way), and maybe 
> arg size changes, into non-issues, rather than punting on this particular 
> endian aspect and thinking everything is all-good.
> 
> Thanks,
> Chris.
> 
>> On May 16, 2020, at 7:49 AM, Andrew Yourtchenko  wrote:
>> 
>> I was hesitating to write exactly that - so at least I am not alone in 
>> thinking this... :-)
>> 
>> With my RelMgr hat on I would be seeing this as to have a -2’d  patch in 
>> gerrit whose existence  I can announce in 20.05 release notes, which would 
>> apply to both stable/2005 branch and to master branch, with an optional 
>> add-on patch that was master specific - so that the change can be 
>> concentrated. Probably even a separate make verify job(s) that would 
>> cherrypick this on top of whatever patch is being verified - such that we 
>> have some assurance that things don’t break.
>> 
>> This patch would add a single boolean “api is big endian”, and create 
>> api_htonX/api_ntohX versions of the endianness functions which will be 
>> no-ops if that boolean is false.
>> 
>> This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
>> would return a the value of the above Boolean.
>> 
>> Its presence means that the whatever abstraction layer is there needs to 
>> query it and to if it returns true, to  make htonX()/ntohX calls in the APIs 
>> to be a a conditional no-op in the same fashion as in VPP.
>> 
>> This will make for a backwards compatible change that can even be even 
>> backported to earlier branches if needs to... 
>> 
>> Sometime in July this could be going into master.
>> 
>> Then that change goes into 2009, with the default still being big-endian, as 
>> well as is committed to 2005 (it’s an api addition that doesn’t break 
>> anything else by default, and by then the folks must have seen it on master).
>> 
>> Then somewhere before 2101, say in November, we flip the default.
>> 
>> This gives two releases to iron out the kinks, and 2109 LTS ships with the 
>> default little endian API.
>> 
>> 2201 then removes the wrappers and makes is_vpp_api_is_big_endian always 
>> return 0.
>> 
>> Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d 
>> be doing and doing so in a graceful fashion will allow to avoid annoying the 
>> downstream users..
>> 
>> Also of course this implies that we MUST start having a longer term road map 
>> for strategic stuff like this, that should be published and vetted by the 
>> community.
>> 
>> Thoughts ?
>> 
>> --a
>> 
 On 16 May 2020, at 13:22, Damjan Marion  wrote:
>>> 
>>> 
>>> Knowing that even hard-core big-endian systems like PowerPC decided to 
>>> switch to Little endian and that situation where binary api will be 
>>> exchanged between BE and LE systems is extremely small, maybe we should 
>>> consider removing endian conversion from APIs and simply represent data
>>> in the native form. Looks like this is fertile ground for new bugs…
>>> 
>>> Thoughts?
>>> 
>>> — 
>>> Damjan
>>> 
>>> 
 On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
 
 There's a very interesting couple of gerrit changes that just came in
 today that is worth discussing,
 and they are a perfect case study to further clarify the process - so
 I tweaked the subject accordingly..
 The API message itself is relatively minor, but regardless of what is
 agreed that makes a good case study.
 
 Backstory:
 
 Once upon a time on Aug 20 2019, commit 053204ab changed
 sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
 conversion
 function didn't make it there (enums are u32 by default as far as I can 
 see).
 
 This was after the 19.08 branch pull, and it wasn't ever tackled, so
 this (buggy) behavior ended being in 20.01, 20.01.1, and in the
 current 20.05-rc1.
 
 Fast forward a bit, today I was pinged about the two changes - one for
 master, one for stable/2001:
 
 https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
 u8
 
 https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
 htonl() call and changes the existing ("buggy")
 behavior in the 20.01.2 - thus would silently break any API consumers
 that coded against the previous "buggy" b

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Christian Hopps


> On May 16, 2020, at 9:52 AM, Andrew 👽 Yourtchenko  wrote:
> 
> On 5/16/20, Dave Barach (dbarach)  wrote:
>> API messages in network byte order made sense 10 years ago when I worked
>> with a mixed x86_64 / ppc32 system. As Damjan points out, API
>> interoperability between big-endian and little-endian systems is a boutique
>> use-case these days.
>> 
>> Timing is key. We won’t be able to cherry-pick API message handler fixes
>> across an endian-order flag-day. If we decide to switch to native byte
>> order, we’d better switch right before we pull our next LTS release.
> 
> Do I understand it right that the idea here would be to flip the
> endianness right before pulling stable/2009 branch ?

If this is done right before (or near) the branch pull for 2009 then I think 
the release candidate should be left to bake longer than normal, and API users 
encouraged to port/test to the release candidate. Obviously any bugs found 
would then be fixed on the RC. Maybe ask for a couple/few stakeholders to sign 
up to do this and then wait a reasonable amount of time for them to sign-off?

I know we use the binary APIs, I believe Netgate does as well. I'm sure there 
are others too (might be good to collect a list of these folks if one doesn't 
exist yet).

If these changes were coupled with the (more important IMO) change i've been 
(ceaselessly :) asking for in the API (non-callback sync functions) I would be 
more than willing to sign up for this.

Thanks,
Chris.

> 
> --a
> 
> 
>> 
>> FWIW... Dave
>> 
>> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
>> via lists.fd.io
>> Sent: Saturday, May 16, 2020 7:23 AM
>> To: Andrew Yourtchenko 
>> Cc: Christian Hopps ; Ole Troan ;
>> vpp-dev ; Jerome Tollet (jtollet) 
>> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev]
>> Proposal for VPP binary API stability
>> 
>> 
>> Knowing that even hard-core big-endian systems like PowerPC decided to
>> switch to Little endian and that situation where binary api will be
>> exchanged between BE and LE systems is extremely small, maybe we should
>> consider removing endian conversion from APIs and simply represent data
>> in the native form. Looks like this is fertile ground for new bugs…
>> 
>> Thoughts?
>> 
>> —
>> Damjan
>> 
>> 
>>> On 15 May 2020, at 16:20, Andrew Yourtchenko
>>> mailto:ayour...@gmail.com>> wrote:
>>> 
>>> There's a very interesting couple of gerrit changes that just came in
>>> today that is worth discussing,
>>> and they are a perfect case study to further clarify the process - so
>>> I tweaked the subject accordingly..
>>> The API message itself is relatively minor, but regardless of what is
>>> agreed that makes a good case study.
>>> 
>>> Backstory:
>>> 
>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>> conversion
>>> function didn't make it there (enums are u32 by default as far as I can
>>> see).
>>> 
>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>> current 20.05-rc1.
>>> 
>>> Fast forward a bit, today I was pinged about the two changes - one for
>>> master, one for stable/2001:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a
>>> u8
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>> htonl() call and changes the existing ("buggy")
>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>> that coded against the previous "buggy" behavior.
>>> 
>>> And then we have a question open about stable/2005, which "by the
>>> letter" can potentially accept only the second approach, since it is
>>> past the API freeze.
>>> 
>>> Additional bit: this API is not touched in make test, so this bug had
>>> slipped through.
>>> 
>>> So there are the following options:
>>> 
>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>> making a "silent change" in both, but making the master look closer to
>>> a 19.08 format.
>>> 
>>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>>> merge the master patch that shrinks the enum down to 1 byte
>>> 
>>> 3) Merge t

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Ole Troan
>>> API messages in network byte order made sense 10 years ago when I worked
>>> with a mixed x86_64 / ppc32 system. As Damjan points out, API
>>> interoperability between big-endian and little-endian systems is a boutique
>>> use-case these days.
>>> 
>>> Timing is key. We won’t be able to cherry-pick API message handler fixes
>>> across an endian-order flag-day. If we decide to switch to native byte
>>> order, we’d better switch right before we pull our next LTS release.
>> 
>> Do I understand it right that the idea here would be to flip the
>> endianness right before pulling stable/2009 branch ?
> 
> If this is done right before (or near) the branch pull for 2009 then I think 
> the release candidate should be left to bake longer than normal, and API 
> users encouraged to port/test to the release candidate. Obviously any bugs 
> found would then be fixed on the RC. Maybe ask for a couple/few stakeholders 
> to sign up to do this and then wait a reasonable amount of time for them to 
> sign-off?
> 
> I know we use the binary APIs, I believe Netgate does as well. I'm sure there 
> are others too (might be good to collect a list of these folks if one doesn't 
> exist yet).
> 
> If these changes were coupled with the (more important IMO) change i've been 
> (ceaselessly :) asking for in the API (non-callback sync functions) I would 
> be more than willing to sign up for this.

I am a little concerned about the benefits of switching the endian 
representation in the wire format.
The endianness and representation on the wire should not be exposed to the 
client-side or VPP API handler code.
The language bindings I am familiar with hide endian issues from the user code. 
Python and Go. I think VAPI does too.

VPP API action handlers do currently need to deal with endianness. That should 
be a relatively simple change.
I can add a new flag in the message definition in the API, and then wrap those 
action handlers with endian functions. Then we can avoid having VPP developers 
manually do endian conversions for API functions.

The current on-the-wire representation is that of packed C structures, and we 
are somewhat a victim of that. E.g. how the compiler puts various members of a 
union in memory is as far as I know not part of the C standard.
I would suggests we move towards isolating user-side and VPP-side code from 
endianess and encoding (and we can do that evolutionary) and then consider if 
we can move towards a better specified encoding format. If that's CBOR, 
Flatbuffers or something else.

Christian for your VAPI C callback issue, that I believe is purely under the 
control of the language binding.
Can I volunteer you ti fix that?

Best regards,
Ole-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16435): https://lists.fd.io/g/vpp-dev/message/16435
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-17 Thread Jon Loeliger via lists.fd.io
On Sat, May 16, 2020 at 10:02 AM Christian Hopps  wrote:

>
> I know we use the binary APIs, I believe Netgate does as well. I'm sure
> there are others too (might be good to collect a list of these folks if one
> doesn't exist yet).
>

Indeed, Netgate uses the binary APIs extensively.

jdl
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16437): https://lists.fd.io/g/vpp-dev/message/16437
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-18 Thread Ole Troan
> API messages in network byte order made sense 10 years ago when I worked with 
> a mixed x86_64 / ppc32 system. As Damjan points out, API interoperability 
> between big-endian and little-endian systems is a boutique use-case these 
> days.
>  
> Timing is key. We won’t be able to cherry-pick API message handler fixes 
> across an endian-order flag-day. If we decide to switch to native byte order, 
> we’d better switch right before we pull our next LTS release.

Here is a draft patch that moves endian conversion into the API infrastructure:
https://gerrit.fd.io/r/c/vpp/+/27119

If we first move all the API handlers to native endian, and leave it to the API 
infrastructure to do endian conversion, we can even allow for clients to 
negotiate what endianness they want.
That move we can do without affecting the API clients as the on-the-wire 
encoding will be the same.

Regarding that patch; I might want to hide the endian conversion inside of 
vl_api_send_msg() instead of in the REPLY macro(s).
There are some mess to tidy up, like client_index is used native endian for 
shared memory clients, but not for socket clients.

Cheers,
Ole

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16438): https://lists.fd.io/g/vpp-dev/message/16438
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-18 Thread Andrew Yourtchenko
FYI - I have added an editorial blurb into the draft of the release notes for 
20.05, based on the discussions in this thread:

https://gerrit.fd.io/r/c/vpp/+/27128/

Please feel free to review.

--a

> On 17 May 2020, at 20:13, Jon Loeliger  wrote:
> 
> 
>> On Sat, May 16, 2020 at 10:02 AM Christian Hopps  wrote:
> 
>> 
>> I know we use the binary APIs, I believe Netgate does as well. I'm sure 
>> there are others too (might be good to collect a list of these folks if one 
>> doesn't exist yet).
> 
> Indeed, Netgate uses the binary APIs extensively.
> 
> jdl
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16440): https://lists.fd.io/g/vpp-dev/message/16440
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-