Hi Rob, all,


Thanks for the updated document.

New version is definitely an improvement. Thanks for the work.

Please find below some comments.





1)  Critical error (§3)



IMHO, the term "critical error" is mixing both technical/protocol 
considerations (e.g. can't read the update) and requirements considerations 
(BGP sessions state is too degraded and I prefer shutting it down rather than 
running on a degraded mode) which IMHO is unfortunate and does not help the 
discussion. I'd much prefer that we distinguish both by defining technical 
levels of errors and then defining the requirements for each plus  the 
consequences/drawbacks of the decision (whether to keep or shut the session).

For the protocol standpoint, I would propose the following level of errors, 
based on the protocol encoding layers: session, update, attribute.

- attribute level error: semantic or syntax error in the attribute value or 
attribute flags

- session level error: error in the update length / marker. i.e. if skipping 
the update length I can't find the marker of the next bgp message.

- update level error: any other error in the update message



We can further distinguish if the NLRIs can be parsed or not.



2)  Business Requirements

In the current text, I found the requirements a bit too technically oriented. 
I'd rather add business requirements independent of the current solutions. I 
would propose:



In VPN networks, VPN are supposed to be isolated from each others and from the 
others services (most notably the Internet). Hence, an error on routes/BGP 
messages related to a VPN SHOULD NOT negatively impact others VPN. Similarly, 
an error on routes/BGP messages related to a non VPN service SHOULD not 
negatively impact the VPN service.

In Internet networks, ASes are supposed to be Autonomous. Hence an error on 
routes/BGP messages originated by an AS SHOULD NOT negatively impact 
destinations originated from others ASes.



By "negatively impact", we mean losing reachability for a destination (NLRI), 
typically by losing all the paths in the Loc-RIB to that destination (NLRI). 
Note that those paths may be learnt through multiple BGP sessions and hence the 
requirement span multiple BGP sessions. The consequence is that if the BGP 
error is believed to be limited to a single BGP session (e.g. a session level 
error), then in a network with redundancy, the destination is believed to be 
still known through another session and hence the session MAY be chosen to be 
shutdown and all path learned from that session removed. On the contrary, if 
the BGP error has a chance to be also met on the redundant paths/sessions, then 
the BGP session and the routes learned from that session SHOULD be preserved, 
until the negatives consequences are considered too important. When evaluating 
those consequences, the fact that all redundant paths/sessions may suffer from 
the same error and hence will inherit the same decision MUST be considered.



As an illustration, we typically seek to avoid that because of a single BGP 
error a PE lose both its redundant iBGP session with its BGP RR. And by "a PE" 
I really mean all PE experiencing this condition. Could easily be 10s of PE, 
even 100s.



3)  Technical requirements

For session level error, the BGP session is dead so need to be 
shutdown/graceful shutdown/graceful restart. If the update length is set to the 
number of octets sent to the peer (or vice versa) rather than computed based on 
the content of the update, there is a chance to 1) limit the number of such 
session level errors and 2) increase the probability that this error is local 
to that session and not likely to happen on a redundant/backup session. There 
is probably a limited part of the BGP code which needs to be hardened to reduce 
such unrecoverable errors. And if those errors are still frequent, we may 
further propose technical solutions (e.g. replacing TCP by SCTP which can 
provides message boundaries, among others things (e.g. some benefits of 
multi-sessions))



For attribute & update level error when the NLRI can be parsed, cf 
draft-error-handling (treat as withdraw).



Now let the discussion begin :). For attribute & update level error when the 
NLRI cannot be extracted IMHO there is room for discussion and analysis of the 
consequences.

"since the NLRI cannot be extracted, error handling mechanisms must be applied 
at the per-session level" (§5)

Well, IMO, this is a choice to be made rather than a "must".





If we were to skip a BGP update:

For Internet, probably the worst case would be to miss a BGP update with a loop 
in the AS path and hence create a loop for me and my upstream ASes for the NLRI 
in the missed updated. How much probable is this? 0 for iBGP sessions. TBE for 
eBGP. Then what would be the consequences? loss of connectivity for the NLRI 
until the problem is manually solved by an AS between the origin and me, 
possible forwarding congestions for others. I'm not sure I care too much about 
loosing reachability to NLRI in faulty BGP update as most likely, if only one 
BGP update (out of millions) is faulty, the reason may come from the origin AS 
playing with a specific bit or attribute and if they chose to play with their 
update, they should bear the responsibility. To be compared by the probability 
of losing all redundant paths (if the error is seen on redundant path) and the 
consequences (PE -possibly all PEs- down).



For VPN, probably the worst case would be to keep a VPN label previously 
allocated to VPN 1 and re-allocated to another VPN (VPN breach Cf 
http://tools.ietf.org/html/draft-uttaro-idr-bgp-persistence-01#section-8)

Again, the pro and con could be discussed (e.g. possibly one way partial VPN 
breach for some time (that basically no one can exploit) vs all VPN/PE being 
down. IMHO, if we believe such issue could be corrected in 30-60 minutes, I 
would probably favor keeping the session up.



>From the lively discussions, looks like the opinions may vary depending on the 
>AS, people and circumstances. E.g. how much my redundant BGP paths are failure 
>independent? (e.g. use different BGP implementations)

As such, what about defining severity levels for BGP error handling? As one may 
wish to accept only low severity errors while others may be willing to accept 
high severity errors (including when the NLRI cannot be found) e.g. the network 
has been down for 30 minutes, while waiting for the patch, one may want to be 
able to restore some service at all costs (can't possibly be worst).



Again, IMHO it would be good to discuss the drawbacks depending on the 
situation (iBGP, eBGP; hop by hop routed, tunneled ...) in this requirement 
document to make sure we are all on the same page, we have constructive 
discussions and SP enabling revised error handling are fully aware of the 
consequences.



4)  Security consideration

In §7 "security considerations" I would discuss the fact that current BGP error 
handling (or a (too) strict one) could be exploited by attackers to create a 
remote DOS attack.

Should we also ask a review of the SIDR WG since "The purpose of the SIDR 
working group is to reduce vulnerabilities in the inter-domain routing system." 
? ...



Best regards,

Bruno





























>-----Original Message-----

>From: idr-boun...@ietf.org [mailto:idr-boun...@ietf.org] On Behalf Of Rob

>Shakir

>Sent: Thursday, December 27, 2012 7:44 PM

>To: i...@ietf.org

>Subject: [Idr] Fwd: [GROW] I-D Action: draft-ietf-grow-ops-reqs-for-bgp-error-

>handling-06.txt

>

>Hi IDR!

>

>FYI -- please find an updated relating to a new version of draft-ietf-grow-ops-

>reqs-for-bgp-error-handling.

>

>Any comments very welcome (to me or grow@).

>

>Seasons greetings!

>r.

>

>Begin forwarded message:

>

>> From: <rob.sha...@bt.com<mailto:rob.sha...@bt.com>>

>> Subject: Re: [GROW] I-D Action: draft-ietf-grow-ops-reqs-for-bgp-error-

>handling-06.txt

>> Date: 27 December 2012 18:41:50 GMT

>> To: <internet-dra...@ietf.org<mailto:internet-dra...@ietf.org>>, 
>> <i-d-annou...@ietf.org<mailto:i-d-annou...@ietf.org>>

>> Cc: grow@ietf.org<mailto:grow@ietf.org>

>>

>> On 27/12/2012 18:35, 
>> "internet-dra...@ietf.org<mailto:internet-dra...@ietf.org>" 
>> <internet-dra...@ietf.org<mailto:internet-dra...@ietf.org>>

>> wrote:

>>

>>>

>>> A New Internet-Draft is available from the on-line Internet-Drafts

>>> directories.

>>> This draft is a work item of the Global Routing Operations Working Group

>>> of the IETF.

>>>

>>>   Title           : Operational Requirements for Enhanced Error Handling

>>> Behaviour in BGP-4

>>>   Author(s)       : Rob Shakir

>>>   Filename        : draft-ietf-grow-ops-reqs-for-bgp-error-handling-06.txt

>>>   Pages           : 19

>>>   Date            : 2012-12-27

>>

>> Hi GROW!

>>

>> This update is a fairly major re-spin of the BGP Error Handling

>> requirements draft. The technical content should be as per the previous

>> revisions however, following the ietf/RtgDir last call comments, I have

>> made the following changes:

>>

>> * Made the amendments that were discussed and there was no disagreement

>> with from our meeting in Atlanta -- this is essentially renaming the

>> Critical/Semantic error types to Critical/Non-Critical.

>>

>> * Significant de-duplication within the text including merging the

>> operational monitoring/toolset discussions into the error handling

>> sections.

>>

>> * Adoption of rfc2119 language throughout to clarify the requirements.

>>

>> * Removal of some of the discussion around more detailed justifications

>> for why particular decisions were made. I think this was useful through

>> the discussion phase of this draft, but it seems like GROW/IDR have

>> converged on a relatively stable set of requirements, so I have trimmed

>> back some of this discussion.

>>

>> I'd really welcome any further comments on this before we re-submit for

>> publication. To eke these out - Peter/Chris - can you kick off a WGLC for

>> this draft please? :-)

>>

>> Seasons greetings!

>> r.

>>

>> _______________________________________________

>> GROW mailing list

>> GROW@ietf.org<mailto:GROW@ietf.org>

>> https://www.ietf.org/mailman/listinfo/grow

>

>_______________________________________________

>Idr mailing list

>i...@ietf.org<mailto:i...@ietf.org>

>https://www.ietf.org/mailman/listinfo/idr

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.

_______________________________________________
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow

Reply via email to