[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-04-05 Thread Shah, Himanshu
Disagree.
We have discussed the motivation (for prioritizing e2e protection over local 
protection) in the draft.
It serves the purpose without having to disable TI-LFA on each node – not a 
desirable option.

Thanks,
Himanshu


From: Joel Halpern 
Date: Thursday, March 20, 2025 at 10:41 AM
To: Greg Mirsky , Robert Raszuk 
Cc: Shah, Himanshu , BESS , 
[email protected] 

Subject: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM

It seems rather counter-intuitive to want to try to repair things end-to-end 
faster than one expects local devices to detect local failures.  The implied 
information race conditions seem an invitation to trouble.

Yours,

Joel
On 3/19/2025 11:14 PM, Greg Mirsky wrote:
Hi Robert,
I wholeheartedly agree that local and e2e OAM are complementary tools in an 
operator's toolbox. Usually, a multi-layer OAM is constructed so that e2e 
provides the network with a safety net. In that manner, local repair of a link 
failure is expected to restore services before the failure is detected on the 
e2e level. As I understand it, the proposal uses a different scheme. According 
to it, e2e network detection is expected to be more aggressive than the 
link-level OAM. To me, that's an unusual arrangement.
As for performance monitoring, although some performance metrics can be 
measured spatially to compose e2e metrics, e2e performance monitoring is easier 
to deploy in many environments.

Regards,
Greg

On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk 
mailto:[email protected]>> wrote:
Hi Greg,

I am very much in support of end to end path assurance. And by assurance I mean 
not only e2e liveness but also e2e loss, delays, jitter etc ...

The main reason is that link layer failures (even if done on every link in the 
path) does not provide any information about transit via network devices. And 
those can be subject to packet drops, selective packet drops (brownouts), 
delays and jitter via box fabrics in distributed systems etc ... So to me even 
if e2e is slower then local link detection it still very much a preferred way 
to assure end to end path quality.

Sure some of them is done at the application layer, but then it is done mainly 
for statistics and reporting. Doing it at network layer opens up possibilities 
to choose different path (quite likely via different provider) when original 
path experiences some issues or service degradation which with link by link 
failure detection is invisible to the endpoints.

I think at the end of the day those two are not really competing solutions but 
complimentary. And of course end to end makes sense especially in deployments 
when you can have diverse paths end to end.

Cheers
Robert

On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky 
mailto:[email protected]>> wrote:

Hi Himanshu,

Thank you for the presentation of draft-karboubi-spring-sidlist-optimized-cs-sr 
[datatracker.ietf.org].
 If I understood your response to Ali correctly, the proposed mechanism is 
expected to use more aggressive network failure detection than the link layer. 
If that is correct, I have several questions about the multi-layer OAM:

  *   AFAIK link-layer failures are detected within 10 ms using a connectivity 
check mechanism (CCM of Y.1731 or a single-hop BFD) with a 3.3 ms interval.
  *   If the link failure is detectable within 10 ms, what detection time for 
the path, i.e., E2E connection failure detection, is suggested? What interval 
between test probes will be used in that case?
  *   Furthermore, even if the path converges around the link failure before 
the local protection is deployed, the link failure will be detected, and the 
protection mechanism will be deployed despite the Orchestrator setting up its 
recovery path in the network. If that is correct, local defect detection and 
protection are unnecessary overheads. Would you agree?



Regards,

Greg
___
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]



___

BESS mailing list -- [email protected]

To unsubscribe send an email to [email protected]
___
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]


[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-22 Thread Shah, Himanshu
C&P some texts from the earlier mail that got short circuited..
--
We recommend that for the purpose of networks that want to take advantage of 
Eligibility mechanism for intent verification especially for fault detection 
scheme, the e2e fault detection Timers are kept more aggressive than local link 
fault detection timers.
This is a better choice than turning off TI-LFA at each node.
For example - 1hop timers at 10 ms interval with 3 miss and s-bfd at 5ms 
interval or 10ms with 2 miss. This is just an example.
It’s a choice, if one wants e2e protection to take higher precedence over local 
protection. As I mentioned, this behavior is more preferable to transport 
centric service providers that we have talked to.
 --

I believe this addresses your comments below.

Do note that we have successfully deployed this solution in running networks 
for multiple years.


Thanks,
Himanshu


From: Greg Mirsky 
Date: Thursday, March 20, 2025 at 11:00 AM
To: Shah, Himanshu 
Cc: Joel Halpern , Robert Raszuk , 
BESS , [email protected] 

Subject: Re: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
Hi Himanshu,
I agree with Joel that inversing multi-layer OAM is a tricky and untested
proposal. Consider the usual multi-layer OAM arrangement. Link failure
detection is within 10 ms using 3.3 ms intervals. You stressed that e2e
uses more aggressive network failure detection. Would that be based on 1 ms
intervals for multi-hop BFD? AFAIK, in the usual multi-layer OAM, the e2e
network failure detection is based on 100 ms to ensure that the local
protection mechanism can converge without firing e2e recovery. However, in
the case of the inverse multi-layer OAM you presented, it appears that both
recovery mechanisms, i.e., local and e2e, will be deployed. In my opinion,
that is inefficient, confusing, and unnecessary. Am I missing something
here?

Regards,
Greg

On Thu, Mar 20, 2025 at 10:46 AM Shah, Himanshu  wrote:

> Disagree.
>
> We have discussed the motivation (for prioritizing e2e protection over
> local protection) in the draft.
>
> It serves the purpose without having to disable TI-LFA on each node – not
> a desirable option.
>
>
>
> Thanks,
>
> Himanshu
>
>
>
>
>
> *From: *Joel Halpern 
> *Date: *Thursday, March 20, 2025 at 10:41 AM
> *To: *Greg Mirsky , Robert Raszuk <
> [email protected]>
> *Cc: *Shah, Himanshu , BESS ,
> [email protected] <
> [email protected]>
> *Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
>
> It seems rather counter-intuitive to want to try to repair things
> end-to-end faster than one expects local devices to detect local failures
___
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]


[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-19 Thread Joel Halpern
At the very least, I would expect to see some explanation of why one 
would also be running TI-LFA.  And probably a discussion of how this 
interacts with the information propagation when the local detection 
kicks in.   I can believe both points can be addressed, but it is hard 
to understand without them.


Yours,

Joel

On 3/19/2025 11:46 PM, Shah, Himanshu wrote:


Disagree.

We have discussed the motivation (for prioritizing e2e protection over 
local protection) in the draft.


It serves the purpose without having to disable TI-LFA on each node – 
not a desirable option.


Thanks,

Himanshu

*From: *Joel Halpern 
*Date: *Thursday, March 20, 2025 at 10:41 AM
*To: *Greg Mirsky , Robert Raszuk 

*Cc: *Shah, Himanshu , BESS , 
[email protected] 


*Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM

It seems rather counter-intuitive to want to try to repair things 
end-to-end faster than one expects local devices to detect local 
failures.  The implied information race conditions seem an invitation 
to trouble.


Yours,

Joel

On 3/19/2025 11:14 PM, Greg Mirsky wrote:

Hi Robert,

I wholeheartedly agree that local and e2e OAM are complementary
tools in an operator's toolbox. Usually, a multi-layer OAM is
constructed so that e2e provides the network with a safety net. In
that manner, local repair of a link failure is expected to restore
services before the failure is detected on the e2e level. As I
understand it, the proposal uses a different scheme. According to
it, e2e network detection is expected to be more aggressive than
the link-level OAM. To me, that's an unusual arrangement.

As for performance monitoring, although some performance metrics
can be measured spatially to compose e2e metrics, e2e performance
monitoring is easier to deploy in many environments.

Regards,

Greg

On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk 
wrote:

Hi Greg,

I am very much in support of end to end path assurance. And by
assurance I mean not only e2e liveness but also e2e loss,
delays, jitter etc ...

The main reason is that link layer failures (even if done on
every link in the path) does not provide any information about
transit via network devices. And those can be subject to
packet drops, selective packet drops (brownouts), delays and
jitter via box fabrics in distributed systems etc ... So to me
even if e2e is slower then local link detection it still very
much a preferred way to assure end to end path quality.

Sure some of them is done at the application layer, but then
it is done mainly for statistics and reporting. Doing it at
network layer opens up possibilities to choose different path
(quite likely via different provider) when original path
experiences some issues or service degradation which with link
by link failure detection is invisible to the endpoints.

I think at the end of the day those two are not really
competing solutions but complimentary. And of course end to
end makes sense especially in deployments when you can have
diverse paths end to end.

Cheers

Robert

On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky
 wrote:

Hi Himanshu,

Thank you for the presentation of
draft-karboubi-spring-sidlist-optimized-cs-sr
[datatracker.ietf.org]

.
If I understood your response to Ali correctly, the
proposed mechanism is expected to use more
aggressive network failure detection than the link layer.
If that is correct, I have several questions about the
multi-layer OAM:

  * AFAIK link-layer failures are detected within 10 ms
using a connectivity check mechanism (CCM of Y.1731 or
a single-hop BFD) with a 3.3 ms interval.
  * If the link failure is detectable within 10 ms, what
detection time for the path, i.e., E2E connection
failure detection, is suggested? What interval between
test probes will be used in that case?
  * Furthermore, even if the path converges around the
link failure before the local protection is deployed,
the link failure will be detected, and the protection
mechanism will be deployed despite the Orchestrator
setting up its recovery path in the network. If
that is correct, local defect detection and protection
are unnecessary overheads. Would you agree?

Re

[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-19 Thread Shah, Himanshu
In line..

Thanks,
Himanshu


From: Zafar Ali (zali) 
Date: Thursday, March 20, 2025 at 12:07 PM
To: Shah, Himanshu , Greg Mirsky 
Cc: Robert Raszuk , BESS , 
[email protected] 
, Zafar Ali (zali) 

Subject: Re: [bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM
Hi Himanshu

All the customers I spoke with wants to run SR Circuit Style on a common IP 
network infrastructure (think $$$).
Hence, turning off TI-LFA is off the table.

Himanshu> Totally agree. We are not recommending disabling TI-LFA

In order to have e2e detection win the local detection, you will have to beat 
the transmission delays.
You mentioned the main goal for you to implement this work around is to avoid 
use of transit policies means that you have many hops.
Also, like Joel mentioned, think about the race condition.

Himanshu> Look we have deployed this solution and the networks have been 
running fine. So all these concerns are somewhat theoretical.

The architecturally correct solution is to use unprotected Adj SIDs and use 
transit BSID

  *   Aside: With uSID, I have hardly seen the need for transit BSIDs for most 
implementations from different vendors I am aware of.

Himanshu> There is no disagreement here either. We are NOT saying that 
uncompressed persistent adj-sids based path does not work. That absolutely 
works. But CS-SR optimized with compressed SID list ALSO works – we have 
deployed it. Not sure what other proof is needed. It is OK for some to remain 
skeptical with theoretical or passion based reasons, it does not change the 
fact in the field.

Thanks,
Himanshu

Thanks

Regards … Zafar

From: Shah, Himanshu 
Date: Thursday, March 20, 2025 at 11:16 AM
To: Greg Mirsky 
Cc: Robert Raszuk , BESS , 
[email protected] 

Subject: [bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM
C&P some texts from the earlier mail that got short circuited..
--
We recommend that for the purpose of networks that want to take advantage of 
Eligibility mechanism for intent verification especially for fault detection 
scheme, the e2e fault detection Timers are kept more aggressive than local link 
fault detection timers.
This is a better choice than turning off TI-LFA at each node.
For example - 1hop timers at 10 ms interval with 3 miss and s-bfd at 5ms 
interval or 10ms with 2 miss. This is just an example.
It’s a choice, if one wants e2e protection to take higher precedence over local 
protection. As I mentioned, this behavior is more preferable to transport 
centric service providers that we have talked to.
 --

I believe this addresses your comments below.

Do note that we have successfully deployed this solution in running networks 
for multiple years.


Thanks,
Himanshu


From: Greg Mirsky 
Date: Thursday, March 20, 2025 at 11:00 AM
To: Shah, Himanshu 
Cc: Joel Halpern , Robert Raszuk , 
BESS , [email protected] 

Subject: Re: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
Hi Himanshu,
I agree with Joel that inversing multi-layer OAM is a tricky and untested
proposal. Consider the usual multi-layer OAM arrangement. Link failure
detection is within 10 ms using 3.3 ms intervals. You stressed that e2e
uses more aggressive network failure detection. Would that be based on 1 ms
intervals for multi-hop BFD? AFAIK, in the usual multi-layer OAM, the e2e
network failure detection is based on 100 ms to ensure that the local
protection mechanism can converge without firing e2e recovery. However, in
the case of the inverse multi-layer OAM you presented, it appears that both
recovery mechanisms, i.e., local and e2e, will be deployed. In my opinion,
that is inefficient, confusing, and unnecessary. Am I missing something
here?

Regards,
Greg

On Thu, Mar 20, 2025 at 10:46 AM Shah, Himanshu  wrote:

> Disagree.
>
> We have discussed the motivation (for prioritizing e2e protection over
> local protection) in the draft.
>
> It serves the purpose without having to disable TI-LFA on each node – not
> a desirable option.
>
>
>
> Thanks,
>
> Himanshu
>
>
>
>
>
> *From: *Joel Halpern 
> *Date: *Thursday, March 20, 2025 at 10:41 AM
> *To: *Greg Mirsky , Robert Raszuk <
> [email protected]>
> *Cc: *Shah, Himanshu , BESS ,
> [email protected] <
> [email protected]>
> *Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
>
> It seems rather counter-intuitive to want to try to repair things
> end-to-end faster than one expects local devices to detect local failures
___
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]


[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-19 Thread Zafar Ali (zali)
Hi Himanshu

All the customers I spoke with wants to run SR Circuit Style on a common IP 
network infrastructure (think $$$).
Hence, turning off TI-LFA is off the table.

In order to have e2e detection win the local detection, you will have to beat 
the transmission delays.
You mentioned the main goal for you to implement this work around is to avoid 
use of transit policies means that you have many hops.
Also, like Joel mentioned, think about the race condition.

The architecturally correct solution is to use unprotected Adj SIDs and use 
transit BSID

  *   Aside: With uSID, I have hardly seen the need for transit BSIDs for most 
implementations from different vendors I am aware of.

Thanks

Regards … Zafar

From: Shah, Himanshu 
Date: Thursday, March 20, 2025 at 11:16 AM
To: Greg Mirsky 
Cc: Robert Raszuk , BESS , 
[email protected] 

Subject: [bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM
C&P some texts from the earlier mail that got short circuited..
--
We recommend that for the purpose of networks that want to take advantage of 
Eligibility mechanism for intent verification especially for fault detection 
scheme, the e2e fault detection Timers are kept more aggressive than local link 
fault detection timers.
This is a better choice than turning off TI-LFA at each node.
For example - 1hop timers at 10 ms interval with 3 miss and s-bfd at 5ms 
interval or 10ms with 2 miss. This is just an example.
It’s a choice, if one wants e2e protection to take higher precedence over local 
protection. As I mentioned, this behavior is more preferable to transport 
centric service providers that we have talked to.
 --

I believe this addresses your comments below.

Do note that we have successfully deployed this solution in running networks 
for multiple years.


Thanks,
Himanshu


From: Greg Mirsky 
Date: Thursday, March 20, 2025 at 11:00 AM
To: Shah, Himanshu 
Cc: Joel Halpern , Robert Raszuk , 
BESS , [email protected] 

Subject: Re: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
Hi Himanshu,
I agree with Joel that inversing multi-layer OAM is a tricky and untested
proposal. Consider the usual multi-layer OAM arrangement. Link failure
detection is within 10 ms using 3.3 ms intervals. You stressed that e2e
uses more aggressive network failure detection. Would that be based on 1 ms
intervals for multi-hop BFD? AFAIK, in the usual multi-layer OAM, the e2e
network failure detection is based on 100 ms to ensure that the local
protection mechanism can converge without firing e2e recovery. However, in
the case of the inverse multi-layer OAM you presented, it appears that both
recovery mechanisms, i.e., local and e2e, will be deployed. In my opinion,
that is inefficient, confusing, and unnecessary. Am I missing something
here?

Regards,
Greg

On Thu, Mar 20, 2025 at 10:46 AM Shah, Himanshu  wrote:

> Disagree.
>
> We have discussed the motivation (for prioritizing e2e protection over
> local protection) in the draft.
>
> It serves the purpose without having to disable TI-LFA on each node – not
> a desirable option.
>
>
>
> Thanks,
>
> Himanshu
>
>
>
>
>
> *From: *Joel Halpern 
> *Date: *Thursday, March 20, 2025 at 10:41 AM
> *To: *Greg Mirsky , Robert Raszuk <
> [email protected]>
> *Cc: *Shah, Himanshu , BESS ,
> [email protected] <
> [email protected]>
> *Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
>
> It seems rather counter-intuitive to want to try to repair things
> end-to-end faster than one expects local devices to detect local failures
___
BESS mailing list -- [email protected]
To unsubscribe send an email to [email protected]


[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-19 Thread Shah, Himanshu
The simple explanation to allow co-existence of both protection schemes in
is to be able to allow network to carry other services. It is shared resources.

I believe we have text that explains that S-BFD will resume as soon as TI-LFA 
based protection kicks in. This is precisely why we need ‘eligibility’ 
construct. The primary CP is rendered “not-eligible” to carry service traffic 
even when it has become active and need intent re-verification before CP is 
used for service traffic.


Thanks,
Himanshu


From: Joel Halpern 
Date: Thursday, March 20, 2025 at 10:52 AM
To: Shah, Himanshu 
Cc: BESS , 
[email protected] 

Subject: Re: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM

At the very least, I would expect to see some explanation of why one would also 
be running TI-LFA.  And probably a discussion of how this interacts with the 
information propagation when the local detection kicks in.   I can believe both 
points can be addressed, but it is hard to understand without them.

Yours,

Joel
On 3/19/2025 11:46 PM, Shah, Himanshu wrote:
Disagree.
We have discussed the motivation (for prioritizing e2e protection over local 
protection) in the draft.
It serves the purpose without having to disable TI-LFA on each node – not a 
desirable option.

Thanks,
Himanshu


From: Joel Halpern 
Date: Thursday, March 20, 2025 at 10:41 AM
To: Greg Mirsky , Robert 
Raszuk 
Cc: Shah, Himanshu , BESS 
, 
[email protected]
 

Subject: [**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM

It seems rather counter-intuitive to want to try to repair things end-to-end 
faster than one expects local devices to detect local failures.  The implied 
information race conditions seem an invitation to trouble.

Yours,

Joel
On 3/19/2025 11:14 PM, Greg Mirsky wrote:
Hi Robert,
I wholeheartedly agree that local and e2e OAM are complementary tools in an 
operator's toolbox. Usually, a multi-layer OAM is constructed so that e2e 
provides the network with a safety net. In that manner, local repair of a link 
failure is expected to restore services before the failure is detected on the 
e2e level. As I understand it, the proposal uses a different scheme. According 
to it, e2e network detection is expected to be more aggressive than the 
link-level OAM. To me, that's an unusual arrangement.
As for performance monitoring, although some performance metrics can be 
measured spatially to compose e2e metrics, e2e performance monitoring is easier 
to deploy in many environments.

Regards,
Greg

On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk 
mailto:[email protected]>> wrote:
Hi Greg,

I am very much in support of end to end path assurance. And by assurance I mean 
not only e2e liveness but also e2e loss, delays, jitter etc ...

The main reason is that link layer failures (even if done on every link in the 
path) does not provide any information about transit via network devices. And 
those can be subject to packet drops, selective packet drops (brownouts), 
delays and jitter via box fabrics in distributed systems etc ... So to me even 
if e2e is slower then local link detection it still very much a preferred way 
to assure end to end path quality.

Sure some of them is done at the application layer, but then it is done mainly 
for statistics and reporting. Doing it at network layer opens up possibilities 
to choose different path (quite likely via different provider) when original 
path experiences some issues or service degradation which with link by link 
failure detection is invisible to the endpoints.

I think at the end of the day those two are not really competing solutions but 
complimentary. And of course end to end makes sense especially in deployments 
when you can have diverse paths end to end.

Cheers
Robert

On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky 
mailto:[email protected]>> wrote:

Hi Himanshu,

Thank you for the presentation of draft-karboubi-spring-sidlist-optimized-cs-sr 
[datatracker.ietf.org].
 If I understood your response to Ali correctly, the proposed mechanism is 
expected to use more aggressive network failure detection than the link layer. 
If that is correct, I have several questions about the multi-layer OAM:

  *   AFAIK link-layer failures are detected within 10 ms using a connectivity 
check mechanism (CCM of Y.1731 or a single-hop BFD) with a 3.3 ms interval.
  *   If the link failure is detectable within 10 ms, what detection time for 
the path, i.e., E2E connection failure detection, is

[bess] Re: [**EXTERNAL**] Re: Re: Inverse multi-layer OAM

2025-03-19 Thread Greg Mirsky
Hi Himanshu,
I agree with Joel that inversing multi-layer OAM is a tricky and untested
proposal. Consider the usual multi-layer OAM arrangement. Link failure
detection is within 10 ms using 3.3 ms intervals. You stressed that e2e
uses more aggressive network failure detection. Would that be based on 1 ms
intervals for multi-hop BFD? AFAIK, in the usual multi-layer OAM, the e2e
network failure detection is based on 100 ms to ensure that the local
protection mechanism can converge without firing e2e recovery. However, in
the case of the inverse multi-layer OAM you presented, it appears that both
recovery mechanisms, i.e., local and e2e, will be deployed. In my opinion,
that is inefficient, confusing, and unnecessary. Am I missing something
here?

Regards,
Greg

On Thu, Mar 20, 2025 at 10:46 AM Shah, Himanshu  wrote:

> Disagree.
>
> We have discussed the motivation (for prioritizing e2e protection over
> local protection) in the draft.
>
> It serves the purpose without having to disable TI-LFA on each node – not
> a desirable option.
>
>
>
> Thanks,
>
> Himanshu
>
>
>
>
>
> *From: *Joel Halpern 
> *Date: *Thursday, March 20, 2025 at 10:41 AM
> *To: *Greg Mirsky , Robert Raszuk <
> [email protected]>
> *Cc: *Shah, Himanshu , BESS ,
> [email protected] <
> [email protected]>
> *Subject: *[**EXTERNAL**] Re: [bess] Re: Inverse multi-layer OAM
>
> It seems rather counter-intuitive to want to try to repair things
> end-to-end faster than one expects local devices to detect local failures.
> The implied information race conditions seem an invitation to trouble.
>
> Yours,
>
> Joel
>
> On 3/19/2025 11:14 PM, Greg Mirsky wrote:
>
> Hi Robert,
>
> I wholeheartedly agree that local and e2e OAM are complementary tools in
> an operator's toolbox. Usually, a multi-layer OAM is constructed so that
> e2e provides the network with a safety net. In that manner, local repair of
> a link failure is expected to restore services before the failure is
> detected on the e2e level. As I understand it, the proposal uses a
> different scheme. According to it, e2e network detection is expected to be
> more aggressive than the link-level OAM. To me, that's an unusual
> arrangement.
>
> As for performance monitoring, although some performance metrics can be
> measured spatially to compose e2e metrics, e2e performance monitoring is
> easier to deploy in many environments.
>
>
>
> Regards,
>
> Greg
>
>
>
> On Wed, Mar 19, 2025 at 11:21 PM Robert Raszuk  wrote:
>
> Hi Greg,
>
>
>
> I am very much in support of end to end path assurance. And by assurance I
> mean not only e2e liveness but also e2e loss, delays, jitter etc ...
>
>
>
> The main reason is that link layer failures (even if done on every link in
> the path) does not provide any information about transit via network
> devices. And those can be subject to packet drops, selective packet drops
> (brownouts), delays and jitter via box fabrics in distributed systems etc
> ... So to me even if e2e is slower then local link detection it still very
> much a preferred way to assure end to end path quality.
>
>
>
> Sure some of them is done at the application layer, but then it is done
> mainly for statistics and reporting. Doing it at network layer opens up
> possibilities to choose different path (quite likely via different
> provider) when original path experiences some issues or service degradation
> which with link by link failure detection is invisible to the endpoints.
>
>
>
> I think at the end of the day those two are not really competing solutions
> but complimentary. And of course end to end makes sense especially in
> deployments when you can have diverse paths end to end.
>
>
>
> Cheers
>
> Robert
>
>
>
> On Wed, Mar 19, 2025 at 4:58 AM Greg Mirsky  wrote:
>
> Hi Himanshu,
>
> Thank you for the presentation of 
> draft-karboubi-spring-sidlist-optimized-cs-sr
> [datatracker.ietf.org]
> .
> If I understood your response to Ali correctly, the proposed mechanism is
> expected to use more aggressive network failure detection than the link
> layer. If that is correct, I have several questions about the multi-layer
> OAM:
>
>- AFAIK link-layer failures are detected within 10 ms using a
>connectivity check mechanism (CCM of Y.1731 or a single-hop BFD) with a 3.3
>ms interval.
>- If the link failure is detectable within 10 ms, what detection time
>for the path, i.e., E2E connection failure detection, is suggested? What
>interval between test probes will be used in that case?
>- Furthermore, even if the path converges around the link failure
>before the local protection is deployed, the link failure will be detected,
>and the protection mechanism will be deployed despite the Orchestrator