Before doing anything else: please revert to the previous DPDK version and see 
if the issue vanishes.

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Peter Mikus via 
Lists.Fd.Io
Sent: Monday, July 30, 2018 3:02 AM
To: vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] CSIT - sw_interface_set_flags admin-up link-up failing

Hello vpp-dev,

I am looking for consultation. We started to test VPP for report on all LF CSIT 
testbeds Skylakes and Haswells.
We are observing weird behavior. In each test we are using sequence to first 
bring the both interfaces (physical up) by VAT:

      sw_interface_set_flags sw_if_index <idx> admin-up         (I also tried 
sw_interface_set_flags sw_if_index idx admin-up link-up)

After setting all interfaces UP we are testing if interfaces are really UP by 
VAT (loop 30times, 1s between API call check): "sw_interface_dump".
It wasn't an issue in past but recently we start seeing that sw_interface_dump 
is reporting interfaces as link_down (admin-up).

Notes/symptoms:
-   Our sw_interface_dump check is running 30x (1s interval) in loop.
-   Link-down is random, sometimes both interfaces are link-up sometimes just 
one and sometimes both link are down.
-   It is not TB related, nor cabling related, we see it on Haswells-3node in 
like 1 out of 70 tests, Skylakes-2node 1 out of 70, but on Skylake-3node more 
than half of the tests.
-   Checking state during test reveals that interfaces are link-down (show int) 
so "sw_interface_dump" is reporting state correctly.
-   Doing CLI during test "set interface state ... up" does bring interfaces UP 
-> (but it is hard to check the timing here).
-   Affected are mostly x520 and x710, but that is most probably because of 
statistics (low coverage of other NICs like xxv710 and xl710).
-   We have seen this in master vpp as well as rc2 vpp.
-   It is not clear when this starts to happen, so bisecting would take lot of 
time.
-   This was spotted on VIRL as well also on Memif interface which bring us to 
suspicious that this is related to API not HW.

Do you have an idea what we could check further? VPP is not crashing so no core 
dump are available. This issue is not 100% replicable which makes it hard to 
debug.

Is there a way to get more verbose error from the api call mentioned to reveal 
more information?

Thank you.

Peter Mikus
Engineer - Software
Cisco Systems Limited
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9972): https://lists.fd.io/g/vpp-dev/message/9972
Mute This Topic: https://lists.fd.io/mt/23857615/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-
  • ... Peter Mikus via Lists.Fd.Io
    • ... Dave Barach via Lists.Fd.Io
    • ... Ray Kinsella
      • ... Peter Mikus via Lists.Fd.Io
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
          • ... Pei, Yulong
            • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
          • ... Peter Mikus via Lists.Fd.Io

Reply via email to