The 2 minute timeout is the result of my recent change. The framework
now forks and runs the test in a child process, and if the child process
fails to send a keep-alive (sent when a test case starts), then it's
killed. Otherwise there'd be no way to recover from stuck mutex or
deadlock..

Are you running the extended tests or the stock verify?

Quoting Ed Kern (ejk) (2017-08-10 00:08:19)
>    klement,
>    ok…ill think about how to do that without too much trouble in its current
>    state..
>    in the meantime…blowing out the cpu and memory a bit changed the error……
> 
>  21:49:42 create 1k of p2p subifs                                             
>      OK
>  21:49:42 
> ==============================================================================
>  21:51:52 21:53:13,610 Timeout while waiting for child test runner process 
> (last test running was `drop rx packet not matching p2p subinterface' in 
> `/tmp/vpp-unittest-P2PEthernetIPV6-GDHSDK')!
>  21:51:52 Killing possible remaining process IDs:  19954 19962 19964
> 
>  21:45:05 PPPoE Test Case
>  21:45:05 ===================================21:48:13,778 Timeout while 
> waiting for child test runner process (last test running was `drop rx packet 
> not matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-I0REOQ')!
>  21:47:45 Killing possible remaining process IDs:  20017 20025 20027
> 
>  20:48:46 PPPoE Test Case
>  20:48:46 ===================================20:51:34,082 Timeout while 
> waiting for child test runner process (last test running was `drop rx packet 
> not matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-tQ5sP0')!
>  20:51:05 Killing possible remaining process IDs:  19919 19927 19929
> 
>    anything new/different/exciting in here?
>    Also the memory/cpu expansion (by roughly a third) these failures happen
>    in the order of 2/3 minutes as opposed to a 90 leading to timeout failure.
>    Since the verifies are still happily chugging along I ASSuME that this
>    drop packet check isn’t happening in that suite?
>    Ed
> 
>      On Aug 9, 2017, at 1:04 PM, Klement Sekera -X (ksekera - PANTHEON
>      TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com> wrote:
>      Ed,
> 
>      it'd help if you could collect log.txt from a failed run so we could
>      peek under the hood... please see my other email in this thread...
> 
>      Thanks,
>      Klement
> 
>      Quoting Ed Kern (ejk) (2017-08-09 20:48:46)
> 
>          this is not you…or this patch…
>          the make test-debug has had a 90+% failure rate (read not 100%) for
>        at
>          least the last 100 builds
>          (far back as my current logs go but will probably blow that out a
>        bit now)
>          you hit the one that is seen most often… on that create 1k of p2p
>        subifs 
>          the other much less frequent is 
> 
>        13:40:24 CGNAT TCP session close initiated from outside network
>                          OK
>        13:40:24 =================================================Build timed
>        out (after 120 minutes). Marking the build as failed.
> 
>          so currently I’m allocating 10000 MHz in cpu and 8G in memory for
>        verify
>          and also for test-debug runs…
>          Im not obviously getting (as you can see) errors about it running
>        out of
>          memory but I wonder if thats possibly whats happening..
>          its easy enough to blow my allocations out a bit and see if that
>        makes a
>          difference..
>          If anyone has other ideas to try and happy to give them a shot..
>          appreciate the heads up
>          Ed
> 
>            On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach)
>            <[1][2]dbar...@cisco.com> wrote:
>            Please see [2][3]https://gerrit.fd.io/r/#/c/7927, and 
>             
>            
> [3][4]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
>             
>            The patch in question is highly unlikely to cause this failure...
>             
>             
>            14:37:11
>            
> ==============================================================================
>            14:37:11 P2P Ethernet tests
>            14:37:11
>            
> ==============================================================================
>            14:37:11 delete/create p2p
>            subif                                                  OK
>            14:37:11 create 100k of p2p
>            subifs                                                SKIP
>            14:37:11 create 1k of p2p
>            subifs                                                  Build
>        timed out
>            (after 120 minutes). Marking the build as failed.
>            16:24:49 $ ssh-agent -k
>            16:24:54 unset SSH_AUTH_SOCK;
>            16:24:54 unset SSH_AGENT_PID;
>            16:24:54 echo Agent pid 84 killed;
>            16:25:07 [ssh-agent] Stopped.
>            16:25:07 Build was aborted
>            16:25:09 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP]
>        done
>            16:25:11 Finished: FAILURE
>             
>            Thanks… Dave
> 
>        References
> 
>          Visible links
>          1. [5]mailto:dbar...@cisco.com
>          2. [6]https://gerrit.fd.io/r/#/c/7927
>          3. 
> [7]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
> 
> References
> 
>    Visible links
>    1. mailto:ksek...@cisco.com
>    2. mailto:dbar...@cisco.com
>    3. https://gerrit.fd.io/r/#/c/7927
>    4. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
>    5. mailto:dbar...@cisco.com
>    6. https://gerrit.fd.io/r/#/c/7927
>    7. 
> http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev
  • [vpp-dev] Spu... Dave Barach (dbarach)
    • Re: [vpp... Ed Kern (ejk)
      • Re: ... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)
        • ... Ed Kern (ejk)
          • ... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)
            • ... Ed Kern (ejk)
              • ... Neale Ranns (nranns)
    • Re: [vpp... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)

Reply via email to