[vpp-dev] Supporting vpp releases besides N and N-1

2018-01-26 Thread Ed Kern (ejk)

Id like to clean up jenkins ‘stream’ in the vpp section removing unsupported 
releases.
Note: Its more than just cosmetically ugly problem.

I could go into more detail but it comes down to “If they are not supported, 
why have them included in the build infra?”

This would mean the removal from the stream of:
1707
1704
1701
1609
1606

Anyone want to stand up and represent why these should remain?

Ed
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Build triggering on a simple commit message update?

2018-01-25 Thread Ed Kern (ejk)


On Jan 25, 2018, at 12:59 AM, Marco Varlese 
mailto:mvarl...@suse.de>> wrote:

Hi Ed,

On Thu, 2018-01-25 at 00:28 +0000, Ed Kern (ejk) wrote:
hey marco,

What your looking for (imo) is not a gerrit change. Its a jenkins side change

 https://gerrit.fd.io/r/10237
Thank you for showing the path; I am not very familiar with Jenkins, etc. so 
this example is very much appreciated!


Note: I also included not running on trivial rebase which may freak some people 
out.
Yeah, possibly that should be avoided...


Well if the goal is to reduce unnecessary cycles ill throw in two little bits:
1. Just because you add this to the verify set of jobs doesnt mean you also 
have to remove from the merge set
(which also runs a make test but not a virl run)
2. trivial rebase checks like this are as much religion as anything.  So it 
doesnt hurt to float it as well and see
if anyone explodes



Ed

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Build triggering on a simple commit message update?

2018-01-24 Thread Ed Kern (ejk)
hey marco,

What your looking for (imo) is not a gerrit change. Its a jenkins side change

 https://gerrit.fd.io/r/10237

Note: I also included not running on trivial rebase which may freak some people 
out.

You cant just gerrit-trigger-patch-submitted because the way they roll 
everything up and up and up it
would change behavior of all projects (outside of vpp) using that global 
trigger.

I have no plans on advancing this patch..only pointing out one way to do it.

Ed


On Jan 24, 2018, at 4:13 AM, Marco Varlese 
mailto:mvarl...@suse.de>> wrote:

All,

I noticed that when a patch is updated solely for the commit message (and pushed
to gerrit), gerrit triggers a complete new build.

I wonder if it is possible (on the gerrit backend) to catch that a patch has
been submitted with only the commit-message being updated hence skipping the
full verification process?

I am thinking about it since we could save cycles (many) on the build-machines
between the many processes building the code on various distros.

Thoughts?


Cheers,

--
Marco V

SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] [csit-dev] gerrit 9904 VIRL verification is failing

2018-01-02 Thread Ed Kern (ejk)
peter was close…the virl salt server we were pointed at did have a hard crash..

ive repointed all 3 (this also should correct the issue therbert was seeing on 
virl3)
at production list of masters and verified all three as getting good license 
response now.

Let me know if you see issues on your rechecks and ill give it a harder reset 
swat.

Ed

On Jan 2, 2018, at 10:04 AM, Marek Gradzki -X (mgradzki - PANTHEON TECHNOLOGIES 
at Cisco) mailto:mgrad...@cisco.com>> wrote:

+ hc2vpp list (hc2vpp csit jobs are failing with the same error message)

Regards,
Marek

From: csit-dev-boun...@lists.fd.io<mailto:csit-dev-boun...@lists.fd.io> 
[mailto:csit-dev-boun...@lists.fd.io] On Behalf Of Peter Mikus -X (pmikus - 
PANTHEON TECHNOLOGIES at Cisco)
Sent: 2 stycznia 2018 06:37
To: Ed Kern (ejk) mailto:e...@cisco.com>>
Cc: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [csit-dev] [vpp-dev] gerrit 9904 VIRL verification is failing

Hello Ed,

Can you please take a look? Looks like VIRLs are trying to connect to Cisco 
master machines but communication failing (due to company shutdown?)?
Can you please closer describe why this connection is required?

Thanks.

Peter Mikus
Engineer – Software
Cisco Systems Limited

From: csit-dev-boun...@lists.fd.io<mailto:csit-dev-boun...@lists.fd.io> 
[mailto:csit-dev-boun...@lists.fd.io] On Behalf Of Thomas F Herbert
Sent: Wednesday, December 27, 2017 3:49 PM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; 
csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>
Subject: Re: [csit-dev] [vpp-dev] gerrit 9904 VIRL verification is failing


On 12/27/2017 09:02 AM, Neale Ranns (nranns) wrote:

Hi Nitin,



Hit the ‘reply’ button and post a review comment of:

  recheck



that will poke Jenkins to redo the verification.



/neale



-Original Message-

From: <mailto:vpp-dev-boun...@lists.fd.io> on 
behalf of "Saxena, Nitin" 
<mailto:nitin.sax...@cavium.com>

Date: Wednesday, 27 December 2017 at 14:38

To: "Dave Barach (dbarach)" <mailto:dbar...@cisco.com>, 
"vpp-dev@lists.fd.io"<mailto:vpp-dev@lists.fd.io> 
<mailto:vpp-dev@lists.fd.io>

Subject: [vpp-dev] gerrit 9904 VIRL verification is failing



Hi,



I sent a patch (https://gerrit.fd.io/r/#/c/9904/) for review in which 
"vpp-csit-verify-virl-master" job is failing.



Console logs 
(https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/8798/console.log.gz)
 shows following error.





call_home\nFlmClientException: Cisco contact was not established. This may 
be temporary.\nPlease make sure the VIRL server is connected to the Internet 
and capable of reaching the configured Cisco master.\nAlso make sure that the 
minion key provided to you matches your minion ID and domain, and remains 
valid.\nCurrent status is: Last successful contact was more than 7 days 
ago.\nLast call home check result was: Call has timed out; failed to connect or 
minion key not accepted.\n"



I have also seen what looks like the same thing on VIRL3 which is currently not 
in production. I reported it yesterday to the CSIT mailing list.

https://jenkins.fd.io/job/csit-vpp-functional-master-ubuntu1604-virl/3290/



}



+ VIRL_SID[${index}]=

+ retval=1

+ '[' 1 -ne 0 ']'

+ echo 'VIRL simulation start failed on 10.30.51.29'

VIRL simulation start failed on 10.30.51.29

===



Seems like a temporary problem. What is the gerrit command such that 
Jenkins again start doing verification.



Thanks,

Nitin

___

vpp-dev mailing list

vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

https://lists.fd.io/mailman/listinfo/vpp-dev





___

vpp-dev mailing list

vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>

https://lists.fd.io/mailman/listinfo/vpp-dev


--
Thomas F Herbert
NFV and Fast Data Planes
Networking Group Office of the CTO
Red Hat

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Jenkins jobs not starting from a "clean" state?

2017-11-29 Thread Ed Kern (ejk)


On Nov 29, 2017, at 3:09 AM, Marco Varlese 
mailto:mvarl...@suse.de>> wrote:

Hi Ed,

On Wed, 2017-11-29 at 03:24 +0000, Ed Kern (ejk) wrote:

All the jobs that ive looked at vpp verify’s are still set to ‘single use 
slave’  So there is no re-use..

if the job is run by jenkins and is either ubuntu or centos it will attempt to 
pull and install
ubuntu:
vpp-dpdk-dev
vpp-dpdk-dkms

I suppose that's exactly my point of "not-clean-state”.

Just to be clear…

Are you unhappy that it is doing those package installs before running the make 
verify (or build.sh in the case of opensuse)
Or just because you think those package installs are breaking the build.


It will attempt to install but since it finds those packages already installed 
then obviously it doesn't keep going.

well no…well I certainly should have kept going (you trimmed the log you sent 
to not include the failure or a link to a specific
job.

I was trying to ask and understand how is it possible that on a "fresh booted" 
VM there're already packages installed.

Well when either the openstack or container image are ‘booted’ neither dpdk or 
vpp are installed.
32k other prereqs and base packages but (tmk with openstack clone) not dpdk.



I suppose that what the message "Up-to-date DPDK package already installed" 
points out, am I correct?

again i want to be careful we are on the same page..

DPDK is not installed in the base template image.

DPDK IS now installed as a package (to ubuntu and centos):
AFTER check style
BEFORE make verify or build.sh
as part of the jenkins build scripts.


Similarly, the DPDK tarball is already downloaded when the 'curl' command runs 
since the source tarball can already be found in /dpdk


So from the base template we go from no directory
template base image is spun up
git clone/fetch/etc is run
check style is run
there should be no dpdk.xxx.tar.xz anywhere at this point
make verify or build.sh (im only speaking about verify builds )  is run
as part of that ( i myself trip the download by doing make config in the dpdk 
directory, never bothered to track down how it is tripped ‘normally’)
the tar.xz is pulled.

if you actually see an order different then this id be curious to see it.

thanks,

Ed


centos:
vpp-dpdk-devel

if running the build of opensuse or any build outside of jenkins it will 
attempt to build it from scratch..
Right, I believe that should always be the approach tho...

(thats one of the issues you were seeing the other day)

https://gerrit.fd.io/r/#/c/9606/

both that dpdk-17.08 rev’d up and also the fact that they added stable to the 
directory name..
and to make matters worse (only because their mirrors are hosed)  the makefile 
was pointed at fast.dpdk.org<http://fast.dpdk.org/>
which points to three servers that return at least two different cksums…(So a 
total of 3 different a. pre 11/27 b. two post 11/27)
in my patch above i just changed it to static.dpdk.org<http://static.dpdk.org/> 
which is slower but consistent.
Great... we could have modified my patch since yours looks pretty similar... 
anyway, I abonded mine with the open yours gets merged sooner.


Note: ignore the cpoc failures…im still bumping into oom condition waiting on 
vanessa to come back and bump me up
https://rt.linuxfoundation.org/Ticket/Display.html?id=48884




On Nov 28, 2017, at 7:58 AM, Marco Varlese 
mailto:mvarl...@suse.de>> wrote:
To add on the reported issue below, similarly, (I think that) the DPDK tarball
can be also (always) found in the workspace hence  never "freshly" downloaded
either.


there is no tarball as part of vpp so its always getting freshly pulled…from 
one location or another..

Snipped of the logs below:
---
11:00:48  Finding source for dpdk 
11:00:48  Makefile fragment found in /w/workspace/vpp-verify-master-
ubuntu1604/build-data/packages/dpdk.mk 
11:00:48  Source found in /w/workspace/vpp-verify-master-ubuntu1604/dpdk

---

On Tue, 2017-11-28 at 15:49 +0100, Marco Varlese wrote:
All,

While looking into an issue which Dave raised yesterday, I bumped into
something
which does not look right to me.

The Jenkins jobs executing the VPP builds do not start from a clean-state. I
would expect - for instance - not to find the DPDK package already installed
on
the target host which builds a GERRIT patch.

well…it isn’t…its just installed post check style but before any of the other 
primary build script.


I would basically like to see the
VM boot-strapping (as per the package installed by the ci-management scripts)
and nothing else.
Anything else would be installed because of the scripts we execute to
build/install VPP (as per the Jenkins jobs). Would you agree/disagree?


So this is how its already happening with the open stack slaves unless 
something has changed.

But i will say that I disagree and its not how im doing the container base 
images and thats around
make install-dep
In that e

Re: [vpp-dev] Jenkins jobs not starting from a "clean" state?

2017-11-28 Thread Ed Kern (ejk)

All the jobs that ive looked at vpp verify’s are still set to ‘single use 
slave’  So there is no re-use..

if the job is run by jenkins and is either ubuntu or centos it will attempt to 
pull and install
ubuntu:
vpp-dpdk-dev
vpp-dpdk-dkms

centos:
vpp-dpdk-devel

if running the build of opensuse or any build outside of jenkins it will 
attempt to build it from scratch..
(thats one of the issues you were seeing the other day)

https://gerrit.fd.io/r/#/c/9606/

both that dpdk-17.08 rev’d up and also the fact that they added stable to the 
directory name..
and to make matters worse (only because their mirrors are hosed)  the makefile 
was pointed at fast.dpdk.org
which points to three servers that return at least two different cksums…(So a 
total of 3 different a. pre 11/27 b. two post 11/27)
in my patch above i just changed it to static.dpdk.org 
which is slower but consistent.

Note: ignore the cpoc failures…im still bumping into oom condition waiting on 
vanessa to come back and bump me up
https://rt.linuxfoundation.org/Ticket/Display.html?id=48884




On Nov 28, 2017, at 7:58 AM, Marco Varlese 
mailto:mvarl...@suse.de>> wrote:

To add on the reported issue below, similarly, (I think that) the DPDK tarball
can be also (always) found in the workspace hence  never "freshly" downloaded
either.


there is no tarball as part of vpp so its always getting freshly pulled…from 
one location or another..

Snipped of the logs below:
---
11:00:48  Finding source for dpdk 
11:00:48  Makefile fragment found in /w/workspace/vpp-verify-master-
ubuntu1604/build-data/packages/dpdk.mk 
11:00:48  Source found in /w/workspace/vpp-verify-master-ubuntu1604/dpdk

---

On Tue, 2017-11-28 at 15:49 +0100, Marco Varlese wrote:
All,

While looking into an issue which Dave raised yesterday, I bumped into
something
which does not look right to me.

The Jenkins jobs executing the VPP builds do not start from a clean-state. I
would expect - for instance - not to find the DPDK package already installed
on
the target host which builds a GERRIT patch.

well…it isn’t…its just installed post check style but before any of the other 
primary build script.


I would basically like to see the
VM boot-strapping (as per the package installed by the ci-management scripts)
and nothing else.
Anything else would be installed because of the scripts we execute to
build/install VPP (as per the Jenkins jobs). Would you agree/disagree?


So this is how its already happening with the open stack slaves unless 
something has changed.

But i will say that I disagree and its not how im doing the container base 
images and thats around
make install-dep
In that every three days ill ‘bake into’ the basic build container a run of 
make install-dep.
Even though it still does run make install-dep again at build time the actual 
number of packages
that it has to pull in (and number of external services that have to be 
available) is trimmed down
to as small a number as possible.

This is still a problem in the ‘test’ area that likes to pip install python 
dependencies but are not linked
to main install-dep  but i wont rant about that here...

In the Jenkins logs I can see the below all the time:

11:00:48 make -C dpdk install-deb
11:00:48 make[1]: Entering directory '/w/workspace/vpp-verify-master-
ubuntu1604/dpdk'
11:00:48 Building IPSec-MB 0.46 library
11:00:48 ==
11:00:48  Up-to-date DPDK package already installed
11:00:48 ==
11:00:48 make[1]: Leaving directory '/w/workspace/vpp-verify-master-
ubuntu1604/dpdk'
11:00:48


Also, I would like to ask if by any chance the tarball which VPP build-system
requires (e.g. DPDK / IPSecMB / etc.) are somehow cached somewhere (either on
the Jenkins-slave or Proxied) since I think the DPDK tarball (currently
pulled)
may not be coming from the real location…


Wont speak for the openstack build side but I actually think it may be worth 
just doing something
similar to the install-dep above with the dpdk… installing the binaries from 
nexus every few days
so while it will check the nexus server for an update it normally wouldn’t have 
to actually pull or install.
Will give that some thought…

and from previous thread

Since the MD5 checksum on the DPDK tarball fails; to answer your
question, no, it has never happened to me to see this specific issue
before.

I don't think there's anything specific to the openSUSE setup and/or
scripts being executed. I'd rather feel is - as I said earlier -
something to do with an hiccup on the infrastructure side. The fact
that a 'recheck' made it passing I suppose it confirms my current
theory.


well opensuse still has issues ‘passing’ or not…namely no artifacts are ever 
generated/pushed


01:43:40 [ssh-agent] Stopped.
01:43:40 Archiving artifacts
01:43:40 WARN: No artifacts found that match the file pattern

Re: [vpp-dev] Test job maybe not working correctly

2017-10-21 Thread Ed Kern (ejk)
this was just flat out a bug…

it should have been {os}-s

I have mail into helpdesk (because its part of jenkins master config) to alter 
the flavors from ubuntu to ubuntu1604 and centos to centos7
so that the os env can be properly called.

After thats done ill get this patched back in and we should be rolling..

thanks for the help/catch matt/dave

Ed


On Oct 16, 2017, at 10:24 AM, Dave Wallace 
mailto:dwallac...@gmail.com>> wrote:

Matt,

Thanks for pointing this out.  There appears to be a bug in 
ci-management/jjb/vpp/vpp.yaml, where the node type is specified as 'ubuntu-s' 
instead of an os node (e.g. '{os}-basebuild-8c-32g').

- %< -
- job-template:
name: 'vpp-test-poc-verify-{stream}-{os}'

project-type: freestyle
node: 'ubuntu-s'
- %< -

@Ed, is this a bug or intentional (for testing purposes only)?

Thanks,
-daw-

On 10/16/2017 11:33 AM, Matthew Smith wrote:

Hi,

After submitting a rebased patch set this morning, I went and looked at the 
console output of one of the jenkins jobs that kicked off automatically. The 
job was named vpp-test-poc-verify-master-centos7. It caught my eye that the 
name of the job implies that it should be running on CentOS 7 but the console 
output showed that it was building deb packages. Here’s the full console log 
for that build:

https://jenkins.fd.io/job/vpp-test-poc-verify-master-centos7/255/consoleFull

Is something broken there? This seemed a little suspect to me, but I don’t have 
a full understanding of what each jenkins job is supposed to be doing or how 
they’re supposed to run.

Thanks,
-Matt Smith




___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev


___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Test failing

2017-10-06 Thread Ed Kern (ejk)
could you throw me some example jobs?

thanks,

Ed


> On Oct 6, 2017, at 8:54 AM, Marco Varlese  wrote:
> 
> Hi all,
> 
> I have seen this many times these days...
> I wonder if it's a infra hiccup or something is really broken?
> 
> The "recheck" is becoming the norm to get a clean +1 Verified... :(
> 
> 14:32:52 14:00:32 TC04: VPP doesn't send DHCPv4 REQUEST after OFFER with wrong
> XID :: Configure DHCPv4 client on interface to TG. If server   | FAIL |
> 14:32:52 14:00:32 Expected error 'DHCP REQUEST Rx timeout' but got 'Traffic
> script execution failed'.
> 
> 
> Cheers,
> Marco
> ___
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev


Re: [vpp-dev] Spurious make test failure (container POC)

2017-08-10 Thread Ed Kern (ejk)
these are NOT with verify…

specifically with test-debug that I added as a separate run at someones 
request..(sorry can’t remember who at this moment)

Ed


On Aug 10, 2017, at 1:07 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES 
at Cisco) mailto:ksek...@cisco.com>> wrote:

The 2 minute timeout is the result of my recent change. The framework
now forks and runs the test in a child process, and if the child process
fails to send a keep-alive (sent when a test case starts), then it's
killed. Otherwise there'd be no way to recover from stuck mutex or
deadlock..

Are you running the extended tests or the stock verify?

Quoting Ed Kern (ejk) (2017-08-10 00:08:19)
  klement,
  ok…ill think about how to do that without too much trouble in its current
  state..
  in the meantime…blowing out the cpu and memory a bit changed the error……

21:49:42 create 1k of p2p subifs
  OK
21:49:42 
==
21:51:52 21:53:13,610 Timeout while waiting for child test runner process (last 
test running was `drop rx packet not matching p2p subinterface' in 
`/tmp/vpp-unittest-P2PEthernetIPV6-GDHSDK')!
21:51:52 Killing possible remaining process IDs:  19954 19962 19964

21:45:05 PPPoE Test Case
21:45:05 ===21:48:13,778 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-I0REOQ')!
21:47:45 Killing possible remaining process IDs:  20017 20025 20027

20:48:46 PPPoE Test Case
20:48:46 ===20:51:34,082 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-tQ5sP0')!
20:51:05 Killing possible remaining process IDs:  19919 19927 19929

  anything new/different/exciting in here?
  Also the memory/cpu expansion (by roughly a third) these failures happen
  in the order of 2/3 minutes as opposed to a 90 leading to timeout failure.
  Since the verifies are still happily chugging along I ASSuME that this
  drop packet check isn’t happening in that suite?
  Ed

On Aug 9, 2017, at 1:04 PM, Klement Sekera -X (ksekera - PANTHEON
TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com<mailto:ksek...@cisco.com>> 
wrote:
Ed,

it'd help if you could collect log.txt from a failed run so we could
peek under the hood... please see my other email in this thread...

Thanks,
Klement

Quoting Ed Kern (ejk) (2017-08-09 20:48:46)

this is not you…or this patch…
the make test-debug has had a 90+% failure rate (read not 100%) for
  at
least the last 100 builds
(far back as my current logs go but will probably blow that out a
  bit now)
you hit the one that is seen most often… on that create 1k of p2p
  subifs
the other much less frequent is

  13:40:24 CGNAT TCP session close initiated from outside network
OK
  13:40:24 =Build timed
  out (after 120 minutes). Marking the build as failed.

so currently I’m allocating 1 MHz in cpu and 8G in memory for
  verify
and also for test-debug runs…
Im not obviously getting (as you can see) errors about it running
  out of
memory but I wonder if thats possibly whats happening..
its easy enough to blow my allocations out a bit and see if that
  makes a
difference..
If anyone has other ideas to try and happy to give them a shot..
appreciate the heads up
Ed

  On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach)
  <[1][2]dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
  Please see [2][3]https://gerrit.fd.io/r/#/c/7927, and

  
[3][4]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

  The patch in question is highly unlikely to cause this failure...


  14:37:11
  
==
  14:37:11 P2P Ethernet tests
  14:37:11
  
==
  14:37:11 delete/create p2p
  subif  OK
  14:37:11 create 100k of p2p
  subifsSKIP
  14:37:11 create 1k of p2p
  subifs  Build
  timed out
  (after 120 minutes). Marking the build as failed.
  16:24:49 $ ssh-agent -k
  16:24:54 unset SSH_AUTH_SOCK;
  16:24:54 unset SSH_AGENT_PID;
  16:24:54 echo Agent pid 

Re: [vpp-dev] Spurious make test failure (container POC)

2017-08-09 Thread Ed Kern (ejk)

klement,

ok…ill think about how to do that without too much trouble in its current 
state..

in the meantime…blowing out the cpu and memory a bit changed the error……


21:49:42 create 1k of p2p subifs
  OK
21:49:42 
==
21:51:52 21:53:13,610 Timeout while waiting for child test runner process (last 
test running was `drop rx packet not matching p2p subinterface' in 
`/tmp/vpp-unittest-P2PEthernetIPV6-GDHSDK')!
21:51:52 Killing possible remaining process IDs:  19954 19962 19964



21:45:05 PPPoE Test Case
21:45:05 ===21:48:13,778 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-I0REOQ')!
21:47:45 Killing possible remaining process IDs:  20017 20025 20027



20:48:46 PPPoE Test Case
20:48:46 ===20:51:34,082 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-tQ5sP0')!
20:51:05 Killing possible remaining process IDs:  19919 19927 19929


anything new/different/exciting in here?

Also the memory/cpu expansion (by roughly a third) these failures happen in the 
order of 2/3 minutes as opposed to a 90 leading to timeout failure.


Since the verifies are still happily chugging along I ASSuME that this drop 
packet check isn’t happening in that suite?

Ed





On Aug 9, 2017, at 1:04 PM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES 
at Cisco) mailto:ksek...@cisco.com>> wrote:

Ed,

it'd help if you could collect log.txt from a failed run so we could
peek under the hood... please see my other email in this thread...

Thanks,
Klement

Quoting Ed Kern (ejk) (2017-08-09 20:48:46)
  this is not you…or this patch…
  the make test-debug has had a 90+% failure rate (read not 100%) for at
  least the last 100 builds
  (far back as my current logs go but will probably blow that out a bit now)
  you hit the one that is seen most often… on that create 1k of p2p subifs
  the other much less frequent is

13:40:24 CGNAT TCP session close initiated from outside network 
  OK
13:40:24 =Build timed out 
(after 120 minutes). Marking the build as failed.

  so currently I’m allocating 1 MHz in cpu and 8G in memory for verify
  and also for test-debug runs…
  Im not obviously getting (as you can see) errors about it running out of
  memory but I wonder if thats possibly whats happening..
  its easy enough to blow my allocations out a bit and see if that makes a
  difference..
  If anyone has other ideas to try and happy to give them a shot..
  appreciate the heads up
  Ed

On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach)
<[1]dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
Please see [2]https://gerrit.fd.io/r/#/c/7927, and


[3]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

The patch in question is highly unlikely to cause this failure...


14:37:11

==
14:37:11 P2P Ethernet tests
14:37:11

==
14:37:11 delete/create p2p
subif  OK
14:37:11 create 100k of p2p
subifsSKIP
14:37:11 create 1k of p2p
subifs  Build timed out
(after 120 minutes). Marking the build as failed.
16:24:49 $ ssh-agent -k
16:24:54 unset SSH_AUTH_SOCK;
16:24:54 unset SSH_AGENT_PID;
16:24:54 echo Agent pid 84 killed;
16:25:07 [ssh-agent] Stopped.
16:25:07 Build was aborted
16:25:09 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
16:25:11 Finished: FAILURE

Thanks… Dave

References

  Visible links
  1. mailto:dbar...@cisco.com
  2. https://gerrit.fd.io/r/#/c/7927
  3. 
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Spurious make test failure (container POC)

2017-08-09 Thread Ed Kern (ejk)
this is not you…or this patch…

the make test-debug has had a 90+% failure rate (read not 100%) for at least 
the last 100 builds
(far back as my current logs go but will probably blow that out a bit now)

you hit the one that is seen most often… on that create 1k of p2p subifs

the other much less frequent is

13:40:24 CGNAT TCP session close initiated from outside network 
  OK
13:40:24 =Build timed out 
(after 120 minutes). Marking the build as failed.

so currently I’m allocating 1 MHz in cpu and 8G in memory for verify and 
also for test-debug runs…

Im not obviously getting (as you can see) errors about it running out of memory 
but I wonder if thats possibly whats happening..

its easy enough to blow my allocations out a bit and see if that makes a 
difference..
If anyone has other ideas to try and happy to give them a shot..

appreciate the heads up

Ed




On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach) 
mailto:dbar...@cisco.com>> wrote:

Please see https://gerrit.fd.io/r/#/c/7927, and

http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

The patch in question is highly unlikely to cause this failure...


14:37:11 
==
14:37:11 P2P Ethernet tests
14:37:11 
==
14:37:11 delete/create p2p subif
  OK
14:37:11 create 100k of p2p subifs  
  SKIP
14:37:11 create 1k of p2p subifs
  Build timed out (after 120 minutes). Marking the build as failed.
16:24:49 $ ssh-agent -k
16:24:54 unset SSH_AUTH_SOCK;
16:24:54 unset SSH_AGENT_PID;
16:24:54 echo Agent pid 84 killed;
16:25:07 [ssh-agent] Stopped.
16:25:07 Build was aborted
16:25:09 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
16:25:11 Finished: FAILURE

Thanks… Dave

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] [csit-dev] [FD.io Helpdesk #41921] connection interruptiones between jenkins executor and VIRL servers

2017-06-28 Thread Ed Kern (ejk) via RT

Hey maciek,

We don’t need and would prefer NOT to remove that .1 addresses on those virtual 
routers.

Just the addition of the static is whats needed right now..

thanks,

Ed



> On Jun 28, 2017, at 5:07 AM, Maciek Konstantynowicz (mkonstan) 
>  wrote:
> 
> Anton and Team,
> 
> The continued interruptions of IP connectivity to/from VIRL server 
> simulations on the management subnet have been impacting both CSIT and VPP 
> project operations. We decided to temporarily remove VPP VIRL based verify 
> jobs, job/vpp-csit-verify-virl-master/, from both per vpp patch auto-trigger 
> and the voting rights - Ed W. was kind to prepared required ci-mgmt patches, 
> but they are not merged yet (https://gerrit.fd.io/r/#/c/7319/, 
> https://gerrit.fd.io/r/#/c/7320/).
> 
> Before we proceed with above step, we want to do one more set of network 
> infra focused tests per yesterday exchange on #fdio-infra irc with 
> Vanessa/valderrv, Ed Kern/snergster and Mohammed/mnaser. Here quick recap:
> 
> Connectivity is affected between following the mgmt subnets added few weeks 
> back as part of  [FD.io Helpdesk #40733]:
>10.30.52.0/24
>10.30.53.0/24
>10.30.54.0/24
> 
> The high packet drop rate (50..70%) problem seem to occur sporadically, but 
> if packets are passing thru the default gateway router that has address .1 in 
> each of above subnets. This affects all connectivity to jenkins slaves, but 
> also between tb4 virl hosts. The problem is never observed if packets are 
> sent directly between the hosts, it works fine.
> 
> Test proposal:
> 
> Configure the router that acts as default gateway for these subnet with the 
> following static routes:
>10.30.52.0/24 at 10.30.51.28 // tb-4virl1 mgmt addr
>10.30.53.0/24 at 10.30.51.29 // tb-4virl1 mgmt addr
>10.30.54.0/24 at 10.30.51.30 // tb-4virl1 mgmt addr
>Meaning all packets to above subnets will be routed through the main 
> management IP address on respective tb4-virl host, per wiki [1].
>This will remove default gateway router from the problem domain under 
> investigation.
> 

this is all well and good


> Remove following IP addresses
> from the default gateway router:
>10.30.52.1/24
>10.30.53.1/24
>10.30.54.1/24
> 

Not sure how this got in there….we want to keep these right where they are 
unless there is some reason to remove them


> Continue to advertise below routes into WAN to ensure reachability from 
> Jenkins slave and LF FD.io infra:
>10.30.52.0/24
>10.30.53.0/24
>10.30.54.0/24
> 


this is also correct...

> Could you pls advise when can these be conducted?
> 
> -Maciek
> 
> [1] 
> https://wiki.fd.io/view/CSIT/CSIT_LF_testbed#Management_VLAN_IP_Addresses_allocation
> 
>> On 21 Jun 2017, at 16:00, Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at 
>> Cisco)  wrote:
>> 
>> Hello Anton,
>> 
>> We did some checks and here are results:
>> 
>> 1. ping simulated node from the host itself - ping is OK
>> 
>> 2. ping simulated node from other host (i.e. node simulated on virl2, 
>> executing ping command on virl3) - discovered packet loss (see e-mail from 
>> Peter below)
>>- even for successful ping packet transition we can see the wide range of 
>> time - from cca 0,6ms to 45ms...
>> 
>> We are still investigating VIRL settings but do you have some hints for us?
>> 
>> Thanks,
>> Jan
>> 
>> -Original Message-
>> From: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
>> Sent: Wednesday, June 21, 2017 15:20
>> To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) 
>> 
>> Subject: RE: [vpp-dev] [FD.io Helpdesk #41921] connection interruptiones 
>> between jenkins executor and VIRL servers
>> 
>> virl@t4-virl3:/home/testuser$ ping 10.30.51.127
>> PING 10.30.51.127 (10.30.51.127) 56(84) bytes of data.
>> 64 bytes from 10.30.51.127: icmp_seq=54 ttl=64 time=1.86 ms
>> ...
>> ^C
>> --- 10.30.51.127 ping statistics ---
>> 1202 packets transmitted, 193 received, 83% packet loss, time 1202345ms
>> rtt min/avg/max/mdev = 0.369/0.736/3.271/0.509 ms
>> virl@t4-virl3:/home/testuser$ ping 10.30.51.29
>> 
>> Peter Mikus
>> Engineer - Software
>> Cisco Systems Limited
>> 
>> -Original Message-
>> From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
>> Behalf Of Jan Gelety -X via RT
>> Sent: Tuesday, June 20, 2017 5:20 PM
>> Cc: csit-...@lists.fd.io; vpp-dev@lists.fd.io
>> Subject: Re: [vpp-dev] [FD.io Helpdesk #41921] connection interruptiones 
>> between jenkins executor and VIRL servers
>> 
>> Hello Anton,
>> 
>> Thanks for the fast response. We will check local firewall setting as you 
>> proposed.
>> 
>> Regards,
>> Jan
>> 
>> -Original Message-
>> From: Anton Baranov via RT [mailto:fdio-helpd...@rt.linuxfoundation.org]
>> Sent: Tuesday, June 20, 2017 17:13
>> To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) 
>> 
>> Cc: csit-...@lists.fd.io; vpp-dev@lists.fd.io
>> Subject: [FD.io Helpdesk #41921] connection interruptiones between jenkins 
>

Re: [vpp-dev] shadow build system change adding test-debug job

2017-06-14 Thread Ed Kern (ejk)

alright klement/dave

im a bit stuck again…

i get about an 60+% failure rate out of test-debug even with higher than normal 
cpu settings (higher than just what i use for build verify)

always right here


19:43:38 
==
19:43:38 ERROR: L2 FIB test 7 - flush bd_id
19:43:38 
--
19:43:38 Traceback (most recent call last):
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 508, in 
test_l2_fib_07
19:43:38 self.run_verify_negat_test(bd_id=1, dst_hosts=flushed)
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 418, in 
run_verify_negat_test
19:43:38 i.get_capture(0, timeout=timeout)
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/vpp_pg_interface.py", line 
240, in get_capture
19:43:38 (len(capture.res), expected_count, name))
19:43:38 Exception: Captured packets mismatch, captured 9 packets, expected 0 
packets on pg0
19:43:38
19:43:38 
==
19:43:38 ERROR: L2 FIB test 8 - flush all
19:43:38 
--
19:43:38 Traceback (most recent call last):
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 522, in 
test_l2_fib_08
19:43:38 self.run_verify_negat_test(bd_id=1, dst_hosts=flushed)
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 418, in 
run_verify_negat_test
19:43:38 i.get_capture(0, timeout=timeout)
19:43:38   File 
"/workspace/vpp-test-debug-master-ubuntu1604/test/vpp_pg_interface.py", line 
240, in get_capture
19:43:38 (len(capture.res), expected_count, name))
19:43:38 Exception: Captured packets mismatch, captured 9 packets, expected 0 
packets on pg0
19:43:38


when it fails its always the same two tests…always the same exception (captured 
9, expected 0)

its so consistent in its ‘death’ but so intermittent in frequency its freaking 
me out a bit…

any thoughts?

Ed



On May 24, 2017, at 8:42 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES 
at Cisco) mailto:ksek...@cisco.com>> wrote:

I know that the functional BFD tests passed so unless there is a bug in
the tests, the failures are pretty much timing issues. From my
experience the load is the culprit as the BFD tests test interactive
sessions, which need to be kept alive. The timings currently are set at
300ms and for most tests two keep-alives can be missed before the session
goes down on vpp side and asserts start failing. While this might seem
like ample time, especially on loaded systems there is a high chance
that at least one test will derp ...

I've also seen derps even on idle systems, where a select() call (used
by python in its own sleep() implementation) with timeout of 100ms returns
after 1-3 seconds.

Try running the bfd tests only (make test-all TEST=bfd) while no other tasks
are running - I think they should pass on your box just fine.

Thanks,
Klement

Quoting Ed Kern (ejk) (2017-05-24 16:27:10)
  right now its a VERY intentional mix…but depending on loading I could
  easily see this coming up if those timings are strict.
  To not dodge your question max loading on my slowest node would be 3
  concurrent builds on an Xeon™ E3-1240 v3 (4 cores @ 3.4Ghz)
yeah yeah stop laughing…..Do you have suggested or even guesstimate
  minimums in this regard…I could pretty trivially route them towards
  the larger set that I have right now if you think magic will result :)
  Ed
  PS thanks though..for whatever reason the type of errors I was getting
  didn’t naturally steer my mind towards cpu/io binding.

On May 24, 2017, at 12:57 AM, Klement Sekera -X (ksekera - PANTHEON
TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com<mailto:ksek...@cisco.com>> 
wrote:
Hi Ed,

how fast are your boxes? And how many cores? The BFD tests struggle to
meet
    the aggresive timings on slower boxes...

Thanks,
Klement

Quoting Ed Kern (ejk) (2017-05-23 20:43:55)

No problem.
If anyone is curious in rubbernecking the accident that is the
  current
test-all (at least for my build system)
adding a comment of
testall
SHOULD trigger and fire it off on my end.
make it all pass and you win a beer (or beverage of your choice)
Ed

  On May 23, 2017, at 11:34 AM, Dave Wallace
  <[1][2]dwallac...@gmail.com<mailto:dwallac...@gmail.com>>
  wrote:
  Ed,

  Thanks for adding this to the shadow build system.  Real data on
  the
  cost and effectiveness of this will be most useful.

  -daw-
  On 5/23/2017 1:30 PM, Ed Kern (ejk) wrote:

   

Re: [vpp-dev] Build failures on master

2017-06-02 Thread Ed Kern (ejk)
Tomas,

Just another data point for you/whomever its not specific to forking..
If you do a shallow clone  —depth=1 you will get the same error on build…

Ed




On Jun 2, 2017, at 12:49 AM, Tomas Brännström 
mailto:tomas.a.brannst...@tieto.com>> wrote:

No it's the same :-(

One thing though: when I get these errors I clone from a fork of the vpp repo 
that we made. Is there some kind of git-hook that creates these version 
files/update the .deb files with a version when cloning from the main vpp 
gerrit repository, that might not fire when cloning from a fork?

/Tomas

On 1 June 2017 at 19:02, Florin Coras 
mailto:fcoras.li...@gmail.com>> wrote:
Hi Tomas,

That sure is weird.

(backup everything that’s not in git)
git clean -fdx
make bootstrap
make build

Do you still see the issue?

HTH,
Florin

On Jun 1, 2017, at 6:18 AM, Tomas Brännström 
mailto:tomas.a.brannst...@tieto.com>> wrote:

Hi
I'm getting build errors when trying to build a recent commit on the master 
branch:

/home/ubuntu/git/vpp/build-data/../src/vnet/tcp/builtin_client.c:25:29: fatal 
error: vpp/app/version.h: No such file or directory
 #include 
 ^
compilation terminated.

I'm building using the "extras/vagrant/build.sh" script (or 
"build-root/vagrant/build.sh" in slightly earlier versions). What's strange 
here is that I built it successfully on the exact same commit yesterday on 
another machine. The commit in question is 
79ea7ec3b3c04d334a21107818c64d70c42b99ae but I tried on the latest master as 
well.

Also, I tried to build an earlier version (git tag v17.07-rc0) and got another 
error, where the .deb files could not be built  because the "version" was 
missing. I don't have the exact printout though.

Am I missing something here? I'm trying to build in an ubuntu trusty server 
install, and it has worked before.

/Tomas

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev


___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] State of the verify jobs

2017-06-01 Thread Ed Kern (ejk)
well that would certainly help me poke around to repo..

Now could you link the console log from a few minutes ago and not the
one from 13 hours ago?


Ed


On Jun 1, 2017, at 8:30 AM, Ed Warnicke 
mailto:hagb...@gmail.com>> wrote:

We are still seeing the 'six' issue as of a few minutes ago:

https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/5699/console

Ed

On Thu, Jun 1, 2017 at 7:26 AM, Ed Kern (ejk) 
mailto:e...@cisco.com>> wrote:
well as quickly as it came I no longer see this error on any build in the last 
three hours..

One comment though Ed its not clear if its what WAS biting us here..

I have a long history with deb/apt installed python packages (in this case 
python-six) ‘fighting’ with pip installed versions of packages
over the last couple years alone (and varying wildly with the version of pip) 
this can manifest itself in any number of ways:
a. pip not installing newer or prerequisite versions of a package because it 
refuses to ‘touch’ (aka remove) apt version of a package
b. pip installing newer version but software still pointed at or favoring the 
older version.
c. pip trying and failing (but thinking it succeeded) in removing deb installed 
version..

imho.. take one ‘boat’ or the other and don’t straddle otherwise your bound to 
get wet.

Ed



On Jun 1, 2017, at 7:20 AM, Ed Warnicke 
mailto:hagb...@gmail.com>> wrote:

Klement,

It is an interesting question.  I presume however as we are installing things 
via pip that it could be an upstream change.

Ed

On Thu, Jun 1, 2017 at 12:37 AM, Klement Sekera -X (ksekera - PANTHEON 
TECHNOLOGIES at Cisco) mailto:ksek...@cisco.com>> wrote:
Hmm, system-wide modules will probably have no effect, as we use
virtualenv to install our own modules as needed as part of make test.

I wonder why this suddenly fails, when it worked before?

Quoting Ed Warnicke (2017-06-01 05:15:55)
>A brief probe patch indicates that we already have the latest python 'six'
>module installed:
>[1]https://gerrit.fd.io/r/#/c/6966/
>
> [2]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/581/console
>
>  02:59:03 python-six is already the newest version (1.10.0-3).
>
>  Ed
>
>On Wed, May 31, 2017 at 7:38 PM, Ed Warnicke 
> <[3]hagb...@gmail.com<mailto:hagb...@gmail.com>> wrote:
>
>  We've had some turbulence in the last couple of days.
>  We are back to triggering jobs correctly, but we appear to be having an
>  issue with a missing python module that is causing issues for make test:
>  [4]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/5699/console
>  We are missing a module named '[5]six' that provides python2 and python
>  3 compatibility. As a result several python module installs are failing:
>  19:31:29   Copying subprocess32.egg-info to
>  build/bdist.linux-x86_64/wheel/subprocess32-3.2.7-py2.7.egg-info
>  19:31:29   running install_scripts
>  19:31:29   Traceback (most recent call last):
>  19:31:29 File "", line 1, in 
>  19:31:29 File "/tmp/pip-build-Yfbhm0/subprocess32/setup.py", line
>  60, in 
>  19:31:29   main()
>  19:31:29 File "/tmp/pip-build-Yfbhm0/subprocess32/setup.py", line
>  54, in main
>  19:31:29   'Programming Language :: Python :: Implementation ::
>  CPython',
>  19:31:29 File "/usr/lib/python2.7/distutils/core.py", line 151, in
>  setup
>  19:31:29   dist.run_commands()
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 953, in
>  run_commands
>  19:31:29   self.run_command(cmd)
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 972, in
>  run_command
>  19:31:29   cmd_obj.run()
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/wheel/bdist_wheel.py",
>  line 235, in run
>  19:31:29   self.run_command('install')
>  19:31:29 File "/usr/lib/python2.7/distutils/cmd.py", line 326, in
>  run_command
>  19:31:29   self.distribution.run_command(command)
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 972, in
>  run_command
>  19:31:29   cmd_obj.run()
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/setuptools/command/install.py",
>  line 61, in run
>  19:31:29   return orig.install.run(self)
>  19:31:29 File "/usr/lib/python2.7/distutils/command/install.py",
>  line 613, in run
>  19:31:

Re: [vpp-dev] State of the verify jobs

2017-06-01 Thread Ed Kern (ejk)
well as quickly as it came I no longer see this error on any build in the last 
three hours..

One comment though Ed its not clear if its what WAS biting us here..

I have a long history with deb/apt installed python packages (in this case 
python-six) ‘fighting’ with pip installed versions of packages
over the last couple years alone (and varying wildly with the version of pip) 
this can manifest itself in any number of ways:
a. pip not installing newer or prerequisite versions of a package because it 
refuses to ‘touch’ (aka remove) apt version of a package
b. pip installing newer version but software still pointed at or favoring the 
older version.
c. pip trying and failing (but thinking it succeeded) in removing deb installed 
version..

imho.. take one ‘boat’ or the other and don’t straddle otherwise your bound to 
get wet.

Ed



On Jun 1, 2017, at 7:20 AM, Ed Warnicke 
mailto:hagb...@gmail.com>> wrote:

Klement,

It is an interesting question.  I presume however as we are installing things 
via pip that it could be an upstream change.

Ed

On Thu, Jun 1, 2017 at 12:37 AM, Klement Sekera -X (ksekera - PANTHEON 
TECHNOLOGIES at Cisco) mailto:ksek...@cisco.com>> wrote:
Hmm, system-wide modules will probably have no effect, as we use
virtualenv to install our own modules as needed as part of make test.

I wonder why this suddenly fails, when it worked before?

Quoting Ed Warnicke (2017-06-01 05:15:55)
>A brief probe patch indicates that we already have the latest python 'six'
>module installed:
>[1]https://gerrit.fd.io/r/#/c/6966/
>
> [2]http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/581/console
>
>  02:59:03 python-six is already the newest version (1.10.0-3).
>
>  Ed
>
>On Wed, May 31, 2017 at 7:38 PM, Ed Warnicke 
> <[3]hagb...@gmail.com> wrote:
>
>  We've had some turbulence in the last couple of days.
>  We are back to triggering jobs correctly, but we appear to be having an
>  issue with a missing python module that is causing issues for make test:
>  [4]https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/5699/console
>  We are missing a module named '[5]six' that provides python2 and python
>  3 compatibility. As a result several python module installs are failing:
>  19:31:29   Copying subprocess32.egg-info to
>  build/bdist.linux-x86_64/wheel/subprocess32-3.2.7-py2.7.egg-info
>  19:31:29   running install_scripts
>  19:31:29   Traceback (most recent call last):
>  19:31:29 File "", line 1, in 
>  19:31:29 File "/tmp/pip-build-Yfbhm0/subprocess32/setup.py", line
>  60, in 
>  19:31:29   main()
>  19:31:29 File "/tmp/pip-build-Yfbhm0/subprocess32/setup.py", line
>  54, in main
>  19:31:29   'Programming Language :: Python :: Implementation ::
>  CPython',
>  19:31:29 File "/usr/lib/python2.7/distutils/core.py", line 151, in
>  setup
>  19:31:29   dist.run_commands()
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 953, in
>  run_commands
>  19:31:29   self.run_command(cmd)
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 972, in
>  run_command
>  19:31:29   cmd_obj.run()
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/wheel/bdist_wheel.py",
>  line 235, in run
>  19:31:29   self.run_command('install')
>  19:31:29 File "/usr/lib/python2.7/distutils/cmd.py", line 326, in
>  run_command
>  19:31:29   self.distribution.run_command(command)
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 972, in
>  run_command
>  19:31:29   cmd_obj.run()
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/setuptools/command/install.py",
>  line 61, in run
>  19:31:29   return orig.install.run(self)
>  19:31:29 File "/usr/lib/python2.7/distutils/command/install.py",
>  line 613, in run
>  19:31:29   self.run_command(cmd_name)
>  19:31:29 File "/usr/lib/python2.7/distutils/cmd.py", line 326, in
>  run_command
>  19:31:29   self.distribution.run_command(command)
>  19:31:29 File "/usr/lib/python2.7/distutils/dist.py", line 972, in
>  run_command
>  19:31:29   cmd_obj.run()
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/setuptools/command/install_scripts.py",
>  line 17, in run
>  19:31:29   import setuptools.command.easy_install as ei
>  19:31:29 File
>  
> "/w/workspace/vpp-verify-master-ubuntu1604/build-root/python/virtualenv/local/lib/python2.7/site-packages/setuptools/command/easy_install.py",
>  line 49, in 
>  19:31:29   from setuptoo

Re: [vpp-dev] shadow build system change adding test-debug job

2017-05-24 Thread Ed Kern (ejk)
well sure enough…i doubled the cpu reservation and it all passed clean…
things are pretty quiet right now so ill throw it up again when things are busy
but right now I’m just happy to see a full test-all-debug pass.


thanks,

Ed


> On May 24, 2017, at 8:42 AM, Klement Sekera -X (ksekera - PANTHEON 
> TECHNOLOGIES at Cisco)  wrote:
> 
> I know that the functional BFD tests passed so unless there is a bug in
> the tests, the failures are pretty much timing issues. From my
> experience the load is the culprit as the BFD tests test interactive
> sessions, which need to be kept alive. The timings currently are set at
> 300ms and for most tests two keep-alives can be missed before the session
> goes down on vpp side and asserts start failing. While this might seem
> like ample time, especially on loaded systems there is a high chance
> that at least one test will derp ...
> 
> I've also seen derps even on idle systems, where a select() call (used
> by python in its own sleep() implementation) with timeout of 100ms returns
> after 1-3 seconds.
> 
> Try running the bfd tests only (make test-all TEST=bfd) while no other tasks
> are running - I think they should pass on your box just fine.
> 
> Thanks,
> Klement
> 
> Quoting Ed Kern (ejk) (2017-05-24 16:27:10)
>>   right now its a VERY intentional mix…but depending on loading I could
>>   easily see this coming up if those timings are strict.  
>>   To not dodge your question max loading on my slowest node would be 3
>>   concurrent builds on an Xeon™ E3-1240 v3 (4 cores @ 3.4Ghz)
>> yeah yeah stop laughing…..Do you have suggested or even guesstimate
>>   minimums in this regard…I could pretty trivially route them towards
>>   the larger set that I have right now if you think magic will result :)
>>   Ed
>>   PS thanks though..for whatever reason the type of errors I was getting
>>   didn’t naturally steer my mind towards cpu/io binding.
>> 
>> On May 24, 2017, at 12:57 AM, Klement Sekera -X (ksekera - PANTHEON
>> TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com> wrote:
>> Hi Ed,
>> 
>> how fast are your boxes? And how many cores? The BFD tests struggle to
>> meet
>> the aggresive timings on slower boxes...
>> 
>> Thanks,
>> Klement
>> 
>> Quoting Ed Kern (ejk) (2017-05-23 20:43:55)
>> 
>> No problem.
>> If anyone is curious in rubbernecking the accident that is the
>>   current
>> test-all (at least for my build system)
>> adding a comment of
>> testall
>> SHOULD trigger and fire it off on my end.
>> make it all pass and you win a beer (or beverage of your choice)  
>> Ed
>> 
>>   On May 23, 2017, at 11:34 AM, Dave Wallace
>>       <[1][2]dwallac...@gmail.com>
>>   wrote:
>>   Ed,
>> 
>>   Thanks for adding this to the shadow build system.  Real data on
>>   the
>>   cost and effectiveness of this will be most useful.
>> 
>>   -daw-
>>   On 5/23/2017 1:30 PM, Ed Kern (ejk) wrote:
>> 
>>   In the vpp-dev call a couple hours ago there was a discussion of
>>   running test-debug on a regular/default? basis.
>>   As a trial I’ve added a new job to the shadow build system:
>> 
>>   vpp-test-debug-master-ubuntu1604
>> 
>>   Will do a make test-debug,  as part of verify set, as an ADDITIONAL
>>   job.
>> 
>>   I gave a couple passes with test-all but can’t ever get a clean run
>>   with test-all (errors in test_bfd and test_ip6 ).
>>   I don’t think this is unusual or unexpected.  Ill leave it to someone
>>   else to say that ‘all’ throwing failures is a good thing.
>>   I’m happy to add another job for/with test-all if someone wants to
>>   actually debug those errors.
>> 
>>   flames, comments,concerns welcome..
>> 
>>   Ed
>> 
>>   PS Please note/remember that all these tests are non-voting regardless
>>   of success or failure.
>>   ___
>>   vpp-dev mailing list
>>   [2][3]vpp-dev@lists.fd.io
>>   [3][4]https://lists.fd.io/mailman/listinfo/vpp-dev
>> 
>>   References
>> 
>> Visible links
>> 1. [5]mailto:dwallac...@gmail.com
>> 2. [6]mailto:vpp-dev@lists.fd.io
>> 3. [7]https://lists.fd.io/mailman/listinfo/vpp-dev
>> 
>> References
>> 
>>   Visible links
>>   1. mailto:ksek...@cisco.com
>>   2. mailto:dwallac...@gmail.com
>>   3. mailto:vpp-dev@lists.fd.io
>>   4. https://lists.fd.io/mailman/listinfo/vpp-dev
>>   5. mailto:dwallac...@gmail.com
>>   6. mailto:vpp-dev@lists.fd.io
>>   7. https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] shadow build system change adding test-debug job

2017-05-24 Thread Ed Kern (ejk)
right now its a VERY intentional mix…but depending on loading I could easily 
see this coming up if those timings are strict.

To not dodge your question max loading on my slowest node would be 3 concurrent 
builds on an Xeon™ E3-1240 v3 (4 cores @ 3.4Ghz)

  yeah yeah stop laughing…..Do you have suggested or even guesstimate minimums 
in this regard…I could pretty trivially route them towards
the larger set that I have right now if you think magic will result :)

Ed

PS thanks though..for whatever reason the type of errors I was getting didn’t 
naturally steer my mind towards cpu/io binding.



On May 24, 2017, at 12:57 AM, Klement Sekera -X (ksekera - PANTHEON 
TECHNOLOGIES at Cisco) mailto:ksek...@cisco.com>> wrote:

Hi Ed,

how fast are your boxes? And how many cores? The BFD tests struggle to meet
the aggresive timings on slower boxes...

Thanks,
Klement

Quoting Ed Kern (ejk) (2017-05-23 20:43:55)
  No problem.
  If anyone is curious in rubbernecking the accident that is the current
  test-all (at least for my build system)
  adding a comment of
  testall
  SHOULD trigger and fire it off on my end.
  make it all pass and you win a beer (or beverage of your choice)
  Ed

On May 23, 2017, at 11:34 AM, Dave Wallace 
<[1]dwallac...@gmail.com<mailto:dwallac...@gmail.com>>
wrote:
Ed,

Thanks for adding this to the shadow build system.  Real data on the
cost and effectiveness of this will be most useful.

-daw-
On 5/23/2017 1:30 PM, Ed Kern (ejk) wrote:

In the vpp-dev call a couple hours ago there was a discussion of running 
test-debug on a regular/default? basis.
As a trial I’ve added a new job to the shadow build system:

vpp-test-debug-master-ubuntu1604

Will do a make test-debug,  as part of verify set, as an ADDITIONAL job.


I gave a couple passes with test-all but can’t ever get a clean run with 
test-all (errors in test_bfd and test_ip6 ).
I don’t think this is unusual or unexpected.  Ill leave it to someone else to 
say that ‘all’ throwing failures is a good thing.
I’m happy to add another job for/with test-all if someone wants to actually 
debug those errors.

flames, comments,concerns welcome..

Ed

PS Please note/remember that all these tests are non-voting regardless of 
success or failure.
___
vpp-dev mailing list
[2]vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
[3]https://lists.fd.io/mailman/listinfo/vpp-dev

References

  Visible links
  1. mailto:dwallac...@gmail.com
  2. mailto:vpp-dev@lists.fd.io
  3. https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] shadow build system change adding test-debug job

2017-05-23 Thread Ed Kern (ejk)
No problem.

If anyone is curious in rubbernecking the accident that is the current test-all 
(at least for my build system)
adding a comment of
testall
SHOULD trigger and fire it off on my end.
make it all pass and you win a beer (or beverage of your choice)

Ed



On May 23, 2017, at 11:34 AM, Dave Wallace 
mailto:dwallac...@gmail.com>> wrote:

Ed,

Thanks for adding this to the shadow build system.  Real data on the cost and 
effectiveness of this will be most useful.

-daw-

On 5/23/2017 1:30 PM, Ed Kern (ejk) wrote:

In the vpp-dev call a couple hours ago there was a discussion of running 
test-debug on a regular/default? basis.
As a trial I’ve added a new job to the shadow build system:

vpp-test-debug-master-ubuntu1604

Will do a make test-debug,  as part of verify set, as an ADDITIONAL job.


I gave a couple passes with test-all but can’t ever get a clean run with 
test-all (errors in test_bfd and test_ip6 ).
I don’t think this is unusual or unexpected.  Ill leave it to someone else to 
say that ‘all’ throwing failures is a good thing.
I’m happy to add another job for/with test-all if someone wants to actually 
debug those errors.

flames, comments,concerns welcome..

Ed

PS Please note/remember that all these tests are non-voting regardless of 
success or failure.
___
vpp-dev mailing list
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
https://lists.fd.io/mailman/listinfo/vpp-dev


___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

[vpp-dev] shadow build system change adding test-debug job

2017-05-23 Thread Ed Kern (ejk)

In the vpp-dev call a couple hours ago there was a discussion of running 
test-debug on a regular/default? basis.
As a trial I’ve added a new job to the shadow build system:

vpp-test-debug-master-ubuntu1604

Will do a make test-debug,  as part of verify set, as an ADDITIONAL job.


I gave a couple passes with test-all but can’t ever get a clean run with 
test-all (errors in test_bfd and test_ip6 ).
I don’t think this is unusual or unexpected.  Ill leave it to someone else to 
say that ‘all’ throwing failures is a good thing.
I’m happy to add another job for/with test-all if someone wants to actually 
debug those errors.

flames, comments,concerns welcome..

Ed

PS Please note/remember that all these tests are non-voting regardless of 
success or failure.
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

[vpp-dev] Fwd: Dropping ubuntu 14.04 from from all verify/merge jobs for ALL releases past/current/future

2017-05-11 Thread Ed Kern (ejk)
forwarding to the correct vpp-dev list

Begin forwarded message:

From: "Ed Kern (ejk)" mailto:e...@cisco.com>>
Subject: Dropping ubuntu 14.04 from from all verify/merge jobs for ALL releases 
past/current/future
Date: May 11, 2017 at 11:12:34 AM MDT
To: "vpp-dev(mailer list)" mailto:vpp-...@cisco.com>>, 
"Damjan Marion (damarion)" mailto:damar...@cisco.com>>
Cc: Dave Barach mailto:dbar...@cisco.com>>, murali 
Venkateshaiah mailto:mura...@cisco.com>>, John Lo 
mailto:l...@cisco.com>>


TLDR   A request has been made to no longer build VPP ubuntu 14.04 verify or 
merge jobs for all releases.  Please speak
up if you object to this happening by CoB 5/15.



Hi folks,


An original request to remove ubuntu 14.04 from master (and > 17.04) from 
damjan more than a month ago was brought
to my attention to actually make happen.  During discussion in the TSC dbarach 
made a comment/change that he didn’t feel
any of the older builds  (1606, 1609, 1701, 1704) needed automatic builds for 
14.04 either at this point.
So this expanded the request (into the much easier jjb change) to just put all 
of ubuntu 14.04 into the trashcan.
I was asked to send email to this list and to the individuals above to make 
sure there were no objections.
If folks think this change should be communicated elsewhere feel free to 
forward.
If folks think I should allow more than 2 business days for someone to raise 
objection please let me know.

Damjan: Folks on the TSC call also requested that you communicate this 
(possible) change to any downstream consumer
of those builds to make sure this won’t cause explosions downstream.


thanks,

Ed

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

[vpp-dev] Fwd: Dropping ubuntu 14.04 from from all verify/merge jobs for ALL releases past/current/future

2017-05-11 Thread Ed Kern (ejk)
forwarding to the correct vpp-dev list

Begin forwarded message:

From: "Ed Kern (ejk)" mailto:e...@cisco.com>>
Subject: Dropping ubuntu 14.04 from from all verify/merge jobs for ALL releases 
past/current/future
Date: May 11, 2017 at 11:12:34 AM MDT
To: "vpp-dev(mailer list)" mailto:vpp-...@cisco.com>>, 
"Damjan Marion (damarion)" mailto:damar...@cisco.com>>
Cc: Dave Barach mailto:dbar...@cisco.com>>, murali 
Venkateshaiah mailto:mura...@cisco.com>>, John Lo 
mailto:l...@cisco.com>>


TLDR   A request has been made to no longer build VPP ubuntu 14.04 verify or 
merge jobs for all releases.  Please speak
up if you object to this happening by CoB 5/15.



Hi folks,


An original request to remove ubuntu 14.04 from master (and > 17.04) from 
damjan more than a month ago was brought
to my attention to actually make happen.  During discussion in the TSC dbarach 
made a comment/change that he didn’t feel
any of the older builds  (1606, 1609, 1701, 1704) needed automatic builds for 
14.04 either at this point.
So this expanded the request (into the much easier jjb change) to just put all 
of ubuntu 14.04 into the trashcan.
I was asked to send email to this list and to the individuals above to make 
sure there were no objections.
If folks think this change should be communicated elsewhere feel free to 
forward.
If folks think I should allow more than 2 business days for someone to raise 
objection please let me know.

Damjan: Folks on the TSC call also requested that you communicate this 
(possible) change to any downstream consumer
of those builds to make sure this won’t cause explosions downstream.


thanks,

Ed

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev