Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Adam Barth
Do you know how they got rid of flakiness in their tests?  We've spent
a bunch of effort fixing flaky tests (and in marking the remaining
flaky tests as flaky), but there's still a long tail of flakiness.  I
wonder if that sort of thing might be different for OpenStack if they
have a different approach to testing than we do.

Adam


On Mon, Feb 4, 2013 at 5:14 PM, Tim Ansell mit...@mithis.com wrote:
 Hey guys,

 Last week a number of the team here at Google Sydney, including myself
 attended Linux.conf.au 2013 conference. The conference was a blast and the
 hot topic this year was OpenStack, an Open Source Cloud layer.

 The OpenStack project has grown from being a small project to having over
 500 active committers and continues to grow at a rapid pace. Both the
 Continuous Integration Miniconf
 (http://lca2013.linux.org.au/schedule/30102/view_talk?day=monday) and main
 conference included talks from OpenStack leaders about how they have tried
 to handle this growth and I think we can learn from their successes and
 failures. All of the OpenStack's infrastructure is documented in the
 following talks http://openstack-ci.github.com/publications/

 I pulled the following stats to see how comparable the projects are;

 OpenStack;
 (http://openstack-ci.github.com/publications/lca2013-ci/index.html#(3))

 Over 500 Active Technical Contributors
 As many as 200 trunk changes an hour
 18 (integrated) projects (and growing)

 I tried looking these up in WebKit and got the following;

 ~200 active contributors
 As many as ~12 trunk changes an hour
 1 project, but 7 target platforms

 One of the most interesting parts of OpenStack was having a gated trunk.
 From their talk;

 Before each change to the OpenStack projects is merged into the main tree,
 unit and integration tests are run on the change, and only if they pass, is
 the change merged.  We call this gating.


 There is a lot of debate about the value of a gated trunk on the internet;
 which I'm not going to repeat here. OpenStack's experience has been that it
 preserves the following properties;
 http://openstack-ci.github.com/publications/lca2013-ci/index.html#(9)

 Ensures Code Quality
 Protects developers

 Devs always start from working code

 Protects tree

 Bad code doesn't land

 Egalitarian

 Process is the same for everyone
 Process is transparent
 Process is automated

 These are all things that came up in Eric's WebKit wishes email specially
 the parts about having an always green tree. The egalitarian nature of the
 system also helps with trusting people as you *know* they can not break the
 tree. This system is similar to our commit queue, however nobody has
 privileges to bypass the queue.

 OpenStack has 18 projects which are all tightly integrated, for example a
 change in the API in one project could break another project, for this
 reason they gate changes on tests runs from all projects before allowing a
 commit to land to any of them. While WebKit is only a single project, the
 process of requiring multiple jobs to be green is similar to WebKit needing
 to support multiple platforms.

 They do point out that when this system is set up, the system has to be
 ultra repeatable and reliable;

 Once everything is automated, the projects stops if the automation does -
 http://openstack-ci.github.com/publications/lca2013-ci/index.html#(8)


 To allow this to happen, OpenStack has managed to eliminated all flaky tests
 in their suite. WebKit is not at this stage and still has a large number
 tests which are both failing and/or flaky. Luckily, WebKit has much better
 infrastructure for dealing with and tracking them down.

 Other things they have done to try and make this process work are;

 Like WebKit, every patch is required to have code review before being
 submitted. OpenStack requires two positive reviews before allowing a commit
 to be submitted, rather than the single one that WebKit needs.
 Like WebKit, OpenStack has an early warning system which runs all tests as
 soon as a patch is submitted.

 The complete OpenStack test suite takes around ~1 hour to run, but as they
 have more than 1 event per hour their landing system needs pipelining. They
 have developed a system called Zuul to make this happen. Before they had
 this pipeline process, committing was taking many hours to land.

 You can see their currently running system at http://zuul.openstack.org/ and
 find out more about Zuul at the following locations;

 Zuul: a Pipelining Trunk Gating System
 http://amo-probos.org/post/14

 http://mirror.linux.org.au/linux.conf.au/2013/ogv/OpenStack_Zuul.ogv


 I guess this is something we should discuss further.

 Tim 'mithro' Ansell



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org

Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Martin Robinson
On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth aba...@webkit.org wrote:
 Do you know how they got rid of flakiness in their tests?  We've spent
 a bunch of effort fixing flaky tests (and in marking the remaining
 flaky tests as flaky), but there's still a long tail of flakiness.  I
 wonder if that sort of thing might be different for OpenStack if they
 have a different approach to testing than we do.

Another useful thing is to know the number of tests in OpenStack.
WebKit has more tests than any other project I've worked on.

--Martin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Dirk Pranke
On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson mrobin...@webkit.org wrote:
 On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth aba...@webkit.org wrote:
 Do you know how they got rid of flakiness in their tests?  We've spent
 a bunch of effort fixing flaky tests (and in marking the remaining
 flaky tests as flaky), but there's still a long tail of flakiness.  I
 wonder if that sort of thing might be different for OpenStack if they
 have a different approach to testing than we do.

 Another useful thing is to know the number of tests in OpenStack.
 WebKit has more tests than any other project I've worked on.


There are two other related aspects that make our tests flaky:

1) They're very high level integration tests (mostly), which, as they
cover large swaths of code in each test, are much more susceptible to
flakiness than method-level unit tests.

2) They weren't generally written to be run in parallel, and thus we
often have to be concerned with system-level resource contention.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Benjamin Poulain
On Tue, Feb 5, 2013 at 12:17 PM, Dirk Pranke dpra...@chromium.org wrote:

 There are two other related aspects that make our tests flaky:

 1) They're very high level integration tests (mostly), which, as they
 cover large swaths of code in each test, are much more susceptible to
 flakiness than method-level unit tests.

 2) They weren't generally written to be run in parallel, and thus we
 often have to be concerned with system-level resource contention.


3) WebKit runs on top of large imperfect platforms. The frameworks we use
have bugs too.

Benjamin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Tim Ansell
On 6 February 2013 07:17, Dirk Pranke dpra...@chromium.org wrote:

 On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson mrobin...@webkit.org
 wrote:
  On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth aba...@webkit.org wrote:
  Do you know how they got rid of flakiness in their tests?  We've spent
  a bunch of effort fixing flaky tests (and in marking the remaining
  flaky tests as flaky), but there's still a long tail of flakiness.  I
  wonder if that sort of thing might be different for OpenStack if they
  have a different approach to testing than we do.


From what I can see they have a pretty similar goal to us. I personally
don't know where our test flakyness comes from, so can't really comment on
how we could fix it.


 
  Another useful thing is to know the number of tests in OpenStack.
  WebKit has more tests than any other project I've worked on.
 

 There are two other related aspects that make our tests flaky:

 1) They're very high level integration tests (mostly), which, as they
 cover large swaths of code in each test, are much more susceptible to
 flakiness than method-level unit tests.


While OpenStack doesn't have anywhere near the number of integration tests
WebKit does, it does have large integration tests. Infact, one of their
tests brings up a whole cloud stack and checks that you can operate the
cluster.


 2) They weren't generally written to be run in parallel, and thus we
 often have to be concerned with system-level resource contention.


Neither where OpenStack's originally. They made heavy use of tool called *
testr* ( http://pypi.python.org/pypi/testrepository ) which has a mode to
automatically find when two tests are interfering with each other. testr
also has a bunch of other useful features, like only re-running tests which
are currently failing and keeping a database of test runs and allowing stat
collection.

We too could use testr if our tests output the subunit format. The subunit
format was originally developed for python and has excellent python support
so I think it should be pretty trivial to add.

Tim 'mithro' Ansell
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Dirk Pranke
On Tue, Feb 5, 2013 at 3:34 PM, Tim Ansell mit...@mithis.com wrote:
 On 6 February 2013 07:17, Dirk Pranke dpra...@chromium.org wrote:

 On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson mrobin...@webkit.org
 wrote:
  On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth aba...@webkit.org wrote:
  Do you know how they got rid of flakiness in their tests?  We've spent
  a bunch of effort fixing flaky tests (and in marking the remaining
  flaky tests as flaky), but there's still a long tail of flakiness.  I
  wonder if that sort of thing might be different for OpenStack if they
  have a different approach to testing than we do.


 From what I can see they have a pretty similar goal to us. I personally
 don't know where our test flakyness comes from, so can't really comment on
 how we could fix it.


 
  Another useful thing is to know the number of tests in OpenStack.
  WebKit has more tests than any other project I've worked on.
 

 There are two other related aspects that make our tests flaky:

 1) They're very high level integration tests (mostly), which, as they
 cover large swaths of code in each test, are much more susceptible to
 flakiness than method-level unit tests.


 While OpenStack doesn't have anywhere near the number of integration tests
 WebKit does, it does have large integration tests. Infact, one of their
 tests brings up a whole cloud stack and checks that you can operate the
 cluster.


 2) They weren't generally written to be run in parallel, and thus we
 often have to be concerned with system-level resource contention.


 Neither where OpenStack's originally. They made heavy use of tool called
 testr ( http://pypi.python.org/pypi/testrepository ) which has a mode to
 automatically find when two tests are interfering with each other. testr
 also has a bunch of other useful features, like only re-running tests which
 are currently failing and keeping a database of test runs and allowing stat
 collection.


Ah, the testr isolation bisection does look interesting. I have done a
little work along those lines but haven't gotten very far.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Tim Ansell
testr is already a great way to record test runs and collect stats on them.
I'd really love it if we could contribute to testr some of our flakyness
stuff. Flakyness plagues all projects and having awesome tools would help a
lot of people, not just us.

Tim

On 6 February 2013 12:55, Dirk Pranke dpra...@chromium.org wrote:

 On Tue, Feb 5, 2013 at 3:34 PM, Tim Ansell mit...@mithis.com wrote:
  On 6 February 2013 07:17, Dirk Pranke dpra...@chromium.org wrote:
 
  On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson mrobin...@webkit.org
  wrote:
   On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth aba...@webkit.org wrote:
   Do you know how they got rid of flakiness in their tests?  We've
 spent
   a bunch of effort fixing flaky tests (and in marking the remaining
   flaky tests as flaky), but there's still a long tail of flakiness.  I
   wonder if that sort of thing might be different for OpenStack if they
   have a different approach to testing than we do.
 
 
  From what I can see they have a pretty similar goal to us. I personally
  don't know where our test flakyness comes from, so can't really comment
 on
  how we could fix it.
 
 
  
   Another useful thing is to know the number of tests in OpenStack.
   WebKit has more tests than any other project I've worked on.
  
 
  There are two other related aspects that make our tests flaky:
 
  1) They're very high level integration tests (mostly), which, as they
  cover large swaths of code in each test, are much more susceptible to
  flakiness than method-level unit tests.
 
 
  While OpenStack doesn't have anywhere near the number of integration
 tests
  WebKit does, it does have large integration tests. Infact, one of their
  tests brings up a whole cloud stack and checks that you can operate the
  cluster.
 
 
  2) They weren't generally written to be run in parallel, and thus we
  often have to be concerned with system-level resource contention.
 
 
  Neither where OpenStack's originally. They made heavy use of tool called
  testr ( http://pypi.python.org/pypi/testrepository ) which has a mode to
  automatically find when two tests are interfering with each other. testr
  also has a bunch of other useful features, like only re-running tests
 which
  are currently failing and keeping a database of test runs and allowing
 stat
  collection.
 

 Ah, the testr isolation bisection does look interesting. I have done a
 little work along those lines but haven't gotten very far.

 -- Dirk

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] Gated trunk, experiences from OpenStack

2013-02-04 Thread Tim Ansell
Hey guys,

Last week a number of the team here at Google Sydney, including myself
attended Linux.conf.au 2013 conference. The conference was a blast and the
hot topic this year was OpenStack, an Open Source Cloud layer.

The OpenStack project has grown from being a small project to having over
500 active committers and continues to grow at a rapid pace. Both
the Continuous Integration Miniconf (
http://lca2013.linux.org.au/schedule/30102/view_talk?day=monday) and main
conference included talks from OpenStack leaders about how they have tried
to handle this growth and I think we can learn from their successes and
failures. All of the OpenStack's infrastructure is documented in the
following talks http://openstack-ci.github.com/publications/

I pulled the following stats to see how comparable the projects are;

OpenStack; (
http://openstack-ci.github.com/publications/lca2013-ci/index.html#(3))


   - Over 500 Active Technical Contributors
  - As many as 200 trunk changes an hour
  - 18 (integrated) projects (and growing)

I tried looking these up in WebKit and got the following;


   - ~200 active contributors
  - As many as ~12 trunk changes an hour
  - 1 project, but 7 target platforms

One of the most interesting parts of OpenStack was having a gated trunk.
From their talk;

 Before each change to the OpenStack projects is merged into the main tree,
 unit and integration tests are run on the change, and only if they pass, is
 the change merged.  We call this gating.


There is a lot of debate about the value of a gated trunk on the internet;
which I'm not going to repeat here. OpenStack's experience has been that it
preserves the following properties;
http://openstack-ci.github.com/publications/lca2013-ci/index.html#(9)

   - Ensures Code Quality
   - Protects developers
  - Devs always start from working code
   - Protects tree
  - Bad code doesn't land
   - Egalitarian
  - Process is the same for everyone
  - Process is transparent
  - Process is automated

These are all things that came up in Eric's WebKit wishes email specially
the parts about having an always green tree. The egalitarian nature of the
system also helps with trusting people as you *know* they can not break the
tree. This system is similar to our commit queue, however nobody
has privileges to bypass the queue.

OpenStack has 18 projects which are all tightly integrated, for example a
change in the API in one project could break another project, for this
reason they gate changes on tests runs from *all* projects before allowing
a commit to land to any of them. While WebKit is only a single project, the
process of requiring multiple jobs to be green is similar to WebKit needing
to support multiple platforms.

They do point out that when this system is set up, the system has to be
ultra repeatable and reliable;

 Once everything is automated, the projects stops if the automation does -
 http://openstack-ci.github.com/publications/lca2013-ci/index.html#(8)


To allow this to happen, OpenStack has managed to eliminated all flaky
tests in their suite. WebKit is not at this stage and still has a large
number tests which are both failing and/or flaky. Luckily, WebKit has much
better infrastructure for dealing with and tracking them down.

Other things they have done to try and make this process work are;

   - Like WebKit, every patch is required to have code review before being
   submitted. OpenStack requires two positive reviews before allowing a commit
   to be submitted, rather than the single one that WebKit needs.
   - Like WebKit, OpenStack has an early warning system which runs all
   tests as soon as a patch is submitted.

The complete OpenStack test suite takes around ~1 hour to run, but as they
have more than 1 event per hour their landing system needs pipelining. They
have developed a system called Zuul to make this happen. Before they had
this pipeline process, committing was taking many hours to land.

You can see their currently running system at
http://zuul.openstack.org/ and find
out more about Zuul at the following locations;

 Zuul: a Pipelining Trunk Gating System
 http://amo-probos.org/post/14

http://mirror.linux.org.au/linux.conf.au/2013/ogv/OpenStack_Zuul.ogv


I guess this is something we should discuss further.

Tim 'mithro' Ansell
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev