Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Tim Ansell
testr is already a great way to record test runs and collect stats on them.
I'd really love it if we could contribute to testr some of our flakyness
stuff. Flakyness plagues all projects and having awesome tools would help a
lot of people, not just us.

Tim

On 6 February 2013 12:55, Dirk Pranke  wrote:

> On Tue, Feb 5, 2013 at 3:34 PM, Tim Ansell  wrote:
> > On 6 February 2013 07:17, Dirk Pranke  wrote:
> >>
> >> On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson 
> >> wrote:
> >> > On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth  wrote:
> >> >> Do you know how they got rid of flakiness in their tests?  We've
> spent
> >> >> a bunch of effort fixing flaky tests (and in marking the remaining
> >> >> flaky tests as flaky), but there's still a long tail of flakiness.  I
> >> >> wonder if that sort of thing might be different for OpenStack if they
> >> >> have a different approach to testing than we do.
> >
> >
> > From what I can see they have a pretty similar goal to us. I personally
> > don't know where our test flakyness comes from, so can't really comment
> on
> > how we could fix it.
> >
> >>
> >> >
> >> > Another useful thing is to know the number of tests in OpenStack.
> >> > WebKit has more tests than any other project I've worked on.
> >> >
> >>
> >> There are two other related aspects that make our tests flaky:
> >>
> >> 1) They're very high level integration tests (mostly), which, as they
> >> cover large swaths of code in each test, are much more susceptible to
> >> flakiness than method-level unit tests.
> >
> >
> > While OpenStack doesn't have anywhere near the number of integration
> tests
> > WebKit does, it does have large integration tests. Infact, one of their
> > tests brings up a whole cloud stack and checks that you can operate the
> > cluster.
> >
> >>
> >> 2) They weren't generally written to be run in parallel, and thus we
> >> often have to be concerned with system-level resource contention.
> >
> >
> > Neither where OpenStack's originally. They made heavy use of tool called
> > testr ( http://pypi.python.org/pypi/testrepository ) which has a mode to
> > automatically find when two tests are interfering with each other. testr
> > also has a bunch of other useful features, like only re-running tests
> which
> > are currently failing and keeping a database of test runs and allowing
> stat
> > collection.
> >
>
> Ah, the testr isolation bisection does look interesting. I have done a
> little work along those lines but haven't gotten very far.
>
> -- Dirk
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Gated trunk, experiences from OpenStack

2013-02-05 Thread Tim Ansell
On 6 February 2013 07:17, Dirk Pranke  wrote:

> On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson 
> wrote:
> > On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth  wrote:
> >> Do you know how they got rid of flakiness in their tests?  We've spent
> >> a bunch of effort fixing flaky tests (and in marking the remaining
> >> flaky tests as flaky), but there's still a long tail of flakiness.  I
> >> wonder if that sort of thing might be different for OpenStack if they
> >> have a different approach to testing than we do.
>

>From what I can see they have a pretty similar goal to us. I personally
don't know where our test flakyness comes from, so can't really comment on
how we could fix it.


> >
> > Another useful thing is to know the number of tests in OpenStack.
> > WebKit has more tests than any other project I've worked on.
> >
>
> There are two other related aspects that make our tests flaky:
>
> 1) They're very high level integration tests (mostly), which, as they
> cover large swaths of code in each test, are much more susceptible to
> flakiness than method-level unit tests.
>

While OpenStack doesn't have anywhere near the number of integration tests
WebKit does, it does have large integration tests. Infact, one of their
tests brings up a whole cloud stack and checks that you can operate the
cluster.


> 2) They weren't generally written to be run in parallel, and thus we
> often have to be concerned with system-level resource contention.
>

Neither where OpenStack's originally. They made heavy use of tool called *
testr* ( http://pypi.python.org/pypi/testrepository ) which has a mode to
automatically find when two tests are interfering with each other. testr
also has a bunch of other useful features, like only re-running tests which
are currently failing and keeping a database of test runs and allowing stat
collection.

We too could use testr if our tests output the subunit format. The subunit
format was originally developed for python and has excellent python support
so I think it should be pretty trivial to add.

Tim 'mithro' Ansell
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


[webkit-dev] Gated trunk, experiences from OpenStack

2013-02-04 Thread Tim Ansell
Hey guys,

Last week a number of the team here at Google Sydney, including myself
attended Linux.conf.au 2013 conference. The conference was a blast and the
hot topic this year was OpenStack, an Open Source Cloud layer.

The OpenStack project has grown from being a small project to having over
500 active committers and continues to grow at a rapid pace. Both
the Continuous Integration Miniconf (
http://lca2013.linux.org.au/schedule/30102/view_talk?day=monday) and main
conference included talks from OpenStack leaders about how they have tried
to handle this growth and I think we can learn from their successes and
failures. All of the OpenStack's infrastructure is documented in the
following talks http://openstack-ci.github.com/publications/

I pulled the following stats to see how comparable the projects are;

OpenStack; (
http://openstack-ci.github.com/publications/lca2013-ci/index.html#(3))


   - Over 500 Active Technical Contributors
  - As many as 200 trunk changes an hour
  - 18 (integrated) projects (and growing)

I tried looking these up in WebKit and got the following;


   - ~200 active contributors
  - As many as ~12 trunk changes an hour
  - 1 project, but 7 target platforms

One of the most interesting parts of OpenStack was having a "gated trunk".
>From their talk;

> Before each change to the OpenStack projects is merged into the main tree,
> unit and integration tests are run on the change, and only if they pass, is
> the change merged.  We call this "gating".


There is a lot of debate about the value of a gated trunk on the internet;
which I'm not going to repeat here. OpenStack's experience has been that it
preserves the following properties;
http://openstack-ci.github.com/publications/lca2013-ci/index.html#(9)

   - Ensures Code Quality
   - Protects developers
  - Devs always start from working code
   - Protects tree
  - Bad code doesn't land
   - Egalitarian
  - Process is the same for everyone
  - Process is transparent
  - Process is automated

These are all things that came up in Eric's "WebKit wishes" email specially
the parts about having an always green tree. The egalitarian nature of the
system also helps with trusting people as you *know* they can not break the
tree. This system is similar to our commit queue, however nobody
has privileges to bypass the queue.

OpenStack has 18 projects which are all tightly integrated, for example a
change in the API in one project could break another project, for this
reason they gate changes on tests runs from *all* projects before allowing
a commit to land to any of them. While WebKit is only a single project, the
process of requiring multiple jobs to be green is similar to WebKit needing
to support multiple platforms.

They do point out that when this system is set up, the system has to be
ultra repeatable and reliable;

> Once everything is automated, the projects stops if the automation does -
> http://openstack-ci.github.com/publications/lca2013-ci/index.html#(8)


To allow this to happen, OpenStack has managed to eliminated all flaky
tests in their suite. WebKit is not at this stage and still has a large
number tests which are both failing and/or flaky. Luckily, WebKit has much
better infrastructure for dealing with and tracking them down.

Other things they have done to try and make this process work are;

   - Like WebKit, every patch is required to have code review before being
   submitted. OpenStack requires two positive reviews before allowing a commit
   to be submitted, rather than the single one that WebKit needs.
   - Like WebKit, OpenStack has an "early warning system" which runs all
   tests as soon as a patch is submitted.

The complete OpenStack test suite takes around ~1 hour to run, but as they
have more than 1 event per hour their landing system needs pipelining. They
have developed a system called Zuul to make this happen. Before they had
this pipeline process, committing was taking many hours to land.

You can see their currently running system at
http://zuul.openstack.org/ and find
out more about Zuul at the following locations;

> Zuul: a Pipelining Trunk Gating System
> http://amo-probos.org/post/14

http://mirror.linux.org.au/linux.conf.au/2013/ogv/OpenStack_Zuul.ogv


I guess this is something we should discuss further.

Tim 'mithro' Ansell
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev