Re: [webkit-dev] Gated trunk, experiences from OpenStack
testr is already a great way to record test runs and collect stats on them. I'd really love it if we could contribute to testr some of our flakyness stuff. Flakyness plagues all projects and having awesome tools would help a lot of people, not just us. Tim On 6 February 2013 12:55, Dirk Pranke wrote: > On Tue, Feb 5, 2013 at 3:34 PM, Tim Ansell wrote: > > On 6 February 2013 07:17, Dirk Pranke wrote: > >> > >> On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson > >> wrote: > >> > On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth wrote: > >> >> Do you know how they got rid of flakiness in their tests? We've > spent > >> >> a bunch of effort fixing flaky tests (and in marking the remaining > >> >> flaky tests as flaky), but there's still a long tail of flakiness. I > >> >> wonder if that sort of thing might be different for OpenStack if they > >> >> have a different approach to testing than we do. > > > > > > From what I can see they have a pretty similar goal to us. I personally > > don't know where our test flakyness comes from, so can't really comment > on > > how we could fix it. > > > >> > >> > > >> > Another useful thing is to know the number of tests in OpenStack. > >> > WebKit has more tests than any other project I've worked on. > >> > > >> > >> There are two other related aspects that make our tests flaky: > >> > >> 1) They're very high level integration tests (mostly), which, as they > >> cover large swaths of code in each test, are much more susceptible to > >> flakiness than method-level unit tests. > > > > > > While OpenStack doesn't have anywhere near the number of integration > tests > > WebKit does, it does have large integration tests. Infact, one of their > > tests brings up a whole cloud stack and checks that you can operate the > > cluster. > > > >> > >> 2) They weren't generally written to be run in parallel, and thus we > >> often have to be concerned with system-level resource contention. > > > > > > Neither where OpenStack's originally. They made heavy use of tool called > > testr ( http://pypi.python.org/pypi/testrepository ) which has a mode to > > automatically find when two tests are interfering with each other. testr > > also has a bunch of other useful features, like only re-running tests > which > > are currently failing and keeping a database of test runs and allowing > stat > > collection. > > > > Ah, the testr isolation bisection does look interesting. I have done a > little work along those lines but haven't gotten very far. > > -- Dirk > ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Gated trunk, experiences from OpenStack
On 6 February 2013 07:17, Dirk Pranke wrote: > On Tue, Feb 5, 2013 at 9:46 AM, Martin Robinson > wrote: > > On Tue, Feb 5, 2013 at 9:28 AM, Adam Barth wrote: > >> Do you know how they got rid of flakiness in their tests? We've spent > >> a bunch of effort fixing flaky tests (and in marking the remaining > >> flaky tests as flaky), but there's still a long tail of flakiness. I > >> wonder if that sort of thing might be different for OpenStack if they > >> have a different approach to testing than we do. > >From what I can see they have a pretty similar goal to us. I personally don't know where our test flakyness comes from, so can't really comment on how we could fix it. > > > > Another useful thing is to know the number of tests in OpenStack. > > WebKit has more tests than any other project I've worked on. > > > > There are two other related aspects that make our tests flaky: > > 1) They're very high level integration tests (mostly), which, as they > cover large swaths of code in each test, are much more susceptible to > flakiness than method-level unit tests. > While OpenStack doesn't have anywhere near the number of integration tests WebKit does, it does have large integration tests. Infact, one of their tests brings up a whole cloud stack and checks that you can operate the cluster. > 2) They weren't generally written to be run in parallel, and thus we > often have to be concerned with system-level resource contention. > Neither where OpenStack's originally. They made heavy use of tool called * testr* ( http://pypi.python.org/pypi/testrepository ) which has a mode to automatically find when two tests are interfering with each other. testr also has a bunch of other useful features, like only re-running tests which are currently failing and keeping a database of test runs and allowing stat collection. We too could use testr if our tests output the subunit format. The subunit format was originally developed for python and has excellent python support so I think it should be pretty trivial to add. Tim 'mithro' Ansell ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
[webkit-dev] Gated trunk, experiences from OpenStack
Hey guys, Last week a number of the team here at Google Sydney, including myself attended Linux.conf.au 2013 conference. The conference was a blast and the hot topic this year was OpenStack, an Open Source Cloud layer. The OpenStack project has grown from being a small project to having over 500 active committers and continues to grow at a rapid pace. Both the Continuous Integration Miniconf ( http://lca2013.linux.org.au/schedule/30102/view_talk?day=monday) and main conference included talks from OpenStack leaders about how they have tried to handle this growth and I think we can learn from their successes and failures. All of the OpenStack's infrastructure is documented in the following talks http://openstack-ci.github.com/publications/ I pulled the following stats to see how comparable the projects are; OpenStack; ( http://openstack-ci.github.com/publications/lca2013-ci/index.html#(3)) - Over 500 Active Technical Contributors - As many as 200 trunk changes an hour - 18 (integrated) projects (and growing) I tried looking these up in WebKit and got the following; - ~200 active contributors - As many as ~12 trunk changes an hour - 1 project, but 7 target platforms One of the most interesting parts of OpenStack was having a "gated trunk". >From their talk; > Before each change to the OpenStack projects is merged into the main tree, > unit and integration tests are run on the change, and only if they pass, is > the change merged. We call this "gating". There is a lot of debate about the value of a gated trunk on the internet; which I'm not going to repeat here. OpenStack's experience has been that it preserves the following properties; http://openstack-ci.github.com/publications/lca2013-ci/index.html#(9) - Ensures Code Quality - Protects developers - Devs always start from working code - Protects tree - Bad code doesn't land - Egalitarian - Process is the same for everyone - Process is transparent - Process is automated These are all things that came up in Eric's "WebKit wishes" email specially the parts about having an always green tree. The egalitarian nature of the system also helps with trusting people as you *know* they can not break the tree. This system is similar to our commit queue, however nobody has privileges to bypass the queue. OpenStack has 18 projects which are all tightly integrated, for example a change in the API in one project could break another project, for this reason they gate changes on tests runs from *all* projects before allowing a commit to land to any of them. While WebKit is only a single project, the process of requiring multiple jobs to be green is similar to WebKit needing to support multiple platforms. They do point out that when this system is set up, the system has to be ultra repeatable and reliable; > Once everything is automated, the projects stops if the automation does - > http://openstack-ci.github.com/publications/lca2013-ci/index.html#(8) To allow this to happen, OpenStack has managed to eliminated all flaky tests in their suite. WebKit is not at this stage and still has a large number tests which are both failing and/or flaky. Luckily, WebKit has much better infrastructure for dealing with and tracking them down. Other things they have done to try and make this process work are; - Like WebKit, every patch is required to have code review before being submitted. OpenStack requires two positive reviews before allowing a commit to be submitted, rather than the single one that WebKit needs. - Like WebKit, OpenStack has an "early warning system" which runs all tests as soon as a patch is submitted. The complete OpenStack test suite takes around ~1 hour to run, but as they have more than 1 event per hour their landing system needs pipelining. They have developed a system called Zuul to make this happen. Before they had this pipeline process, committing was taking many hours to land. You can see their currently running system at http://zuul.openstack.org/ and find out more about Zuul at the following locations; > Zuul: a Pipelining Trunk Gating System > http://amo-probos.org/post/14 http://mirror.linux.org.au/linux.conf.au/2013/ogv/OpenStack_Zuul.ogv I guess this is something we should discuss further. Tim 'mithro' Ansell ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev