On Mon, Sep 20, 2010 at 10:09 PM, Rob Lanphier <ro...@wikimedia.org> wrote:
> This seems like a fine line of reasoning, though not one that I had
> thought was set in stone.  For earlier releases, the MediaWiki
> releases benefited from deployment being pretty close to trunk, but
> presumably the reason why that drifted was because it became
> progressively harder for us to use our production environment as the
> de facto MediaWiki testbed.

The reason why that drifted is because our review system was already
overloaded before Brion left, and completely collapsed after that,
because we failed to decentralize review properly.  Today the practice
is roughly that most employees get their code reviewed and deployed
quickly by other employees or even themselves; volunteers (and maybe
some employees) get their code reviewed by (generally) Tim whenever he
has time, which he doesn't have enough of, so their code never gets
deployed, or only once in a blue moon.  This is a terrible situation,
and we need to fix it so that all committed code is being reviewed and
deployed on a regular basis before we even consider a release, IMO.

> I'm not sure what you mean by this.  October 15 would be the branch
> point, not the release date.  Are you saying that we have to release
> to production one month before even branching off of trunk?

Yes.  There's such a huge deployment backlog that even after careful
review, there's going to be a flurry of new problems that are quickly
discovered and will have to be fixed.  I don't think it makes sense to
try backporting the inevitable flood of fixes to a separate branch.
Instead, we should wait until deployment and trunk are relatively in
sync again (we are aiming for that, right?) and then wait a while for
things to stabilize before branching.

On Tue, Sep 21, 2010 at 12:26 PM, Rob Lanphier <ro...@robla.net> wrote:
> Doesn't this kinda depend on what our priorities are and what the
> priorities of people running MediaWiki are?  There are many demands
> placed by Wikipedia that most websites don't have.  In the rest of the
> software world, high traffic websites are the *last* ones to upgrade,
> not the first.  Don't we want to get the benefit of other people using
> the software more heavily before we put it on Wikipedia?

No, because other people are in a much worse position to track down
bugs.  MediaWiki developers are mostly heavy Wikimedia users, and
Wikimedia users are much more likely to know about Bugzilla and know
where they can complain about problems.  Moreover, Wikimedia employs
(practically?) all paid MediaWiki developers.  If a third-party site
has a bunch of serious problems, its sysadmins will probably throw up
their hands and revert to an earlier version; if Wikimedia has a
problem, it's likely that it can be fixed in minutes by its employees.

Incremental deployment is a much better overall development strategy.
Back in the days when we had scaps every week or two, as soon as a
user reported a problem, we'd sometimes all say "Oh, I remember the
commit that must have caused that."  I remember one time when a user
reported a problem in #wikimedia-tech, and Brion and I had a commit
conflict due to committing the exact same fix at the same time in (I
think) something like two minutes or less -- both of us remembered the
commit that touched that problem (me because I had committed it, he
because he reviewed it) and the problem was obvious given the bug
report.  This was standard; whoever did the scap would make sure to
hang around for a few hours in #wikimedia-tech to fix any problems,
and savvy users who watched that channel would see the scap and know
to report any regressions there immediately.  There'd be only a
handful, so all of them could be fixed quickly.

Even if we didn't remember the exact commit, we'd have very few
changes to look at in the log for the relevant files before we found
the issue.  At worst, we could almost certainly just revert the
problematic commits with no conflicts.  When you have months of old
code being deployed at once, you're going to have tons of problems
crop up all at once, instead of a few at a time, and they'll be harder
to fix -- you won't remember what could have caused them, you'll have
to look over more commits to find the problem, and when you do, you
probably can't easily revert them.

Trying to use third-party sites to test the code before we deploy it
isn't feasible.  First of all, few of them will test it and fewer
still will report bugs, and that will only get worse if we release
less-tested code.  Second of all, Wikimedia will run into problems
that other sites won't, and then all the problems I discuss above are
inevitable.

I think the correct course of action is to revamp our review structure
so that we can return to the status quo ante of keeping deployment
roughly in sync with trunk.  We should aim for all commits should be
reviewed for deployment less than a week after being committed --
perhaps just immediately reverted if they're badly flawed, but still
reviewed.

Indeed, contrary to what you suggest, high-traffic websites are
usually the first and only users to deploy the software *that they
develop*.  Most such software is in fact secret, so no one else can
use it even if they wanted to.  When it is open-source, the vendor's
site is usually the first to upgrade, in my experience.
vbulletin.com/forums/ runs alphas of vBulletin before they're released
to customers, for example.  I'd be interested to know if sites like
drupal.org or phpbb.org use anything but cutting-edge versions of
their own software -- I'd bet most of them deploy betas or release
candidates, at the very least.

> I realize that this isn't how it's traditionally been done, but then
> again, I think our tradition has drifted.  Once upon a time, trunk was
> very regularly deployed in production.  Providing releases was merely
> an alternative to telling MediaWiki admins "just go checkout trunk;
> that's what we're using".  Now that we're a lot more cautious about
> what we put into production, we should question whether we still need
> to be even more cautious about what we release as MediaWiki.

I wouldn't say we're more cautious about what we put into production.
I'd say it's more like some people get their stuff put into
production, and others don't.  As far as I can tell, the difference is
mostly whether they're paid by Wikimedia.  What employees have their
code waiting in trunk for months without deployment?  What volunteers
have their code put into production on any kind of regular basis?  I
expect a few of the former exist, but a minority of employees; and I
don't think the latter category exists at all.  Correct me if I'm
wrong, please -- I never followed the deployment branch closely.  It
includes none or almost none of my changes, so I never saw a reason
to.

On Tue, Sep 21, 2010 at 1:48 PM, Guillaume Paumier
<gpaum...@wikimedia.org> wrote:
> I can see a number of reasons to have a stable trunk (also used by
> Wikimedia websites), that contains reviewed & tested code, along with a
> development branch that /can/ be broken:
> * Developers wouldn't be afraid to commit unfinished work to the
> development branch fearing they're going to break trunk.
> * Tarballs for non-Wikimedia MediaWiki users would be more stable.
> * Updates to Wikimedia sites would happen more often.
> * Getting to a release would be easier, since it would be the result of
> many incremental changes already merged into the stable trunk.
> * Wikimedia users would probably not mind encountering small bugs &
> quirks if it's the downside of benefiting from more regular code
> updates.
>
> That said, I guess there are obvious drawbacks I'm not seeing.

The problem isn't the policy for committing to various places.  The
problem is review and deployment procedures.

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to