On 07/17/2012 08:41 PM, Rob Lanphier wrote:
> It would appear from reading this page that the only alternative to
> Gerrit that has a serious following is GitHub.  Is that the case?

We definitely need a GitHub *strategy*.  GitHub draws together tons of
open source contributors.  So we ought to address:

* pull requests.  People *will* clone our projects onto GitHub and end
up submitting pull requests there; we have to find or make tools to sync
those, or at least get notified about them and make it easy to pull them
into whatever we use. [0] [1]
* discoverability.  Having a presence on GitHub gets us publicity to a
lot of potential contributors.
* reputation.  People on GitHub want credit, in their system, for their
commits.  It'd help us to give them that somehow.

But I have a lot of reservations about using GitHub as our primary
source control and code review platform.  There's the free-as-in-freedom
issue, of course, but I'm also concerned about flexibility, account
management, fragmentation of community and duplication of tools, and
their terms of service.

== Flexibility ==
I see GitHub as kind of like a Mac.  It has a nice UI for the use case
that its creators envision.  It's fine for personal use.  And if we try
it, everything'll be great.... until we smack into an invisible brick
wall.  We'll want to work around one little thing, the way that we sneak
around various issues in Gerrit, with hacks and searches and upgrades,
if it's not in GitHub's web UI or API [3], we'll be stuck.

Right now we have our primary Git repo on our own machines, which is the
ultimate backdoor. The way we have been modifying our tools, automating
certain kinds of commits (like with l10n-bot), troubleshooting by
looking at our logfiles, and generally customizing things to suit our
weird needs -- GitHub is closed source and won't let us do that.  We are
not the typical use case for GitHub.  Since we have hundreds of
extensions, each with their own repository, we would have way more
repositories and members than almost any other organization on there.
So, one example: arbitrary sortability of lists of repositories.  We
could mod Gerrit to do it, but not GitHub.  How would we centralize and
list the repositories so they're easy to browse, search, edit, follow,
and watch them together?  It looks like GitHub's less suitable for that,
but I'd welcome examples of orgs that create their own sub-GitHub hubs.

The WMF used to host the MediaWiki source code on SourceForge, about 8
years ago.  We switched away for a number of reasons -- because
SourceForge was not robust and reliable enough for our needs (extended
downtime led to the actual switchover), because it didn't give us enough
flexibility and customization, and because we couldn't get the data we
wanted out of the host.

We could swap Greasemonkey scripts and the like to do a little personal
UI customization on GitHub, but we could not make improvements or share
them. With Gerrit, we've already begun forming some friendships with the
development team and have contributed several small patches back
upstream. Plus, Gerrit will provide a plugin/extension interface
(starting with the next version, 2.5) which will allow us to further
tweak it to our needs.  But we would not be able to do that with GitHub.
I can't see us actually hosting our deployment branches on GitHub; a
scenario in which we do not control the access to that is
*unacceptable*.  And the more frequently we want to deploy, and the more
entangled our source control gets into our deployment infrastructure,
the more of a pain it'll be to have our source control someplace we
can't tweak or totally trust.

== Accounts ==
By using GitHub, we would no longer be managing the user accounts. This
would make single sign-on with other Wikimedia services (especially
Labs) completely impossible.

I mentioned above that GitHub seems more meant for single FLOSS projects
than for confederations of related repositories. GitHub does not have
the concept of "groups," so granting access to collections of repos
would be a time-consuming process. GitHub does not support branch-level
permissions, either (it encourages "forking" and then merging back to
master), and that does not seem as suitable for long-term collaborative
branches.

Gerrit's Terms of Service (more on that below) requires people to use
their "real" (wallet) names. Our community has many members who value
their privacy, and we currently allow them to use their pseudonyms.
(Since we control our registration process for Developer Access, we can
ensure that users are who they claim to be, to our standards.)

== Duplication of tools, fragmentation of community ==
We don't want to fragment our communication EVEN MORE.  GitHub wikis and
bug management aren't such a big deal since we can probably disable
those.  But messaging and notification .... "oh, did you say that on
GitHub? We didn't see that there."  That's already a big enough
headache, with Bugzilla and all the mailing lists and IRC channels and
talk pages and and and.  :-)

== The Terms of Service ==
GitHub's ToS/Security/Privacy policies[2] pose a few problems for our needs.

One is that people under 13 can't sign up.  I do not want to limit our
community that way.

Another is: "You may not duplicate, copy, or reuse any portion of the
HTML/CSS, Javascript, or visual design elements or concepts without
express written permission from GitHub."  Do we really want to get into
a possible situation where we have noticed a design concept or cool use
of JS on GitHub but don't feel okay reusing it in our personal or
professional projects?

And, considering our level of activity, check out this clause: "If your
bandwidth usage significantly exceeds the average bandwidth usage (as
determined solely by GitHub) of other GitHub customers, we reserve the
right to immediately disable your account or throttle your file hosting
until you can reduce your bandwidth consumption."  We simply cannot
afford to have GitHub disable our access with no notice.

== A couple open questions ==
* What's the FLOSS project on GitHub that's most like us, in terms of
size, number of unique repositories, privacy concerns, robustness needs,
and so on?  How are they dealing with these issues?
* What does GitHub Enterprise buy us?  Which of these issues would that fix?

Basically, I'm thinking, let's not put so many of our eggs in the GitHub
basket.  GitHub is fine for FLOSS projects with fewer than a hundred
repositories, ones that don't already have several communications
channels, ones where privacy is less of a concern, or ones that don't
run the sixth biggest website in the world practically right off trunk.
 But we have and will have so many strange, unforeseen needs that we
should keep certain key operations on servers that we run and can hack
at will.

We do need a GitHub strategy -- to make our projects more discoverable,
make use of more contributions, and participate in the GitHub
reputational economy.  So we must figure out the right ways to mirror
and sync.  But I doubt our own long-term needs would work well with
using GitHub as our main platform.

[0] https://bugzilla.wikimedia.org/show_bug.cgi?id=38196
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=35497
[2] https://github.com/site/terms
[3] http://developer.github.com/


(Thanks to Chad and RobLa for talking through much of this with me.)

-- 
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to