On Mon, Mar 2, 2009 at 7:57 AM, Daniel Spiewak <[email protected]> wrote:
> Welcome back from lurk-mode! :-) > > I think this is an interesting issue beyond just Git vs SVN and how the > project is hosted. As Assaf said, the Git repository is still synced to > the > SVN (and vice versa), so there isn't any real side-stepping going on. In > fact, committers could use SVN directly if they so choose (without any > negative impact on an SVN-based workflow), it's merely personal taste which > drives everyone to Git. The interesting point though is that while in > incubation, Vic's GitHub repo really became the "unofficially canonical" > one. While Buildr's site did point everyone to the SVN *first*, the > culture > was such that Git was really the convention/standard across the board. > That's not to say that SVN was discouraged in any way, it just wasn't used > (except as the main store-point once commits were made). > > The larger issue here is what does it mean to be the "primary" source when > all sources give the same artifacts? Is it the repository officially > recommended by the project? Is it the repository decided by the "wisdom of > the masses" in the development team? As solutions like Git, Mercurial and > Bazaar catch on, I think we're going to see more and more projects raising > this issue: what does it mean to have multiple "canonical" sources and when > does it really cause problems? Apache has two interesting principles. The first, you need a "licensed and reviewed contributions" repository. The repository blessed by Apache so to speak, from which you cut releases. There's a lot of wisdom of the years insisting on having this repository, and also insisting that it run on Apache hardware (meaning, guarded and monitored by Apache). The second, all development have to be done in the open, that way everyone can participate. It's the open development part of open source. (The context here is the project and its community, not anything you do downstream.) Once code has been brought to our attention -- JIRA patch, mailing list discussions, etc -- we continue working on it in a public forum with as much visibility as possible. In the SVN model this job is handled by the master repository, but what would we do if there was only Git? You will still have one "licensed and reviewed contributions" Git repository, and it will still be hosted on Apache hardware, and it will still be only writable to committers. Again all those wisdoms of the years will take us there. The difference is, you can also have any number of perfectly synchronized clones, so development can happen elsewhere. Now we get into the open development question. If development can happen anywhere, how do I keep track of all the places where it happens? And where do I find one place where it happens, so I can start tracking it? You need to have at least one place that everyone can point to, a common ground, and it needs to have certain guarantees: be an accurate clone of the contribution repo, get synchronized quickly enough, not be the weakest link (security wise), etc. It doesn't have to be a single place, but having too many places could be a problem. Where do you start? Are they all as well maintained? Will they last long enough to be permalinks? The second question is, once code has been brought to our attention, what places do we have to accommodate it before it ends up in the contribution repo. Our focus here is for everyone to be able to follow changes to that code, say as a result of discussion on the mailing list, and for committers to be able to pull it in as a contribution. This should be at least as good as what we have right now; for the record, what we have right now are JIRA patches. In my experience, you can have as many canonical sources as you want, but as a project you're responsible for their quality/reliability, it actually works better to have as few as possible. One must be the "licensed and reviewed contribution" repo, which Apache takes care of. For people who want to track development, view the source code, branch off, or start contributing, which repository would you point them to? Let's call this one "the town hall repo", the one place we get to socialize around code. I absolutely agree that this should be a decision for the individual project. It doesn't have to run on Apache infrastructure, as long as it follows certain guidelines (public access, restricted writes, timely updates, etc), it just has to work very well for that project. It therefore can't be Apache official repository because the ASF can't govern other people's infrastructure. But if the project agrees to supervise it for the purposes I outlined above, why can't it be the project's primary public repository? To put it in context, all mailing list discussions have to pass through Apache's servers, and Apache maintains the contributing archive. If in doubt, look there. But projects can tell people to search in other archives, like markmail or nabble. Can one of these off-site archives be the primary point of reference for the project? If so, why can't we do the same for source control? Assaf > Daniel > > On Mon, Mar 2, 2009 at 3:07 AM, Martijn Dashorst < > [email protected] > > wrote: > > > On Mon, Mar 2, 2009 at 1:34 AM, Assaf Arkin <[email protected]> wrote: > > > I'm with you in using Github as the main repository: > > > > As an ASF Member I must protest against the direction that you are > > taking this project. GitHub can not be used as the main repository for > > any ASF Project. The canonical resource for Apache project's code must > > be hosted on Apache hardware. Since the only repository that is > > supported by Infrastructure is SVN, you'll have to maintain the > > primary source for your project *in* SVN. Not somewhere else, not > > bypassing ASF authorizations, not bypassing Apache policy. > > > > Martijn > > >
