Cutting out the debian-devel and ubuntu-devel-discuss lists. I think this subthread has quite a risk of fragmenting, so I'm moving it to the one list (vcs-pkg-discuss) that seems most appropriate for this thread.
I have seen a few comments about "why not dgit?" and I realise that I have yet to reply to these. Over the years I have internalised the reasons, so have found it difficult to write down my arguments in a cogent way. I'll attempt that here now, and I welcome replies that try to convince me that I'm wrong. I have may some other ideas in my head I have failed to remember right now. I've tried hard to get this written and posted in advance of our Ubuntu Online Summit session on this topic[4]. It took me a while though, and I realise there is not much time left for participants to read and absorb this. So sorry for the short notice. For clarity, I'll call our system "usdi" as we don't have a better name for it yet. And one fundamental point of difference appears to be whether the git tree is "quilt popped" (usdi) or "quilt pushed with .pc removed" (dgit). Note that an "Ubuntu merge" (our primary use case) isn't the same as a git merge. It's more like a git rebase, but formerly without the benefit of a VCS. If you're not familiar with this, see [1] for details. Please understand that we did not set off trying to solve all use cases at once. Our goal in the design of this was not "one git to rule them all". It was more "how can we use git to solve our immediate problem?" My first effort[1] did not even have an automatically imported git tree. I did it manually from source packages. # A technical summary of what our importer does Our imports appear in regular git repositories (not special remotes); one per source package. The importer (and only the importer) maintains one branch pointer per distribution and series. These branch pointers fast forward, so users can follow them easily. When the importer runs for a given source package, it checks for any newer source uploads, imports their trees and updates the distribution+series branch pointers. These commits are equivalent to the source packages with quilt patches popped and with no .pc directory. This isn't enforced on other commits pushed by uploaders (see below) but it wouldn't make sense do so anyway. An uploader can choose to push to the git repository before the importer runs against a corresponding upload. The uploader should push just a tag in the form "upload/<version>" to indicate to the importer that this is a set of logical commits that can represent the upload (instead of having to fall back on a wholesale import in one commit). The uploader should not update any branch pointers. When the importer sees a new published source version, it will find and use this tag if it passes sanity checks. The updated distribution+series branch pointers will then have the "upload/<version>" commit in their histories as appropriate. You can run the importer locally. If you run it against some current remote tree (perhaps in future the "official" tree), then you will end up with the same new imports as some other importer run (perhaps in the future the "official" importer) with identical commit hashes. This is quite convenient during development, since you can propose a branch against a future official branch that doesn't necessarily exist yet. # Other parts of our workflow We have a bunch of documention-in-progress[2] and tooling[3] to help us manipulate the tree in common ways to help us with our use cases. I argue below why I think our model works for our use cases, so in that sense I guess that they're relevant. But you don't need to understand what we do in order to understand what we've done. # Differences from Debian Some unique challenges I think Ubuntu has over Debian: 1. Like Debian, we have NMUs (effectively). Unlike Debian, these happen all the time. Unlike Debian, we can't choose to switch a package maintainership over to a particular git model. In addition to non-git uploads from other Ubuntu developers, we also have to deal with non-git uploads from Debian, which is the primary source of commits and uploads for the package. Whatever model we choose, we have to be able to deal with the fact that the primary input from the system will not contain nicely separated logical commits. 2. Being derived from Debian, we get far more benefit from git if git understands our inheritence model. 1.0-1ubuntu1 in Ubuntu should have 1.0-1 from Debian as a parent in our git model. We may be able to get away without this, since "Ubuntu merges" are primarily a rebase-based workflow for us. However, getting the inheritance graph correct allows us to automate more and more of this. Right now, we have the rebase graft points identified for us automatically. In the future, we might even be able to create a git-based "merge-o-matic" which could do "Ubuntu merges" automatically where there are no conflicts. 3. As an Ubuntu development team, we need a system that presents a unified view of all the packages we look after for both Ubuntu and Debian. I don't want new starters on our team to have to be trained on five different VCS mechanisms with five different decision trees on "how to do an Ubuntu merge". Whether the Debian maintainers for a particular package are using gbp, dgit, git-dpm, on source package version 1.0, 3.0 or using quilt, cdbs, or dpatch, we need to be able to cope with it. Ideally our workflow will be the same for everything. Some of this is unavoidable. If adding a patch, then the developer will have to understand the patch system in use. But it's no good to us to have to say "sorry, you can't use our git trees then". One might call point 3 "universal applicability". I've heard people say about our system "if it doesn't work with dgit it's no use to me". This may be true, but note that the opposite is not. Our system works whether an upstream maintainer uses dgit or not. Some history (individual commits over what was uploaded) may be lost, but theoretically I think even that would be automatically recoverable should we spend the time on writing that feature. It doesn't need any infrastructure support, or need us to push to a particular place, or need an "official tree"; developers can use the system regardless of any of these factors. # quilt patches popped For our "Ubuntu merge" use case, I think makes sense to store into git with quilt patches popped, over any kind of quilt patches pushed model. This is because: 0. Before I begin, I realise that most of my arguments here can be mitigated with additional client-side tooling since all of the data required is present. However we developed this process incrementally. We increased our productivity from day one, even without an importer. I think it's a useful property of any data model that specialised tooling isn't required in order to make use of it. 1. Removing .pc breaks quilt. Going with my incremental theme again, our import format does not break this; users do not have to learn any additional tools. dgit punts on this with "If you want to manipulate the patch stack you probably want to be looking at tools like git-dpm". But git-dpm seems to assume a model where the maintainer team collectively decides to maintain a package in VCS with it. See point 1 under "Differences from Debian" above. I have yet to see anyone using a git-dpm model in Ubuntu where the Debian maintainer team is separate and not using git-dpm in Debian. 2. Using "git diff" against two tags will present quilt patches added or removed at least twice (three if .pc wasn't removed). This makes for a bunch of unnecessary diff noise. 3. Cherry-picking commits that add or drop quilt patches become difficult, and makes a mess of the .pc directory, again breaking quilt. And how do we deal with conflicts? Admittedly the same might be said about a popped format. Conflicts would then get deferred until the developer tries to "quilt push" to make sure that all patches still apply, or get an upload rejected because they don't. I find it easier to deal with them at the end, however. In Ubuntu, I typically deal with juggling the logical changes first. When done, I can fix up the quilt patches as needed using traditional methods. I find this easier than having to deal with quilt patch conflicts at the same time as juggling the high level items during a rebase (a very common task in my team's workflow). 3. IME, in Ubuntu it is rare to have to deal with anything but trivial changes to quilt patches. Mostly quilt involement is in adding or dropping entire quilt patches. At least, this is the case for most of the server team merges which was our primary use case. I've always found it easier to deal with quilt patches as a whole *using quilt*, even when working inside git. See points 2 and 3. 4. "dgit quilt-fixup" becomes unnecessary. I think its existence is a symptom of dgit's model being suboptimal. With our quilt popped model, what the developer sees is identical to the developer's mental model of a source package that isn't using a VCS, just like a regular software development source tree in a VCS is identical to the developer's mental model of that source tree without VCS. 5. No magical commits from dgit's client tooling. I don't want to see extra noise as overhead interpersed with my own commits. This makes rebasing a pain. 6. dgit's documentation says that "packing up a tree in `3.0 (quilt)' and then unpacking it does not always yield the same tree". This cannot happen if trees are always stored with quilt popped. AIUI, dgit requires fixups to make this the case before a push, so perhaps this does not impact the choice of data model. However, I think think it is a further illustration that the data model itself is wrong as a demonstration of it being denormalised. There are two forms of normalisation happening here: a) what "dgit quilt-fixup" does; b) that quilt patches are stored pushed. What I'm saying is that the necessity of a) shows that b) is not really a normal form. Given the above, I don't think it makes sense, for our use cases at least, to store trees hacked to push quilt patches and delete .pc. On the contrary I think it makes more sense to store the canonical form of the tree (at least, I argue that the canonical form of the tree is the source package unpacked but without interpretation). It is always possible to add client-side tooling if you'd prefer to see that tree in some other form. And since the quilt-popped form is normal and git hashes are reproducible, this can be done consistently on the client side. I understand that there are two sides to this. Essentially what I'm saying is that in our team's use cases, the downsides to doing it our way never come up, whereas the downsides to doing it your say always come up. I also realise that this canonical form may make things tricky when you want to store additional information and not lose the commit hashes associated with these. I don't think this is unsurmountable though. Our importer does this by picking up and trusting the uploader's push if the trees match. The necessary state is maintained in the tree maintained by the importer (but pushed to by developers). # dgit vs. usdi: where next? On Tue, Nov 15, 2016 at 01:14:01AM +0000, Colin Watson wrote: > not to mention the duplicated effort I don't think we duplicated effort. Every step we took made us incrementally more efficient. The key things that we have spent time on do not appear to be things that dgit currently does. Perhaps we're trying to solve a different problem. It may appear that we could have spent the same amount of effort on dgit instead, fulfilled our use case inside dgit, and given you one thing to implement inside Launchpad that does everything. But I don't think this is true. Most of the problems we solved are not things that dgit currently addresses. We would have faced additional overhead trying to resolve the impedance mismatch between dgit's model and the obvious model for our use case. One thing I've wondered, and I think is worth exploring, is if our model could be used to present a dgit special remote, or vice versa. Both formats are in some normal form, so I think this should theoretically be possible. [1] https://lists.ubuntu.com/archives/ubuntu-devel/2014-August/038418.html [2] https://wiki.ubuntu.com/UbuntuDevelopment/Merging/GitWorkflow [3] https://code.launchpad.net/~usd-import-team/usd-importer/+git/usd-importer [now includes the client tools as well as the importer] [4] http://summit.ubuntu.com/uos-1611/meeting/22710/git-based-merge-workflow/
signature.asc
Description: PGP signature
_______________________________________________ vcs-pkg-discuss mailing list vcs-pkg-discuss@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/vcs-pkg-discuss