Re: git workflows for general Ubuntu development

Robie Basak Tue, 15 Nov 2016 08:26:13 -0800

Cutting out the debian-devel and ubuntu-devel-discuss lists. I think
this subthread has quite a risk of fragmenting, so I'm moving it to the
one list (vcs-pkg-discuss) that seems most appropriate for this thread.


I have seen a few comments about "why not dgit?" and I realise that I
have yet to reply to these. Over the years I have internalised the
reasons, so have found it difficult to write down my arguments in a
cogent way. I'll attempt that here now, and I welcome replies that try
to convince me that I'm wrong. I have may some other ideas in my head I
have failed to remember right now.

I've tried hard to get this written and posted in advance of our Ubuntu
Online Summit session on this topic[4]. It took me a while though, and I
realise there is not much time left for participants to read and absorb
this. So sorry for the short notice.

For clarity, I'll call our system "usdi" as we don't have a better name
for it yet. And one fundamental point of difference appears to be
whether the git tree is "quilt popped" (usdi) or "quilt pushed with .pc
removed" (dgit). Note that an "Ubuntu merge" (our primary use case)
isn't the same as a git merge. It's more like a git rebase, but formerly
without the benefit of a VCS. If you're not familiar with this, see [1]
for details.

Please understand that we did not set off trying to solve all use cases
at once. Our goal in the design of this was not "one git to rule them
all". It was more "how can we use git to solve our immediate problem?"
My first effort[1] did not even have an automatically imported git tree.
I did it manually from source packages.

# A technical summary of what our importer does

Our imports appear in regular git repositories (not special remotes);
one per source package.

The importer (and only the importer) maintains one branch pointer per
distribution and series. These branch pointers fast forward, so users
can follow them easily.

When the importer runs for a given source package, it checks for any
newer source uploads, imports their trees and updates the
distribution+series branch pointers. These commits are equivalent to the
source packages with quilt patches popped and with no .pc directory.
This isn't enforced on other commits pushed by uploaders (see below) but
it wouldn't make sense do so anyway.

An uploader can choose to push to the git repository before the importer
runs against a corresponding upload. The uploader should push just a tag
in the form "upload/<version>" to indicate to the importer that this is
a set of logical commits that can represent the upload (instead of
having to fall back on a wholesale import in one commit). The uploader
should not update any branch pointers. When the importer sees a new
published source version, it will find and use this tag if it passes
sanity checks. The updated distribution+series branch pointers will then
have the "upload/<version>" commit in their histories as appropriate.

You can run the importer locally. If you run it against some current
remote tree (perhaps in future the "official" tree), then you will end
up with the same new imports as some other importer run (perhaps in the
future the "official" importer) with identical commit hashes. This is
quite convenient during development, since you can propose a branch
against a future official branch that doesn't necessarily exist yet.

# Other parts of our workflow

We have a bunch of documention-in-progress[2] and tooling[3] to help us
manipulate the tree in common ways to help us with our use cases. I
argue below why I think our model works for our use cases, so in that
sense I guess that they're relevant. But you don't need to understand
what we do in order to understand what we've done.

# Differences from Debian

Some unique challenges I think Ubuntu has over Debian:

1. Like Debian, we have NMUs (effectively). Unlike Debian, these happen
all the time. Unlike Debian, we can't choose to switch a package
maintainership over to a particular git model. In addition to non-git
uploads from other Ubuntu developers, we also have to deal with non-git
uploads from Debian, which is the primary source of commits and uploads
for the package. Whatever model we choose, we have to be able to deal
with the fact that the primary input from the system will not contain
nicely separated logical commits.

2. Being derived from Debian, we get far more benefit from git if git
understands our inheritence model. 1.0-1ubuntu1 in Ubuntu should have
1.0-1 from Debian as a parent in our git model. We may be able to get
away without this, since "Ubuntu merges" are primarily a rebase-based
workflow for us. However, getting the inheritance graph correct allows
us to automate more and more of this. Right now, we have the rebase
graft points identified for us automatically. In the future, we might
even be able to create a git-based "merge-o-matic" which could do
"Ubuntu merges" automatically where there are no conflicts.

3. As an Ubuntu development team, we need a system that presents a
unified view of all the packages we look after for both Ubuntu and
Debian. I don't want new starters on our team to have to be trained on
five different VCS mechanisms with five different decision trees on "how
to do an Ubuntu merge". Whether the Debian maintainers for a particular
package are using gbp, dgit, git-dpm, on source package version 1.0, 3.0
or using quilt, cdbs, or dpatch, we need to be able to cope with it.
Ideally our workflow will be the same for everything. Some of this is
unavoidable. If adding a patch, then the developer will have to
understand the patch system in use. But it's no good to us to have to
say "sorry, you can't use our git trees then".

One might call point 3 "universal applicability". I've heard people say
about our system "if it doesn't work with dgit it's no use to me". This
may be true, but note that the opposite is not. Our system works whether
an upstream maintainer uses dgit or not. Some history (individual
commits over what was uploaded) may be lost, but theoretically I think
even that would be automatically recoverable should we spend the time on
writing that feature. It doesn't need any infrastructure support, or
need us to push to a particular place, or need an "official tree";
developers can use the system regardless of any of these factors.

# quilt patches popped

For our "Ubuntu merge" use case, I think makes sense to store into git
with quilt patches popped, over any kind of quilt patches pushed model.
This is because:

0. Before I begin, I realise that most of my arguments here can be
mitigated with additional client-side tooling since all of the data
required is present. However we developed this process incrementally. We
increased our productivity from day one, even without an importer. I
think it's a useful property of any data model that specialised tooling
isn't required in order to make use of it.

1. Removing .pc breaks quilt. Going with my incremental theme again, our
import format does not break this; users do not have to learn any
additional tools. dgit punts on this with "If you want to manipulate the
patch stack you probably want to be looking at tools like git-dpm". But
git-dpm seems to assume a model where the maintainer team collectively
decides to maintain a package in VCS with it. See point 1 under
"Differences from Debian" above. I have yet to see anyone using a
git-dpm model in Ubuntu where the Debian maintainer team is separate and
not using git-dpm in Debian.

2. Using "git diff" against two tags will present quilt patches added or
removed at least twice (three if .pc wasn't removed). This makes for a
bunch of unnecessary diff noise.

3. Cherry-picking commits that add or drop quilt patches become
difficult, and makes a mess of the .pc directory, again breaking quilt.
And how do we deal with conflicts? Admittedly the same might be said
about a popped format. Conflicts would then get deferred until the
developer tries to "quilt push" to make sure that all patches still
apply, or get an upload rejected because they don't. I find it easier to
deal with them at the end, however. In Ubuntu, I typically deal with
juggling the logical changes first. When done, I can fix up the quilt
patches as needed using traditional methods. I find this easier than
having to deal with quilt patch conflicts at the same time as juggling
the high level items during a rebase (a very common task in my team's
workflow).

3. IME, in Ubuntu it is rare to have to deal with anything but trivial
changes to quilt patches. Mostly quilt involement is in adding or
dropping entire quilt patches. At least, this is the case for most of
the server team merges which was our primary use case. I've always found
it easier to deal with quilt patches as a whole *using quilt*, even when
working inside git. See points 2 and 3.

4. "dgit quilt-fixup" becomes unnecessary. I think its existence is a
symptom of dgit's model being suboptimal. With our quilt popped model,
what the developer sees is identical to the developer's mental model of
a source package that isn't using a VCS, just like a regular software
development source tree in a VCS is identical to the developer's mental
model of that source tree without VCS.

5. No magical commits from dgit's client tooling. I don't want to see
extra noise as overhead interpersed with my own commits. This makes
rebasing a pain.

6. dgit's documentation says that "packing up a tree in `3.0 (quilt)'
and then unpacking it does not always yield the same tree". This cannot
happen if trees are always stored with quilt popped. AIUI, dgit requires
fixups to make this the case before a push, so perhaps this does not
impact the choice of data model. However, I think think it is a further
illustration that the data model itself is wrong as a demonstration of
it being denormalised. There are two forms of normalisation happening
here: a) what "dgit quilt-fixup" does; b) that quilt patches are stored
pushed. What I'm saying is that the necessity of a) shows that b) is not
really a normal form.

Given the above, I don't think it makes sense, for our use cases at
least, to store trees hacked to push quilt patches and delete .pc. On
the contrary I think it makes more sense to store the canonical form of
the tree (at least, I argue that the canonical form of the tree is the
source package unpacked but without interpretation). It is always
possible to add client-side tooling if you'd prefer to see that tree in
some other form. And since the quilt-popped form is normal and git
hashes are reproducible, this can be done consistently on the client
side.

I understand that there are two sides to this. Essentially what I'm
saying is that in our team's use cases, the downsides to doing it our
way never come up, whereas the downsides to doing it your say always
come up.

I also realise that this canonical form may make things tricky when you
want to store additional information and not lose the commit hashes
associated with these. I don't think this is unsurmountable though. Our
importer does this by picking up and trusting the uploader's push if the
trees match. The necessary state is maintained in the tree maintained by
the importer (but pushed to by developers).

# dgit vs. usdi: where next?

On Tue, Nov 15, 2016 at 01:14:01AM +0000, Colin Watson wrote:
> not to mention the duplicated effort

I don't think we duplicated effort. Every step we took made us
incrementally more efficient. The key things that we have spent time on
do not appear to be things that dgit currently does. Perhaps we're
trying to solve a different problem.

It may appear that we could have spent the same amount of effort on dgit
instead, fulfilled our use case inside dgit, and given you one thing to
implement inside Launchpad that does everything. But I don't think this
is true. Most of the problems we solved are not things that dgit
currently addresses. We would have faced additional overhead trying to
resolve the impedance mismatch between dgit's model and the obvious
model for our use case.

One thing I've wondered, and I think is worth exploring, is if our model
could be used to present a dgit special remote, or vice versa. Both
formats are in some normal form, so I think this should theoretically be
possible.

[1] https://lists.ubuntu.com/archives/ubuntu-devel/2014-August/038418.html
[2] https://wiki.ubuntu.com/UbuntuDevelopment/Merging/GitWorkflow
[3] https://code.launchpad.net/~usd-import-team/usd-importer/+git/usd-importer
    [now includes the client tools as well as the importer]
[4] http://summit.ubuntu.com/uos-1611/meeting/22710/git-based-merge-workflow/

signature.asc
Description: PGP signature

_______________________________________________
vcs-pkg-discuss mailing list
vcs-pkg-discuss@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/vcs-pkg-discuss

Re: git workflows for general Ubuntu development

Reply via email to