usd-import beta2!

Nish Aravamudan Thu, 02 Jun 2016 18:32:43 -0700

[Adapted from an e-mail I sent earlier today to explain the latest
version of the tooling, which felt like something I should broadcast
more publicly.]


Couple of quick links first:

The importer and helper tools live at:
https://code.launchpad.net/~usd-import-team/usd-importer/+git/usd-importer

git-dsc-commit, git-reconstruct-changelog and git-merge-changelogs (as
well as xgit) are at:
https://github.com/basak/ubuntu-git-tools
[I sent a PR today to pull in the --tree-only mode for git-dsc-commit
which is needed for the importer]

The imported trees temporarily are at:
https://code.launchpad.net/~usd-import-team/+git

I say temporarily because once we feel confident in the utility and
correctness of the importer, we'll move them to
https://code.launchpad.net/~ubuntu-server-dev/+git

On to some details:

The first problem we were trying to solve was how to make merges
consistent across the server team, and, ideally easy to review for
sponsors/uploaders. Robie Basak and others came up with a git-based
workflow (documented roughly at the above github URL, and I plan on
making a wiki.ubuntu.com entry tomorrow for the same with details and
examples, including much of this e-mail). This workflow uses git (the
only addition to 'stock' git are a few commands (git-dsc-commit,
git-merge-changelogs, git-reconstruct-changelog) from
git://github.com/basak/ubuntu-git-tools.git) to effectively rebase the
ubuntu tip onto the latest debian tip. That presumed you spent the time
yourself to create a repository with commits representing those tips.
The goal for the importer is to have canonical (little c) location for
such commits to live, per-package.

Scott Moser then asked if we could import the entire history for a
package, because he wanted to `git-blame` a give file and get useful
data. So we extended the algorithm (that Robie designed) to be more
flexible. After hitting many corner-cases during implementation, we
scrapped our complicated algorithm that produced clean trees for a clean
algorithm that produces complicated trees.

In essence, the algorithm looks at Launchpad's publishing history for
versions it hasn't seen before (which if an empty or no local repository
is specified and you aren't cloning an existing repository will be all
of them). For each such version, it uses
`pull-lp-source`/`pull-debian-source` equivalents and `git-dsc-commit`
to import them into the git repository. Technically it uses a
lower-level command then a proper commit (`git write-tree` and `git
commit-tree`), so that we can get the imported tree, examine it and find
it's parents and then commit it. The parents for an imported tree are at
most 2:
    1) The last version imported into the same series/pocket, with some
    knowledge of how to establish a new series/pocket.
    2) The last debian/changelog entry (using debian/changelog from the
    just-imported tree) that was successfully imported.
We call these the 'publishing parent' and 'changelog parent'
respectively. Now, if we can't find either of these, we will orphan the
import and if we only find one, we'll use what we have. But the
resulting tree looks like many git-merges, even where there are not
ubuntu-merges. This is correct by the algorithm but can be confusing to
the original `git-rebase` workflow, because rebase will try to replay
the git-merges and that doesn't work.

So we have created a simple helper script (also in the same repositry as
the importer) entitled (for now) 'usd-import-reconstruct-merge' which
can take a usd-import'd tree and a commitish and attempts to give you a
reconstructed sequence of linear commits that represent the same state
as the commitish. It does this by using merge-base to figure out the
common ancestor (it assumes onto is debian/sid but it also accepts a
parameter) and then playing back the debian/changelog looking for
imported tags. It then cherry-picks those tags from oldest to newest
against the merge-base and does a quick sanity check that the resulting
commit does not differ from the original one.

It tags that as reconstruct/<version> which sort of conflicts with the
git-based workflow some have been used. In that workflow,
reconstruct/<version> is the broken-down sequence of changes including
changelog and metadata, where each commit is a single change from the
changelog. I have taken to now calling that pointer
deconstruct/<version> because it's clever.

So you would break down reconstruct/<version> into its constituent
changes (per debian/changelog for each version) and tag the resulting
end-commit as deconstruct/<version>.

That can then be broken up by our workflow into a logical/<version> tag
and then rebased onto debian/sid (or the specified onto).

Some last thoughts:

a) SRU/maintenance might become easier because you can cherry-pick
commits around to the right head, build the source package
(dpkg-buildpackage -S) from the tree directly, etc.

b) The resulting gitk graph really demonstrates what Ubuntu is, for a
specific package. And it clearly indicates when a package has not been
merged recently and when it has.

Feel free to respond with any questions or feedback!
 
-- 
Nishanth Aravamudan
Ubuntu Server
Canonical Ltd

-- 
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

usd-import beta2!

Reply via email to