[Adapted from an e-mail I sent earlier today to explain the latest version of the tooling, which felt like something I should broadcast more publicly.]
Couple of quick links first: The importer and helper tools live at: https://code.launchpad.net/~usd-import-team/usd-importer/+git/usd-importer git-dsc-commit, git-reconstruct-changelog and git-merge-changelogs (as well as xgit) are at: https://github.com/basak/ubuntu-git-tools [I sent a PR today to pull in the --tree-only mode for git-dsc-commit which is needed for the importer] The imported trees temporarily are at: https://code.launchpad.net/~usd-import-team/+git I say temporarily because once we feel confident in the utility and correctness of the importer, we'll move them to https://code.launchpad.net/~ubuntu-server-dev/+git On to some details: The first problem we were trying to solve was how to make merges consistent across the server team, and, ideally easy to review for sponsors/uploaders. Robie Basak and others came up with a git-based workflow (documented roughly at the above github URL, and I plan on making a wiki.ubuntu.com entry tomorrow for the same with details and examples, including much of this e-mail). This workflow uses git (the only addition to 'stock' git are a few commands (git-dsc-commit, git-merge-changelogs, git-reconstruct-changelog) from git://github.com/basak/ubuntu-git-tools.git) to effectively rebase the ubuntu tip onto the latest debian tip. That presumed you spent the time yourself to create a repository with commits representing those tips. The goal for the importer is to have canonical (little c) location for such commits to live, per-package. Scott Moser then asked if we could import the entire history for a package, because he wanted to `git-blame` a give file and get useful data. So we extended the algorithm (that Robie designed) to be more flexible. After hitting many corner-cases during implementation, we scrapped our complicated algorithm that produced clean trees for a clean algorithm that produces complicated trees. In essence, the algorithm looks at Launchpad's publishing history for versions it hasn't seen before (which if an empty or no local repository is specified and you aren't cloning an existing repository will be all of them). For each such version, it uses `pull-lp-source`/`pull-debian-source` equivalents and `git-dsc-commit` to import them into the git repository. Technically it uses a lower-level command then a proper commit (`git write-tree` and `git commit-tree`), so that we can get the imported tree, examine it and find it's parents and then commit it. The parents for an imported tree are at most 2: 1) The last version imported into the same series/pocket, with some knowledge of how to establish a new series/pocket. 2) The last debian/changelog entry (using debian/changelog from the just-imported tree) that was successfully imported. We call these the 'publishing parent' and 'changelog parent' respectively. Now, if we can't find either of these, we will orphan the import and if we only find one, we'll use what we have. But the resulting tree looks like many git-merges, even where there are not ubuntu-merges. This is correct by the algorithm but can be confusing to the original `git-rebase` workflow, because rebase will try to replay the git-merges and that doesn't work. So we have created a simple helper script (also in the same repositry as the importer) entitled (for now) 'usd-import-reconstruct-merge' which can take a usd-import'd tree and a commitish and attempts to give you a reconstructed sequence of linear commits that represent the same state as the commitish. It does this by using merge-base to figure out the common ancestor (it assumes onto is debian/sid but it also accepts a parameter) and then playing back the debian/changelog looking for imported tags. It then cherry-picks those tags from oldest to newest against the merge-base and does a quick sanity check that the resulting commit does not differ from the original one. It tags that as reconstruct/<version> which sort of conflicts with the git-based workflow some have been used. In that workflow, reconstruct/<version> is the broken-down sequence of changes including changelog and metadata, where each commit is a single change from the changelog. I have taken to now calling that pointer deconstruct/<version> because it's clever. So you would break down reconstruct/<version> into its constituent changes (per debian/changelog for each version) and tag the resulting end-commit as deconstruct/<version>. That can then be broken up by our workflow into a logical/<version> tag and then rebased onto debian/sid (or the specified onto). Some last thoughts: a) SRU/maintenance might become easier because you can cherry-pick commits around to the right head, build the source package (dpkg-buildpackage -S) from the tree directly, etc. b) The resulting gitk graph really demonstrates what Ubuntu is, for a specific package. And it clearly indicates when a package has not been merged recently and when it has. Feel free to respond with any questions or feedback! -- Nishanth Aravamudan Ubuntu Server Canonical Ltd -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam