Reproducible Builds for recent Debian security updates
Philipp Kern asked about trying to do reproducible builds checks for recent security updates to try to gain confidence about Debian's buildd infrastructure, given that they run builds in sid chroots which may have used or built or run a vulnerable xz-utils... So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive. I only tested bookworm security updates (not bullseye), and I tested the xz-utils update now present in unstable, which took a little trial and error to find the right snapshot! The build dependencies for Debian bookworm (a.k.a. stable) were *much* easier to satisfy, as it is not a moving target! Debian bookworm security updates verified: cacti iwd libuv1 pdns-recursor samba composer fontforge knot-resolver php-dompdf-svg-lib squid yard Not yet finished building: openvswitch Did not yet try some time and disk-intensive builds: chromium firefox-esr thunderbird Debian unstable updates verified: xz-utils A tarball of build logs (including some failed builds) and .buildinfo files is available at: https://people.debian.org/~vagrant/debian-security-rebuilds.tar.zst Some caveats: Notably, xz-utils has a build dependency that pulls in xz-utils, and the version used may have been a vulnerable version (partly vulnerable?), 5.6.0-0.2. The machine where I ran the builds had done some builds using packages from sid over the last couple months, so may have at some point run the vulnerable xz-utils code, so is not absolutely cleanest of checks... but is at least some sort of data point. The build environment used tarballs that had usrmerge applied (as it is harder to not apply usrmerge these days), while the buildd infrastructure chroots do not have usrmerge applied. But this did not appear to cause significant problems, although pulled in a few more perl dependencies! I used sbuild with the --chroot-mode=unshare mode. For the xz-utils build I used some of the ideas developed in an earlier verification builds experiment: https://salsa.debian.org/reproducible-builds/debian-verification-build-experiment/-/blob/e003ddf19de13db2d512c25417e4bec863c3a082/sbuild-wrap#L71 Was great to try and apply Reproducible Builds to real-world uses! live well, vagrant signature.asc Description: PGP signature
Re: Arch Linux minimal container userland 100% reproducible - now what?
John Gilmore: > kpcyrd wrote: >> 1) There's currently no way to tell if a package can be built offline >> (without trying yourself). > > Packages that can't be built offline are not reproducible, by > definition. They depend on outside events and circumstances > in order for a third party to reproduce them successfully. > > So, fixing that in each package would be a prerequisite to making a > reproducible Arch distro (in my opinion). I don't agree. For example the r-b.o [1] definition doesn't mandate who needs to archive what. We probably can agree that we mean a "verifiable path from source to binary code" (and not just repeatability, which is also sometimes meant by reproducible builds in other contexts), but beyond that the details and motivations will be different depending on who you ask. To be clear I don't say what you like to see is not worthwhile. Actually I'm very sympathic to such archiving goals. But if Arch Linux, as kpcyrd's mails suggest, right now just want to verify their builder output soon-ish after upload that's fine too and can be called reproducible, in my opinion. [1]: https://reproducible-builds.org/docs/definition/ > I don't understand why a "source tree" would store a checksum of a > source tarball or source file, rather than storing the actual source > tarball or source file. You can't compile a checksum. How distros store their source code is different, due to different needs, historic circumstances, etc.. And the approach of just having the packaging definition and patches and then referring the "original" source is common and I certainly see the advantages. > kpcyrd wrote: >> Specifically Gentoo and OpenBSD Ports have solutions for this that I >> really like, they store a generated list of URLs along with a >> cryptographic checksum in a separate file, which includes crates >> referenced in e.g. a project's Cargo.lock. > > I don't know what a crate or a Cargo.lock is, It's Cargo's (Rust' package/dependency manager) way to pin specific dependencies, including hashes of those. > but rather than fix the problem at its source (include the source > files), you propose to add another complex circumvention alongside the > existing package building infrastructure? What is the advantage of > that over merely doing the "cargo fetch" early rather than late and > putting all the resulting source files into the Arch source package? I'm not an Arch developer, but probably because a package source repo like [2] is much easier for them to handle than if they would commit the source of all (transitive) dependencies [3]. [2]: https://gitlab.archlinux.org/archlinux/packaging/packages/rage-encryption [3]: https://github.com/str4d/rage/blob/v0.10.0/Cargo.lock (Note that Arch made the, for a classic Linux distro currently rather unusual, decision to build Rust programs with the exact dependencies upstream has defined and not separately package those libraries.) >> 3) All of this doesn't take BUILDINFO files into account > > The BUILDINFO files are part of the source distribution needed > to reproduce the binary distribution. So they would go on the > source ISO image. > >> I did some digging and downloaded the buildinfo files for each package >> that is present in the archlinux-2024.03.01 iso > > Thank you for doing that digging! > >> Using plenty of different gcc versions looks >> annoying, but is only an issue for bootstrapping, not for reproducible >> builds (as long as everything is fully documented). > > I agree that it's annoying. It compounds the complexity of reproducing > the build. Does Arch get some benefit from doing so? > > Ideally, a binary release ISO would be built with a single set of > compiler tools. Why is Arch using a dozen compiler versions? Just to > avoid rebuilding binary packages once the binary release's engineers > decide what compiler is going to be this release's gold-standard > compiler? (E.g. The one that gets installed when the user runs pacman > to install gcc.) Or do the release-engineers never actually standardize > on a compiler -- perhaps new ones get thrown onto some server whenever > someone likes, and suddenly all the users who install a compiler just > start using that one? If you look at classic Linux distros it's the norm to iteratively add packages to your repo and build new packages with what is in the (development) repo at this time. So a single snapshot of a repo will in nearly all cases not contain all versions to reproduce the packages in that snapshot. You will find this in Arch, Debian, Fedora, Some other distros like Yocto, might make different decisions, but those are rather the exception. > It currently seems that there is no guarantee that on day X, if you > install gcc on Arch (from the Internet) and on the same day you pull in > the source code of pacman package Y, that it will even build with the > Day X version of gcc. Is that true? As described above for the
Re: Arch Linux minimal container userland 100% reproducible - now what?
kpcyrd wrote: > 1) There's currently no way to tell if a package can be built offline > (without trying yourself). Packages that can't be built offline are not reproducible, by definition. They depend on outside events and circumstances in order for a third party to reproduce them successfully. So, fixing that in each package would be a prerequisite to making a reproducible Arch distro (in my opinion). I don't understand why a "source tree" would store a checksum of a source tarball or source file, rather than storing the actual source tarball or source file. You can't compile a checksum. kpcyrd wrote: > Specifically Gentoo and OpenBSD Ports have solutions for this that I > really like, they store a generated list of URLs along with a > cryptographic checksum in a separate file, which includes crates > referenced in e.g. a project's Cargo.lock. I don't know what a crate or a Cargo.lock is, but rather than fix the problem at its source (include the source files), you propose to add another complex circumvention alongside the existing package building infrastructure? What is the advantage of that over merely doing the "cargo fetch" early rather than late and putting all the resulting source files into the Arch source package? > 3) All of this doesn't take BUILDINFO files into account The BUILDINFO files are part of the source distribution needed to reproduce the binary distribution. So they would go on the source ISO image. > I did some digging and downloaded the buildinfo files for each package > that is present in the archlinux-2024.03.01 iso Thank you for doing that digging! > Using plenty of different gcc versions looks > annoying, but is only an issue for bootstrapping, not for reproducible > builds (as long as everything is fully documented). I agree that it's annoying. It compounds the complexity of reproducing the build. Does Arch get some benefit from doing so? Ideally, a binary release ISO would be built with a single set of compiler tools. Why is Arch using a dozen compiler versions? Just to avoid rebuilding binary packages once the binary release's engineers decide what compiler is going to be this release's gold-standard compiler? (E.g. The one that gets installed when the user runs pacman to install gcc.) Or do the release-engineers never actually standardize on a compiler -- perhaps new ones get thrown onto some server whenever someone likes, and suddenly all the users who install a compiler just start using that one? It currently seems that there is no guarantee that on day X, if you install gcc on Arch (from the Internet) and on the same day you pull in the source code of pacman package Y, that it will even build with the Day X version of gcc. Is that true? John
Re: Arch Linux minimal container userland 100% reproducible - now what?
On 3/29/24 6:48 AM, John Gilmore wrote: John Gilmore wrote: Bootstrappable builds are a different thing. Worthwhile, but not what I was asking for. I just wanted provable reproducibility from two ISO images and nothing more. I was asking that a bare amd64 be able to boot from an Arch Linux *binary* ISO image. And then be fed a matching Arch Linux *source* ISO image. And that the scripts in the source image would be able to reproduce the binary image from its source code, running the binaries (like the kernel, shell, and compiler) from the binary ISO image to do the rebuilds (without Internet access). This should be much simpler than doing a bootstrap from bare metal *without* a binary ISO image. I think this project would still be somewhat involved: 1) There's currently no way to tell if a package can be built offline (without trying yourself). Some distros have `options=(!net)`-like settings, but pacman currently doesn't. Needing network access for things like `cargo fetch` or `go mod download` is considered acceptable in Arch Linux, since these extra inputs are pinned by cryptographic hash (the PKGBUILD acts as a merkle-tree root). Specifically Gentoo and OpenBSD Ports have solutions for this that I really like, they store a generated list of URLs along with a cryptographic checksum in a separate file, which includes crates referenced in e.g. a project's Cargo.lock. When unpacking them to the right location the build itself does not need any additional network resources and can run fully offline. This concept currently does not exist in pacman, one would potentially need to generate 100+ lines into the source= array of a PKGBUILD (and another 200+ lines for checksums if 2 checksum algorithms are used). This is currently considered bad style, because the PKGBUILD is supposed to be short, simple and easy to read/understand/audit. 2) The official ISO is meant for installation and maintenance, but does not contain a compiler, and I'm not sure it should. Many of the other base-devel packages are also missing, but since you also need the build dependencies of all the packages you're using (recursively?) this should likely be its own ISO (at which point you could also include the source code however). 3) All of this doesn't take BUILDINFO files into account, you can use Arch Linux as a source-based distro, but if you want exact matches with the official packages you would need to match the compiler version that was used for each respective package. I did some digging and downloaded the buildinfo files for each package that is present in the archlinux-2024.03.01 iso (using the archlinux-userland-fs-cmp tool) and in total these gcc versions have been used (gcc7 being part of the usb_modeswitch build environment, but I didn't bother investigating why): gcc7-7.4.1+20181207-3-x86_64 gcc-9.2.0-4-x86_64 gcc-9.3.0-1-x86_64 gcc-10.1.0-1-x86_64 gcc-10.1.0-2-x86_64 gcc-10.2.0-3-x86_64 gcc-10.2.0-4-x86_64 gcc-10.2.0-6-x86_64 gcc-11.1.0-1-x86_64 gcc-11.2.0-4-x86_64 gcc-12.1.0-2-x86_64 gcc-12.2.0-1-x86_64 gcc-12.2.1-1-x86_64 gcc-12.2.1-2-x86_64 gcc-12.2.1-4-x86_64 gcc-13.1.1-1-x86_64 gcc-13.1.1-2-x86_64 gcc-13.2.1-3-x86_64 gcc-13.2.1-4-x86_64 gcc-13.2.1-5-x86_64 And these versions of the Rust compiler: rust-1:1.74.0-1-x86_64 rust-1:1.75.0-2-x86_64 rust-1:1.76.0-1-x86_64 In total the build environment of all packages consists of 3704 different (pkgname, pkgver) tuples. If you disregard this, the packages you build with such an ISO wouldn't match the official packages, but 2 groups with the same ISO could likely produce matching binary packages (assuming they have a way to derive a deterministic SOURCE_DATE_EPOCH value from that ISO). From there on you'd "only" need to bootstrap a path to these binary seeds, but that's also why I pointed out this is more relevant to bootstrappable builds. Using plenty of different gcc versions looks annoying, but is only an issue for bootstrapping, not for reproducible builds (as long as everything is fully documented). If someday an Electromagnetic Pulse weapon destroys all the running computers, we'd like to bootstrap the whole industry up again, without breadboarding 8-bit micros and manually toggling in programs. Instead, a chip foundry can take these two ISOs and a bare laptop out of a locked fire-safe, reboot the (Arch Linux) world from them, and then use that Linux machine to control the chip-making and chip-testing machines that can make more high-function chips. (This would depend on the chip-makers keeping good offline fireproof backups of their own application software -- but even if they had that, they can't reboot and maintain the chip foundry without working source code for their controller's OS.) I'm personally not interested in this scenario, I'm aware Allan McRae is looking for funding for pacman development. Maybe somebody could sponsor development of a "build without network" feature in pacman, or
The upstream xz repository and the xz tarballs have been backdoored
https://www.openwall.com/lists/oss-security/2024/03/29/4 Exciting times
Re: Two questions about build-path reproducibility in Debian
Hi again, On Mon, 11 Mar 2024 at 18:24, James Addison wrote: > > Hi folks, > > On Wed, 6 Mar 2024 at 01:04, James Addison wrote: > > [ ... snip ...] > > > > The Debian bug severity descriptions[1] provide some more nuance, and that > > reassures me that wishlist should be appropriate for most of these bugs > > (although I'll inspect their contents before making any changes). > > Please find below a draft of the message I'll send to each affected bugreport. > > Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ > continue to test build-path variance, at least until we decide otherwise. > > --- BEGIN DRAFT --- > Because Debian builds packages from a fixed build path, customized build paths > are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or > during > package builds on the Reproducible Builds team's package test infrastructure > for Debian[1]. > > This means that this package will pass current reproducibility tests; however > we still believe that source code and/or build steps embed the build path into > binary package output, making it more difficult that necessary for independent > consumers to confirm whether their local compilations produce identical binary > artifacts. > > As a result, this bugreport will remain open and be assigned the 'wishlist' > severity[2]. > > ... > > [1] - https://tests.reproducible-builds.org/debian/reproducible.html > > [2] - https://www.debian.org/Bugs/Developer#severities > --- END DRAFT --- Most of the remaining buildpath bugs have been updated to severity 'wishlist'. Approximately thirty are still set to other severity levels, and I plan to update those with the following adjusted messaging: --- BEGIN DRAFT --- Control: severity -1 wishlist Dear Maintainer, Currently, Debian's buildd and also the Reproducible Builds team's testing infrastructure[1] both use a fixed build path when building binary packages. This means that your package will pass current reproducibility tests; however we believe that varying the build path still produces undesirable changes in the binary package output, making it more difficult than necessary for independent consumers to check the integrity of those packages by rebuilding them themselves. As a result, this bugreport will remain open and be re-assigned the 'wishlist' severity[2]. You can use the 'reprotest' package build utility - either locally, or as provided in Debian's Salsa continuous integration pipelines - to assist uncovering reproducibility failures due build-path variance. For more information about build paths and how they can affect reproducibility, please refer to: https://reproducible-builds.org/docs/build-path/ ... [1] - https://tests.reproducible-builds.org/debian/reproducible.html [2] - https://www.debian.org/Bugs/Developer#severities --- END DRAFT --- Thanks for your feedback and suggestions, James
diffoscope 262 released
Hi, The diffoscope maintainers are pleased to announce the release of version 262 of diffoscope. diffoscope tries to get to the bottom of what makes files or directories different. It will recursively unpack archives of many kinds and transform various binary formats into more human-readable form to compare them. It can compare two tarballs, ISO images, or PDF just as easily. Version 262 includes the following changes: [ Chris Lamb ] * Factor out Python version checking in test_zip.py. (Re: #362) * Also skip some zip tests under 3.10.14 as well; a potential regression may have been backported to the 3.10.x series. The underlying cause is still to be investigated. (Re: #362) ## Download Version 262 is available from Debian unstable as well as PyPI, and will shortly be available on other platforms surely. More details can be found here: https://diffoscope.org/ … but source tarballs may be located here: https://diffoscope.org/archive/ The corresponding Docker image may be run via (for example): $ docker run --rm -t -w $(pwd) -v $(pwd):$(pwd):ro \ registry.salsa.debian.org/reproducible-builds/diffoscope a b ## Contribute diffoscope is developed within the "Reproducible builds" effort. - Git repository https://salsa.debian.org/reproducible-builds/diffoscope - Docker image, eg. registry.salsa.debian.org/reproducible-builds/diffoscope https://salsa.debian.org/reproducible-builds/diffoscope - Issues and feature requests https://salsa.debian.org/reproducible-builds/diffoscope/issues - Contribution instructions (eg. to file an issue) https://reproducible-builds.org/contribute/salsa/ Regards, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o