Reproducible Builds for recent Debian security updates

2024-03-29 Thread Vagrant Cascadian
Philipp Kern asked about trying to do reproducible builds checks for
recent security updates to try to gain confidence about Debian's buildd
infrastructure, given that they run builds in sid chroots which may have
used or built or run a vulnerable xz-utils...

So far, I have not found any reproducibility issues; everything I tested
I was able to get to build bit-for-bit identical with what is in the
Debian archive.

I only tested bookworm security updates (not bullseye), and I tested the
xz-utils update now present in unstable, which took a little trial and
error to find the right snapshot! The build dependencies for Debian
bookworm (a.k.a. stable) were *much* easier to satisfy, as it is not a
moving target!


Debian bookworm security updates verified:

  cacti iwd libuv1 pdns-recursor samba composer fontforge knot-resolver
  php-dompdf-svg-lib squid yard

Not yet finished building:

  openvswitch

Did not yet try some time and disk-intensive builds:

  chromium firefox-esr thunderbird

Debian unstable updates verified:

  xz-utils


A tarball of build logs (including some failed builds) and .buildinfo
files is available at:

  https://people.debian.org/~vagrant/debian-security-rebuilds.tar.zst


Some caveats:

Notably, xz-utils has a build dependency that pulls in xz-utils, and the
version used may have been a vulnerable version (partly vulnerable?),
5.6.0-0.2.

The machine where I ran the builds had done some builds using packages
from sid over the last couple months, so may have at some point run the
vulnerable xz-utils code, so is not absolutely cleanest of
checks... but is at least some sort of data point.

The build environment used tarballs that had usrmerge applied (as it is
harder to not apply usrmerge these days), while the buildd
infrastructure chroots do not have usrmerge applied. But this did not
appear to cause significant problems, although pulled in a few more perl
dependencies!


I used sbuild with the --chroot-mode=unshare mode. For the xz-utils
build I used some of the ideas developed in an earlier verification
builds experiment:

  
https://salsa.debian.org/reproducible-builds/debian-verification-build-experiment/-/blob/e003ddf19de13db2d512c25417e4bec863c3a082/sbuild-wrap#L71


Was great to try and apply Reproducible Builds to real-world uses!


live well,
  vagrant


signature.asc
Description: PGP signature


Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-03-29 Thread HW42
John Gilmore:
> kpcyrd  wrote:
>> 1) There's currently no way to tell if a package can be built offline 
>> (without trying yourself).
> 
> Packages that can't be built offline are not reproducible, by
> definition.  They depend on outside events and circumstances
> in order for a third party to reproduce them successfully.
> 
> So, fixing that in each package would be a prerequisite to making a
> reproducible Arch distro (in my opinion).

I don't agree. For example the r-b.o [1] definition doesn't mandate who
needs to archive what. We probably can agree that we mean a "verifiable
path from source to binary code" (and not just repeatability, which is
also sometimes meant by reproducible builds in other contexts), but
beyond that the details and motivations will be different depending on
who you ask.

To be clear I don't say what you like to see is not worthwhile. Actually
I'm very sympathic to such archiving goals. But if Arch Linux, as
kpcyrd's mails suggest, right now just want to verify their builder
output soon-ish after upload that's fine too and can be called
reproducible, in my opinion.

[1]: https://reproducible-builds.org/docs/definition/

> I don't understand why a "source tree" would store a checksum of a
> source tarball or source file, rather than storing the actual source
> tarball or source file.  You can't compile a checksum.

How distros store their source code is different, due to different
needs, historic circumstances, etc.. And the approach of just having the
packaging definition and patches and then referring the "original" source
is common and I certainly see the advantages.

> kpcyrd  wrote:
>> Specifically Gentoo and OpenBSD Ports have solutions for this that I 
>> really like, they store a generated list of URLs along with a 
>> cryptographic checksum in a separate file, which includes crates 
>> referenced in e.g. a project's Cargo.lock.
> 
> I don't know what a crate or a Cargo.lock is,

It's Cargo's (Rust' package/dependency manager) way to pin specific
dependencies, including hashes of those.

> but rather than fix the problem at its source (include the source
> files), you propose to add another complex circumvention alongside the
> existing package building infrastructure?  What is the advantage of
> that over merely doing the "cargo fetch" early rather than late and
> putting all the resulting source files into the Arch source package?

I'm not an Arch developer, but probably because a package source repo
like [2] is much easier for them to handle than if they would commit the
source of all (transitive) dependencies [3].

[2]: https://gitlab.archlinux.org/archlinux/packaging/packages/rage-encryption
[3]: https://github.com/str4d/rage/blob/v0.10.0/Cargo.lock

(Note that Arch made the, for a classic Linux distro currently rather
unusual, decision to build Rust programs with the exact dependencies
upstream has defined and not separately package those libraries.)

>> 3) All of this doesn't take BUILDINFO files into account
> 
> The BUILDINFO files are part of the source distribution needed
> to reproduce the binary distribution.  So they would go on the
> source ISO image.
> 
>> I did some digging and downloaded the buildinfo files for each package 
>> that is present in the archlinux-2024.03.01 iso
> 
> Thank you for doing that digging!
> 
>>   Using plenty of different gcc versions looks 
>> annoying, but is only an issue for bootstrapping, not for reproducible 
>> builds (as long as everything is fully documented).
> 
> I agree that it's annoying.  It compounds the complexity of reproducing
> the build.  Does Arch get some benefit from doing so?
> 
> Ideally, a binary release ISO would be built with a single set of
> compiler tools.  Why is Arch using a dozen compiler versions?  Just to
> avoid rebuilding binary packages once the binary release's engineers
> decide what compiler is going to be this release's gold-standard
> compiler?  (E.g. The one that gets installed when the user runs pacman
> to install gcc.)  Or do the release-engineers never actually standardize
> on a compiler -- perhaps new ones get thrown onto some server whenever
> someone likes, and suddenly all the users who install a compiler just
> start using that one?

If you look at classic Linux distros it's the norm to iteratively add
packages to your repo and build new packages with what is in the
(development) repo at this time. So a single snapshot of a repo will in
nearly all cases not contain all versions to reproduce the packages in
that snapshot.

You will find this in Arch, Debian, Fedora,  Some other distros like
Yocto, might make different decisions, but those are rather the
exception.

> It currently seems that there is no guarantee that on day X, if you
> install gcc on Arch (from the Internet) and on the same day you pull in
> the source code of pacman package Y, that it will even build with the
> Day X version of gcc.  Is that true?

As described above for the 

Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-03-29 Thread John Gilmore
kpcyrd  wrote:
> 1) There's currently no way to tell if a package can be built offline 
> (without trying yourself).

Packages that can't be built offline are not reproducible, by
definition.  They depend on outside events and circumstances
in order for a third party to reproduce them successfully.

So, fixing that in each package would be a prerequisite to making a
reproducible Arch distro (in my opinion).

I don't understand why a "source tree" would store a checksum of a
source tarball or source file, rather than storing the actual source
tarball or source file.  You can't compile a checksum.

kpcyrd  wrote:
> Specifically Gentoo and OpenBSD Ports have solutions for this that I 
> really like, they store a generated list of URLs along with a 
> cryptographic checksum in a separate file, which includes crates 
> referenced in e.g. a project's Cargo.lock.

I don't know what a crate or a Cargo.lock is, but rather than fix the
problem at its source (include the source files), you propose to add
another complex circumvention alongside the existing package building
infrastructure?  What is the advantage of that over merely doing the
"cargo fetch" early rather than late and putting all the resulting
source files into the Arch source package?

> 3) All of this doesn't take BUILDINFO files into account

The BUILDINFO files are part of the source distribution needed
to reproduce the binary distribution.  So they would go on the
source ISO image.

> I did some digging and downloaded the buildinfo files for each package 
> that is present in the archlinux-2024.03.01 iso

Thank you for doing that digging!

>   Using plenty of different gcc versions looks 
> annoying, but is only an issue for bootstrapping, not for reproducible 
> builds (as long as everything is fully documented).

I agree that it's annoying.  It compounds the complexity of reproducing
the build.  Does Arch get some benefit from doing so?

Ideally, a binary release ISO would be built with a single set of
compiler tools.  Why is Arch using a dozen compiler versions?  Just to
avoid rebuilding binary packages once the binary release's engineers
decide what compiler is going to be this release's gold-standard
compiler?  (E.g. The one that gets installed when the user runs pacman
to install gcc.)  Or do the release-engineers never actually standardize
on a compiler -- perhaps new ones get thrown onto some server whenever
someone likes, and suddenly all the users who install a compiler just
start using that one?

It currently seems that there is no guarantee that on day X, if you
install gcc on Arch (from the Internet) and on the same day you pull in
the source code of pacman package Y, that it will even build with the
Day X version of gcc.  Is that true?

John



Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-03-29 Thread kpcyrd

On 3/29/24 6:48 AM, John Gilmore wrote:

John Gilmore  wrote:
Bootstrappable builds are a different thing.  Worthwhile, but not
what I was asking for.  I just wanted provable reproducibility from two
ISO images and nothing more.

I was asking that a bare amd64 be able to boot from an Arch Linux
*binary* ISO image.  And then be fed a matching Arch Linux *source* ISO
image.  And that the scripts in the source image would be able to
reproduce the binary image from its source code, running the binaries
(like the kernel, shell, and compiler) from the binary ISO image to do
the rebuilds (without Internet access).

This should be much simpler than doing a bootstrap from bare metal
*without* a binary ISO image.


I think this project would still be somewhat involved:

1) There's currently no way to tell if a package can be built offline 
(without trying yourself). Some distros have `options=(!net)`-like 
settings, but pacman currently doesn't. Needing network access for 
things like `cargo fetch` or `go mod download` is considered acceptable 
in Arch Linux, since these extra inputs are pinned by cryptographic hash 
(the PKGBUILD acts as a merkle-tree root).


Specifically Gentoo and OpenBSD Ports have solutions for this that I 
really like, they store a generated list of URLs along with a 
cryptographic checksum in a separate file, which includes crates 
referenced in e.g. a project's Cargo.lock. When unpacking them to the 
right location the build itself does not need any additional network 
resources and can run fully offline.


This concept currently does not exist in pacman, one would potentially 
need to generate 100+ lines into the source= array of a PKGBUILD (and 
another 200+ lines for checksums if 2 checksum algorithms are used). 
This is currently considered bad style, because the PKGBUILD is supposed 
to be short, simple and easy to read/understand/audit.


2) The official ISO is meant for installation and maintenance, but does 
not contain a compiler, and I'm not sure it should. Many of the other 
base-devel packages are also missing, but since you also need the build 
dependencies of all the packages you're using (recursively?) this should 
likely be its own ISO (at which point you could also include the source 
code however).


3) All of this doesn't take BUILDINFO files into account, you can use 
Arch Linux as a source-based distro, but if you want exact matches with 
the official packages you would need to match the compiler version that 
was used for each respective package.


I did some digging and downloaded the buildinfo files for each package 
that is present in the archlinux-2024.03.01 iso (using the 
archlinux-userland-fs-cmp tool) and in total these gcc versions have 
been used (gcc7 being part of the usb_modeswitch build environment, but 
I didn't bother investigating why):


gcc7-7.4.1+20181207-3-x86_64
gcc-9.2.0-4-x86_64
gcc-9.3.0-1-x86_64
gcc-10.1.0-1-x86_64
gcc-10.1.0-2-x86_64
gcc-10.2.0-3-x86_64
gcc-10.2.0-4-x86_64
gcc-10.2.0-6-x86_64
gcc-11.1.0-1-x86_64
gcc-11.2.0-4-x86_64
gcc-12.1.0-2-x86_64
gcc-12.2.0-1-x86_64
gcc-12.2.1-1-x86_64
gcc-12.2.1-2-x86_64
gcc-12.2.1-4-x86_64
gcc-13.1.1-1-x86_64
gcc-13.1.1-2-x86_64
gcc-13.2.1-3-x86_64
gcc-13.2.1-4-x86_64
gcc-13.2.1-5-x86_64

And these versions of the Rust compiler:

rust-1:1.74.0-1-x86_64
rust-1:1.75.0-2-x86_64
rust-1:1.76.0-1-x86_64

In total the build environment of all packages consists of 3704 
different (pkgname, pkgver) tuples.


If you disregard this, the packages you build with such an ISO wouldn't 
match the official packages, but 2 groups with the same ISO could likely 
produce matching binary packages (assuming they have a way to derive a 
deterministic SOURCE_DATE_EPOCH value from that ISO).


From there on you'd "only" need to bootstrap a path to these binary 
seeds, but that's also why I pointed out this is more relevant to 
bootstrappable builds. Using plenty of different gcc versions looks 
annoying, but is only an issue for bootstrapping, not for reproducible 
builds (as long as everything is fully documented).



If someday an Electromagnetic Pulse weapon destroys all the running
computers, we'd like to bootstrap the whole industry up again, without
breadboarding 8-bit micros and manually toggling in programs.  Instead,
a chip foundry can take these two ISOs and a bare laptop out of a locked
fire-safe, reboot the (Arch Linux) world from them, and then use that
Linux machine to control the chip-making and chip-testing machines that
can make more high-function chips.  (This would depend on the
chip-makers keeping good offline fireproof backups of their own
application software -- but even if they had that, they can't reboot and
maintain the chip foundry without working source code for their
controller's OS.)


I'm personally not interested in this scenario, I'm aware Allan McRae is 
looking for funding for pacman development. Maybe somebody could sponsor 
development of a "build without network" feature in pacman, or 

The upstream xz repository and the xz tarballs have been backdoored

2024-03-29 Thread kpcyrd

https://www.openwall.com/lists/oss-security/2024/03/29/4

Exciting times


Re: Two questions about build-path reproducibility in Debian

2024-03-29 Thread James Addison via rb-general
Hi again,

On Mon, 11 Mar 2024 at 18:24, James Addison  wrote:
>
> Hi folks,
>
> On Wed, 6 Mar 2024 at 01:04, James Addison  wrote:
> > [ ... snip ...]
> >
> > The Debian bug severity descriptions[1] provide some more nuance, and that
> > reassures me that wishlist should be appropriate for most of these bugs
> > (although I'll inspect their contents before making any changes).
>
> Please find below a draft of the message I'll send to each affected bugreport.
>
> Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_
> continue to test build-path variance, at least until we decide otherwise.
>
> --- BEGIN DRAFT ---
> Because Debian builds packages from a fixed build path, customized build paths
> are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or 
> during
> package builds on the Reproducible Builds team's package test infrastructure
> for Debian[1].
>
> This means that this package will pass current reproducibility tests; however
> we still believe that source code and/or build steps embed the build path into
> binary package output, making it more difficult that necessary for independent
> consumers to confirm whether their local compilations produce identical binary
> artifacts.
>
> As a result, this bugreport will remain open and be assigned the 'wishlist'
> severity[2].
>
> ...
>
> [1] - https://tests.reproducible-builds.org/debian/reproducible.html
>
> [2] - https://www.debian.org/Bugs/Developer#severities
> --- END DRAFT ---

Most of the remaining buildpath bugs have been updated to severity 'wishlist'.

Approximately thirty are still set to other severity levels, and I plan to
update those with the following adjusted messaging:

--- BEGIN DRAFT ---
Control: severity -1 wishlist

Dear Maintainer,

Currently, Debian's buildd and also the Reproducible Builds team's testing
infrastructure[1] both use a fixed build path when building binary packages.

This means that your package will pass current reproducibility tests; however
we believe that varying the build path still produces undesirable changes in
the binary package output, making it more difficult than necessary for
independent consumers to check the integrity of those packages by rebuilding
them themselves.

As a result, this bugreport will remain open and be re-assigned the 'wishlist'
severity[2].

You can use the 'reprotest' package build utility - either locally, or as
provided in Debian's Salsa continuous integration pipelines - to assist
uncovering reproducibility failures due build-path variance.

For more information about build paths and how they can affect reproducibility,
please refer to: https://reproducible-builds.org/docs/build-path/

...

[1] - https://tests.reproducible-builds.org/debian/reproducible.html

[2] - https://www.debian.org/Bugs/Developer#severities
--- END DRAFT ---

Thanks for your feedback and suggestions,
James


diffoscope 262 released 

2024-03-29 Thread Chris Lamb
Hi,

The diffoscope maintainers are pleased to announce the release of
version 262 of diffoscope.

diffoscope tries to get to the bottom of what makes files or
directories different. It will recursively unpack archives of many
kinds and transform various binary formats into more human-readable
form to compare them. It can compare two tarballs, ISO images, or PDF
just as easily.

Version 262 includes the following changes:

  [ Chris Lamb ]
  * Factor out Python version checking in test_zip.py. (Re: #362)
  * Also skip some zip tests under 3.10.14 as well; a potential regression may
have been backported to the 3.10.x series. The underlying cause is still to
be investigated. (Re: #362)

## Download

Version 262 is available from Debian unstable as well as PyPI, and
will shortly be available on other platforms surely. More details can
be found here:

   https://diffoscope.org/

… but source tarballs may be located here:

  https://diffoscope.org/archive/

The corresponding Docker image may be run via (for example):

  $ docker run --rm -t -w $(pwd) -v $(pwd):$(pwd):ro \
  registry.salsa.debian.org/reproducible-builds/diffoscope a b


## Contribute

diffoscope is developed within the "Reproducible builds" effort.

  - Git repository
https://salsa.debian.org/reproducible-builds/diffoscope

  - Docker image, eg.
registry.salsa.debian.org/reproducible-builds/diffoscope
https://salsa.debian.org/reproducible-builds/diffoscope

  - Issues and feature requests
https://salsa.debian.org/reproducible-builds/diffoscope/issues

  - Contribution instructions (eg. to file an issue)
https://reproducible-builds.org/contribute/salsa/


Regards,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o