Re: New supply-chain security tool: backseat-signed
Friends - On Wed, Apr 03, 2024 at 05:21:40AM +0300, Adrian Bunk wrote: > It is documented that auto-generated Github tarballs for the same tag > and with the same commit ID downloaded at different times might have > different checksums. I've run into this statement before. It's annoyingly true, in part because it's typically false. Can we document a standard workaround-recipe, where a script grabs the tarball, decompresses it, and then rebuilds and compresses the contents in a way that _is_ reproducible? - Larry
Re: New supply-chain security tool: backseat-signed
On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote: >... > I figured out a somewhat straight-forward way to check if a given `git > archive` output is cryptographically claimed to be the source input of a > given binary package in either Arch Linux or Debian (or both). For Debian the proper approach would be to copy Checksums-Sha256 for the source package to the buildinfo file, and there is nothing where it would matter whether the tarball was generated from git or otherwise. > I believe this to be the "reproducible source tarball" thing some people > have been asking about. >... The lack of a reliably reproducible checksum when using "git archive" is the problem, and git cannot realistically provide that. Even when called with the same parameters, "git archive" executed in different environments might produce different archives for the same commit ID. It is documented that auto-generated Github tarballs for the same tag and with the same commit ID downloaded at different times might have different checksums. > This tool highlights the concept of "canonical sources", which is supposed > to give guidance on what to code review. >... How does it tell the git commit ID the tarball was generated from? Doing a code review of git sources as tarball would would be stupid, you really want the git metadata that usually shows when, why and by whom something was changed. > https://github.com/kpcyrd/backseat-signed > > The README >... "This requires some squinting since in Debian the source tarball is commonly recompressed so only the inner .tar is compared" This doesn't sound true. > Let me know what you think. > > Happy feet, > kpcyrd cu Adrian
New supply-chain security tool: backseat-signed
Hello, I'm going to keep this short, I've been writing a lot of text recently (which is quite exhausting, on top of my dayjob and all the code I wrote today afterwards. Apologies if you're still waiting for a reply in one of the other threads). I figured out a somewhat straight-forward way to check if a given `git archive` output is cryptographically claimed to be the source input of a given binary package in either Arch Linux or Debian (or both). I believe this to be the "reproducible source tarball" thing some people have been asking about. As explained in the README, I believe reproducing autotools-generated tarballs isn't worth everybody's time and instead a distribution that claims to build from source should operate on VCS snapshots instead of tarballs with 25k lines of pre-generated shell-script. Building from VCS snapshots is already the case for a large number of Arch Linux packages (through auto-generated Github tarballs). Some packages have been actively converted to VCS snapshots by Arch Linux staff in response to the xz incident. This tool highlights the concept of "canonical sources", which is supposed to give guidance on what to code review. This is also why I think code signing by upstream is somewhat low priority, since the big distros can form consensus around "what's the source code" regardless. https://github.com/kpcyrd/backseat-signed The README shows how to verify Arch Linux and Debian build cmatrix from the same source code - they may both still apply patches (which would be considered part of the build instructions), but the specified source input is the same. This tarball can also be bit-for-bit reproduced from VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix repository. (If somebody ever tells you programming in Rust is slower, I wrote the entirety of this codebase within a few hours of a single day) Let me know what you think. Happy feet, kpcyrd
Re: Arch Linux minimal container userland 100% reproducible - now what?
James Addison wrote that local storage can contain errors. I agree. > My guess is that we could get into near-unsolvable philosophical territory > along this path, but I think it's worth being skeptical of the notions that > local-storage is always trustworthy and that the network should always be > avoided. For me, the distinction is that the local storage is under the direct control of the person trying to rebuild, while the network and the servers elsewhere in the network are not. If local storage is unreliable, you can fix or replace it, and continue with your work. I am looking for reproducibility that is completely doable by the person trying to do it, at any time after when they obtain a limited number of key items by any means: the bootable binary of the OS release, and what the GPL calls the "Corresponding Source". And, I am very happy to be seeing lots of incremental progress along the way! John PS: I have a local archive of the source ISO images and the binary ISO images of many Ubuntu, Fedora, Debian, BSD, etc releases. It all fits easily on a single hard disk drive, and that drive has many backups from different times. The images all have checksums that were checked when I obtained the images. The checksums are in the backups, so I can see if my copies were tampered with or merely suffered from storage degradation over time. And I can easily copy the whole thing and send you a copy, if you want one; or put it on the Internet (some of the releases are available from me now via BitTorrent). If those distros were reproducible, I could verify that each of those binary releases was untampered. Or YOU could, without my help, after you got a copy from me or from anyone. And if you suspected a binary Ken Thompson attack, you could use those releases locally at your site, as the source material for an arbitrarily intense diverse double-compilation check. Without my help, and without the help of anyone else on the Internet. In short, making a local archive of reproducible binaries and their corresponding sources, readily enables all the verifications that we are trying to make common in the world.
Re: Two questions about build-path reproducibility in Debian
James Addison wrote: > None of the remaining thirty-or-so (and in fact, none of the 66 updated so > far) > are usertagged both 'buildpath' and 'toolchain'. > > I would say that a few of them _are_ 'toolchain packages' -- mono, > binutils-dev > and a few others -- but for these bugs the buildpath issues are internal to > each package at build-time and do not affect the construction of other > packages in their ecosystem. You are absolutely right to distinguish between a package that is itself unreproducible and a package that is causing other packages to be unreproducible. These are very much orthogonal concepts as you imply, and a package can certainly be in both categories at once. What might be confusing to folks is that our "toolchain" usertag in the Debian BTS does not refer to a toolchain *package* in the usual, Debian sense, i.e. Mono, libc, Bison, documentation generators and so on. But rather that (loosely speaking) "if this usertag is applied to a bug, its denoting that that particular *bug* is affecting the reproducibility of other packages." Unfortunately, the tag is actually an excellent example of that general trend in tech where something was badly named in the spur of the moment, and then the name just sticks around forever due to some combination of muscle memory, inertia and, frankly, priority: as in, this metadata is not *all* that visible nor A++ important to begin with… outside of threads like this. :) Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o
Re: Arch Linux minimal container userland 100% reproducible - now what?
Hi John, On Fri, 29 Mar 2024 at 19:29, John Gilmore wrote: > > kpcyrd wrote: > > 1) There's currently no way to tell if a package can be built offline > > (without trying yourself). > > Packages that can't be built offline are not reproducible, by > definition. They depend on outside events and circumstances > in order for a third party to reproduce them successfully. > > So, fixing that in each package would be a prerequisite to making a > reproducible Arch distro (in my opinion). This perspective is valuable because it is certainly true that unreliable or unexpected responses from a network adapter could cause software builds to fail, be delayed, or contain errors. However I fail to see why any of those circumstances would not be equally possible in the case of equivalent responses from physically or locally attached I/O devices. A storage device could be considered a node on a local network that no other host is able to communicate with directly; and to my knowledge it's rarely the case that traffic to-and-from local storage devices is inspected for integrity by hardware/software outside of the device that it is connected to (which isn't necessarily the place that it makes sense to run those checks). My guess is that we could get into near-unsolvable philosophical territory along this path, but I think it's worth being skeptical of the notions that local-storage is always trustworthy and that the network should always be avoided. Regards, James
Re: Two questions about build-path reproducibility in Debian
Thanks, Chris, On Sun, 31 Mar 2024 at 13:01, Chris Lamb wrote: > > Hi James, > > > Approximately thirty are still set to other severity levels, and I plan to > > update those with the following adjusted messaging […] > > Looks good to me. :) > > Completely out of interest, are any of those 30 bugs tagged both > "buildpath" and "toolchain"? It's written nowhere in Policy (and I > can't remember if it's ever been discussed before), but if package X > is causing package Y to be unreproducible, I feel that has some > bearing on the severity of the bug for that issue filed against X… > completely independent of whether package X is reproducible itself or > not. :) None of the remaining thirty-or-so (and in fact, none of the 66 updated so far) are usertagged both 'buildpath' and 'toolchain'. I would say that a few of them _are_ 'toolchain packages' -- mono, binutils-dev and a few others -- but for these bugs the buildpath issues are internal to each package at build-time and do not affect the construction of other packages in their ecosystem. > Just to underscore that this is simply my curiosity before you > reassign: in the particular case of *buildpath* AND toolchain, these > should almost certainly be wishlist anyway because, as discussed, we > "aren't testing buildpath". Mostly agree. Of the bugs in Debian that _are_ usertagged both buildpath and also toolchain, a few of them appear to have possible known/tested fixes, but in some cases are awaiting maintainer/upstream support. Using a static buildpath seems like it should mitigate most concern there, but if that were not the case, then the severity of those could perhaps be re-argued based on the quantity, popularity and importance of affected software (packaged or otherwise). Regards, James