Re: New supply-chain security tool: backseat-signed

2024-04-02 Thread Larry Doolittle
Friends -

On Wed, Apr 03, 2024 at 05:21:40AM +0300, Adrian Bunk wrote:
> It is documented that auto-generated Github tarballs for the same tag 
> and with the same commit ID downloaded at different times might have 
> different checksums.

I've run into this statement before.  It's annoyingly true,
in part because it's typically false.

Can we document a standard workaround-recipe, where a script
grabs the tarball, decompresses it, and then rebuilds and compresses
the contents in a way that _is_ reproducible?

  - Larry


Re: New supply-chain security tool: backseat-signed

2024-04-02 Thread Adrian Bunk
On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:
>...
> I figured out a somewhat straight-forward way to check if a given `git
> archive` output is cryptographically claimed to be the source input of a
> given binary package in either Arch Linux or Debian (or both).

For Debian the proper approach would be to copy Checksums-Sha256 for the 
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.

> I believe this to be the "reproducible source tarball" thing some people
> have been asking about.
>...

The lack of a reliably reproducible checksum when using "git archive" is 
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in 
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag 
and with the same commit ID downloaded at different times might have 
different checksums.

> This tool highlights the concept of "canonical sources", which is supposed
> to give guidance on what to code review.
>...

How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.

> https://github.com/kpcyrd/backseat-signed
> 
> The README
>...

"This requires some squinting since in Debian the source tarball is 
 commonly recompressed so only the inner .tar is compared"

This doesn't sound true.

> Let me know what you think. 
> 
> Happy feet,
> kpcyrd

cu
Adrian


New supply-chain security tool: backseat-signed

2024-04-02 Thread kpcyrd

Hello,

I'm going to keep this short, I've been writing a lot of text recently 
(which is quite exhausting, on top of my dayjob and all the code I wrote 
today afterwards. Apologies if you're still waiting for a reply in one 
of the other threads).


I figured out a somewhat straight-forward way to check if a given `git 
archive` output is cryptographically claimed to be the source input of a 
given binary package in either Arch Linux or Debian (or both).


I believe this to be the "reproducible source tarball" thing some people 
have been asking about. As explained in the README, I believe 
reproducing autotools-generated tarballs isn't worth everybody's time 
and instead a distribution that claims to build from source should 
operate on VCS snapshots instead of tarballs with 25k lines of 
pre-generated shell-script. Building from VCS snapshots is already the 
case  for a large number of Arch Linux packages (through auto-generated 
Github tarballs). Some packages have been actively converted to VCS 
snapshots by Arch Linux staff in response to the xz incident.


This tool highlights the concept of "canonical sources", which is 
supposed to give guidance on what to code review. This is also why I 
think code signing by upstream is somewhat low priority, since the big 
distros can form consensus around "what's the source code" regardless.


https://github.com/kpcyrd/backseat-signed

The README shows how to verify Arch Linux and Debian build cmatrix from 
the same source code - they may both still apply patches (which would be 
considered part of the build instructions), but the specified source 
input is the same. This tarball can also be bit-for-bit reproduced from 
VCS by taking a `git archive` snapshot of the v2.0 tag in the cmatrix 
repository.


(If somebody ever tells you programming in Rust is slower, I wrote the 
entirety of this codebase within a few hours of a single day)


Let me know what you think. 

Happy feet,
kpcyrd


Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-04-02 Thread John Gilmore
James Addison wrote that local storage can contain errors.  I agree.

> My guess is that we could get into near-unsolvable philosophical territory
> along this path, but I think it's worth being skeptical of the notions that
> local-storage is always trustworthy and that the network should always be
> avoided.

For me, the distinction is that the local storage is under the direct
control of the person trying to rebuild, while the network and the
servers elsewhere in the network are not.  If local storage is
unreliable, you can fix or replace it, and continue with your work.

I am looking for reproducibility that is completely doable by the person
trying to do it, at any time after when they obtain a limited number of
key items by any means: the bootable binary of the OS release, and what
the GPL calls the "Corresponding Source".

And, I am very happy to be seeing lots of incremental progress along the way!

John

PS: I have a local archive of the source ISO images and the binary ISO
images of many Ubuntu, Fedora, Debian, BSD, etc releases.  It all fits
easily on a single hard disk drive, and that drive has many backups from
different times.  The images all have checksums that were checked when I
obtained the images.  The checksums are in the backups, so I can see if
my copies were tampered with or merely suffered from storage degradation
over time.

And I can easily copy the whole thing and send you a copy, if you want
one; or put it on the Internet (some of the releases are available from
me now via BitTorrent).  If those distros were reproducible, I could
verify that each of those binary releases was untampered.  Or YOU could,
without my help, after you got a copy from me or from anyone.  And if
you suspected a binary Ken Thompson attack, you could use those releases
locally at your site, as the source material for an arbitrarily intense
diverse double-compilation check.  Without my help, and without the help
of anyone else on the Internet.

In short, making a local archive of reproducible binaries and their
corresponding sources, readily enables all the verifications that we are
trying to make common in the world.



Re: Two questions about build-path reproducibility in Debian

2024-04-02 Thread Chris Lamb
James Addison wrote:

> None of the remaining thirty-or-so (and in fact, none of the 66 updated so 
> far)
> are usertagged both 'buildpath' and 'toolchain'.
>
> I would say that a few of them _are_ 'toolchain packages' -- mono, 
> binutils-dev
> and a few others -- but for these bugs the buildpath issues are internal to
> each package at build-time and do not affect the construction of other
> packages in their ecosystem.

You are absolutely right to distinguish between a package that is
itself unreproducible and a package that is causing other packages
to be unreproducible. These are very much orthogonal concepts as you
imply, and a package can certainly be in both categories at once.

What might be confusing to folks is that our "toolchain" usertag in
the Debian BTS does not refer to a toolchain *package* in the usual,
Debian sense, i.e. Mono, libc, Bison, documentation generators and
so on. But rather that (loosely speaking) "if this usertag is applied
to a bug, its denoting that that particular *bug* is affecting the
reproducibility of other packages."

Unfortunately, the tag is actually an excellent example of that
general trend in tech where something was badly named in the spur of
the moment, and then the name just sticks around forever due to some
combination of muscle memory, inertia and, frankly, priority: as in,
this metadata is not *all* that visible nor A++ important to begin
with… outside of threads like this. :)


Best wishes,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o


Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-04-02 Thread James Addison via rb-general
Hi John,

On Fri, 29 Mar 2024 at 19:29, John Gilmore  wrote:
>
> kpcyrd  wrote:
> > 1) There's currently no way to tell if a package can be built offline
> > (without trying yourself).
>
> Packages that can't be built offline are not reproducible, by
> definition.  They depend on outside events and circumstances
> in order for a third party to reproduce them successfully.
>
> So, fixing that in each package would be a prerequisite to making a
> reproducible Arch distro (in my opinion).

This perspective is valuable because it is certainly true that unreliable
or unexpected responses from a network adapter could cause software builds to
fail, be delayed, or contain errors.

However I fail to see why any of those circumstances would not be
equally possible
in the case of equivalent responses from physically or locally attached I/O
devices.

A storage device could be considered a node on a local network that no other
host is able to communicate with directly; and to my knowledge it's rarely the
case that traffic to-and-from local storage devices is inspected for integrity
by hardware/software outside of the device that it is connected to (which
isn't necessarily the place that it makes sense to run those checks).

My guess is that we could get into near-unsolvable philosophical territory
along this path, but I think it's worth being skeptical of the notions that
local-storage is always trustworthy and that the network should always be
avoided.

Regards,
James


Re: Two questions about build-path reproducibility in Debian

2024-04-02 Thread James Addison via rb-general
Thanks, Chris,

On Sun, 31 Mar 2024 at 13:01, Chris Lamb  wrote:
>
> Hi James,
>
> > Approximately thirty are still set to other severity levels, and I plan to
> > update those with the following adjusted messaging […]
>
> Looks good to me. :)
>
> Completely out of interest, are any of those 30 bugs tagged both
> "buildpath" and "toolchain"? It's written nowhere in Policy (and I
> can't remember if it's ever been discussed before), but if package X
> is causing package Y to be unreproducible, I feel that has some
> bearing on the severity of the bug for that issue filed against X…
> completely independent of whether package X is reproducible itself or
> not.  :)

None of the remaining thirty-or-so (and in fact, none of the 66 updated so far)
are usertagged both 'buildpath' and 'toolchain'.

I would say that a few of them _are_ 'toolchain packages' -- mono, binutils-dev
and a few others -- but for these bugs the buildpath issues are internal to
each package at build-time and do not affect the construction of other
packages in their ecosystem.

> Just to underscore that this is simply my curiosity before you
> reassign: in the particular case of *buildpath* AND toolchain, these
> should almost certainly be wishlist anyway because, as discussed, we
> "aren't testing buildpath".

Mostly agree.  Of the bugs in Debian that _are_ usertagged both buildpath and
also toolchain, a few of them appear to have possible known/tested fixes, but in
some cases are awaiting maintainer/upstream support.  Using a static buildpath
seems like it should mitigate most concern there, but if that were not the case,
then the severity of those could perhaps be re-argued based on the quantity,
popularity and importance of affected software (packaged or otherwise).

Regards,
James