Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Fri, Apr 05, 2024 at 01:30:51AM +0200, kpcyrd wrote:
> On 4/5/24 12:31 AM, Adrian Bunk wrote:
> > Hashes of "git archive" tarballs are anyway not stable,
> > so whatever a maintainer generates is not worse than what is on Github.
> > 
> > Any proper tooling would have to verify that the contents is equal.
> > 
> > > ...
> > > Being able to disregard the compression layer is still necessary however,
> > > because Debian (as far as I know) never takes the hash of the inner .tar
> > > file but only the compressed one. Because of this you may still need to
> > > provide `--orig ` if you want to compare with an uncompressed tar.
> > > ...
> > 
> > Right now the preferred form of source in Debian is an upstream-signed
> > release tarball, NOT anything from git.
> > 
> > An actual improvement would be to automatically and 100% reliably
> > verify that a given tarball matches the commit ID and signed git tag
> > in an upstream git tree.
> 
> I strongly disagree. I think the upstream signature is overrated.

The best we can realistically verify is that the code is from upstream.

> It's from the old mindset of code signing being the only way of securely
> getting code from upstream. Recent events have shown (instead of bothering
> upstream for signatures) it's much more important to have clarity and
> transparency what's in the code that is compiled into binaries and executed
> on our computers, instead of who we got it from.
>...

We do know that for the backdoored xz packages.

An intentional backdoor by upstream is not something we can 
realistically defend against.

The tiny part of the whole xz backdoor that was only in the tarball 
could instead also have been in git like the rest of the backdoor.

A "supply-chain security tool" that does not bring any improvement in 
this case is just snake oil.

> cheers,
> kpcyrd

cu
Adrian


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/5/24 12:31 AM, Adrian Bunk wrote:

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.


...
Being able to disregard the compression layer is still necessary however,
because Debian (as far as I know) never takes the hash of the inner .tar
file but only the compressed one. Because of this you may still need to
provide `--orig ` if you want to compare with an uncompressed tar.
...


Right now the preferred form of source in Debian is an upstream-signed
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.


I strongly disagree. I think the upstream signature is overrated.

It's from the old mindset of code signing being the only way of securely 
getting code from upstream. Recent events have shown (instead of 
bothering upstream for signatures) it's much more important to have 
clarity and transparency what's in the code that is compiled into 
binaries and executed on our computers, instead of who we got it from. 
The entire reproducible builds effort is based on the idea of the source 
code in Debian being safe and sound to use.


If upstream refused to sign anything but pre-compiled llvm IR, I'd put 
both the IR and signature in the trash and build from source code.


If upstream wouldn't sign anything but autotools pre-processed archives 
with 25k lines of auto-generated shell scripts I'd put it next to the IR 
and build from the actual source code as well.


If upstream would only sign a tarball with files sorted in the order 
they were returned by their kernel to readdir(), I'd raise the question 
why we're having this in 2024 (and possibly suggest to use a tar with 
sorted entries).


Although to be honest if this would really be the only problem we'd be 
having, I'd likely not care anymore and put my time to better use.



Or perhaps stop using tarballs in Debian as sole permitted
form of source.


I'd be fine with that.

cheers,
kpcyrd


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread James McCoy
On Fri, Apr 05, 2024 at 01:31:25AM +0300, Adrian Bunk wrote:
> On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
> >...
> > I've checked both, upstreams github release page and their website[1], but
> > couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> > the compression is fair.
> > 
> > [1]: https://www.vim.org/download.php
> >...
> 
> Perhaps that's a maintainer running "git archive" manually?

Yes, in whichever way git-deborig(1) is driving git archive.

Cheers,
-- 
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread Adrian Bunk
On Thu, Apr 04, 2024 at 09:39:51PM +0200, kpcyrd wrote:
>...
> I've checked both, upstreams github release page and their website[1], but
> couldn't find any mention of .tar.xz, so I think my claim of Debian doing
> the compression is fair.
> 
> [1]: https://www.vim.org/download.php
>...

Perhaps that's a maintainer running "git archive" manually?

Hashes of "git archive" tarballs are anyway not stable,
so whatever a maintainer generates is not worse than what is on Github.

Any proper tooling would have to verify that the contents is equal.

>...
> Being able to disregard the compression layer is still necessary however,
> because Debian (as far as I know) never takes the hash of the inner .tar
> file but only the compressed one. Because of this you may still need to
> provide `--orig ` if you want to compare with an uncompressed tar.
>...

Right now the preferred form of source in Debian is an upstream-signed 
release tarball, NOT anything from git.

An actual improvement would be to automatically and 100% reliably
verify that a given tarball matches the commit ID and signed git tag
in an upstream git tree.

But for that writing tooling would be the trivial part,
architectural topics like where to store the commit ID
and where to store the git tree would be the harder parts.

Or perhaps stop using tarballs in Debian as sole permitted
form of source.

> cheers,
> kpcyrd

cu
Adrian


Re: New supply-chain security tool: backseat-signed

2024-04-04 Thread kpcyrd

On 4/3/24 4:21 AM, Adrian Bunk wrote:

On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:

...
I figured out a somewhat straight-forward way to check if a given `git
archive` output is cryptographically claimed to be the source input of a
given binary package in either Arch Linux or Debian (or both).


For Debian the proper approach would be to copy Checksums-Sha256 for the
source package to the buildinfo file, and there is nothing where it would
matter whether the tarball was generated from git or otherwise.


I believe this to be the "reproducible source tarball" thing some people
have been asking about.
...


The lack of a reliably reproducible checksum when using "git archive" is
the problem, and git cannot realistically provide that.

Even when called with the same parameters, "git archive" executed in
different environments might produce different archives for the same
commit ID.

It is documented that auto-generated Github tarballs for the same tag
and with the same commit ID downloaded at different times might have
different checksums.


Granted it takes some skill to take snapshots that match what github is 
generating (and there are occasional issues) but generally speaking it 
works quite well. The required command is in the README, and I encourage 
you to give it a try.


If you want something that's explicitly designed for taking reproducible 
VCS snapshots you could also consider the "Nix Archive" format[0], 
however I think more people would be in favor of agreeing on how to 
canonically derive a given git tree into a `.tar.gz` (or at least .tar) 
instead of switching Debian to the .nar file format.


[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that 
it may only work in 98% of cases, I'd say, is a Luxusproblem considering 
the current state of things. The next paragraph is the bigger headache:



This tool highlights the concept of "canonical sources", which is supposed
to give guidance on what to code review.
...


How does it tell the git commit ID the tarball was generated from?

Doing a code review of git sources as tarball would would be stupid,
you really want the git metadata that usually shows when, why and by
whom something was changed.


It doesn't. It works like a one-way function, it can verify a given VCS 
snapshot is definitely the source code that was ingested into Debian, 
but it can't locate the source code on its own.


I don't know if Debian has this kind of provenance information 
available, to my knowledge, Debian operates on "our maintainers upload 
.tar.xz files into our archive and we take them for face value". Which 
does make sense, considering not every software project uses git, some 
may develop their own VCS, some software projects do not have any VCS at 
all and it's just one person applying patches to a folder on their local 
computer and uploading .tar snapshots to a webserver every other month.


There's some packages that have some kind of system behind them, like 
rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to 
match  (although 
sometimes files get excluded from the tar upload). I'd like to 
explicitly encourage people to point me in the right direction if 
there's any existing effort of mapping debian .orig.tar.gz files to git 
tags (not necessarily bit-for-bit, but at least which commit we expect 
it to come from).



https://github.com/kpcyrd/backseat-signed

The README
...


"This requires some squinting since in Debian the source tarball is
  commonly recompressed so only the inner .tar is compared"

This doesn't sound true.


I've updated the wording and intend to investigate this further. By 
default the relevant command even expects an exact match. For example 
this works:


```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
[2024-04-04T18:45:09Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.gz"

[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] File verified 
successfully

```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
[2024-04-04T18:48:32Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.xz"

[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find source tarball with matching hash in source index
```

Being able to disregard the compression layer is still necessary 
however, because Debian (as far as I know) never takes the 

Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-04-04 Thread David A. Wheeler via rb-general



> On Apr 2, 2024, at 1:11 PM, John Gilmore  wrote:
> 
> For me, the distinction is that the local storage is under the direct
> control of the person trying to rebuild, while the network and the
> servers elsewhere in the network are not.  If local storage is
> unreliable, you can fix or replace it, and continue with your work.

There are obviously many advantages to local storage.

However, if you locally record cryptographic hashes, and re-download the
bits for (say) a compiler, you could still reproduce the results
*if* the information is still available where you're downloading it from
(or can find an alternative source). The key is that "if" condition.

The risk of not having local copies is the risk of loss of availability.
However, many sites are fairly reliable. I'd hate to tell someone they
can't verify reproducible builds just because they don't (currently)
have a local copy of everything. Indeed, you want multiple verifications
of reproducible builds, and they'll have to get their data from somewhere.

It's sometimes much easier to send the source including build instructions,
information on how to download the rest, and the cryptographic hashes for
what is not bundled.

--- David A. Wheeler