Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-03-28 Thread John Gilmore
John Gilmore  wrote:
> It seems to me that the next step in making the Arch release ISOs
> reproducible is to have the Arch release engineering team create a
> source-code release ISO that matches each binary release ISO.  Then you
> (or anyone) could test the reproducibility of the release by having
> merely those two ISO images and a bare amd64 computer (without even an
> Internet connection).

kpcyrd  wrote:
> I think this falls under "bootstrappable builds", a bare amd64 computer 
> still needs something to boot into (a CD with only source code won't do 
> the trick).

Bootstrappable builds are a different thing.  Worthwhile, but not
what I was asking for.  I just wanted provable reproducibility from two
ISO images and nothing more.

I was asking that a bare amd64 be able to boot from an Arch Linux
*binary* ISO image.  And then be fed a matching Arch Linux *source* ISO
image.  And that the scripts in the source image would be able to
reproduce the binary image from its source code, running the binaries
(like the kernel, shell, and compiler) from the binary ISO image to do
the rebuilds (without Internet access).

This should be much simpler than doing a bootstrap from bare metal
*without* a binary ISO image.

And if your source/binary ISO images can do that, it's not just an
academic exercise in reproducibility.  It can also produce a new binary
ISO that is built from that source ISO plus a few patches (e.g. for
fixing security issues).  Or, it can "recompile-the-world" after you (or
any user) makes a small change to a kernel, include file, library, or
compiler -- and show exactly how many programs compile to something
*different* as a result.  Basically, that pair of ISOs becomes a seed
that can carry forward, or fork, the whole distribution.  For anybody
who receives them.  That is the promise of free software, but the
complexity of modern distros plus the convenience of ubiquitous
Internet have inadvertently tended to undermine that promise.  Until
the reproducible builds effort!

If someday an Electromagnetic Pulse weapon destroys all the running
computers, we'd like to bootstrap the whole industry up again, without
breadboarding 8-bit micros and manually toggling in programs.  Instead,
a chip foundry can take these two ISOs and a bare laptop out of a locked
fire-safe, reboot the (Arch Linux) world from them, and then use that
Linux machine to control the chip-making and chip-testing machines that
can make more high-function chips.  (This would depend on the
chip-makers keeping good offline fireproof backups of their own
application software -- but even if they had that, they can't reboot and
maintain the chip foundry without working source code for their
controller's OS.)

John



Re: Verifying reproducibility of Java builds from Maven Central

2024-03-28 Thread Arnout Engelen
On Thu, Mar 28, 2024, at 16:41, Railean, Alexander via rb-general wrote:
> I am trying to understand how someone can independently verify the 
> reproducibility of Java projects on Maven Central. Having explored the 
> repositories on Maven Central, I could not find examples where the 
> “buildinfo” file was present.
Publishing a buildinfo to Maven Central is indeed relatively uncommon.
> The archives of this mailing list pointed out examples such as 
> https://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_2.13/2.6.4/akka-actor_2.13-2.6.4.buildinfo,
>  and yet my understanding is that this is not enough [but why?], hence 
> reproducible-central was created to address some sort of gap.
>  
> So far, my mental model is that:
>  • By including buildinfo in the artifacts on Maven Central, library authors 
> empower users to check for themselves if the build is reproducible or not.
>  • Reproducible-central takes it a step further and attempts to do a build 
> and then gives you a “yes/no” result.
>  
> Thus, the former makes the problem solvable in principle, whereas the latter 
> actually solves it. Is my understanding is correct? 

Mostly: publishing the buildinfo is optional, it is possible to have a 
reproducible build without publishing the buildinfo metadata (but you might 
need some other way to convey the requirements for your build environment). 
Indeed, reproducible-central has successfully rebuilt many artifacts that 
haven't published a buildinfo.

> Besides that, I have some additional questions:
> 1. Can you provide references to documentation that explains how to make sure 
> buildinfo ends up on Maven Central?
In the case of Akka, they/we use the 
https://github.com/raboof/sbt-reproducible-builds/ plugin for the sbt build 
tool that is used to build Akka.
> 2. Is there a tutorial that describes how to get featured on Reproducible 
> Central?
>  
>  
> I had a look at 
> https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md,
>  and my understanding is that this is not working for projects built on 
> Windows, because it relies on rebuild.sh, which implies one has bash. The 
> library I publish on Maven Central is built on a Windows computer – does this 
> mean that I won’t be able to list it in reproducible-builds?

Hmm, that sounds tricky. However, my experience with Java/Maven is that it is 
often possible to achieve reproducibility across operating systems: artifacts 
built on MacOS can often be rebuilt on Linux and vice-versa, so perhaps the 
same is also true for Windows?


Kind regards,

-- 
Arnout Engelen
Engelen Open Source
https://engelen.eu


Verifying reproducibility of Java builds from Maven Central

2024-03-28 Thread Railean, Alexander via rb-general
Hi everybody,



I am trying to understand how someone can independently verify the 
reproducibility of Java projects on Maven Central. Having explored the 
repositories on Maven Central, I could not find examples where the "buildinfo" 
file was present.



The archives of this mailing list pointed out examples such as 
https://repo1.maven.org/maven2/com/typesafe/akka/akka-actor_2.13/2.6.4/akka-actor_2.13-2.6.4.buildinfo,
 and yet my understanding is that this is not enough [but why?], hence 
reproducible-central was created to address some sort of gap.



So far, my mental model is that:

*   By including buildinfo in the artifacts on Maven Central, library 
authors empower users to check for themselves if the build is reproducible or 
not.
*   Reproducible-central takes it a step further and attempts to do a build 
and then gives you a "yes/no" result.



Thus, the former makes the problem solvable in principle, whereas the latter 
actually solves it. Is my understanding is correct?





Besides that, I have some additional questions:

1. Can you provide references to documentation that explains how to make sure 
buildinfo ends up on Maven Central?

2. Is there a tutorial that describes how to get featured on Reproducible 
Central?





I had a look at 
https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md,
 and my understanding is that this is not working for projects built on 
Windows, because it relies on rebuild.sh, which implies one has bash. The 
library I publish on Maven Central is built on a Windows computer - does this 
mean that I won't be able to list it in reproducible-builds?







Looking forward to your feedback,

Alex



Re: Arch Linux minimal container userland 100% reproducible - now what?

2024-03-28 Thread kpcyrd

On 3/26/24 5:03 PM, Michael Schierl via rb-general wrote:

So we can expect many year/month pairs embedded in manpages that got
unnoticed since mostly the build happens in the same month? Or have they
been manually vetted?


The results on reproducible.archlinux.org don't aim to guarantee the 
absence of reproducible builds issues, they instead aim to confirm the 
binary can be built from the given source code and build instructions 
(which is, at least for me, why I'm working on reproducible builds, 
since this means we can take the source code at face value for what's in 
the binaries).


Embedded timestamps are considered bad because they are usually a 
show-stopper for this (and timestamps with second/minute precision still 
are for us). There's a different kind of system that tries to prove the 
absence of reproducible builds issues - I've referred to this as "build 
environment fuzzing" in the past and it's the kind of thing 
tests.reproducible-builds.org does.


These results also still exist for Arch Linux[1] (since 2017), and if 
you're concerned about this you could check over there, but since Arch 
Linux _integrates_ with other eco-systems (instead of re-implementing 
them like Debian tries to), some builds fail to build if the clock is 
too far off, since https certificates would be considered expired. 
There's a lot of `curl -k` going on to work around this, but e.g. cargo 
has no option to "turn off all security", so these packages simply won't 
build on there.


[1]: https://tests.reproducible-builds.org/archlinux/

In late 2019 it turned out to be easier to "do the real thing" instead 
of trying to find more workarounds, and "not having enough 
true-positives" isn't really a problem we're having at the moment. If 
you find a false-negative please shout.


If anybody is bothered by the claims Arch Linux is making they're very 
welcome to run a rebuilder with a clock that is off by 48h (this would 
be interesting to have, but still wouldn't guarantee the absence of 
other reproducible builds issues, like missing Cargo.lock files).



Apart from Guix pushing bootstrappable builds for quite some time,
recent builds of Freedesktop SDK (container userland mostly used for
flatpaks) are fully bootstrapped from stage0 - except for Rust which is
not boostrapped via mrustc but built using the binary package from 
upstream.


Is there any public website I could look at for results? According to 
our tests, having reproducible distro tooling isn't enough because 
there's still plenty of opensource software doing silly things in their 
build processes.



Assuming I wanted to bootstrap some (non-reproducible) Arch setup from
Freedesktop SDK and then use it to verify the reproducible builds, what
steps would I have to take?


If you want to bootstrap the 114 packages that are present in 
docker.io/library/archlinux from source, you would need to:


- Build any version of pacman (which is C and shell scripts, but for 
makepkg you might even get away with just the shell scripts)
- Download all 114 buildinfo files for these packages (they are 
contained inside of the package itself)
- Identify all packages and their versions that are referenced in there 
as build dependency
- Build these packages on Freedesktop SDK with `makepkg --nodeps`, this 
disables dependency checks and simply assumes the required 
tools/compilers are going to be in $PATH - the checksums of packages 
built this way are naturally going to be different from the official 
packages but that's ok
- Use the packages you built to setup the build environment that is 
described in each buildinfo file
- Run the build with makepkg and SOURCE_DATE_EPOCH set to the value in 
the buildinfo file


This should result in exact matches of the official packages, but of 
course there are a few things that could go wrong so I can not make any 
guarantees.


Instead of doing the last two steps you could also remove the signature 
checks in archlinux-repro[2] and populate its download cache folder with 
the packages you built yourself, archlinux-repro then takes care of the 
rest.


[2]: https://github.com/archlinux/archlinux-repro


Has anything like that been tried for Arch? How many dependency loops
are there in the build dependencies of the packages mentioned above, and
can they be broken by using packages from Freedesktop SDK?


I'm not aware of anybody having tried this. There wasn't much point in 
trying without having achieved reproducible builds first.


cheers,
kpcyrd