Re: Introducing: Semantically reproducible builds

2023-06-02 Thread David A. Wheeler
> On Jun 2, 2023, at 12:26 PM, Andreas Enge wrote: > > Hello, > > Am Fri, Jun 02, 2023 at 11:39:42AM -0400 schrieb David A. Wheeler: >> I think the OSSGadget folks aren't fussed about this, because they're merely >> using this definition to explain what they're doing. > > my first reaction

Re: Introducing: Semantically reproducible builds

2023-06-02 Thread Andreas Enge
Hello, Am Fri, Jun 02, 2023 at 11:39:42AM -0400 schrieb David A. Wheeler: > I think the OSSGadget folks aren't fussed about this, because they're merely > using this definition to explain what they're doing. my first reaction also was "this is not a definition"! They are a few vague words

Re: Introducing: Semantically reproducible builds

2023-06-02 Thread David A. Wheeler
> On Jun 2, 2023, at 11:10 AM, Ed Warnicke wrote: > Please don't get me wrong, the OSSGadget folks may be doing *really* good > work. My complaint is that the definition of "Semantically Reproducible" is > effectively unusable as written above. Can it be tightened ups to something > that

Re: Introducing: Semantically reproducible builds

2023-06-02 Thread Holger Levsen
hi, I was busy with the Debian Hamburg Reunion 2023 last week and the first half of this, so I only started catching up on this thread yesterday... On Fri, Jun 02, 2023 at 10:46:16AM -0400, David A. Wheeler wrote: > Fair enough. The immediate issue is to reduce confusion. > > The OSSGadget

Re: Introducing: Semantically reproducible builds

2023-06-02 Thread Ed Warnicke
> A project build is `semantically equivalent` if its build results can be either recreated exactly (a bit for bit [reproducible build]( https://en.wikipedia.org/wiki/Reproducible_builds)), or if the differences between the release package and a rebuilt package are not expected to produce

Re: Introducing: Semantically reproducible builds

2023-06-02 Thread David A. Wheeler
> On May 31, 2023, at 10:36 AM, Ed Warnicke wrote: > > I tend to think about reproducible builds in this generalizable way: > > A build is reproducible if equivalent inputs (source, build tools, build tool > invocation, etc) to the build result in equivalent outputs. Fair enough. The

Re: Introducing: Semantically reproducible builds

2023-05-31 Thread Ed Warnicke
I tend to think about reproducible builds in this generalizable way: A build is reproducible if equivalent inputs (source, build tools, build tool invocation, etc) to the build result in equivalent outputs. The question then becomes: how are input equivalence and output equivalence defined?

Re: Introducing: Semantically reproducible builds

2023-05-30 Thread David A. Wheeler
> On May 30, 2023, at 10:51 AM, David A. Wheeler wrote: > I'll file an issue with OSSGadget > to propose that they rename "semantically reproducible build" to > "semi-reproducible build", > but I can't guarantee that they'll change the name. Since it's

Re: Introducing: Semantically reproducible builds

2023-05-30 Thread David A. Wheeler
> On May 29, 2023, at 4:19 PM, Vagrant Cascadian > wrote: > > On 2023-05-29, Bernhard M. Wiedemann via rb-general wrote: >> >> That 'semi-' prefix should give people a good hint of what it is and if >> not, encourage them to ask for details. "sort-of-reproducible" or >>

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Vagrant Cascadian
eproducibility locally and see what changes are necessary to fix it. Although for huge dependency chains this obviously becomes impractical... but huge dependency chains are arguably impractical in their own right (and yet painfully pervasive). Any tooling that facilitates review of software ob

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Vagrant Cascadian
On 2023-05-29, Bernhard M. Wiedemann via rb-general wrote: > On 29/05/2023 06.10, Vagrant Cascadian wrote: >> Do such tools actually exist, or are we talking about something >> theoretical here? > > https://github.com/openSUSE/build-compare/ is in use for 13 years. > > And strip-nondeterminism can

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread John Gilmore
David A. Wheeler wrote: > Please don't view the text above as opposing reproducible builds. > I think reproducible builds are the gold standard for countering subverted > builds, and I will continue to encourage them. > But when you can't get them (e.g., because you don't have time to patch

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread David A. Wheeler
> On May 29, 2023, at 12:41 PM, kpcyrd wrote: > > I think the pypi example and missing .gitignore file is more about "git and > pypi are both a VCS, did the author commit the same source code". It's about > "what's the canonical source code release" instead of a real build. Huh? PyPI is

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread kpcyrd
On 5/29/23 05:15, David A. Wheeler wrote: Here's an example that might clarify the threat model. It's possible that a program could look for ".gitignore" and run it if present. The source code repo might not have a .gitignore file, but the malicious package added .gitignore and filled it with a

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread David A. Wheeler
On Sun, 28 May 2023 21:10:36 -0700, Vagrant Cascadian wrote: > Do such tools actually exist, or are we talking about something > theoretical here? I am nervous about investing too much energy in > something without a specific, precise, working proof of concept. > > In your earlier mention

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread FC Stegerman
* FC Stegerman [2023-05-29 13:14]: [...] > > I find it hard to believe it could so close that you can programatically > > determine something is (probably!) mostly harmless and yet still have it > > be implausible to go all the way to make a properly reproducible build. > > > > That flys in the

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Janneke Nieuwenhuizen
Vagrant Cascadian writes: > On 2023-05-28, David A. Wheeler wrote: >> On Sun, 28 May 2023 13:04:40 +0100, James Addison via rb-general >> wrote: >>> Thanks for sharing this. >>> >>> I think that the problem with this idea and name are: >>> >>> - That it does not allow two or more people to

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread FC Stegerman
* Vagrant Cascadian [2023-05-29 06:10]: [...] > I still expect it will be harder to actually do "semantically > reproducible builds" than "fully reproducible builds". > > To be honest, it sounds like a lot of extra work to avoid fixing things > properly... +1 > I find it hard to believe it

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Nicolas Vigier
On Mon, 29 May 2023, Bernhard M. Wiedemann via rb-general wrote: > > I very much worry that the meaning of Reproducible Builds may gradually > > get whittled down > > I share this concern, which is why I have been calling this > semi-reproducible to distinguish it from bit-reproducible / >

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Bernhard M. Wiedemann via rb-general
On 29/05/2023 06.10, Vagrant Cascadian wrote: Do such tools actually exist, or are we talking about something theoretical here? https://github.com/openSUSE/build-compare/ is in use for 13 years. And strip-nondeterminism can be used to build another such tool. They will only ever be able to

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread Bernhard M. Wiedemann via rb-general
On 29/05/2023 05.25, David A. Wheeler wrote: If you have tips on common likely errors, please post, I think that would be of interest to many. https://github.com/openSUSE/build-compare/issues/53 https://github.com/openSUSE/build-compare/issues/33

Re: Introducing: Semantically reproducible builds

2023-05-29 Thread ahojlm
On Sun, May 28, 2023 at 09:10:36PM -0700, Vagrant Cascadian wrote: > To be honest, it sounds like a lot of extra work to avoid fixing things > properly... +1 Guessing whether some differences have any semantic effect does not look to me like a generally solvable problem. As others already

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread Vagrant Cascadian
On 2023-05-28, David A. Wheeler wrote: > On Sun, 28 May 2023 13:04:40 +0100, James Addison via rb-general > wrote: >> Thanks for sharing this. >> >> I think that the problem with this idea and name are: >> >> - That it does not allow two or more people to share and confirm that >> they have

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread David A. Wheeler
On Sun, 28 May 2023 08:02:18 +0200, "Bernhard M. Wiedemann via rb-general" wrote: > I agree, that it is good to give it a name (I have called it > semi-reproducible before), but we should be clear on communicating the > disadvantages. Agreed. > However, while working with the tool, I

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread David A. Wheeler
On Sat, 27 May 2023 15:24:25 +0200, kpcyrd wrote: > I think semantically reproducible builds is going to be more expensive > in the long run. I think my intended use case is really different from what you're expecting. In my use case, the "expense" is irrelevant. I'm primarily trying to

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread David A. Wheeler
On Sun, 28 May 2023 13:04:40 +0100, James Addison via rb-general wrote: > Hi David, > > Thanks for sharing this. > > I think that the problem with this idea and name are: > > - That it does not allow two or more people to share and confirm that > they have the same build of some software.

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread Clemens Lang
On Fri, May 26, 2023 at 04:06:44PM -0400, David A. Wheeler wrote: > Reproducible builds are great for showing that a package really was > built from some given source, but sometimes they're hard to do. > > If your primary goal is to determine where the major risks are from > subverted builds, I

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread James Addison via rb-general
Hi David, Thanks for sharing this. I think that the problem with this idea and name are: - That it does not allow two or more people to share and confirm that they have the same build of some software. - That it does not allow tests to fail-early, catching and preventing reproducibility

Re: Introducing: Semantically reproducible builds

2023-05-28 Thread Bernhard M. Wiedemann via rb-general
I agree, that it is good to give it a name (I have called it semi-reproducible before), but we should be clear on communicating the disadvantages. In openSUSE we have been working towards repeatable semantically reproducible builds for over a decade [1] using our open-build-service and a

Re: Introducing: Semantically reproducible builds

2023-05-27 Thread kpcyrd
> It's much easier (and lower cost) for software > developers to create a semantically reproducible build instead of always > creating a fully reproducible build. > Fully reproducible builds are still a gold standard for verifying > that a build has not been tampered with. > However, creating

Re: Introducing: Semantically reproducible builds

2023-05-27 Thread Eric Myhre
I could see myself supporting this. It seems appropriate for the weaker term to require more words (thereby teeing up the opportunity to point out the distinction, which will remain important to do as part of urging further progress).  And this proposal does fit that criteria! Cheers! On

Introducing: Semantically reproducible builds

2023-05-26 Thread David A. Wheeler
Reproducible builds are great for showing that a package really was built from some given source, but sometimes they're hard to do. If your primary goal is to determine where the major risks are from subverted builds, I think a useful backoff is something called a "semantically reproducible