git quirk: core.autocrlf
Hi folks, This message isn't _directly_ related to reproducible builds, but it does relate to unexpected differences in text (including, potentially, source code) checked out from git repositories, and I think that that could be relevant to the audience here. Some of the code within the Sphinx documentation generator removes carriage-return ('\r' in Python string literal escape code notation) characters from input documents before checksumming them, and that part of the code puzzled me - generally any kind of content modification before checksumming seems like a code smell to me. The relevant code removes those carriage-returns so that the checksums produced are in a sense cross-platform compatible; that is, the 'same content' produces the same checksum whether the platform uses CRLF or LF line-endings. Now, Python itself does include some functionality[1] to handle what it refers to as 'universal newlines'; newlines in strings are generally represented using a single '\n' character, that is serialized and deserialized to CRLF or LF as platform-appropriate. This is stable, mature and well-established behaviour at this point. That universal newline handling may cause problems in some cases if not handled carefully, but surprisingly -- at least to me -- 'git' itself also automatically converts the line-endings of files to the local platform's standard. I suppose this makes sense so that developer tooling designed for each platform works as-expected with text stored in git repositories (which, internally, store the newlines using LF). However it does mean that the checksums of files checked out from the same origin git repository can differ on different OS platforms. Overriding this behaviour on a per-file basis is possible using .gitattributes config[2] file(s) within the repository, or alternatively a git client system system can use the 'core.autocrlf' configuration setting[3] to specify the desired line-ending-conversion method. Again: this is probably slightly off-topic and perhaps not of direct relevance to anyone on the list today. However, it seems like the kind of issue that is useful to be aware of if-and-when puzzling over unexpected git content / checksum issues (situations that I _do_ expect people on this list encounter from time-to-time). Regards, James [1] - https://docs.python.org/3.12/glossary.html#term-universal-newlines [2] - https://git-scm.com/docs/gitattributes [3] - https://git-scm.com/docs/git-config#Documentation/git-config.txt-coresafecrlf PS: For anyone concerned that this might inadvertently expose some kind of checksumming vulnerability; I briefly worried about that after determining the line-ending behaviour to be the cause. Padding of source files with carriage-returns could be a way for bad actors to attempt to find checksum collisions, yes; but equally, newlines -- or spaces -- are available to achieve the same. Are there any languages that attempt to prevent arbitrary source code padding so that checksum-space-exploration from a known code plaintext is constrained? Golang and other languages that require or support autoformatting may be the safest bets.
Re: Arch Linux minimal container userland 100% reproducible - now what?
Hi John, On Fri, 29 Mar 2024 at 19:29, John Gilmore wrote: > > kpcyrd wrote: > > 1) There's currently no way to tell if a package can be built offline > > (without trying yourself). > > Packages that can't be built offline are not reproducible, by > definition. They depend on outside events and circumstances > in order for a third party to reproduce them successfully. > > So, fixing that in each package would be a prerequisite to making a > reproducible Arch distro (in my opinion). This perspective is valuable because it is certainly true that unreliable or unexpected responses from a network adapter could cause software builds to fail, be delayed, or contain errors. However I fail to see why any of those circumstances would not be equally possible in the case of equivalent responses from physically or locally attached I/O devices. A storage device could be considered a node on a local network that no other host is able to communicate with directly; and to my knowledge it's rarely the case that traffic to-and-from local storage devices is inspected for integrity by hardware/software outside of the device that it is connected to (which isn't necessarily the place that it makes sense to run those checks). My guess is that we could get into near-unsolvable philosophical territory along this path, but I think it's worth being skeptical of the notions that local-storage is always trustworthy and that the network should always be avoided. Regards, James
Re: Two questions about build-path reproducibility in Debian
Thanks, Chris, On Sun, 31 Mar 2024 at 13:01, Chris Lamb wrote: > > Hi James, > > > Approximately thirty are still set to other severity levels, and I plan to > > update those with the following adjusted messaging […] > > Looks good to me. :) > > Completely out of interest, are any of those 30 bugs tagged both > "buildpath" and "toolchain"? It's written nowhere in Policy (and I > can't remember if it's ever been discussed before), but if package X > is causing package Y to be unreproducible, I feel that has some > bearing on the severity of the bug for that issue filed against X… > completely independent of whether package X is reproducible itself or > not. :) None of the remaining thirty-or-so (and in fact, none of the 66 updated so far) are usertagged both 'buildpath' and 'toolchain'. I would say that a few of them _are_ 'toolchain packages' -- mono, binutils-dev and a few others -- but for these bugs the buildpath issues are internal to each package at build-time and do not affect the construction of other packages in their ecosystem. > Just to underscore that this is simply my curiosity before you > reassign: in the particular case of *buildpath* AND toolchain, these > should almost certainly be wishlist anyway because, as discussed, we > "aren't testing buildpath". Mostly agree. Of the bugs in Debian that _are_ usertagged both buildpath and also toolchain, a few of them appear to have possible known/tested fixes, but in some cases are awaiting maintainer/upstream support. Using a static buildpath seems like it should mitigate most concern there, but if that were not the case, then the severity of those could perhaps be re-argued based on the quantity, popularity and importance of affected software (packaged or otherwise). Regards, James
Re: Two questions about build-path reproducibility in Debian
Hi again, On Mon, 11 Mar 2024 at 18:24, James Addison wrote: > > Hi folks, > > On Wed, 6 Mar 2024 at 01:04, James Addison wrote: > > [ ... snip ...] > > > > The Debian bug severity descriptions[1] provide some more nuance, and that > > reassures me that wishlist should be appropriate for most of these bugs > > (although I'll inspect their contents before making any changes). > > Please find below a draft of the message I'll send to each affected bugreport. > > Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ > continue to test build-path variance, at least until we decide otherwise. > > --- BEGIN DRAFT --- > Because Debian builds packages from a fixed build path, customized build paths > are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or > during > package builds on the Reproducible Builds team's package test infrastructure > for Debian[1]. > > This means that this package will pass current reproducibility tests; however > we still believe that source code and/or build steps embed the build path into > binary package output, making it more difficult that necessary for independent > consumers to confirm whether their local compilations produce identical binary > artifacts. > > As a result, this bugreport will remain open and be assigned the 'wishlist' > severity[2]. > > ... > > [1] - https://tests.reproducible-builds.org/debian/reproducible.html > > [2] - https://www.debian.org/Bugs/Developer#severities > --- END DRAFT --- Most of the remaining buildpath bugs have been updated to severity 'wishlist'. Approximately thirty are still set to other severity levels, and I plan to update those with the following adjusted messaging: --- BEGIN DRAFT --- Control: severity -1 wishlist Dear Maintainer, Currently, Debian's buildd and also the Reproducible Builds team's testing infrastructure[1] both use a fixed build path when building binary packages. This means that your package will pass current reproducibility tests; however we believe that varying the build path still produces undesirable changes in the binary package output, making it more difficult than necessary for independent consumers to check the integrity of those packages by rebuilding them themselves. As a result, this bugreport will remain open and be re-assigned the 'wishlist' severity[2]. You can use the 'reprotest' package build utility - either locally, or as provided in Debian's Salsa continuous integration pipelines - to assist uncovering reproducibility failures due build-path variance. For more information about build paths and how they can affect reproducibility, please refer to: https://reproducible-builds.org/docs/build-path/ ... [1] - https://tests.reproducible-builds.org/debian/reproducible.html [2] - https://www.debian.org/Bugs/Developer#severities --- END DRAFT --- Thanks for your feedback and suggestions, James
Re: Two questions about build-path reproducibility in Debian
Hi folks, On Wed, 6 Mar 2024 at 01:04, James Addison wrote: > [ ... snip ...] > > The Debian bug severity descriptions[1] provide some more nuance, and that > reassures me that wishlist should be appropriate for most of these bugs > (although I'll inspect their contents before making any changes). Please find below a draft of the message I'll send to each affected bugreport. Note: I confused myself when writing this; in fact Salsa-CI reprotest _does_ continue to test build-path variance, at least until we decide otherwise. --- BEGIN DRAFT --- Because Debian builds packages from a fixed build path, customized build paths are _not_ currently evaluated by the 'reprotest' utility in Salsa-CI, or during package builds on the Reproducible Builds team's package test infrastructure for Debian[1]. This means that this package will pass current reproducibility tests; however we still believe that source code and/or build steps embed the build path into binary package output, making it more difficult that necessary for independent consumers to confirm whether their local compilations produce identical binary artifacts. As a result, this bugreport will remain open and be assigned the 'wishlist' severity[2]. ... [1] - https://tests.reproducible-builds.org/debian/reproducible.html [2] - https://www.debian.org/Bugs/Developer#severities --- END DRAFT ---
Re: Two questions about build-path reproducibility in Debian
Hi Vagrant, Narrowing in on (or perhaps nitpicking) a detail: On Mon, 4 Mar 2024 at 20:41, Vagrant Cascadian wrote: > > On 2024-03-04, John Gilmore wrote: > > Vagrant Cascadian wrote: > >> > > to make it easier to debug other issues, although deprioritizing them > >> > > makes sense, given buildd.debian.org now normalizes them. > > > > James Addison via rb-general > > wrote: > >> Ok, thank you both. A number of these bugs are currently recorded at > >> severity > >> level 'normal'; unless told not to, I'll spend some time to double-check > >> their > >> details and - assuming all looks OK - will bulk downgrade them to > >> 'wishlist' > >> severity a week or so from now. > > Well, I think we should change it to "minor" rather than "wishlist" > severity, but that may be splitting hairs; I do not find a huge amount > of difference between debian bug severities... they are pretty much > either critical/serious/grave and thus must be fixed, or > normal/minor/wishlist and fixed when someone feels like it. The Debian bug severity descriptions[1] provide some more nuance, and that reassures me that wishlist should be appropriate for most of these bugs (although I'll inspect their contents before making any changes). Regards, James [1] - https://www.debian.org/Bugs/Developer#severities
Re: Two questions about build-path reproducibility in Debian
On Wed, 28 Feb 2024 at 12:06, Chris Lamb wrote: > > Vagrant Cascadian wrote: > > > There are real-world build path issues, and while it is possible to work > > around them in various ways, I think they are still issues worth fixing > > to make it easier to debug other issues, although deprioritizing them > > makes sense, given buildd.debian.org now normalizes them. > > +1. > > And for this reason, I think we should keep the buildpath-related > bugs as well. They should all be 'wishlist' priority anyway, and I > wouldn't like to bet my hat that the usertag metadata is accurate and > comprehensive enough to blindly close them in the first place. (We > only really used the usertags to do some rough-and-ready statistics > on broad issue categories.) Ok, thank you both. A number of these bugs are currently recorded at severity level 'normal'; unless told not to, I'll spend some time to double-check their details and - assuming all looks OK - will bulk downgrade them to 'wishlist' severity a week or so from now.
Re: reprotest: inadvertent misconfiguration in salsa-ci config
Hi Chris, Vagrant, On Tue, 27 Feb 2024 at 17:44, Vagrant Cascadian wrote: > > On 2024-02-27, Chris Lamb wrote: > >> * Update reprotest to handle a single-disabled-varations-value as a > >> special case - treating it as vary and/or emitting a warning. > > Well, I would broaden this to include an arbitrary number of negating > options: > > --variations=-time,-build_path > > That seems just as invalid. > > The one special case I could see is "--variations=-all" where you might > want to be normalizing as much as possible. Hmm, yep. So when there are only subtractions, we _could_ imply that there is an implicit '+all' at the beginning of the 'variations' argument. And along that line of thinking, we could emit a warning to stderr: $ reprotest auto --dry-run --variations=-timezone Implicitly expanding variations '-timezone' to '+all,-timezone' ... > > On whether to magically/transparently fix this, needless to say, it's > > considered bad practice to change the behaviour of software that has > > already been released — I would, as a rule, subscribe to that idea. > > However, we should bear in mind that this idea revolves around what > > users are *expecting*, not necessarily what the software actually > > does. > > > > I say that because I hazard that all 400 usages are indeed expecting > > that `--variations=-foo` functions the same as `--variations=all,-foo` > > (or `--vary=-foo`), and so this proposed change would merely be > > modifying reprotest to reflect their existing expectations. It would > > not therefore be a violation of the "don't break existing > > functionality" dictum. > > > > (Saying that, the addition of a warning that we are doing so would > > definitely not go amiss.) > > Hrm. Less inclined toward this approach; expectations can shift with > time and context and culture and whatnot. That said, I agree the current > behavior is confusing, and we should change something explicitly, rather > than implicitly... Changing-existing-behaviours could arguably be even more problematic for cases like this where we're talking about continuous integration checks. Breaking/unbreaking unrelated CI pipelines seems like something we should be careful to avoid. > >> * Treat removal of a variance factor from an already-empty-context > >> as an error. > > > > I'm also tempted by this as well. :) How would this be experienced by > > most DDs? Would their new pushes to Salsa now suddenly fail in the > > reprotest job of the pipeline? If so, that's not too awful, given that > > the prominent error message would presumably let them know precisely > > how to fix it. > > I would much prefer an error message if we can correctly identify this. That'd be nice - perhaps something like: Failed to parse variations: '-timezone'; did you mean '+all,-timezone'? I've opened a merge request[1] to explore this error-treatment approach; it lacks useful error messaging so far, but I'll attempt to add that soon. > Some possible expected behaviors to consider treating as invalid, and > issue an error: > > --variations=-build_path > > --variations=-time,-build_path > > This almost makes me want to entirely deprecate --variations, and switch > to recommending "--vary=-all,+whatever" or "--vary=-all > --vary=+whatever" instead of ever using --variations. > > I'm not sure the variations syntax enables much that cannot be more > unambiguously expressed with --vary. I do think that supporting two command-line argument names that provide similar operations (and use similar names!) is confusing. However I'm inclined to limit the effect of any behaviour changes here to the specific cases that we know are problematic (ref previous thoughts about CI infrastructure). > That said, the reprotest code is a bit hairy, and I am not sure what > sort of refactoring will be needed to make this possible. In particular, > how --auto-build is implemented, where it systematically tests each > variation one at a time. That said, Refactoring might be needed > regardless. :) That's a neat bit of functionality in auto-build. As far as I can tell, it seems agnostic of whether the build specifications are provided by 'vary' or 'variations' -- but test coverage would be better at confirming that. Regards, James
reprotest: inadvertent misconfiguration in salsa-ci config
Hello, A few hundred packages that use reprotest in Salsa-CI appear to be misconfigured; the remainder of this message explains the problem, and asks for help figuring out what to do. Context --- The reprotest[1] utility tests reproducibility of .deb package builds by performing two comparative builds with selective differences in the environment. As documented[2], the extent of build-env difference can be customized using the 'variations' command-line argument, that has a default value of 'all', or similarly the 'vary' argument. These arguments can be used together, and they support plus-or-minus symbols as value prefixes (+/-) to indicate whether a variance factor is being added or removed. The reprotest commandline is parsed in sequence from left-to-right, with each 'vary' argument applied like a patch -- amending existing settings -- while in contrast each 'variations' argument performs a complete reset of the variance context. To examine/confirm reprotest's behaviour locally I can recommend its '--dry-run' argument, instructing it to print what it would do without performing any build actions. Problem: misconfiguration case -- Although the single argument '--variations=-timezone' could reasonably be expected to disable a single form of variance (timezone) during a test, in fact it resets the variance context to empty (it does not contain 'all', begins with an empty context, and then attempts performs a no-op removal of timezone from that). This could allow packages to succeed when they would otherwise fail if the intended level of build variation was enabled. This misconfiguration has occurred in practice, and based on some code searches (example[3]) I believe that around 400 Debian packages are affected by this. Resolution -- My working assumption is that packages that have a single negative-variations entry (like the -timezone example above) intended to disable solely the named factor during reprotest testing. To resolve this it seems that we could: * Update the salsa-ci.yml files in each affected case to replace '--variations=-' with '--vary=-'. * Update reprotest to handle a single-disabled-varations-value as a special case - treating it as vary and/or emitting a warning. * Treat removal of a variance factor from an already-empty-context as an error. * Radically, remove the ability for packages to customize their reprotest arguments at all. To readers of these lists: does this analysis and set of assumptions make sense, and if so: do you prefer/recommend any of the suggested approaches, or have alternative suggestions of your own? Thank you, James [1] - https://salsa.debian.org/reproducible-builds/reprotest [2] - https://salsa.debian.org/reproducible-builds/reprotest/-/blob/6cb0328ea422e12d115737714627850745f93a71/README.rst?plain=1#L299-311 [3] - https://codesearch.debian.net/search?q=path%3Asalsa-ci.yml+SALSA_CI_REPROTEST_ARGS%3A+%27--variations%3D-build-path%27=1
Two questions about build-path reproducibility in Debian
Hi folks, A quick recap: in July 2023, Debian's package build infrastructure (buildd) intentionally began using a fixed directory path during package builds (bug #1034424). Previously, some string randomness existed within each source build directory path. I've two questions related to buildpaths - one relevant to the Salsa-CI team, and the other a RB-team housekeeping question: 1. [Salsa] Recently Debian's CI pipeline was reconfigured[1] to enable more variance in builds. However: I think that change also (inadvertently?) enabled buildpath variation. Is that useful and/or aligned with Debian package migration incentives[2] -- or should we disable that buildpath variance? 2. [RB] Housekeeping: we use Debian's bugtracker to record packages with buildpath-related build problems[3]. Do we want to keep those bugs open, or should we close them? Thanks, James [1] - https://salsa.debian.org/salsa-ci-team/pipeline/-/merge_requests/468 [2] - "Reproducibility migration policy" @ https://lists.debian.org/debian-devel-announce/2023/12/msg3.html [3] - https://udd.debian.org/bugs/?release=any=ign=ign=ign=7=7=only=buildpath=reproducible-builds%40lists.alioth.debian.org=1=id=asc=html#results
Re: Introducing: Semantically reproducible builds
Hi David, Thanks for sharing this. I think that the problem with this idea and name are: - That it does not allow two or more people to share and confirm that they have the same build of some software. - That it does not allow tests to fail-early, catching and preventing reproducibility regressions (semantic or otherwise). - That the naming terminology conflates with true reproducible builds, therefore creating the potential for misunderstanding to consumers. Cheers, James
Re: Sphinx: localisation changes / reproducibility
On Wed, 26 Apr 2023 at 18:48, Vagrant Cascadian wrote: > > On 2023-04-26, James Addison wrote: > > On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian > > wrote: > >> > James Addison wrote: > >> This is why in the reproducible builds documentation on timestamps, > >> there is a paragraph "Timestamps are best avoided": > >> > >> https://reproducible-builds.org/docs/timestamps/ > >> > >> Or as I like to say "There are no timestamps quite like NO timestamps!" > > > > I see a parallel between the use of timestamps as a key for > > data-lookup (as in Holger's developers-reference package), and the use > > of locale as a similar data-lookup key (as in the case of localised > > documentation builds). > > > I'm not sure what the equivalent approach is for localisation, though. > > Command-line software, for example, requires at least one written > > natural-language to be usable, and as a second use case, providing > > natural-language documentation with software is highly recommended (is > > it part of the software? maybe not. but a sufficiently-confusing > > poorly-translated error message could be as serious as a code-related > > bug, I think?). > > > > Linking back to my recent experience with Sphinx, and from the > > perspective of allowing-users-to-verify-their-software, I'd tend to > > think that an ideally-produced, reproducible, localised software would > > include _all_ available translations in the build artifact. Some of > > that could be retrieved at runtime (gettext, for example), and some > > could be static (file-backed HTML documentation, where runtime lookups > > might not be so straightforward). > > I struggle to see the parallel. A timestamp is an arbitrary value based > on when you built it, whereas the locale-rendered document should be > reproducibly translated based on the translations you have available at > the time you run whatever process generates the translated version of > the document/binary, and regardless of the locale of the build > environment. Ok, I think I understand. Please check my understanding, though: I interpret your perspective as matching the ideal-world scenario that John outlined, where the SOURCE_DATE_EPOCH value has no effect at all on the output of the build Until then, I see both the build-time (SOURCE_DATE_EPOCH) and build-locale as inputs that do affect the output of software build systems, and believe that relevant guidance could help projects migrate towards reproducibility. > With runtime translation, you would be desiring translation from the > source language to the operating locale of the environment you've called > it in... but that should still be systematic, no? Runtime translation should be systematic, yes. So recommending that projects use runtime translation (instead of compiling-in separate source files for each language) is good advice. > While there almost certainly might be more than one legitimate > translation for a given work, your process for rendering it should > really only have one particular output given a particular input > (e.g. the source language input and the descriptions of how to translate > it to the desired language)... barring, of course, bugs in the system > ... or am i missing something entirely? No, I don't think you missed anything, and I think we have the same understanding of the components. We're likely arriving from different perspectives on the problem space. My question is approximately this: for some source software developed in a natural language that I don't read or understand, and that includes statically-built documentation (say, HTML files for example), could I determine that the distributed software (an installer file downloaded from the web, for example) recommended to me because it includes support for a natural language that I _do_ understand is identical to the one in the developers' own natural language? (and I think that yes, it's possible: build the source to include the content from all available languages, and distribute that single copy; the translations may be better or worse in some areas, but we can all agree that it is not only the same source, but the same build of that source) > Unless, I guess, you're using some Machine Learning model to produce > your translations? ... well, in honesty I think that Machine Learning could -- and in many cases, perhaps should -- be encouraged towards deterministic/repeatable behaviour. But that's probably a conversation for another thread.
Re: Sphinx: localisation changes / reproducibility
On Sun, 16 Apr 2023 at 00:25, John Gilmore wrote: > > James Addison via rb-general wrote: > > In general, we should be able to > > pick two times, "s" and "t", s <= t, where "s" is the > > source-package-retrieval time, and "t" is the build-time, and using > > those, any two people should be able to create exactly the same > > (bit-for-bit) documentation. I think that SOURCE_DATE_EPOCH generally > > refers to "t". > > I think that SOURCE_DATE_EPOCH generally refers to the check-IN time of > each of the source package(s) being rebuilt. You can retrieve the > packages anytime later than that, and you can do the build at any time > later, and SOURCE_DATE_EPOCH should not change (and the built binaries > and docs should also not change). When the goal is to build the software as it was available to the author at the time of code commit/check-in - and I think that that is a valid use case - then that makes sense. (this ignores a subtlety in the hypothetical case where multiple independent software authors could be tasked with writing to a specification but without access to each others' source dependencies -- in that case there is no notion of the "software as it was available to [each] author". using FOSS licensing and publication solves most of that problem, excepting situations where individual authors are out-of-contact during software development) Inverting the question somewhat: if a single source-base is rebuilt using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01 and 2023-04-18), then what are expected/valid differences in the resulting output?
Re: Sphinx: localisation changes / reproducibility
On Fri, 14 Apr 2023 at 19:51, Holger Levsen wrote: > > Dear James, > > many thanks also from me for your work on this and sharing your findings here. > > I'm another happy sphinx user affected by those problems. :) Thanks, Holger - I think I made a bit of a (verbose) mess of this particular bugfix attempt, but it's a learning experience I suppose :) > somewhat related: > > i'm wondering whether distro-info should respect SOURCE_DATE_EPOCH: > src:developers-reference builds different content based on the build > date, due to using distro-info and distro-info knows that in 398 days > trixie will be released :))) > see > https://tests.reproducible-builds.org/debian/rb-pkg/bookworm/arm64/diffoscope-results/developers-reference.html Although it's probably an alternative understanding of the goal, I (possibly mis)interpret the end result of reproducible builds as: reliable and complete support for time-traveling software users. So: it's a slow, rainy Tuesday in early Y2045, and there's a report that someone's saved game file from Y2019 doesn't load correctly. Well, we could start by retrieving the sources and building the game as it existed back in Y2019 - does it load in that version? In practice, built software could depend on the software and tools installed locally (in a true-ish sense of location), not only time -- but by using time as a universal and ordered clothesrack (?) onto which software can be unambiguously placed (and then non-destructively and freely copied-from), we can all achieve matching software costumes if and when we want to. To return more directly to your question, though: the release date information seems to be in distro-info-data, and I guess that that itself is also updated over time. In general, we should be able to pick two times, "s" and "t", s <= t, where "s" is the source-package-retrieval time, and "t" is the build-time, and using those, any two people should be able to create exactly the same (bit-for-bit) documentation. I think that SOURCE_DATE_EPOCH generally refers to "t". In practice some variance of "s" is allowed for the build dependencies, in the absence of output-affecting bugs/changes relevant to the build. If something is producing different results across builds that share the same fixed "s" and "t" (and I think that the RB dashboard indicates that that is happening for developers-reference?), then either something isn't respecting SOURCE_DATE_EPOCH, or something else in the build environment is affecting the build output. Long story short: that's probably not very helpful -- I have retrieved a few of the sources, but I don't know where the problem is yet. I'll take more of a look soon. (also: although it's not DFSG-compatible, a game that springs to mind in my case, albeit Y2009 or so probably, would be the original Knights of The Old Republic game. not so much a load-game problem, more of a logic error in the game data that meant that I was stuck on some planet without being able to achieve the assigned objectives. I would _probably_ still remember enough of the details to figure out replication details for that bug, given a refresher on some of the levels and objectives involved. that'll have to be a very rainy day) Cheers, James
Re: Sphinx: localisation changes / reproducibility
A follow-up: after doing more work to try to confirm the behaviour of the fix -- something I should have done before even starting development! -- I was confused that I couldn't replicate the original problem when using a version of the codebase _before_ my proposed fix pr#10949 was applied. I now believe that the issue had, in fact, already been fixed between versions v4.5.0 and v5.0.0 of Sphinx - so I've offered a revert of my changes. Details about tracing the location of the existing fix (using 'git bisect') can be found here: https://github.com/sphinx-doc/sphinx/issues/9778#issuecomment-1501172176 Attempting to learn from this: I find the experience interesting because it seems that I deluded myself into believing that I'd resolved a bug -- and unfortunately drew in a bunch of other people's time doing that -- when, in fact, following good time-and-vesion-aware engineering practices (not always easy when keen to contribute to a project, but as demonstrated here, potentially very important) could have avoided much of that work in the first place. The original bugreport was detailed and quite clear, so I think that much of the fault here was from me not carefully reading and considering how to proceed. There could be other learnings / recommendations (I'm mulling over some ideas related to continuous-bug-checking), but I'm unable to make any clear recommendations there yet, partly because I think that bug-checking-and-verification, while valuable, is not always considered desirable or economically-beneficial work -- and partly because writing test cases to reproduce bugs (which could make continuous evaluation of stale bugs easier) often performs 80%+ (guestimate, in my opinion) of the work to find the cause of the bug -- meaning that it's frequently worthwhile to combine with fixing the bug (in other words: there's often a significant overlap between developing a bug reproduction test case and fixing the bug). Probably nothing new to many of the folks on this mailing list and/or seasoned software engineers generally, but I figured I'd try to document my findings :) On Sat, 8 Apr 2023 at 11:10, James Addison wrote: > > Hi folks, > > A set of reproducible-build-related changes[1] that I've developed for > sphinx (a documentation project generator) have been accepted for > inclusion in v6.2.0 of sphinx. > > I'm optimistic that those changes can address a sizable category[2] of > reproducible build failures related to translation of documentation > during software builds (reproducible build testing intentionally > varies the host LANGUAGE setting to shake out unintended sources of > build variation, and some sphinx projects fail rb-tests due to that). > > However.. with the changes merged (although not yet released) I'm > beginning to have some doubts about them. > > The positive effect of the changes is that I expect they'll help to > confirm and achieve reproducibility for a good chunk of remaining > non-reproducible software. > > The downside is: disabling localization -- or perhaps more accurately: > emitting all documentation for each project using 'null'[3] > translation -- seems like a fairly blunt, and perhaps unwelcome (for > consumers) way to achieve reproducibility. > > > A longer, better path to achieve reproducibility would be to support > building documentation in _all_ available translated locales during > Sphinx project builds (something that is not yet supported -- and with > at least one component that I'm aware of (objects.inv) that doesn't > seem to support multi-language content). Doing that should produce > output artifacts that users of any supported locale can access in a > relevant localised way, and that can be made bit-for-bit consistent. > > In summary: I'm writing partly optimistically because I think the > merged changes could improve and help to confirm reproducibility of > software during testing. But I also feel a bit conflicted about the > way I've approached the changes and their implications, so I'm also > keen to gather feedback and thoughts. > > Thank you, > James > > [1] - https://github.com/sphinx-doc/sphinx/pull/10949 > > [2] - > https://tests.reproducible-builds.org/debian/issues/unstable/sphinxdoc_translations_issue.html > > [3] - > https://docs.python.org/3/library/gettext.html#the-nulltranslations-class
Sphinx: localisation changes / reproducibility
Hi folks, A set of reproducible-build-related changes[1] that I've developed for sphinx (a documentation project generator) have been accepted for inclusion in v6.2.0 of sphinx. I'm optimistic that those changes can address a sizable category[2] of reproducible build failures related to translation of documentation during software builds (reproducible build testing intentionally varies the host LANGUAGE setting to shake out unintended sources of build variation, and some sphinx projects fail rb-tests due to that). However.. with the changes merged (although not yet released) I'm beginning to have some doubts about them. The positive effect of the changes is that I expect they'll help to confirm and achieve reproducibility for a good chunk of remaining non-reproducible software. The downside is: disabling localization -- or perhaps more accurately: emitting all documentation for each project using 'null'[3] translation -- seems like a fairly blunt, and perhaps unwelcome (for consumers) way to achieve reproducibility. A longer, better path to achieve reproducibility would be to support building documentation in _all_ available translated locales during Sphinx project builds (something that is not yet supported -- and with at least one component that I'm aware of (objects.inv) that doesn't seem to support multi-language content). Doing that should produce output artifacts that users of any supported locale can access in a relevant localised way, and that can be made bit-for-bit consistent. In summary: I'm writing partly optimistically because I think the merged changes could improve and help to confirm reproducibility of software during testing. But I also feel a bit conflicted about the way I've approached the changes and their implications, so I'm also keen to gather feedback and thoughts. Thank you, James [1] - https://github.com/sphinx-doc/sphinx/pull/10949 [2] - https://tests.reproducible-builds.org/debian/issues/unstable/sphinxdoc_translations_issue.html [3] - https://docs.python.org/3/library/gettext.html#the-nulltranslations-class
Re: alembic / sphinx puzzler
On Thu, Feb 16, 2023 at 6:17 PM Chris Lamb wrote: > Thanks. Please feel free to quote my previous email, as well as link > to my WIP patch. > Let us know when you have an issue number/URL. D'oh - unfortunately I only read these after filing the issue, thanks though. It is reported at: https://github.com/sphinx-doc/sphinx/issues/11198 (and I see you've added context there) > Hm, isn't this just probability at work? As in, because there is 50% > chance that the 2-item set is serialised in any given order, it's only > going to be detected as unreproducible 50% of the time: > > +-+-++ > | Build A | Build B | Result | > +-+-++ > | a, b | a, b | "Reproducible" | > +-+-++ > | b, a | a, b | Unreproducible | > +-+-++ > | a, b | b, a | Unreproducible | > +-+-++ > | b, a | b, a | "Reproducible" | > +-+-++ > I'm not sure; for an event that is truly a random binary choice, that would make sense. In this case, though, I think there may be something about the system initialization prior to the object description code running that produces a predictable, yet differing, result based on environmental factor(s). (I'd prefer to be replying with some detailed findings as a result of experimenting with repeated attempts to generate the documentation during from-scratch builds.. I haven't gotten around to that here, though)
Re: alembic / sphinx puzzler
Hey Chris, On Wed, Feb 15, 2023 at 7:27 PM Chris Lamb wrote: > This change to Sphinx makes alembic reproducible: > > > https://github.com/lamby/sphinx/commit/4ad7670c1df00f82e758aaa8a7b9aaea83b8eaba > > Does this patch work for you? > Yes! Thank you - that's a much better patch than an alternative approach I was working on that attempted to sort the string-typed results of processing the AST. It produces stable results for me when I vary the order of the set-within-a-tuple's elements in the input. I'll file a bug on sphinx's GitHub repository about the original issue within the next few hours. > Why it hasn't been a problem before is rather curious to me, though. > It may be just that it hasn't come up, but it may be because this > value ultimately comes from a Python typing annotation. Yet at that > point in the code, there doesn't seem to be anything special > whatsoever about this tuple & set: it's really just a regular tuple > and set. > Could it be that other Sphinx documentation-generation issues tend to have occluded this one? I also have to admit that I still don't understand what it is that varies (and frequently varies, apparently) across builds that exposed the problem in the first place. I'm not convinced that it's likely to be related to the memory addresses of the datastructures.. I had a vague theory that perhaps filesystem choice/layout could be a cause (perhaps a strange theory at first: my rationale would be that it could cause the AST to read and process files in a different order, and that that could affect the parser's state in subtle ways. I haven't even convinced myself of that possibility entirely yet though). Thanks again, James
alembic / sphinx puzzler
Hi folks, I noticed what _seemed_ like a quick reproducible-build fix for alembic (a database migration framework written in Python). A few hours later though, I'm still puzzled. The problem appears in a similar pattern across various architectures in the diffoscope results for alembic -- both amd64[1] and arm64[2], for example. It looks like the variations are in sphinx-generated documentation where the ordering of collection elements -- like the items in this attribute definition[3] -- differs in the output. My understanding is that sphinx is using Python 3.11's built-in AST parser, which doesn't provide parse-tree traversal order guarantees, during these builds - and that'd seem to make sense as a cause. Also: this might be the same issue as described in 'randomness_in_property_annotations_generated_by_sphinx'[4]. Does anyone have suggestions about how to proceed? I'll likely take more of a look again tomorrow. Thanks, James (note: posting from my work email, because some of my work infrastructure uses alembic, and so I think there's a clear, if small, work-related motivation for this) [1] - https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/alembic.html [2] - https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/alembic.html [3] - https://github.com/sqlalchemy/alembic/blob/a968c9d2832173ee7d5dde50c7573f7b99424c38/alembic/ddl/impl.py#L90 [4] - https://tests.reproducible-builds.org/debian/issues/unstable/randomness_in_property_annotations_generated_by_sphinx_issue.html
Re: buildinfo question
Ah, typical: while trying to figure out where functionality like this could fit into Debian, I learned that it already exists there. The 'dpkg-depcheck' and 'dpkg-genbuilddeps' utilities (both included in the 'devscripts' package) provide this kind of functionality in Debian. On Wed, 14 Dec 2022 at 00:38, James Addison wrote: > > On Tue, 13 Dec 2022 at 18:15, Vagrant Cascadian > wrote: > > > > It would be interesting to do something more systematic like your > > suggestion, though I'm not aware of anything at the moment. > > Thanks Vagrant, that's good to know (it matches my understanding too, > from searching around). > > Roughly speaking, the reason I ask is to see whether it'd be possible > to reduce migration and maintenance burden (in terms of maintainer > time, primarily, although perhaps arguably also compute resources) by > providing information about no-longer-required build dependencies. > > I'll spend some time at the metaphorical drawing board to see whether > something like this could make sense and how to integrate in a > non-disruptive way if so. > > James
Re: buildinfo question
On Tue, 13 Dec 2022 at 18:15, Vagrant Cascadian wrote: > > It would be interesting to do something more systematic like your > suggestion, though I'm not aware of anything at the moment. Thanks Vagrant, that's good to know (it matches my understanding too, from searching around). Roughly speaking, the reason I ask is to see whether it'd be possible to reduce migration and maintenance burden (in terms of maintainer time, primarily, although perhaps arguably also compute resources) by providing information about no-longer-required build dependencies. I'll spend some time at the metaphorical drawing board to see whether something like this could make sense and how to integrate in a non-disruptive way if so. James
buildinfo question
Hi folks, As Debian's buildinfo[1] wiki page hints, it's difficult to determine whether a build dependency is genuinely required at build-time, compared to: it was required in the past, but has become dependency cruft. I was wondering: are there reproducible-builds efforts underway (in Debian or other ecosystems) to determine the packages that were involved (first-pass approximation: at least one file belonging to the package was read from the filesystem by a child of the build process -- anything else?) during a reproducible package build? Thanks, James [1] - https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles