Hi,


I saw many changes related to pyc last week, so I had a look. I don't understand well these issues. Here are my notes to try to understand the context ;-) I don't request any change, I'm fine with the latest choices made in Fedora.

--

There are different issues:

(1) Performance regression on importing .py files
(2) Getting reproducible .rpm binaries
(3) .pyc files are not fully reproducible

In the reverse order:

(3) should be fixed in Python, I reported the bug upstream:
https://bugs.python.org/issue37596

(2) is a work-in-progress, Fedora builds are not reproducible yet.

(1) is the main question here.

--

First of all, the rationale for the .pyc file change in Python 3.7 is described in the PEP 552:

  https://www.python.org/dev/peps/pep-0552/#rationale
  "Reproducibility is important for security"

But I'm not sure how important is it if it's only done half-way? Fedora doesn't seem to support reproducible builds yet, even if recent changes show that it's moving on. Old documents about Fedora:

  https://fedoraproject.org/wiki/Reproducible_Builds
  https://github.com/kholia/ReproducibleBuilds

Debian is making good progress:

  https://wiki.debian.org/ReproducibleBuilds

OpenSUSE is also working on that:

   https://en.opensuse.org/openSUSE:Reproducible_Builds

--

On Python 3.7 and newer, if SOURCE_DATE_EPOCH is set, py_compile uses hash-based pyc file: pyc files don't contain a timepstamp, but a hash instead.

"The default is PycInvalidationMode.CHECKED_HASH if the SOURCE_DATE_EPOCH environment variable is set, otherwise the default is PycInvalidationMode.TIMESTAMP." says py_compile doc.

When importing a .py file, the content of the .py file is hashed and compared to the hash stored in the .pyc file.

For timestamp based .pyc, only the mtime attribute of the .py and .pyc files are compared: .pyc is only regenerated if its mtime is older than the .py file.

Note: Python 3.7 also has a --check-hash-based-pycs command line option, but it looks to be for specific use cases. (See also _imp.check_hash_based_pycs value.)

--

redhat-rpm-config was modified 18 days ago in Fedora to set SOURCE_DATE_EPOCH to the timestamp of the topmost changelog entry:
https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57

OpenSUSE had issues with reproducible Python build and .pyc files:
https://bugzilla.opensuse.org/show_bug.cgi?id=1133809

For this reason, %clamp_mtime_to_source_date_epoch is still off by default (in Fedora and OpenSUSE).

--

SOURCE_DATE_EPOCH was disabled in the Python 3.7 package ("
%global source_date_epoch_from_changelog 0
"), because test_cmd_line_script, test_multiprocessing_main_handling and test_runpy fail if SOURCE_DATE_EPOCH is set. These tests have been fixed in Python 3.8.

--

glib2 sets PYTHONHASHSEED=0 environment variable to workaround one of the remaining bug for reproducible .pyc files: frozenset are not written in a deterministic order in .pyc files:
https://bugzilla.redhat.com/show_bug.cgi?id=1686078

This issue should be fixed in Python, I reported the bug upstream:
https://bugs.python.org/issue37596

--

If SOURCE_DATE_EPOCH is set when building Python, pyc uses the hash, no timestamp. Some people consider that it's a performance regression. So Python was modified to force the usage of timestamp when Python is built: when RPM_BUILD_ROOT env var is set.

https://github.com/fedora-python/cpython/pull/3/files

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
python-devel mailing list -- python-devel@lists.fedoraproject.org
To unsubscribe send an email to python-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/python-devel@lists.fedoraproject.org

Reply via email to