On 2021-11-29 Jia Tan wrote:
> This patch addresses the issues with reproducible builds when using
> multithreaded xz. Previously, specifying --threads=1 instead of
> --threads=[n>1] creates different output. Now, setting any number of  
> threads forces multithreading mode, even if there is only 1 worker
> thread.

This is an old problem that should have been fixed long ago.
Unfortunately I think the fix needs to be a little more complex due to
backward compatibility.

With this patch, if threading has been enabled, no further option on
the command line (except --flush-timeout) will disable threading.
Sometimes there are default options (for exampe, XZ_DEFAULTS) that
enable threading and one wants to disable it in a specific situation
(like running multiple xz commands in parallel via xargs). If
--threads=1 always enables threading, memory usage will be quite a bit
higher than in non-threaded mode (94 MiB vs. 166 MiB for the default
compression level -6; 674 MiB vs. 1250 MiB for -9).

To be backward compatible, maybe it needs extra syntax within the
--threads option or a new command line option. Both are a bit annoying
and ugly but I don't have a better idea.

Currently one-thread multi-threading is done if one specifies two or
more threads but the memory limit is so low that only one thread can be
used. In that case xz will never switch to non-threaded mode. This
ensures that the output file is always the same even if the number of
threads gets reduced.

When -T0 is used, that is broken in sense that threading mode (and
thus encoded output) depends on how many hardware threads are supported.
So perhaps -T0 should mean that multi-threaded mode must be used even
for single thread (your patch would do this too).

A way to explicitly specify one-thread multi-threaded mode is still
needed but I guess it wouldn't need to be used so often if -T0 handles
it already. -T0 needs improvements in default memory usage limiting too,
and both changes could make the default behavior better.

The opposite functionality could be made available too: if the number
of threads becomes one for whatever reason, an option could tell xz to
always use single-threaded mode to get better compression and to save
RAM.

> +#include "common.h"
[...]
> // The max is from src/liblzma/common/common.h.
> hardware_threads_set(str_to_uint64("threads",
> - optarg, 0, 16384));
> + optarg, 0, LZMA_THREADS_MAX));

common.h is internal to liblzma and must not be used from xz. Maybe
LZMA_THREADS_MAX could be moved to the public API, I don't know right
now.

-- 
Lasse Collin

Reply via email to