On 2021-01-09 Vitaly Chikunov wrote:
> Few percent of the people report bugs or misfeatures, most just adapt
> or abandon usage. For example,
> 
> 1. We added -T0 to our rpm building system for xz call, and when it's
> failed (tests) the change is just reverted. Yes, we didn't gain speed
> up, but things are working again. How much people did the same?

Possibly many. In some ways the existence of -T0 may even have been
counter-productive since it may make people think xz is smart while it
isn't.

Also worth keeping in mind is that threaded mode in xz tends to produce 
slightly bigger output than single-threaded mode, so when maximum compression 
is wanted threading may need to be avoided anyway.

> 2. More important problem is not just with xz tool, but with liblzma
> xz mt compression.

Yes and it's even harder to fix than xz.

> (I think, they switched to zstd after all.)

I think zstd is nowadays the way to go with package managers unless
file size is very important or the users' connection speeds are known
to be somewhat low (under 10 Mbit/s). Package managers usually download
and decompress as separate steps (not parallel) so zstd's fast
decompression makes up for the extra time spent to download a slightly
bigger file. Threaded decompression with xz won't help much because
most packages are so small that threading doesn't come into play.

> We have to build and test re-build thousands of packages on many
> architectures every day, some of them got compression failures So,
> there is new solution - increase memory limit to 512MiB:

Just to clarify, I see you mean 512 MiB as reserved memory so 3.5 GiB as
the memlimit for liblzma.

> So, this is not like happy careless life out there.

I agree, I don't mean to downplay the severity of the problem.

> Only proper solution, of course, is to make liblzma mt code robust on
> per-thread memory allocation.

That would help but it's not the final answer *in general*.

It's hard to know how much memory the rest of the application (outside
liblzma) may need. If liblzma were more robust against memory
allocation failures, it would still be possible that liblzma manages to
allocate quite a bit of memory and *then* the non-liblzma part of the
application needs to allocate some memory but that fails because
liblzma already grabbed most of the address space. Thus in the end you
could still need hacks like you described. Obviously this isn't a
problem if you *know* that the application won't allocate any memory
during the compression phase.

If 32-bit address space is too limited, one way to is to run the
compressor in a separate process. It may sound like a hack but there's
a reason why 64-bit address space got common.

The new lzma_outq doesn't allocate all buffers at once. If the call to
lzma_block_encoder_init() (and the associated structure
initializations) were moved to get_thread(), it could be a start to
make the encoder tolerate allocation failures because then no input
would be copied to thread-specific buffer before the encoder
initialization has succeeded. There are some details that complicate
the idea but perhaps it should be looked at.

> My ugly and dumb hack does not meant to be merged as is, but (RFC)
> invitation to discussion and would show your attitude to the problem.

Sure. I know I can be stubborn and when memlimits are discussed I may
be extra defensive. Also, I know I have a problem that if all suggested
solutions don't look perfect enough I may too easily reject them. But I
also recognize that the problem you (and others) describe is
significant, I just don't know how to *properly* solve it.

For the xz tool, in the previous email I wondered if it could help to
make -T0 set a memlimit if and only if no limit was otherwise
specified. Then the default memlimit would be specific to the -T0 case
which is known to be silly even on 64-bit arch if one happens to have a
lot of hardware threads but relatively small amount of RAM. I wrote a
rough preliminary patch for this:

diff --git a/src/xz/args.c b/src/xz/args.c
index 9238fb3..0ad924e 100644
--- a/src/xz/args.c
+++ b/src/xz/args.c
@@ -29,6 +29,25 @@ bool opt_ignore_check = false;
 const char stdin_filename[] = "(stdin)";
 
 
+/// True if using --threads=0 to autodetect the number of threads.
+static bool threads_are_autodetected = false;
+
+
+static void
+set_memlimits_for_autodetected_threads(void)
+{
+       if (threads_are_autodetected) {
+               if (hardware_memlimit_get(MODE_COMPRESS) == UINT64_MAX)
+                       hardware_memlimit_set(95, true, false, true);
+
+               if (hardware_memlimit_get(MODE_DECOMPRESS) == UINT64_MAX)
+                       hardware_memlimit_set(95, false, true, true);
+       }
+
+       return;
+}
+
+
 /// Parse and set the memory usage limit for compression and/or decompression.
 static void
 parse_memlimit(const char *name, const char *name_percentage, char *str,
@@ -246,11 +265,14 @@ parse_real(args_info *args, int argc, char **argv)
                        suffix_set(optarg);
                        break;
 
-               case 'T':
+               case 'T': {
                        // The max is from src/liblzma/common/common.h.
-                       hardware_threads_set(str_to_uint64("threads",
-                                       optarg, 0, 16384));
+                       const uint64_t threads = str_to_uint64("threads",
+                                       optarg, 0, 16384);
+                       hardware_threads_set(threads);
+                       threads_are_autodetected = threads == 0;
                        break;
+               }
 
                // --version
                case 'V':
@@ -279,6 +301,10 @@ parse_real(args_info *args, int argc, char **argv)
 
                // --info-memory
                case OPT_INFO_MEMORY:
+                       // If --threads=0 was specified, make its effect
+                       // visible in --info-memory.
+                       set_memlimits_for_autodetected_threads();
+
                        // This doesn't return.
                        hardware_memlimit_show();
 
@@ -635,6 +661,10 @@ args_parse(args_info *args, int argc, char **argv)
        // Then from the command line
        parse_real(args, argc, argv);
 
+       // If --threads=0 was used and there's no memory usage limit set,
+       // set a limit now.
+       set_memlimits_for_autodetected_threads();
+
        // If encoder or decoder support was omitted at build time,
        // show an error now so that the rest of the code can rely on
        // that whatever is in opt_mode is also supported.

Obviously it still only works for 32-bit xz running on 64-bit kernel
because it would set the limit to 4020 MiB if there's a lot of RAM. But
with this patch it would happen without any extra arguments; using -T0
would be enough. So it's a hack but a hack that has minimal effect on
existing behavior outside -T0.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Reply via email to