Re: [xz-devel] [PATCH v2] liblzma: Add multi-threaded decoder

Lasse Collin Sun, 24 Jan 2021 13:56:38 -0800

Hello!

I haven't made much progress with this still, I'm sorry. :-( Below are
comments about a few small details. It's not much but I will (slowly)
keep reading and testing.


I applied the outq patch too. The performance numbers you posted looked
promising.


(1) Segfault due to thr->outbuf == NULL

I changed CHUNK_SIZE to 1 to test corner cases. I used
good-1-block_header-1.xz as the test file. It can segfault in
worker_decoder() on the line calling thr->block_decoder.code(...)
because thr->outbuf is NULL (so the problem was introduced in the outq
patch). This happens because of "thr->outbuf = NULL;" later in the
function.

It looks like that it marks the outbuf finished and returns the thread
to the pool too early or forgets to set thr->state = THR_IDLE. As a
temporary workaround, I added "thr->state = THR_IDLE;" after
"thr->outbuf = NULL;".


(2) Block decoder must return LZMA_STREAM_END on success

Because of end marker and integrity check, the output buffer will be
full before the last bytes of input have been processed by the Block
decoder. Thus it is not enough to look at the input and output
positions to determine when decoding has been finished; only
LZMA_STREAM_END should be used to determine that decoding was
successful.

In theory it is OK to mark the outbuf as finished once the output is
full but for simplicity I suggest doing so (and returning the thread to
the pool) only after LZMA_STREAM_END.

I committed a new test file bad-1-check-crc32-2.xz. The last byte in
the Block (last byte of Check) is wrong. Change CHUNK_SIZE to 1 and try
"xz -t -T2 file bad-1-check-crc32-2.xz". The file must be detected to
be corrupt (LZMA_DATA_ERROR).


(3) Bad input where the whole input or output buffer cannot be used

In the old single-threaded decoding, lzma_code() will eventually return
LZMA_BUF_ERROR if the calls to lzma_code() cannot make any progress,
that is, no more input is consumed and no more output is produced. This
condition can happen with correct code if the input file is corrupt in
a certain way, for example, a truncated .xz file.

Since the no-progress detection is centralized in lzma_code(), the
internal decoders including Block decoder don't try to detect this
situation. Currently this means that worker_decoder() should detect it
to catch bad input and prevent hanging on certain malformed Blocks.
However, since the Block decoder knows both Compressed Size and
Uncompressed Size, I think I will improve Block decoder instead so
don't do anything about this for now.

I committed two test files, bad-1-lzma2-9.xz and bad-1-lzma2-10.xz. The
-9 may make worker_decoder() not notice that the Block is invalid. The
-10 makes the decoder hang. Like I said, I might fix these by changing
the Block decoder.


(4) Usage of partial_update in worker_decoder()

Terminology: main mutex means coder->mutex alias thr->coder->mutex.

In worker_decoder(), the main mutex is locked every time there is new
output available in the worker thread. partial_update is only used to
determine when to signal thr->coder->cond.

To reduce contention on the main mutex, worker_decoder() could lock it
only when
  - decoding of the Block has been finished (successfully or
unsuccessfully, that is, ret != LZMA_OK), or
  - there is new output available and partial_update is true; if
partial_update is false, thr->outbuf->pos is not touched.

This way only one worker will be frequently locking the main mutex.
However, I haven't tried it and thus don't know how much this affects
performance in practice. One possible problem might be that it may
introduce a small delay in output availability when the main thread
switches reading from the next outbuf in the list.


(5) Use of mythread_condtime_set()

In the encoder the absolute time is calculated once per lzma_code()
call. The comment in wait_for_work() in in stream_encoder_mt.c was
wrong. The reason the absolute time is calculated once per lzma_code()
call is to ensure that blocking multiple times won't make the timeout
ineffective if each blocking takes less than timeout milliseconds. So
it should be done similarly in the decoder.


(6) Use of lzma_outq_enable_partial_output()

It should be safe to call it unconditionally:

    if (thr->outbuf == coder->outq.head)
        lzma_outq_enable_partial_output(&coder->outq,
                                        thr_do_partial_update);

If outq.head is something else, it is either already finished or
partial output has already been enabled. In both cases
lzma_outq_enable_partial_output() will do nothing.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode

Re: [xz-devel] [PATCH v2] liblzma: Add multi-threaded decoder

Reply via email to