> > > This:
> > >
> > > diff --git a/src/liblzma/common/stream_decoder_mt.c 
> > > b/src/liblzma/common/stream_decoder_mt.c
> > > --- a/src/liblzma/common/stream_decoder_mt.c
> > > +++ b/src/liblzma/common/stream_decoder_mt.c
> > > @@ -786,7 +786,7 @@ read_output_and_wait(struct lzma_stream_coder *coder,
> > >                                 if (mythread_cond_timedwait(&coder->cond,
> > >                                                 &coder->mutex,
> > >                                                 wait_abs) != 0) {
> > > -                                       ret = LZMA_TIMED_OUT;
> > > +                                       ret = LZMA_OK;
> > >                                         break;
> > >                                 }
> > >                         } else {
> > >
> > > Should "fixes" it. At some point the main thread needs to check if the
> > > next thread is able to make progress or not and then return LZMA_OK so
> > > that the upper layer can figure out that no progress is made. Otherwise
> > > it stucks in the LZMA_TIMED_OUT loop.
> >
> > This fixes it just for xz, but if no timeout is specified it will
> > still deadlock.
> > I haven't looked at the code enough to understand how to fix it for both,
> > but I will start to look into that.
>
> No, it is for everyone. In the timeout case we need to check if the
> first thread in line can make progress. If we don't provide data new
> data to the thread and the thread consumed everything it had then the
> thread won't make progress. If the function gets invoked with 0 new data
> then we should return LZMA_OK at which point the upper layer (between XZ
> binary and the library) will notice that no progress is made and return
> an error to xz.

I tested this locally by setting the timeout for xz to 0 on line 57 of
src/xz/coder.c and it deadlocks on a truncated xz file. It seems like a
race condition because it deadlocks ~90% of the time in my liblzma tests.
When debugging, I can see that the main thread does not return from
lzma_code when the LZMA_FINISH action is given. When the decoder receives
LZMA_FINISH, it should probably signal to the worker threads that no more
input is coming and to exit with error if they are in the middle of a
block. Even if LZMA_FINISH is not given, we still need to avoid deadlock at
all costs. I will come up with a patch for this since it sounds like
a fun problem :)

> The above patch is not correct because if you do it as I suggeted then
> it is possible that an error is returned because the thread is slow and
> did not yet make progress.

I agree that the above patch only "fixes" the problem. Small timeouts and
slow threads will result in false positive deadlock detections.

> > I was following the conversation about the soft and hard memory limiting.
> > If a user wanted decoding to fail if it can't be done multithreaded and 
> > update
> > the memory limit as needed, that can't be done right now. It's a minor issue
> > that only matters for liblzma, but it would be nice to be able to update 
> > both
> > limits after decoding has started. I don't consider this a bug, more like
> > missing a nice to have feature. One easy solution is to add a new API
> > function to liblzma to update the soft memory limit for the multithreaded
> > decoder and do nothing / return an error on all other coders. I will add
> > a patch for this if you guys think it is a good idea.
>
> Maybe I missundertand you. But if you set memlimit_stop to the same
> value as memlimit_threading then you have either multi threaded
> decompression or none at all right?

I understand that setting memlimit_stop = memlimit_threading will result in
either threaded decoding or no decoding. The problem is only the
memlimit_stop can be updated. There is no way to update memlimit_threading
after initialization to ensure the "give me threaded decoding or none
at all" is still true. I don't think most users will care about
this, but it would be nice to provide the flexibility for those that do.

Jia Tan

Reply via email to