On 2022-04-21 Jia Tan wrote:
> The current behavior of LZMA_FINISH in the decoder is a little
> confusing because it requires calling lzma_code a few times without
> providing more input to trigger a LZMA_BUF_ERROR.

The current behavior basically ignores the use LZMA_FINISH when
determining if LZMA_BUF_ERROR should be returned. I understand that it
can be confusing since after LZMA_FINISH there is nothing a new call to
lzma_code() can do to avoid the problem. However, I don't think it's a
problem in practice:

  - Application that calls lzma_code() in a loop will just call
    lzma_code() again and eventually get LZMA_BUF_ERROR.

  - Application that does single-shot decoding without a loop tends to
    check for LZMA_STREAM_END as a success condition and treats other
    codes, including LZMA_OK, as a problem. In the worst case a less
    robust application could break if this LZMA_OK becomes
    LZMA_BUF_ERROR as the existing API doc says that LZMA_BUF_ERROR
    won't be returned immediately. The docs don't give any
    indication that LZMA_FINISH could affect this behavior.

  - An extra call or two to lzma_code() in an error condition doesn't
    matter in terms of performance.

> This patch replaces return LZMA_OK lines with:
> 
> return action == LZMA_FINISH && *out_pos != out_size ? LZMA_BUF_ERROR
> : LZMA_OK;

I don't like replacing a short statement with a copy-pasted long
statement since it is needed in so many places. A benefit of the
current approach is that the handling of LZMA_BUF_ERROR is in
lzma_code() and (most of the time) the rest of code can ignore the
problem completely.

Also, the condition *out_pos != out_size is confusing in a few places.
For example, in SEQ_STREAM_HEADER:

--- a/src/liblzma/common/stream_decoder.c
+++ b/src/liblzma/common/stream_decoder.c
@@ -118,7 +118,8 @@ stream_decode(void *coder_ptr, const lzma_allocator 
*allocator,
 
                // Return if we didn't get the whole Stream Header yet.
                if (coder->pos < LZMA_STREAM_HEADER_SIZE)
-                       return LZMA_OK;
+                       return action == LZMA_FINISH && *out_pos != out_size
+                                       ? LZMA_BUF_ERROR : LZMA_OK;
 
                coder->pos = 0;

In SEQ_STREAM_HEADER no output can be produced, only input will be
read. Still the condition checks for full output buffer which is not
only confusing but wrong: if there was an empty Stream ahead, having no
output space would be fine! In such a situation this can return LZMA_OK
even when the intention was to return LZMA_BUF_ERROR due to truncated
input. To make this work, only places that can produce output should
check if the output buffer is full.

However, I don't think the current behavior is worth changing. As you
pointed out, it is a bit weird (and I had never noticed it myself
before you mentioned it). It's not actually broken though and some
applications doing single-shot decoding might even rely on the current
behavior. Trying to change this could cause problems in rare cases and,
if not done carefully enough, introduce new bugs. So I thank you for
the patch but it won't be included.

-- 
Lasse Collin

Reply via email to