[xz-devel] BUG: liblzma: LZMA+BCJ raw decode: output truncated last word

ｍｉｕｒａ＠ｌｉｎｕｘ Fri, 03 Jul 2020 23:31:08 -0700

Hi,

This is bug report for liblzma.


Background and situation
------------------------------

I've developed 7zip extraction/archive library written by pure python.
It depends on python standard liblzma bindings.

Now, I got a bug report that an extraction of specific archive is failed with 
unexpected error.
A test file is compressed with LZMA1+BCJ.

Issue link:  https://github.com/miurahr/py7zr/issues/178



Investigation and reproduce a problem
----------------------------------------------

It can be reproduce only with Python lzma module as follows:

```
import lzma
import pathlib

def test_lzma_raw_decompressor_lzmabcj():
    filters = []
    filters.append({'id': lzma.FILTER_X86})
    filters.append(lzma._decode_filter_properties(lzma.FILTER_LZMA1, 
b']\x00\x00\x01\x00'))
    decompressor = lzma.LZMADecompressor(format=lzma.FORMAT_RAW, 
filters=filters)
    with pathlib.Path('lzmabcj.bin').open('rb') as infile:
        out = decompressor.decompress(infile.read(11327))
    assert len(out) == 12800
```

It failed with assertion error that output length is 12796 (4 bytes less than 
expected).

A test data is produced from 7-zip archive which created by p7zip utility, and 
distilled 
only a payload data.


Reproduce the problem on liblzma
----------------------------------------

Because Python's lzma module is a thin wrapper of liblzma, I am wondering a 
liblzma behavior.

Here is a test code for xz project.

```
#include "tests.h"


static uint8_t buf[12800];
static uint8_t obuf[12800];


static void
decompress(size_t in_size)
{
    lzma_ret lzret;

    const size_t out_size = 12800;
    lzma_stream strm = LZMA_STREAM_INIT;
    strm.next_in = buf;
    strm.avail_in = in_size;
    strm.next_out = obuf;
    strm.avail_out = out_size;

        lzma_options_lzma opt_lzma;
        succeed(lzma_lzma_preset(&opt_lzma, 0));

        lzma_filter filters[3] = {
                { .id = LZMA_FILTER_X86, .options = NULL },
                { .id = LZMA_FILTER_LZMA1, .options = &opt_lzma },
                { .id = LZMA_VLI_UNKNOWN, .options = NULL },
        };

    succeed(lzma_raw_decoder(&strm, filters));

    lzret = lzma_code(&strm, LZMA_RUN);
    if (lzret == LZMA_STREAM_END) {
        expect(strm.total_in == in_size);
        expect(strm.total_out == out_size);
        lzma_end(&strm);
        return;
    }
    expect(lzret == LZMA_OK);
    expect(strm.total_in == in_size);
    expect(strm.total_out == 12796);  //  (*1)  this should be 12800 == out_size
    expect(strm.total_out == out_size); // (*2) XXX: fails here.
}


extern int
main(void)
{
    FILE * filp = fopen("lzmabcj.bin", "rb");
    int in_size = fread(buf, sizeof(uint8_t), 11327, filp);
        decompress(in_size);
        return 0;
}
```

It read and try decompressing  a same data as python test used.


Expectation and actual result
----------------------------------

This test produce a data, that is as same size as python test.
i.e.  (*1) assertion wrongly passed ( 4bytes less than expected), and aborting 
at the line (*2) with correct expectation 


Download links
------------------

you can download a test data from 
https://github.com/miurahr/py7zr/files/4872155/lzmabcj.bin.gz


An original data reported is 
https://github.com/miurahr/py7zr/files/4870442/lzmabcj_3.7z.gz


Test Data creation
----------------------

The test data is distilled from original data, by striping 32 bytes header and 
trailers, and
save a payload of 11327 bytes, that should be decompressed with 
lzma_raw_decoder().



--
Hiroshi Miura
President of OpenStreetMap Foundation Japan
email: miur...@linux.com
github: @miurahr

[xz-devel] BUG: liblzma: LZMA+BCJ raw decode: output truncated last word

Reply via email to