Hi, This is bug report for liblzma.
Background and situation ------------------------------ I've developed 7zip extraction/archive library written by pure python. It depends on python standard liblzma bindings. Now, I got a bug report that an extraction of specific archive is failed with unexpected error. A test file is compressed with LZMA1+BCJ. Issue link: https://github.com/miurahr/py7zr/issues/178 Investigation and reproduce a problem ---------------------------------------------- It can be reproduce only with Python lzma module as follows: ``` import lzma import pathlib def test_lzma_raw_decompressor_lzmabcj(): filters = [] filters.append({'id': lzma.FILTER_X86}) filters.append(lzma._decode_filter_properties(lzma.FILTER_LZMA1, b']\x00\x00\x01\x00')) decompressor = lzma.LZMADecompressor(format=lzma.FORMAT_RAW, filters=filters) with pathlib.Path('lzmabcj.bin').open('rb') as infile: out = decompressor.decompress(infile.read(11327)) assert len(out) == 12800 ``` It failed with assertion error that output length is 12796 (4 bytes less than expected). A test data is produced from 7-zip archive which created by p7zip utility, and distilled only a payload data. Reproduce the problem on liblzma ---------------------------------------- Because Python's lzma module is a thin wrapper of liblzma, I am wondering a liblzma behavior. Here is a test code for xz project. ``` #include "tests.h" static uint8_t buf[12800]; static uint8_t obuf[12800]; static void decompress(size_t in_size) { lzma_ret lzret; const size_t out_size = 12800; lzma_stream strm = LZMA_STREAM_INIT; strm.next_in = buf; strm.avail_in = in_size; strm.next_out = obuf; strm.avail_out = out_size; lzma_options_lzma opt_lzma; succeed(lzma_lzma_preset(&opt_lzma, 0)); lzma_filter filters[3] = { { .id = LZMA_FILTER_X86, .options = NULL }, { .id = LZMA_FILTER_LZMA1, .options = &opt_lzma }, { .id = LZMA_VLI_UNKNOWN, .options = NULL }, }; succeed(lzma_raw_decoder(&strm, filters)); lzret = lzma_code(&strm, LZMA_RUN); if (lzret == LZMA_STREAM_END) { expect(strm.total_in == in_size); expect(strm.total_out == out_size); lzma_end(&strm); return; } expect(lzret == LZMA_OK); expect(strm.total_in == in_size); expect(strm.total_out == 12796); // (*1) this should be 12800 == out_size expect(strm.total_out == out_size); // (*2) XXX: fails here. } extern int main(void) { FILE * filp = fopen("lzmabcj.bin", "rb"); int in_size = fread(buf, sizeof(uint8_t), 11327, filp); decompress(in_size); return 0; } ``` It read and try decompressing a same data as python test used. Expectation and actual result ---------------------------------- This test produce a data, that is as same size as python test. i.e. (*1) assertion wrongly passed ( 4bytes less than expected), and aborting at the line (*2) with correct expectation Download links ------------------ you can download a test data from https://github.com/miurahr/py7zr/files/4872155/lzmabcj.bin.gz An original data reported is https://github.com/miurahr/py7zr/files/4870442/lzmabcj_3.7z.gz Test Data creation ---------------------- The test data is distilled from original data, by striping 32 bytes header and trailers, and save a payload of 11327 bytes, that should be decompressed with lzma_raw_decoder(). -- Hiroshi Miura President of OpenStreetMap Foundation Japan email: miur...@linux.com github: @miurahr