Dear R developers,

I have discovered a bug in the implementation of lzma decompression in 
memDecompress(). It is only triggered if the uncompressed size of the content 
is more than 3 times as large as the compressed content. Here's a simple 
example to reproduce it:

  n <- 200
  
  char <- paste(replicate(n, "1234567890"), collapse="")
  char.comp <- memCompress(char, type="xz")
  char.dec <- memDecompress(char.comp, type="xz", asChar=TRUE)
  nchar(char.dec) == nchar(char)

  raw <- serialize(char, connection=NULL)
  raw.comp <- memCompress(raw, type="xz")
  raw.dec <- memDecompress(raw.comp, type="xz")
  length(raw.dec) == length(raw)

  char.uns <- unserialize(raw.dec)

The root cause seems to be, that lzma_code() will return LZMA_OK even if it 
could not decompress the whole content. In this case strm.avail_in will be 
greater than zero. The following patch changes the respective if statements:

  http://www.statistik.tu-dortmund.de/~olafm/temp/memdecompress.patch

It also contains a small fix from the xz upstream for an uninitialized field in 
lzma_stream.

Cheers,
Olaf

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to