I quickly tried these with "XZEncDemo 2". I used the preset 2 because that uses LZMAEncoderFast instead of LZMAEncoderNormal where the negative lengths result in a crash. The performance was about the same or worse than the original code. I don't know why. I didn't spend much time on this and it's possible that I messed up something.
One thing that may be worth checking out is how in HC4.java (and BT4.java too) the patch doesn't try to quickly skip too short matches like the original code does. I suppose the first set of patches should be such that they only replace the byte-by-byte loops with a function call to make comparison as fair as possible. These patches won't get into XZ for Java 1.9 but might be in a later version if I see them being/becoming good. The only remaining patch that might get into 1.9 is LZDecoder.repeat improvements. When you post a patch or other code, please make sure that word-wrapping is disabled in the email client or use attachments. Thanks! -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode