On Sat, May 17, 2014 at 10:25 PM, Kevin Burton <bur...@spinn3r.com> wrote:
> "compression" … sure.. but bmdiff? Not that I can find. BMDiff is an > algorithm that in some situations could result in 100000x compression due > to the way it's able to find long commons runs. This is a pathological > case though. But if you were to copy the US constitution into itself > … 100000x… bmdiff could ideally get a 100000x compression rate. > > not all compression algorithms are identical. > The compression classes are pluggable. Exploratory patches are always welcome! :D Not sure I understand why you consider Byte Ordered Partitioner relevant, isn't what matters for compressibility generally the uniformity of data within rows in the SSTable, not the uniformity of their row keys? =Rob