Hi Saeid, We've tried to make the default compression/encoding a reasonable tradeoff of performance for most common workloads. A couple quick tips I've found from my experiments:
- high-cardinality strings won't be automatically compressed by dictionaries. So, if you have such a large string that might have repeated substrings (eg a set of URLs) then enabling LZ4 compression is a good idea. - if you have strings with a lot of common prefixes, you might consider PREFIX_ENCODING - for integer types, choose the smallest size that fits your intended range. eg don't use int64 for storing a customer's age. On disk it will compress to about the same size, but in memory it will use a lot more space with the larger type. Perhaps others can jump in with further recommendations based on experience. -Todd On Mon, May 7, 2018 at 1:45 AM, Saeid Sattari <saeid.satt...@gmail.com> wrote: > Hi all, > > Folks who have used the column compression and encoding in Kudu tables: > can you share your experiences with the performance? What type of fields > are worse/better (IO bottleneck vs query return time,..) to compress. We > can collect a knowledge base regarding these subjects that users can use in > the future. Thanks. > > Regards, > > -- Todd Lipcon Software Engineer, Cloudera