I have generated the dict data size on TrieDictionaryForestBenchmark. If cardinality is less than 20000, the dict size will be less than 802KB. WIll the cardinality be less than 20000 to set a col as dict if we want to speed up query speed if the cell size (less than 1MB) is limit by hbase admin?
cardinality 0 10000 20000 30000 40000 50000 60000 70000 dict size 64B 406KB 802KB 1MB 1MB 1MB 2MB 2MB 2017-11-25 22:42 GMT+08:00 杨浩 <yangha...@gmail.com>: > Thanks. The biggest number has been writen in "Kylin Guide", but it may > affect query performance for hbase limit of KV cell size. As there are > many cubes in the KYLIN , query server would fetch the dict from hbase many > times. Our hbase admin says, if a KV size is under about 500 KB, the query > perfmance can be guaranteed. So the dict size should be less than 500KB in > our env. > > We may choose 1 million or half of that as the guide to use dict to ensure > the query perfmance > > 2017-11-24 22:25 GMT+08:00 ShaoFeng Shi <shaofeng...@apache.org>: > >> The cap is 5 million I remember, But it's better to control that less >> than 1 million. >> >> 2017-11-24 20:33 GMT+08:00 杨浩 <yangha...@gmail.com>: >> >>> There are many cubes in our kylin env. Can any one give the numer of how >>> big cardinal of a column if we want to code a column as dict? >>> >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> >