[ https://issues.apache.org/jira/browse/IGNITE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksandr Polovtcev reassigned IGNITE-16102: -------------------------------------------- Assignee: Aleksandr Polovtcev > Store all RocksDB partitions in a single column family. > ------------------------------------------------------- > > Key: IGNITE-16102 > URL: https://issues.apache.org/jira/browse/IGNITE-16102 > Project: Ignite > Issue Type: Improvement > Affects Versions: 3.0.0-alpha3 > Reporter: Ivan Bessonov > Assignee: Aleksandr Polovtcev > Priority: Major > Labels: iep-74, ignite-3 > > Current storage implementation puts each partition in its own column family. > This effectively means that every partition lives in it's own database, > sharing only WAL and some in-memory resources. Given that each column family > has multiple files for LSM trees, the amount of opened file descriptors is > bigger than it needs to be. > Now, the idea is to have a single column family for partitions within a > table. And we should think of possibility of storing several tables in the > same RocksDB instance, for similar reasons. You can think about is as of > cache groups in Ignite 2.x. > There's also an "optimization" to be implemented that is missing in code - > using key hashes as prefixes. > h3. What should be implemented: > First of all, code will be heavily refactored. This will lead to > simplifications in many places. > Otherwise, I see the following list of goals to achieve: > * current implementation allows to derive the list of partitions from the > list of column families. This won't be possible, I suggest storing this list > explicitly in "meta" CF, in any format that'll be convenient during the > implementation > * there should be a way of having compact "tableId" representation. > IgniteUUID or even UUID is too much I think, but it might work as a basis. > This problem should be discussed > * binary representation for keys should now include following information: > ** tableId - fixed-length set of bytes to be used as a prefix > ** partitionId - 2 bytes that will follow the tableId. This layout will > allow making range queries for specific partitions of specific tables > ** key hash - 4 bytes. This one is required to optimize comparison time for > keys. Generally speaking, it's safe to assume that hashes will be mostly > different for different keys, meaning that hashes will be enough to determine > keys inequality > ** actual key payload goes after all these prefixes -- This message was sent by Atlassian Jira (v8.20.1#820001)