[ 
https://issues.apache.org/jira/browse/IGNITE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev reassigned IGNITE-16102:
--------------------------------------------

    Assignee: Aleksandr Polovtcev

> Store all RocksDB partitions in a single column family.
> -------------------------------------------------------
>
>                 Key: IGNITE-16102
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16102
>             Project: Ignite
>          Issue Type: Improvement
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Ivan Bessonov
>            Assignee: Aleksandr Polovtcev
>            Priority: Major
>              Labels: iep-74, ignite-3
>
> Current storage implementation puts each partition in its own column family. 
> This effectively means that every partition lives in it's own database, 
> sharing only WAL and some in-memory resources. Given that each column family 
> has multiple files for LSM trees, the amount of opened file descriptors is 
> bigger than it needs to be.
> Now, the idea is to have a single column family for partitions within a 
> table. And we should think of possibility of storing several tables in the 
> same RocksDB instance, for similar reasons. You can think about is as of 
> cache groups in Ignite 2.x.
> There's also an "optimization" to be implemented that is missing in code - 
> using key hashes as prefixes.
> h3. What should be implemented:
> First of all, code will be heavily refactored. This will lead to 
> simplifications in many places.
> Otherwise, I see the following list of goals to achieve:
>  * current implementation allows to derive the list of partitions from the 
> list of column families. This won't be possible, I suggest storing this list 
> explicitly in "meta" CF, in any format that'll be convenient during the 
> implementation
>  * there should be a way of having compact "tableId" representation. 
> IgniteUUID or even UUID is too much I think, but it might work as a basis. 
> This problem should be discussed
>  * binary representation for keys should now include following information:
>  ** tableId - fixed-length set of bytes to be used as a prefix
>  ** partitionId - 2 bytes that will follow the tableId. This layout will 
> allow making range queries for specific partitions of specific tables
>  ** key hash - 4 bytes. This one is required to optimize comparison time for 
> keys. Generally speaking, it's safe to assume that hashes will be mostly 
> different for different keys, meaning that hashes will be enough to determine 
> keys inequality
>  ** actual key payload goes after all these prefixes



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to