Ivan Bessonov created IGNITE-17320:
--------------------------------------

             Summary: Implement B+Tree based sorted index storage
                 Key: IGNITE-17320
                 URL: https://issues.apache.org/jira/browse/IGNITE-17320
             Project: Ignite
          Issue Type: Improvement
            Reporter: Ivan Bessonov


Like in IGNITE-17318, binary tuple format is a main goal here, because that's 
what we're gonna pass into API.

Implementing a tree for indexes isn't hard by itself. I expect it to be stored 
in same partition files as raw storage, everything should be colocated in a 
single file.

What's complicated is the inlining of data into trees pages. I see it as the 
following:
 * If tuple has no offset table, we can always store the entire payload in the 
tree. This is the best case scenario, because the size is fixed and known a 
priori, we don't even need to store it before the payload.
 * If tuple has an offset table, the inline size will immediately get bigger. 
In my view, we will have to:
 ** store the size of inlined payload, that's 4 bytes
 ** store null table if it's there, that's a known amount of bytes
 ** store header and offset table:
 *** if there are non-fixed-length columns, then a single entry in offset table 
can be up to 4 bytes
 *** if there are only fixed-length columns, like ints, floats or even bitsets, 
the amount of bytes per single entry can be accurately estimated with the upper 
bound
 ** then store a good amount of the actual columns data. How much? I'd be 
generous, but then we would probably have too much space, so all of this is 
debatable:
 *** for columns with fixed size, allocate room for entire value
 *** for strings and numbers (is there something else?) we have to pre-allocate 
a reasonable amount of bytes. Like, 8, for example. I don't know, there are 
defaults somewhere in the code of Ignite 2.x, we can use them.

So, my point is, there's no new data format for inlined section of the tuple, 
we should reuse it and thus avoid many possible errors. And, if record fits 
into a tree page, there's no need in inserting it into a free list. Good!

And yes, of course, there has to be a room for row id in inlined section as 
well.

Last, meta tree is still a thing. But, we shouldn't identify indexes by their 
names, cause there's UUID id or even integer id (see IGNITE-17318).
h3. Other ideas

I don't like how durable background tasks work in Ignite 2.x, there are always 
some issues. I would prefer having a general-purposed "recycle bin" in 
partition and a background cleaner process that would clean it. Maybe this 
queue should contain other entities in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to