Sylvain Lebresne created CASSANDRA-8099:
-------------------------------------------

             Summary: Refactor and modernize the storage engine
                 Key: CASSANDRA-8099
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
             Fix For: 3.0


The current storage engine (which for this ticket I'll loosely define as "the 
code implementing the read/write path") is suffering from old age. One of the 
main problem is that the only structure it deals with is the cell, which 
completely ignores the more high level CQL structure that groups cell into 
(CQL) rows.

This leads to many inefficiencies, like the fact that during a reads we have to 
group cells multiple times (to count on replica, then to count on the 
coordinator, then to produce the CQL resultset) because we forget about the 
grouping right away each time (so lots of useless cell names comparisons in 
particular). But outside inefficiencies, having to manually recreate the CQL 
structure every time we need it for something is hindering new features and 
makes the code more complex that it should be.

Said storage engine also has tons of technical debt. To pick an example, the 
fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to 
go into to simply "remove the last query result".

So I want to bite the bullet and modernize this storage engine. I propose to do 
2 main things:
# Make the storage engine more aware of the CQL structure. In practice, instead 
of having partitions be a simple iterable map of cells, it should be an 
iterable list of row (each being itself composed of per-column cells, though 
obviously not exactly the same kind of cell we have today).
# Make the engine more iterative. What I mean here is that in the read path, we 
end up reading all cells in memory (we put them in a ColumnFamily object), but 
there is really no reason to. If instead we were working with iterators all the 
way through, we could get to a point where we're basically transferring data 
from disk to the network, and we should be able to reduce GC substantially.

Please note that such refactor should provide some performance improvements 
right off the bat but it's not it's primary goal either. It's primary goal is 
to simplify the storage engine and adds abstraction that are better suited to 
further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to