Francisco Guerrero created CASSANDRASC-94:
---------------------------------------------

             Summary: Reduce filesystem calls while streaming SSTables
                 Key: CASSANDRASC-94
                 URL: https://issues.apache.org/jira/browse/CASSANDRASC-94
             Project: Sidecar for Apache Cassandra
          Issue Type: Improvement
            Reporter: Francisco Guerrero
            Assignee: Francisco Guerrero


When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will 
perform multiple filesystem calls:
- Traverse the data directories to determine the keyspace / table path
- Once found determine if the SSTable file exists under the snapshots directory
- Read the filesystem to obtain the file type and file size
- Read the requested range of the file and stream it

The amount of filesystem calls is manageable for streaming a single SSTable, 
but when a client(s) read multiple SSTables, for example in the case of 
Cassandra Analytics bulk reads, hundred to thousand of requests are performed 
requiring every request to perform the above system calls.

In this improvement, it is proposed introducing several caches to reduce the 
amount of system calls while streaming SSTables.

- *snapshot list cache*: to maintain a cache of recently listed snapshot files 
under a snapshot directory. This cache avoids having to access the filesystem 
every time a bulk read client list the snapshot directory.
- *table dir cache*: to maintain a cache of recently streamed table directory 
paths. This cache helps avoiding having to traverse the filesystem searching 
for the table directory while running bulk reads for example. Since bulk reads 
can stream tens to hundreds of SSTable components from a snapshot directory, 
this cache helps avoid having to resolve the table directory each time.
- *snapshot path cache*: to maintain a cache of recently streamed snapshot 
SSTable components. This cache avoids having to resolve the snapshot SSTable 
component path during bulk reads. Since bulk reads streams sub-ranges of an 
SSTable component, the resolution can happen multiple times during bulk reads 
for a single SSTable component.
- *file props cache*: to maintain a cache of FileProps of recently streamed 
files. This cache avoids having to validate file properties during bulk reads 
for example where sub-ranges of an SSTable component are streamed, therefore 
reading the file properties can occur multiple times during bulk reads of the 
same file.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to