Francisco Guerrero created CASSANDRASC-94: ---------------------------------------------
Summary: Reduce filesystem calls while streaming SSTables Key: CASSANDRASC-94 URL: https://issues.apache.org/jira/browse/CASSANDRASC-94 Project: Sidecar for Apache Cassandra Issue Type: Improvement Reporter: Francisco Guerrero Assignee: Francisco Guerrero When streaming snapshotted SSTables from Cassandra Sidecar, Sidecar will perform multiple filesystem calls: - Traverse the data directories to determine the keyspace / table path - Once found determine if the SSTable file exists under the snapshots directory - Read the filesystem to obtain the file type and file size - Read the requested range of the file and stream it The amount of filesystem calls is manageable for streaming a single SSTable, but when a client(s) read multiple SSTables, for example in the case of Cassandra Analytics bulk reads, hundred to thousand of requests are performed requiring every request to perform the above system calls. In this improvement, it is proposed introducing several caches to reduce the amount of system calls while streaming SSTables. - *snapshot list cache*: to maintain a cache of recently listed snapshot files under a snapshot directory. This cache avoids having to access the filesystem every time a bulk read client list the snapshot directory. - *table dir cache*: to maintain a cache of recently streamed table directory paths. This cache helps avoiding having to traverse the filesystem searching for the table directory while running bulk reads for example. Since bulk reads can stream tens to hundreds of SSTable components from a snapshot directory, this cache helps avoid having to resolve the table directory each time. - *snapshot path cache*: to maintain a cache of recently streamed snapshot SSTable components. This cache avoids having to resolve the snapshot SSTable component path during bulk reads. Since bulk reads streams sub-ranges of an SSTable component, the resolution can happen multiple times during bulk reads for a single SSTable component. - *file props cache*: to maintain a cache of FileProps of recently streamed files. This cache avoids having to validate file properties during bulk reads for example where sub-ranges of an SSTable component are streamed, therefore reading the file properties can occur multiple times during bulk reads of the same file. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org