On Wed, Jul 27, 2016 at 10:56 PM, Jacob Willoughby <jake.willoug...@vivint.com> wrote: > I have been investigating cassandra to store small objects as a trivial > replacement for s3. GET/PUT/DELETE are all easy, but LIST is what is > tripping me up. > > > S3 does a hierarchical list that kinda simulates traversing folders. > > http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html > > > So say my schema is this: > > CREATE TABLE "stuff" (key BLOB PRIMARY KEY, value BLOB) > > > I know that the prefix part is easy with a ByteOrderedPartitioner (and > possibly with a secondary index in Cassandra 3.x? ). What trips me up is > the delimiter part.
I don't think either of options are good. > I have looked at a handful of open source projects that are s3 clones and > use cassandra, and they seem to do the prefix match then manually search for > the delimiter. I have looked at doing a UDA, but they also seem to send all > of the data to a single node to do the aggregation. > > > What I am hoping to do is achieve what S3 does: "List performance is not > substantially affected by the total number of keys in your bucket, nor by > the presence or absence of the prefix, marker, maxkeys, or delimiter > arguments." ( > > http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysUsingAPIs.html) > > > Is there some sort of denormalization, indexing, querying that I am missing > that might help solve this? I think if UDA's could do some summary > operation on each node before returning it then aggregating the results it > would work, but as far as I know that isn't possible. It seems like a > binary search of each partition involved in the list prefix would be a > really quick and easy way to return the first 1000 results. > > > Is this even possible using cassandra? Perhaps you could just store the objects as a simple key-value representation (like your "stuff" table above), and then separately index the components of the key (presumably in another table), using adjacency lists (https://en.wikipedia.org/wiki/Adjacency_list). -- Eric Evans john.eric.ev...@gmail.com