I'm relatively new to data modeling in Cassandra, but perhaps instead of date and last_modified in your primary key for doc_by_last_modified, just use the docId. This way, you are can update the last_modified and date fields against the docId and it removes the duplicate issue and obviates the need to delete the current row or adding a new one-- you'd simply be updating (upserting?) by the docId ....
Regards, Victor On Mon, Jul 20, 2015 at 11:59 PM, Robert Wille <rwi...@fold3.com> wrote: > Data structures that have a recently-modified access pattern seem to be a > poor fit for Cassandra. I’m wondering if any of you smart guys can provide > suggestions. > > For the sake of discussion, lets assume I have the following tables: > > CREATE TABLE document ( > docId UUID, > doc TEXT, > last_modified TIMEUUID, > PRIMARY KEY ((docid)) > ) > > CREATE TABLE doc_by_last_modified ( > date TEXT, > last_modified TIMEUUID, > docId UUID, > PRIMARY KEY ((date), last_modified) > ) > > When I update a document, I retrieve its last_modified time, delete the > current record from doc_by_last_modified, and add a new one. Unfortunately, > if you’d like each document to appear at most once in the > doc_by_last_modified table, then this doesn’t work so well. > > Documents can get into the doc_by_last_modified table multiple times if > there is concurrent access, or if there is a consistency issue. > > Any thoughts out there on how to efficiently provide recently-modified > access to a table? This problem exists for many types of data structures, > not just recently-modified. Any ordered data structure that can be > dynamically reordered suffers from the same problems. As I’ve been doing > schema design, this pattern keeps recurring. A nice way to address this > problem has lots of applications. > > Thanks in advance for your thoughts > > Robert > >