I'm relatively new to data modeling in Cassandra, but perhaps instead of
date and last_modified in your primary key for doc_by_last_modified, just
use the docId. This way, you are can update the last_modified and date
fields against the docId and it removes the duplicate issue and obviates
the need to delete the current row or adding a new one-- you'd simply be
updating (upserting?) by the docId ....

Regards,
Victor

On Mon, Jul 20, 2015 at 11:59 PM, Robert Wille <rwi...@fold3.com> wrote:

> Data structures that have a recently-modified access pattern seem to be a
> poor fit for Cassandra. I’m wondering if any of you smart guys can provide
> suggestions.
>
> For the sake of discussion, lets assume I have the following tables:
>
> CREATE TABLE document (
>         docId UUID,
>         doc TEXT,
>         last_modified TIMEUUID,
>         PRIMARY KEY ((docid))
> )
>
> CREATE TABLE doc_by_last_modified (
>         date TEXT,
>         last_modified TIMEUUID,
>         docId UUID,
>         PRIMARY KEY ((date), last_modified)
> )
>
> When I update a document, I retrieve its last_modified time, delete the
> current record from doc_by_last_modified, and add a new one. Unfortunately,
> if you’d like each document to appear at most once in the
> doc_by_last_modified table, then this doesn’t work so well.
>
> Documents can get into the doc_by_last_modified table multiple times if
> there is concurrent access, or if there is a consistency issue.
>
> Any thoughts out there on how to efficiently provide recently-modified
> access to a table? This problem exists for many types of data structures,
> not just recently-modified. Any ordered data structure that can be
> dynamically reordered suffers from the same problems. As I’ve been doing
> schema design, this pattern keeps recurring. A nice way to address this
> problem has lots of applications.
>
> Thanks in advance for your thoughts
>
> Robert
>
>

Reply via email to