David,
thanks for your nice summary on this topic.

We would be very happy if cassandra would give us an option to maintain the sort order on our own (application logic). That is why it would be interesting to hear from any of the developers if it would be easily possible to add such a feature to cassandra.

Otherwise, it seems like we have to implement sth. based on strategy (a) because (b) is not feasible for us and (c) is a rather young research topic which is slowly gaining more attention.

Kind Regards
Matthias

On 10/15/2011 10:55 PM, David Jeske wrote:
Logically, whether you use cassandra or not, there is some "physics" of
sorted order structures which you should understand and dictate what is
possible.

In order to keep data sorted, a database needs to be able to see the
proper sort-order of the data "all the time" not just at insertion or
query time. When inserting a new record, it is compared with existing
records to put it in the "right place".

As a result, whether you use cassandra or a different system, I believe
you are limited to one of these strategies:

(a) Encrypt the data outside the database with non-order-preserving
encryption, and expose some "actual data" in unencrypted form for
sorting. Since the encryption ruins the sort order, some "actual data"
must be exposed to sort properly. Any data you expose, even if encoded,
would be your actual data, because otherwise it wouldn't sort in the
right order. You can limit the amount of data you expose, creating
buckets instead of proper detailed sorting. Within buckets, only the
agent capable of decrypting the data would be able to properly order the
data within a bucket.

(b) Encrypt the data inside the database. This would expose the "actual
data" to the database, allowing it to keep it in proper order. The code
to handle encryption would be handled after sort-order comparisons. The
code (and keys) for decryption would also be known to the database. The
data would need to be decryptable by the database at all times, because
the database will need to compare new data to existing data in order to
perform operations correctly.

(c) Use an order-preserving encryption scheme. If the encryption output
is in the same order as the source-data, then the database can sort on
the encrypted data and get proper sort-results. I don't know anything
about this field, but doing a google search returned the following paper...

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.8664&rep=rep1&type=pdf
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.8664&rep=rep1&type=pdf>

I believe these three cases represent a totalogy of what is possible in
any data-storage system. So the solution you compose would involve one
or more of these schemes.

One might be tempted to generate some type of "ordinal value"
representing the sort-order of an item. However, in order for this
ordinal to be mathematically unrelated to the original data, it would
have to be generated by a system which stored a copy of the entire data,
which would then have to use one of the above three methods. (i.e. this
approach is a chicken and egg problem)

Reply via email to