Logically, whether you use cassandra or not, there is some "physics" of
sorted order structures which you should understand and dictate what is
possible.

In order to keep data sorted, a database needs to be able to see the proper
sort-order of the data "all the time" not just at insertion or query time.
When inserting a new record, it is compared with existing records to put it
in the "right place".

As a result, whether you use cassandra or a different system, I believe you
are limited to one of these strategies:

(a) Encrypt the data outside the database with non-order-preserving
encryption, and expose some "actual data" in unencrypted form for sorting.
Since the encryption ruins the sort order, some "actual data" must be
exposed to sort properly. Any data you expose, even if encoded, would be
your actual data, because otherwise it wouldn't sort in the right order. You
can limit the amount of data you expose, creating buckets instead of proper
detailed sorting. Within buckets, only the agent capable of decrypting the
data would be able to properly order the data within a bucket.

(b) Encrypt the data inside the database. This would expose the "actual
data" to the database, allowing it to keep it in proper order. The code to
handle encryption would be handled after sort-order comparisons. The code
(and keys) for decryption would also be known to the database. The data
would need to be decryptable by the database at all times, because the
database will need to compare new data to existing data in order to perform
operations correctly.

(c) Use an order-preserving encryption scheme. If the encryption output is
in the same order as the source-data, then the database can sort on the
encrypted data and get proper sort-results. I don't know anything about this
field, but doing a google search returned the following paper...


http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.8664&rep=rep1&type=pdf

I believe these three cases represent a totalogy of what is possible in any
data-storage system. So the solution you compose would involve one or more
of these schemes.

One might be tempted to generate some type of "ordinal value" representing
the sort-order of an item. However, in order for this ordinal to be
mathematically unrelated to the original data, it would have to be generated
by a system which stored a copy of the entire data, which would then have to
use one of the above three methods. (i.e. this approach is a chicken and egg
problem)

Reply via email to