Hi David,
encrypting and Decrypting data in cassandra is not an option to us. However, I read your elaboration on best practices for adding a custom feature to cassandra with a lot of interest. Thank you!

Please see below for the answers to your questions.

Kind regards
Matthias

On 10/18/2011 08:53 AM, David Jeske wrote:
On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau <p...@l3s.de
<mailto:p...@l3s.de>> wrote:

    We would be very happy if cassandra would give us an option to
    maintain the sort order on our own (application logic). That is why
    it would be interesting to hear from any of the developers if it
    would be easily possible to add such a feature to cassandra.


What you are describing above is option (b), you would do this by
building your sort-order, encryption, and decryption into Cassandra. Let
me elaborate...

The database always has to know how to compute sort order for items.
Deferring it to your code can only happen two ways, in-process, or
out-of-process. Deferring sort-order comparisons to out-of-process code
would have diasterous effects on performance, as they are used multiple
times for every single operation the database does. Therefore, short of
an application where performance is irrelevant, the feasable method to
allow your code to maintain sort-order is "option b", to build your
sort-order/encryption/decryption into the database. Cassandra would have
to initialize it at startup to read your database.

Cassandra is open-source, so you can do this work on your own right now.
Aaron's message provided some pointers.

If you do go this route, you'll probably want to separate your
sort-order-and-encryption-handler into a separate JAR, and add some code
to Cassandra to load-and-register your classes when the database starts.
You'd submit this "stable data-format plug-in-API" patch to Cassandra,
and hopefully find a way to get it accepted into the main codebase. This
would make it easier for you to update to new versions, as you would
only be dependent only on the public-API, rather than a private fork of
Cassandra.

    Otherwise, it seems like we have to implement sth. based on strategy
    (a) because (b) is not feasible for us and (c) is a rather young
    research topic which is slowly gaining more attention.


Certainly (option a) is the most straightforward method if you wish to
keep your codebase completely separate from your database (whether
Cassandra or not). Whether this is an acceptable security risk or not is
up to you.

--------

Pulling back from implementation issues, I wonder if you might share a
bit more about the reason you need this functionality for your
application. Here are a few questions I'm curious about:

1) Is the data all-encrypted with a single key, or do different records
use different keys?

We are building a zero knowledge application. The data (of this single CF) is encrypted with a single key.

2) If a single key, would adding a file/block/record-level encryption to
Cassandra solve this problem? If not, why not? Is there something
special about your encryption methods?

There is nothing special about our encryption methods but will never be able to encrypt or decrypt data on our server as the keys will always remain on the clients. Therefore, we would not profit from built-in cassandra encryption support. However, this would probably be a good feature for many other users.

3) Is the compression of the data somehow special, such that block-level
compression (either zlib, snappy, or even a custom-implemented scheme)
is not viable? If so, why?

No, we just have to compress before encryption because it wouldn't make much sense afterwards.

4) Is there something special about the sorting that makes it hard to
expose the sort order to a database? (other than cassandra's lack of
general composite key sorting)

No, Cassandra would be able to sort the data in unencrypted form. However, as the data is encrypted, we can not make use of cassandras sorting capabilities.

Kind regards
Matthias

Reply via email to