We could extend the existing metadata to include a Kerberos-style token, 
whichever scheme is used.  This would mean creating a producer or consumer with 
a security context and session negotiation would result in a token.  It may be 
a lease.  Both of our modules would authenticate and authorize the token, then 
do de/encryption, each our own way.

Thanks,
Rob

> On Jun 10, 2014, at 4:38 PM, Todd Palino <tpal...@linkedin.com.INVALID> wrote:
> 
> Yes, I agree. There are definitely a variety of use cases that demand
> differing levels of complexity here. It comes back to enabling the
> development of at-rest encryption and making it as easy as possible to
> implement within the Kafka system. I think that this can be done with the
> concept of message metadata that can be preserved across clusters, which
> is separate from the message itself (so it’s not an overlay of a schema on
> top of the message, but rather a separate structure entirely that is
> stored with the message).
> 
> -Todd
> 
>> On 6/10/14, 3:26 PM, "Robert Withers" <robert.w.with...@gmail.com> wrote:
>> 
>> What strikes me as an opportunity is to define a plug gable at-rest
>> encryption module interface, that supports each/both of our security
>> needs.
>> 
>> Thanks,
>> Rob
>> 
>>> On Jun 10, 2014, at 4:01 PM, Todd Palino <tpal...@linkedin.com.INVALID>
>>> wrote:
>>> 
>>> The situation of production before having the consumer is definitely a
>>> good one. That’s why I wanted to take a little time before responding.
>>> Had
>>> to think about it.
>>> 
>>> I think that while we may certainly produce data before the consumer is
>>> ready, that doesn’t mean that the consumer can’t have a key pair
>>> generated
>>> for it already, so the producer could start encrypting for that consumer
>>> before it exists. This would probably work fine for lower retention
>>> periods (a week or two), but could be a little more difficult to manage
>>> if
>>> you are keeping data in Kafka longer than that. My gut reaction is that
>>> it’s better to handle it that way and keep the key pair and session key
>>> handling simple. The more we can do that, the more we can leave key
>>> management as a separate component that can be swapped out so the user
>>> can
>>> decide how it should be done.
>>> 
>>> -Todd
>>> 
>>> 
>>>> On 6/9/14, 8:16 AM, "Robert Withers" <robert.w.with...@gmail.com>
>>>> wrote:
>>>> 
>>>> Yes, that sounds familiar as I helped write (minimally) S/MIME in
>>>> squeak
>>>> (open source Smalltalk environment).  This what I was thinking in my
>>>> alternative here, though I have a concern...
>>>> 
>>>> Production may occur before the consumer is coded and executed.  In the
>>>> analogy of mail, the mail is sent before the complete recipient list is
>>>> known.
>>>> 
>>>> This seems to mean that the private key (cert or OTP) must be stored
>>>> and
>>>> interacted with.  My feeling is that key metadata are in a system
>>>> encrypted Hbase store (session key store), for low latency reads,
>>>> rather
>>>> than a topic requiring scanning.  Store the private keys and then give
>>>> client access (producers/consumers) with the hash of the OTP.  A new
>>>> consumer comes along, create a new cert encoding the OTP hash.
>>>> 
>>>> On write, use the producer cert to send a topic hash with the msg which
>>>> would allow the broker to reuse or generate an OTP, stored in the
>>>> session
>>>> key store.
>>>> 
>>>> On read (consumer), if we have a previously run reader, use the
>>>> encrypted
>>>> hash.  If new, create consumer cert and encrypt the hash for that
>>>> session.
>>>> 
>>>> The reader/writer will pass a cert encrypted session hash.  The trick
>>>> seems to be converting hash to PK to encrypt/decrypt.  Given Kafka
>>>> resource distribution, we need system encryption for metadata and
>>>> cert-based key exchange.  This seems to mean triple encryption:
>>>> 1) client to/from broker
>>>> 2) system key/hash  mgmt/translation
>>>> 3) at-rest encryption
>>>> 
>>>> Thanks,
>>>> Rob
>>>> 
>>>>> On Jun 9, 2014, at 7:57 AM, Todd Palino <tpal...@linkedin.com.INVALID>
>>>>> wrote:
>>>>> 
>>>>> It’s the same method used by S/MIME and many other encryption
>>>>> specifications with the potential for multiple recipients. The sender
>>>>> generates a session key, and uses that key to encrypt the message. The
>>>>> session key is then encrypted once for each recipient with that
>>>>> recipient’s public key. All of the encrypted copies of the session key
>>>>> are
>>>>> then included with the encrypted message. This way, you avoid having
>>>>> to
>>>>> encrypt the message multiple times (this assumes, of course, that the
>>>>> message itself is larger than the key).
>>>>> 
>>>>> In our case, we have some options available to us. We could do that,
>>>>> and
>>>>> put all the encrypted keys in the message metadata. Or we could treat
>>>>> it
>>>>> more like a session and have the encrypted session keys in a special
>>>>> topic
>>>>> (e.g. __session_keys), much like offsets are now. When the producer
>>>>> starts
>>>>> up, they create a session key and encrypt it for each consumer with
>>>>> the
>>>>> current consumer key. The producer publishes the bundle of encrypted
>>>>> keys
>>>>> into __session_keys as a single message. The producer then publishes
>>>>> messages to the normal topic encrypted with the session key. The
>>>>> metadata
>>>>> for each of those messages would contain something the offset into
>>>>> __session_keys to identify the bundle. This has the added benefit of
>>>>> not
>>>>> increasing the per-message data size too much.
>>>>> 
>>>>> Whenever a consumer key is invalidated, or however often the session
>>>>> key
>>>>> should be rotated, the producer would publish a new bundle. This
>>>>> maintains
>>>>> a history of session keys that can be used to decrypt any messages, so
>>>>> the
>>>>> retention on __session_keys must be at least as long as any topic
>>>>> which
>>>>> may potentially contain encrypted data. Past that point, it’s up to
>>>>> the
>>>>> consumer what they want to do with the data. A consumer like Hadoop
>>>>> might
>>>>> re-encrypt it for local storage, or store it in plaintext (depending
>>>>> on
>>>>> the security and requirements of that system).
>>>>> 
>>>>> -Todd
>>>>> 
>>>>>> On 6/8/14, 2:33 PM, "Rob Withers" <robert.w.with...@gmail.com> wrote:
>>>>>> 
>>>>>> I like the use of meta envelopes.  We did this recently, on the job,
>>>>>> as we have an envelope that specifies the type for decoding.  We
>>>>>> discussed adding the encodinType and you are suggesting adding
>>>>>> encryption metadata for that msg.  All good.
>>>>>> 
>>>>>> I don't see your OTP example.  Could you delve deeper for me, please?
>>>>>> The model I envision is internal OTP, with access to decryption
>>>>>> accessed by cert.  A double layer of security, with the internal at-
>>>>>> rest encryption being an unchanging OTP with ACL access to it as the
>>>>>> upper layer.  Are you saying it is possible to re-encrypt with new
>>>>>> keys or that there is a chain of keys over time?
>>>>>> 
>>>>>> Thanks,
>>>>>> Rob
>>>>>> 
>>>>>>> On Jun 8, 2014, at 3:06 PM, Todd Palino wrote:
>>>>>>> 
>>>>>>> I’ll agree that perhaps the “absolutely not” is not quite right.
>>>>>>> There are
>>>>>>> certainly some uses for a simpler solution, but I would still say it
>>>>>>> cannot only be encryption at the broker. This would leave many use
>>>>>>> cases
>>>>>>> for at-rest encryption out of the loop (most auditing cases for SOX,
>>>>>>> PCI,
>>>>>>> HIPAA, and other PII standards). Yes, it does add external overhead
>>>>>>> that
>>>>>>> must be managed, but it’s just the nature of the beast. We can’t
>>>>>>> solve all
>>>>>>> of the external infrastructure needed for this, but we can make it
>>>>>>> easier
>>>>>>> to use for consumers and producers by adding metadata.
>>>>>>> 
>>>>>>> There’s no need for unchanging encryption, and that’s specifically
>>>>>>> why I
>>>>>>> want to see a message envelope that will help consumers determine
>>>>>>> the
>>>>>>> encryption uses for a particular message.  You can definitely still
>>>>>>> expire
>>>>>>> keys, you just have to keep the expired keys around as long as the
>>>>>>> encrypted data stays around, and your endpoints need to know when
>>>>>>> they are
>>>>>>> decrypting data with an expired key (you might want to throw up a
>>>>>>> warning,
>>>>>>> or do something else to let the users know that it’s happening). And
>>>>>>> as
>>>>>>> someone else mentioned, there are solutions for encrypting data for
>>>>>>> multiple consumers. You can encrypt the data with an OTP, and then
>>>>>>> multiply encrypt the OTP once for each consumer and store those
>>>>>>> encrypted
>>>>>>> strings in the envelope.
>>>>>>> 
>>>>>>> -Todd
>>>>>>> 
>>>>>>>> On 6/7/14, 12:25 PM, "Rob Withers" <robert.w.with...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> At one level this makes sense to me to externalize the security
>>>>>>>> issue
>>>>>>>> to producers and consumers.  On consideration I realized that this
>>>>>>>> adds a lot of coordination requirements to the app layer across
>>>>>>>> teams
>>>>>>>> or even companies.  Another issue I feel is that you want a
>>>>>>>> specific
>>>>>>>> unchanging encryption for the data and the clients (producers/
>>>>>>>> consumers) will need to be able to decode frozen data.  If certs
>>>>>>>> are
>>>>>>>> used they cannot expire.  Also, different clients would need to use
>>>>>>>> the same cert.
>>>>>>>> 
>>>>>>>> So, you statement that it should ABSOLUTELY not include internal
>>>>>>>> encryption rings seems misplaced.  There are some customers of
>>>>>>>> kafka
>>>>>>>> that would opt to encrypt the on-disk data and key management is a
>>>>>>>> significant issue.  This is best handled internally, with key
>>>>>>>> management stored in either ZK or in a topic.  Truly, perhaps
>>>>>>>> annealing Hadoop/HBASE as a metadata store seems applicable.
>>>>>>>> 
>>>>>>>> Thanks, another 2 cents,
>>>>>>>> Rob
>>>>>>>> 
>>>>>>>>> On Jun 6, 2014, at 12:15 PM, Todd Palino wrote:
>>>>>>>>> 
>>>>>>>>> Yes, I realized last night that I needed to be clearer in what I
>>>>>>>>> was
>>>>>>>>> saying. Encryption should ABSOLUTELY not be handled server-side. I
>>>>>>>>> think
>>>>>>>>> it¹s a good idea to enable use of it in the consumer/producer, but
>>>>>>>>> doing
>>>>>>>>> it server side will not solve many use cases for needing
>>>>>>>>> encryption
>>>>>>>>> because the server then has access to all the keys. You could say
>>>>>>>>> that
>>>>>>>>> this eliminates the need for TLS, but TLS is pretty low-hanging
>>>>>>>>> fruit, and
>>>>>>>>> there¹s definitely a need for encryption of the traffic across the
>>>>>>>>> network
>>>>>>>>> even if you don¹t need at-rest encryption as well.
>>>>>>>>> 
>>>>>>>>> And as you mentioned, something needs to be done about key
>>>>>>>>> management.
>>>>>>>>> Storing information with the message about which key(s) was used
>>>>>>>>> is
>>>>>>>>> a good
>>>>>>>>> idea, because it allows you to know when a producer has switched
>>>>>>>>> keys.
>>>>>>>>> There are definitely some alternative solutions to that as well.
>>>>>>>>> But
>>>>>>>>> storing the keys in the broker, Zookeeper, or other systems like
>>>>>>>>> that are
>>>>>>>>> not. There needs to be a system used where the keys are only
>>>>>>>>> available to
>>>>>>>>> the producers and consumers that need them, and they only get
>>>>>>>>> access
>>>>>>>>> to
>>>>>>>>> the appropriate part of the key pair.  Even as the guy running
>>>>>>>>> Kafka
>>>>>>>>> and
>>>>>>>>> Zookeeper, I should not have access to the keys being used, and if
>>>>>>>>> data is
>>>>>>>>> encrypted I should not be able to see the cleartext.
>>>>>>>>> 
>>>>>>>>> And even if we decide not to put anything about at-rest encryption
>>>>>>>>> in the
>>>>>>>>> consumer/producer clients directly, and leave it for an exercise
>>>>>>>>> above
>>>>>>>>> that level (you have to pass the ciphertext as the message to the
>>>>>>>>> client),
>>>>>>>>> I still think there is a good case for implementing a message
>>>>>>>>> envelope
>>>>>>>>> that can store the information about which key was used, and other
>>>>>>>>> pertinent metadata, and have the ability for special applications
>>>>>>>>> like
>>>>>>>>> mirror maker to be able to preserve it across clusters. This still
>>>>>>>>> helps
>>>>>>>>> to enable the use of encryption and other features (like auditing)
>>>>>>>>> even if
>>>>>>>>> we decide it¹s too large a scope to fully implement.
>>>>>>>>> 
>>>>>>>>> -Todd
>>>>>>>>> 
>>>>>>>>> On 6/6/14, 10:51 AM, "Pradeep Gollakota" <pradeep...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I'm actually not convinced that encryption needs to be handled
>>>>>>>>>> server side
>>>>>>>>>> in Kafka. I think the best solution for encryption is to handle
>>>>>>>>>> it
>>>>>>>>>> producer/consumer side just like compression. This will offload
>>>>>>>>>> key
>>>>>>>>>> management to the users and we'll still be able to leverage the
>>>>>>>>>> sendfile
>>>>>>>>>> optimization for better performance.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers
>>>>>>>>>> <robert.w.with...@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On consideration, if we have 3 different access groups (1 for
>>>>>>>>>>> production
>>>>>>>>>>> WRITE and 2 consumers) they all need to decode the same
>>>>>>>>>>> encryption
>>>>>>>>>>> and
>>>>>>>>>>> so
>>>>>>>>>>> all need the same public/private key....certs won't work, unless
>>>>>>>>>>> you
>>>>>>>>>>> write
>>>>>>>>>>> a CertAuthority to build multiple certs with the same keys.
>>>>>>>>>>> Better
>>>>>>>>>>> seems
>>>>>>>>>>> to not use certs and wrap the encryption specification with an
>>>>>>>>>>> ACL
>>>>>>>>>>> capabilities for each group of access.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 6, 2014, at 11:43 AM, Rob Withers wrote:
>>>>>>>>>>> 
>>>>>>>>>>> This is quite interesting to me and it is an excelent
>>>>>>>>>>> opportunity to
>>>>>>>>>>>> promote a slightly different security scheme.  Object-
>>>>>>>>>>>> capabilities are
>>>>>>>>>>>> perfect for online security and would use ACL style
>>>>>>>>>>>> authentication to
>>>>>>>>>>>> gain
>>>>>>>>>>>> capabilities filtered to those allowed resources for allow
>>>>>>>>>>>> actions
>>>>>>>>>>>> (READ/WRITE/DELETE/LIST/SCAN).  Erights.org has the
>>>>>>>>>>>> quitenscential (??)
>>>>>>>>>>>> object capabilities model and capnproto is impleemting this for
>>>>>>>>>>>> C+
>>>>>>>>>>>> +.  I
>>>>>>>>>>>> have a java implementation at http://github.com/pauwau/pauwau
>>>>>>>>>>>> but
>>>>>>>>>>>> the
>>>>>>>>>>>> master is broken.  0.2 works, basically.  B asically a TLS
>>>>>>>>>>>> connection
>>>>>>>>>>>> with
>>>>>>>>>>>> no certificate server, it is peer to peer.  It has some
>>>>>>>>>>>> advanced
>>>>>>>>>>>> features,
>>>>>>>>>>>> but the lining of capabilities with authorization so that you
>>>>>>>>>>>> can
>>>>>>>>>>>> only
>>>>>>>>>>>> invoke correct services is extended to the secure user.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding non-repudiation, on disk, why not prepend a CRC?
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding on-disk encryption, multiple users/groups may need to
>>>>>>>>>>>> access,
>>>>>>>>>>>> with different capabilities.  Sounds like zookeeper needs to
>>>>>>>>>>>> store a
>>>>>>>>>>>> cert
>>>>>>>>>>>> for each class of access so that a group member can access the
>>>>>>>>>>>> decrypted
>>>>>>>>>>>> data from disk.  Use cert-based async decryption.  The only
>>>>>>>>>>>> isue is
>>>>>>>>>>>> storing
>>>>>>>>>>>> the private key in zookeeper.  Perhaps some hash magic could be
>>>>>>>>>>>> used.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for kafka,
>>>>>>>>>>>> Rob
>>>>>>>>>>>> 
>>>>>>>>>>>> On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hey Joe,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't really understand the sections you added to the wiki.
>>>>>>>>>>>>> Can you
>>>>>>>>>>>>> clarify them?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is non-repudiation what SASL would call integrity checks? If
>>>>>>>>>>>>> so
>>>>>>>>>>>>> don't
>>>>>>>>>>>>> SSL
>>>>>>>>>>>>> and and many of the SASL schemes already support this as well
>>>>>>>>>>>>> as
>>>>>>>>>>>>> on-the-wire encryption?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Or are you proposing an on-disk encryption scheme? Is this
>>>>>>>>>>>>> actually
>>>>>>>>>>>>> needed?
>>>>>>>>>>>>> Isn't a on-the-wire encryption when combined with mutual
>>>>>>>>>>>>> authentication
>>>>>>>>>>>>> and
>>>>>>>>>>>>> permissions sufficient for most uses?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On-disk encryption seems unnecessary because if an attacker
>>>>>>>>>>>>> can
>>>>>>>>>>>>> get
>>>>>>>>>>>>> root
>>>>>>>>>>>>> on
>>>>>>>>>>>>> the kafka boxes it can potentially modify Kafka to do anything
>>>>>>>>>>>>> he or
>>>>>>>>>>>>> she
>>>>>>>>>>>>> wants with data. So this seems to break any security model.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I understand the problem of a large organization not really
>>>>>>>>>>>>> having a
>>>>>>>>>>>>> trusted network and wanting to secure data transfer and limit
>>>>>>>>>>>>> and
>>>>>>>>>>>>> audit
>>>>>>>>>>>>> data access. The uses for these other things I don't totally
>>>>>>>>>>>>> understand.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also it would be worth understanding the state of other
>>>>>>>>>>>>> messaging and
>>>>>>>>>>>>> storage systems (Hadoop, dbs, etc). What features do they
>>>>>>>>>>>>> support. I
>>>>>>>>>>>>> think
>>>>>>>>>>>>> there is a sense in which you don't have to run faster than
>>>>>>>>>>>>> the
>>>>>>>>>>>>> bear,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> only faster then your friends. :-)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein
>>>>>>>>>>>>> <joe.st...@stealth.ly>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I like the idea of working on the spec and prioritizing. I
>>>>>>>>>>>>> will
>>>>>>>>>>>>> update
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> wiki.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Joestein
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps
>>>>>>>>>>>>>> <jay.kr...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey Joe,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for kicking this discussion off! I totally agree that
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> that acts as a central message broker security is critical
>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> a number of people have been interested in this topic and
>>>>>>>>>>>>>>> several
>>>>>>>>>>>>>>> people
>>>>>>>>>>>>>>> have put effort into special purpose security efforts.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Since most the LinkedIn folks are working on the consumer
>>>>>>>>>>>>>>> right now
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> this would be a great project for any other interested
>>>>>>>>>>>>>>> people to
>>>>>>>>>>>>>>> take
>>>>>>>>>>>>>>> on.
>>>>>>>>>>>>>>> There are some challenges in doing these things distributed
>>>>>>>>>>>>>>> but it
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> also
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> be a lot of fun.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I think a good first step would be to get a written plan we
>>>>>>>>>>>>>>> can all
>>>>>>>>>>>>>>> agree
>>>>>>>>>>>>>>> on for how things should work. Then we can break things down
>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>> chunks
>>>>>>>>>>>>>>> that can be done independently while still aiming at a good
>>>>>>>>>>>>>>> end
>>>>>>>>>>>>>>> state.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I had tried to write up some notes that summarized at least
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> thoughts
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> had had on security:
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What do you think of that?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> One assumption I had (which may be incorrect) is that
>>>>>>>>>>>>>>> although
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> the things in your list, the two most pressing would be
>>>>>>>>>>>>>>> authentication
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> authorization, and that was all that write up covered. You
>>>>>>>>>>>>>>> have more
>>>>>>>>>>>>>>> experience in this domain, so I wonder how you would
>>>>>>>>>>>>>>> prioritize?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Those notes are really sketchy, so I think the first goal I
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> would be to get to a real spec we can all agree on and
>>>>>>>>>>>>>>> discuss. A
>>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> the security stuff has a high human interaction element and
>>>>>>>>>>>>>>> needs to
>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>> in pretty different domains and different companies so
>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> kind
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> review is important.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein
>>>>>>>>>>>>>>> <joe.st...@stealth.ly>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,I wanted to re-ignite the discussion around Apache Kafka
>>>>>>>>>>>>>>> Security.
>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> is a huge bottleneck (non-starter in some cases) for a lot
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> organizations
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> (due to regulatory, compliance and other requirements).
>>>>>>>>>>>>>>>> Below
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>> suggestions for specific changes in Kafka to accommodate
>>>>>>>>>>>>>>>> security
>>>>>>>>>>>>>>>> requirements.  This comes from what folks are doing "in the
>>>>>>>>>>>>>>>> wild"
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> workaround and implement security with Kafka as it is today
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> what I
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> have discovered from organizations about their blockers. It
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> picks
>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> from the wiki (which I should have time to update later in
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> week
>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> on the below and feedback from the thread).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1) Transport Layer Security (i.e. SSL)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This also includes client authentication in addition to in-
>>>>>>>>>>>>>>>> transit
>>>>>>>>>>>>>>> security
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> layer.  This work has been picked up here
>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1477 and do
>>>>>>>>>>>>>>>> appreciate
>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>> thoughts, comments, feedback, tomatoes, whatever for this
>>>>>>>>>>>>>>>> patch.
>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> pickup from the fork of the work first done here
>>>>>>>>>>>>>>>> https://github.com/relango/kafka/tree/kafka_security.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2) Data encryption at rest.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This is very important and something that can be
>>>>>>>>>>>>>>>> facilitated
>>>>>>>>>>>>>>>> within
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> wire protocol. It requires an additional map data structure
>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>> "encrypted [data encryption key]". With this map (either in
>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>> object
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> in the wire protocol) you can store the dynamically
>>>>>>>>>>>>>>>> generated
>>>>>>>>>>>>>>>> symmetric
>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> (for each message) and then encrypt the data using that
>>>>>>>>>>>>>>>> dynamically
>>>>>>>>>>>>>>>> generated key.  You then encrypt the encryption key using
>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> for whom is expected to be able to decrypt the encryption
>>>>>>>>>>>>>>> key to
>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>> decrypt the message.  For each public key encrypted
>>>>>>>>>>>>>>>> symmetric
>>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>> (which
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> now the "encrypted [data encryption key]" along with which
>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>> key
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> was encrypted with for (so a map of [publicKey] =
>>>>>>>>>>>>>>>> encryptedDataEncryptionKey) as a chain.   Other patterns
>>>>>>>>>>>>>>>> can be
>>>>>>>>>>>>>>> implemented
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> but this is a pretty standard digital enveloping [0]
>>>>>>>>>>>>>>>> pattern
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> field added. Other patterns should be able to use that field
>>>>>>>>>>>>>>> to-do
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> implementation too.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 3) Non-repudiation and long term non-repudiation.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Non-repudiation is proving data hasn't changed.  This is
>>>>>>>>>>>>>>>> often (if
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> always) done with x509 public certificates (chained to a
>>>>>>>>>>>>>>>> certificate
>>>>>>>>>>>>>>>> authority).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Long term non-repudiation is what happens when the
>>>>>>>>>>>>>>>> certificates of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> certificate authority are expired (or revoked) and
>>>>>>>>>>>>>>>> everything
>>>>>>>>>>>>>>>> ever
>>>>>>>>>>>>>>> signed
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> (ever) with that certificate's public key then becomes "no
>>>>>>>>>>>>>>> longer
>>>>>>>>>>>>>>> provable
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> as ever being authentic".  That is where RFC3126 [1] and
>>>>>>>>>>>>>>>> RFC3161
>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> come
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> in (or worm drives [hardware], etc).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For either (or both) of these it is an operation of the
>>>>>>>>>>>>>>>> encryptor
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> sign/hash the data (with or without third party trusted
>>>>>>>>>>>>>>>> timestap of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> signing event) and encrypt that with their own private key
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> distribute
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> the results (before and after encrypting if required) along
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>> public key. This structure is a bit more complex but
>>>>>>>>>>>>>>>> feasible, it
>>>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>> map
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> of digital signature formats and the chain of dig sig
>>>>>>>>>>>>>>>> attestations.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig
>>>>>>>>>>>>>>> [4])
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> a list of map where that key is "purpose" of signature
>>>>>>>>>>>>>>>> (what
>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>> attesting
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> too).  As a sibling field to the list another field for
>>>>>>>>>>>>>>>> "the
>>>>>>>>>>>>>>>> attester"
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> bytes (e.g. their PKCS12 [5] for the map of PKCS7
>>>>>>>>>>>>>>> signatures).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 4) Authorization
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We should have a policy of "404" for data, topics,
>>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>> (etc) if
>>>>>>>>>>>>>>>> authenticated connections do not have access.  In "secure
>>>>>>>>>>>>>>>> mode" any
>>>>>>>>>>>>>>>> non
>>>>>>>>>>>>>>>> authenticated connections should get a "404" type message
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> everything.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Knowing "something is there" is a security risk in many uses
>>>>>>>>>>>>>>> cases.
>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> you don't have access you don't even see it.  Baking "that"
>>>>>>>>>>>>>>>> into
>>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>> along with some interface for entitlement (access
>>>>>>>>>>>>>>>> management)
>>>>>>>>>>>>>>>> systems
>>>>>>>>>>>>>>>> (pretty standard) is all that I think needs to be done to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> core
>>>>>>>>>>>>>>> project.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I want to tackle item later in the year after summer after
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>> three
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> are complete.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I look forward to thoughts on this and anyone else
>>>>>>>>>>>>>>>> interested
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> with us on these items.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> [0]
>>>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-
>>>>>>>>>>>>>> initiatives/what-is-a-digital-envelope.htm
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1] http://tools.ietf.org/html/rfc3126
>>>>>>>>>>>>>>>> [2] http://tools.ietf.org/html/rfc3161
>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pk
>>>>>>>>>>>>>>> cs
>>>>>>>>>>>>>>> -7
>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> cryptographic-message-syntax-standar.htm
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [4] http://en.wikipedia.org/wiki/XML_Signature
>>>>>>>>>>>>>>>> [5] http://en.wikipedia.org/wiki/PKCS_12
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> /*******************************************
>>>>>>>>>>>>>>>> Joe Stein
>>>>>>>>>>>>>>>> Founder, Principal Consultant
>>>>>>>>>>>>>>>> Big Data Open Source Security LLC
>>>>>>>>>>>>>>>> http://www.stealth.ly
>>>>>>>>>>>>>>>> Twitter: @allthingshadoop
>>>>>>>>>>>>>>>> <http://www.twitter.com/allthingshadoop
>>>>>>>>>>>>>>>> ********************************************/
> 

Reply via email to