We could extend the existing metadata to include a Kerberos-style token, whichever scheme is used. This would mean creating a producer or consumer with a security context and session negotiation would result in a token. It may be a lease. Both of our modules would authenticate and authorize the token, then do de/encryption, each our own way.
Thanks, Rob > On Jun 10, 2014, at 4:38 PM, Todd Palino <tpal...@linkedin.com.INVALID> wrote: > > Yes, I agree. There are definitely a variety of use cases that demand > differing levels of complexity here. It comes back to enabling the > development of at-rest encryption and making it as easy as possible to > implement within the Kafka system. I think that this can be done with the > concept of message metadata that can be preserved across clusters, which > is separate from the message itself (so it’s not an overlay of a schema on > top of the message, but rather a separate structure entirely that is > stored with the message). > > -Todd > >> On 6/10/14, 3:26 PM, "Robert Withers" <robert.w.with...@gmail.com> wrote: >> >> What strikes me as an opportunity is to define a plug gable at-rest >> encryption module interface, that supports each/both of our security >> needs. >> >> Thanks, >> Rob >> >>> On Jun 10, 2014, at 4:01 PM, Todd Palino <tpal...@linkedin.com.INVALID> >>> wrote: >>> >>> The situation of production before having the consumer is definitely a >>> good one. That’s why I wanted to take a little time before responding. >>> Had >>> to think about it. >>> >>> I think that while we may certainly produce data before the consumer is >>> ready, that doesn’t mean that the consumer can’t have a key pair >>> generated >>> for it already, so the producer could start encrypting for that consumer >>> before it exists. This would probably work fine for lower retention >>> periods (a week or two), but could be a little more difficult to manage >>> if >>> you are keeping data in Kafka longer than that. My gut reaction is that >>> it’s better to handle it that way and keep the key pair and session key >>> handling simple. The more we can do that, the more we can leave key >>> management as a separate component that can be swapped out so the user >>> can >>> decide how it should be done. >>> >>> -Todd >>> >>> >>>> On 6/9/14, 8:16 AM, "Robert Withers" <robert.w.with...@gmail.com> >>>> wrote: >>>> >>>> Yes, that sounds familiar as I helped write (minimally) S/MIME in >>>> squeak >>>> (open source Smalltalk environment). This what I was thinking in my >>>> alternative here, though I have a concern... >>>> >>>> Production may occur before the consumer is coded and executed. In the >>>> analogy of mail, the mail is sent before the complete recipient list is >>>> known. >>>> >>>> This seems to mean that the private key (cert or OTP) must be stored >>>> and >>>> interacted with. My feeling is that key metadata are in a system >>>> encrypted Hbase store (session key store), for low latency reads, >>>> rather >>>> than a topic requiring scanning. Store the private keys and then give >>>> client access (producers/consumers) with the hash of the OTP. A new >>>> consumer comes along, create a new cert encoding the OTP hash. >>>> >>>> On write, use the producer cert to send a topic hash with the msg which >>>> would allow the broker to reuse or generate an OTP, stored in the >>>> session >>>> key store. >>>> >>>> On read (consumer), if we have a previously run reader, use the >>>> encrypted >>>> hash. If new, create consumer cert and encrypt the hash for that >>>> session. >>>> >>>> The reader/writer will pass a cert encrypted session hash. The trick >>>> seems to be converting hash to PK to encrypt/decrypt. Given Kafka >>>> resource distribution, we need system encryption for metadata and >>>> cert-based key exchange. This seems to mean triple encryption: >>>> 1) client to/from broker >>>> 2) system key/hash mgmt/translation >>>> 3) at-rest encryption >>>> >>>> Thanks, >>>> Rob >>>> >>>>> On Jun 9, 2014, at 7:57 AM, Todd Palino <tpal...@linkedin.com.INVALID> >>>>> wrote: >>>>> >>>>> It’s the same method used by S/MIME and many other encryption >>>>> specifications with the potential for multiple recipients. The sender >>>>> generates a session key, and uses that key to encrypt the message. The >>>>> session key is then encrypted once for each recipient with that >>>>> recipient’s public key. All of the encrypted copies of the session key >>>>> are >>>>> then included with the encrypted message. This way, you avoid having >>>>> to >>>>> encrypt the message multiple times (this assumes, of course, that the >>>>> message itself is larger than the key). >>>>> >>>>> In our case, we have some options available to us. We could do that, >>>>> and >>>>> put all the encrypted keys in the message metadata. Or we could treat >>>>> it >>>>> more like a session and have the encrypted session keys in a special >>>>> topic >>>>> (e.g. __session_keys), much like offsets are now. When the producer >>>>> starts >>>>> up, they create a session key and encrypt it for each consumer with >>>>> the >>>>> current consumer key. The producer publishes the bundle of encrypted >>>>> keys >>>>> into __session_keys as a single message. The producer then publishes >>>>> messages to the normal topic encrypted with the session key. The >>>>> metadata >>>>> for each of those messages would contain something the offset into >>>>> __session_keys to identify the bundle. This has the added benefit of >>>>> not >>>>> increasing the per-message data size too much. >>>>> >>>>> Whenever a consumer key is invalidated, or however often the session >>>>> key >>>>> should be rotated, the producer would publish a new bundle. This >>>>> maintains >>>>> a history of session keys that can be used to decrypt any messages, so >>>>> the >>>>> retention on __session_keys must be at least as long as any topic >>>>> which >>>>> may potentially contain encrypted data. Past that point, it’s up to >>>>> the >>>>> consumer what they want to do with the data. A consumer like Hadoop >>>>> might >>>>> re-encrypt it for local storage, or store it in plaintext (depending >>>>> on >>>>> the security and requirements of that system). >>>>> >>>>> -Todd >>>>> >>>>>> On 6/8/14, 2:33 PM, "Rob Withers" <robert.w.with...@gmail.com> wrote: >>>>>> >>>>>> I like the use of meta envelopes. We did this recently, on the job, >>>>>> as we have an envelope that specifies the type for decoding. We >>>>>> discussed adding the encodinType and you are suggesting adding >>>>>> encryption metadata for that msg. All good. >>>>>> >>>>>> I don't see your OTP example. Could you delve deeper for me, please? >>>>>> The model I envision is internal OTP, with access to decryption >>>>>> accessed by cert. A double layer of security, with the internal at- >>>>>> rest encryption being an unchanging OTP with ACL access to it as the >>>>>> upper layer. Are you saying it is possible to re-encrypt with new >>>>>> keys or that there is a chain of keys over time? >>>>>> >>>>>> Thanks, >>>>>> Rob >>>>>> >>>>>>> On Jun 8, 2014, at 3:06 PM, Todd Palino wrote: >>>>>>> >>>>>>> I’ll agree that perhaps the “absolutely not” is not quite right. >>>>>>> There are >>>>>>> certainly some uses for a simpler solution, but I would still say it >>>>>>> cannot only be encryption at the broker. This would leave many use >>>>>>> cases >>>>>>> for at-rest encryption out of the loop (most auditing cases for SOX, >>>>>>> PCI, >>>>>>> HIPAA, and other PII standards). Yes, it does add external overhead >>>>>>> that >>>>>>> must be managed, but it’s just the nature of the beast. We can’t >>>>>>> solve all >>>>>>> of the external infrastructure needed for this, but we can make it >>>>>>> easier >>>>>>> to use for consumers and producers by adding metadata. >>>>>>> >>>>>>> There’s no need for unchanging encryption, and that’s specifically >>>>>>> why I >>>>>>> want to see a message envelope that will help consumers determine >>>>>>> the >>>>>>> encryption uses for a particular message. You can definitely still >>>>>>> expire >>>>>>> keys, you just have to keep the expired keys around as long as the >>>>>>> encrypted data stays around, and your endpoints need to know when >>>>>>> they are >>>>>>> decrypting data with an expired key (you might want to throw up a >>>>>>> warning, >>>>>>> or do something else to let the users know that it’s happening). And >>>>>>> as >>>>>>> someone else mentioned, there are solutions for encrypting data for >>>>>>> multiple consumers. You can encrypt the data with an OTP, and then >>>>>>> multiply encrypt the OTP once for each consumer and store those >>>>>>> encrypted >>>>>>> strings in the envelope. >>>>>>> >>>>>>> -Todd >>>>>>> >>>>>>>> On 6/7/14, 12:25 PM, "Rob Withers" <robert.w.with...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> At one level this makes sense to me to externalize the security >>>>>>>> issue >>>>>>>> to producers and consumers. On consideration I realized that this >>>>>>>> adds a lot of coordination requirements to the app layer across >>>>>>>> teams >>>>>>>> or even companies. Another issue I feel is that you want a >>>>>>>> specific >>>>>>>> unchanging encryption for the data and the clients (producers/ >>>>>>>> consumers) will need to be able to decode frozen data. If certs >>>>>>>> are >>>>>>>> used they cannot expire. Also, different clients would need to use >>>>>>>> the same cert. >>>>>>>> >>>>>>>> So, you statement that it should ABSOLUTELY not include internal >>>>>>>> encryption rings seems misplaced. There are some customers of >>>>>>>> kafka >>>>>>>> that would opt to encrypt the on-disk data and key management is a >>>>>>>> significant issue. This is best handled internally, with key >>>>>>>> management stored in either ZK or in a topic. Truly, perhaps >>>>>>>> annealing Hadoop/HBASE as a metadata store seems applicable. >>>>>>>> >>>>>>>> Thanks, another 2 cents, >>>>>>>> Rob >>>>>>>> >>>>>>>>> On Jun 6, 2014, at 12:15 PM, Todd Palino wrote: >>>>>>>>> >>>>>>>>> Yes, I realized last night that I needed to be clearer in what I >>>>>>>>> was >>>>>>>>> saying. Encryption should ABSOLUTELY not be handled server-side. I >>>>>>>>> think >>>>>>>>> it¹s a good idea to enable use of it in the consumer/producer, but >>>>>>>>> doing >>>>>>>>> it server side will not solve many use cases for needing >>>>>>>>> encryption >>>>>>>>> because the server then has access to all the keys. You could say >>>>>>>>> that >>>>>>>>> this eliminates the need for TLS, but TLS is pretty low-hanging >>>>>>>>> fruit, and >>>>>>>>> there¹s definitely a need for encryption of the traffic across the >>>>>>>>> network >>>>>>>>> even if you don¹t need at-rest encryption as well. >>>>>>>>> >>>>>>>>> And as you mentioned, something needs to be done about key >>>>>>>>> management. >>>>>>>>> Storing information with the message about which key(s) was used >>>>>>>>> is >>>>>>>>> a good >>>>>>>>> idea, because it allows you to know when a producer has switched >>>>>>>>> keys. >>>>>>>>> There are definitely some alternative solutions to that as well. >>>>>>>>> But >>>>>>>>> storing the keys in the broker, Zookeeper, or other systems like >>>>>>>>> that are >>>>>>>>> not. There needs to be a system used where the keys are only >>>>>>>>> available to >>>>>>>>> the producers and consumers that need them, and they only get >>>>>>>>> access >>>>>>>>> to >>>>>>>>> the appropriate part of the key pair. Even as the guy running >>>>>>>>> Kafka >>>>>>>>> and >>>>>>>>> Zookeeper, I should not have access to the keys being used, and if >>>>>>>>> data is >>>>>>>>> encrypted I should not be able to see the cleartext. >>>>>>>>> >>>>>>>>> And even if we decide not to put anything about at-rest encryption >>>>>>>>> in the >>>>>>>>> consumer/producer clients directly, and leave it for an exercise >>>>>>>>> above >>>>>>>>> that level (you have to pass the ciphertext as the message to the >>>>>>>>> client), >>>>>>>>> I still think there is a good case for implementing a message >>>>>>>>> envelope >>>>>>>>> that can store the information about which key was used, and other >>>>>>>>> pertinent metadata, and have the ability for special applications >>>>>>>>> like >>>>>>>>> mirror maker to be able to preserve it across clusters. This still >>>>>>>>> helps >>>>>>>>> to enable the use of encryption and other features (like auditing) >>>>>>>>> even if >>>>>>>>> we decide it¹s too large a scope to fully implement. >>>>>>>>> >>>>>>>>> -Todd >>>>>>>>> >>>>>>>>> On 6/6/14, 10:51 AM, "Pradeep Gollakota" <pradeep...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I'm actually not convinced that encryption needs to be handled >>>>>>>>>> server side >>>>>>>>>> in Kafka. I think the best solution for encryption is to handle >>>>>>>>>> it >>>>>>>>>> producer/consumer side just like compression. This will offload >>>>>>>>>> key >>>>>>>>>> management to the users and we'll still be able to leverage the >>>>>>>>>> sendfile >>>>>>>>>> optimization for better performance. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers >>>>>>>>>> <robert.w.with...@gmail.com >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On consideration, if we have 3 different access groups (1 for >>>>>>>>>>> production >>>>>>>>>>> WRITE and 2 consumers) they all need to decode the same >>>>>>>>>>> encryption >>>>>>>>>>> and >>>>>>>>>>> so >>>>>>>>>>> all need the same public/private key....certs won't work, unless >>>>>>>>>>> you >>>>>>>>>>> write >>>>>>>>>>> a CertAuthority to build multiple certs with the same keys. >>>>>>>>>>> Better >>>>>>>>>>> seems >>>>>>>>>>> to not use certs and wrap the encryption specification with an >>>>>>>>>>> ACL >>>>>>>>>>> capabilities for each group of access. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jun 6, 2014, at 11:43 AM, Rob Withers wrote: >>>>>>>>>>> >>>>>>>>>>> This is quite interesting to me and it is an excelent >>>>>>>>>>> opportunity to >>>>>>>>>>>> promote a slightly different security scheme. Object- >>>>>>>>>>>> capabilities are >>>>>>>>>>>> perfect for online security and would use ACL style >>>>>>>>>>>> authentication to >>>>>>>>>>>> gain >>>>>>>>>>>> capabilities filtered to those allowed resources for allow >>>>>>>>>>>> actions >>>>>>>>>>>> (READ/WRITE/DELETE/LIST/SCAN). Erights.org has the >>>>>>>>>>>> quitenscential (??) >>>>>>>>>>>> object capabilities model and capnproto is impleemting this for >>>>>>>>>>>> C+ >>>>>>>>>>>> +. I >>>>>>>>>>>> have a java implementation at http://github.com/pauwau/pauwau >>>>>>>>>>>> but >>>>>>>>>>>> the >>>>>>>>>>>> master is broken. 0.2 works, basically. B asically a TLS >>>>>>>>>>>> connection >>>>>>>>>>>> with >>>>>>>>>>>> no certificate server, it is peer to peer. It has some >>>>>>>>>>>> advanced >>>>>>>>>>>> features, >>>>>>>>>>>> but the lining of capabilities with authorization so that you >>>>>>>>>>>> can >>>>>>>>>>>> only >>>>>>>>>>>> invoke correct services is extended to the secure user. >>>>>>>>>>>> >>>>>>>>>>>> Regarding non-repudiation, on disk, why not prepend a CRC? >>>>>>>>>>>> >>>>>>>>>>>> Regarding on-disk encryption, multiple users/groups may need to >>>>>>>>>>>> access, >>>>>>>>>>>> with different capabilities. Sounds like zookeeper needs to >>>>>>>>>>>> store a >>>>>>>>>>>> cert >>>>>>>>>>>> for each class of access so that a group member can access the >>>>>>>>>>>> decrypted >>>>>>>>>>>> data from disk. Use cert-based async decryption. The only >>>>>>>>>>>> isue is >>>>>>>>>>>> storing >>>>>>>>>>>> the private key in zookeeper. Perhaps some hash magic could be >>>>>>>>>>>> used. >>>>>>>>>>>> >>>>>>>>>>>> Thanks for kafka, >>>>>>>>>>>> Rob >>>>>>>>>>>> >>>>>>>>>>>> On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hey Joe, >>>>>>>>>>>>> >>>>>>>>>>>>> I don't really understand the sections you added to the wiki. >>>>>>>>>>>>> Can you >>>>>>>>>>>>> clarify them? >>>>>>>>>>>>> >>>>>>>>>>>>> Is non-repudiation what SASL would call integrity checks? If >>>>>>>>>>>>> so >>>>>>>>>>>>> don't >>>>>>>>>>>>> SSL >>>>>>>>>>>>> and and many of the SASL schemes already support this as well >>>>>>>>>>>>> as >>>>>>>>>>>>> on-the-wire encryption? >>>>>>>>>>>>> >>>>>>>>>>>>> Or are you proposing an on-disk encryption scheme? Is this >>>>>>>>>>>>> actually >>>>>>>>>>>>> needed? >>>>>>>>>>>>> Isn't a on-the-wire encryption when combined with mutual >>>>>>>>>>>>> authentication >>>>>>>>>>>>> and >>>>>>>>>>>>> permissions sufficient for most uses? >>>>>>>>>>>>> >>>>>>>>>>>>> On-disk encryption seems unnecessary because if an attacker >>>>>>>>>>>>> can >>>>>>>>>>>>> get >>>>>>>>>>>>> root >>>>>>>>>>>>> on >>>>>>>>>>>>> the kafka boxes it can potentially modify Kafka to do anything >>>>>>>>>>>>> he or >>>>>>>>>>>>> she >>>>>>>>>>>>> wants with data. So this seems to break any security model. >>>>>>>>>>>>> >>>>>>>>>>>>> I understand the problem of a large organization not really >>>>>>>>>>>>> having a >>>>>>>>>>>>> trusted network and wanting to secure data transfer and limit >>>>>>>>>>>>> and >>>>>>>>>>>>> audit >>>>>>>>>>>>> data access. The uses for these other things I don't totally >>>>>>>>>>>>> understand. >>>>>>>>>>>>> >>>>>>>>>>>>> Also it would be worth understanding the state of other >>>>>>>>>>>>> messaging and >>>>>>>>>>>>> storage systems (Hadoop, dbs, etc). What features do they >>>>>>>>>>>>> support. I >>>>>>>>>>>>> think >>>>>>>>>>>>> there is a sense in which you don't have to run faster than >>>>>>>>>>>>> the >>>>>>>>>>>>> bear, >>>>>>>>>>>>> but >>>>>>>>>>>>> only faster then your friends. :-) >>>>>>>>>>>>> >>>>>>>>>>>>> -Jay >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein >>>>>>>>>>>>> <joe.st...@stealth.ly> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I like the idea of working on the spec and prioritizing. I >>>>>>>>>>>>> will >>>>>>>>>>>>> update >>>>>>>>>>>>>> the >>>>>>>>>>>>>> wiki. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Joestein >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps >>>>>>>>>>>>>> <jay.kr...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey Joe, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for kicking this discussion off! I totally agree that >>>>>>>>>>>>>>> for >>>>>>>>>>>>>> something >>>>>>>>>>>>>> >>>>>>>>>>>>>>> that acts as a central message broker security is critical >>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>> I >>>>>>>>>>>>>> think >>>>>>>>>>>>>> >>>>>>>>>>>>>>> a number of people have been interested in this topic and >>>>>>>>>>>>>>> several >>>>>>>>>>>>>>> people >>>>>>>>>>>>>>> have put effort into special purpose security efforts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since most the LinkedIn folks are working on the consumer >>>>>>>>>>>>>>> right now >>>>>>>>>>>>>>> I >>>>>>>>>>>>>> think >>>>>>>>>>>>>> >>>>>>>>>>>>>>> this would be a great project for any other interested >>>>>>>>>>>>>>> people to >>>>>>>>>>>>>>> take >>>>>>>>>>>>>>> on. >>>>>>>>>>>>>>> There are some challenges in doing these things distributed >>>>>>>>>>>>>>> but it >>>>>>>>>>>>>>> can >>>>>>>>>>>>>> also >>>>>>>>>>>>>> >>>>>>>>>>>>>>> be a lot of fun. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think a good first step would be to get a written plan we >>>>>>>>>>>>>>> can all >>>>>>>>>>>>>>> agree >>>>>>>>>>>>>>> on for how things should work. Then we can break things down >>>>>>>>>>>>>>> into >>>>>>>>>>>>>>> chunks >>>>>>>>>>>>>>> that can be done independently while still aiming at a good >>>>>>>>>>>>>>> end >>>>>>>>>>>>>>> state. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I had tried to write up some notes that summarized at least >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> thoughts >>>>>>>>>>>>>> I >>>>>>>>>>>>>> >>>>>>>>>>>>>>> had had on security: >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What do you think of that? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One assumption I had (which may be incorrect) is that >>>>>>>>>>>>>>> although >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>> want >>>>>>>>>>>>>> all >>>>>>>>>>>>>> >>>>>>>>>>>>>>> the things in your list, the two most pressing would be >>>>>>>>>>>>>>> authentication >>>>>>>>>>>>>> and >>>>>>>>>>>>>> >>>>>>>>>>>>>>> authorization, and that was all that write up covered. You >>>>>>>>>>>>>>> have more >>>>>>>>>>>>>>> experience in this domain, so I wonder how you would >>>>>>>>>>>>>>> prioritize? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Those notes are really sketchy, so I think the first goal I >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>> have >>>>>>>>>>>>>>> would be to get to a real spec we can all agree on and >>>>>>>>>>>>>>> discuss. A >>>>>>>>>>>>>>> lot >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> the security stuff has a high human interaction element and >>>>>>>>>>>>>>> needs to >>>>>>>>>>>>>>> work >>>>>>>>>>>>>>> in pretty different domains and different companies so >>>>>>>>>>>>>>> getting >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>> kind >>>>>>>>>>>>>> of >>>>>>>>>>>>>> >>>>>>>>>>>>>>> review is important. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein >>>>>>>>>>>>>>> <joe.st...@stealth.ly> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi,I wanted to re-ignite the discussion around Apache Kafka >>>>>>>>>>>>>>> Security. >>>>>>>>>>>>>>> This >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> is a huge bottleneck (non-starter in some cases) for a lot >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> organizations >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (due to regulatory, compliance and other requirements). >>>>>>>>>>>>>>>> Below >>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>> my >>>>>>>>>>>>>>>> suggestions for specific changes in Kafka to accommodate >>>>>>>>>>>>>>>> security >>>>>>>>>>>>>>>> requirements. This comes from what folks are doing "in the >>>>>>>>>>>>>>>> wild" >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> workaround and implement security with Kafka as it is today >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>> what I >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> have discovered from organizations about their blockers. It >>>>>>>>>>>>>>>> also >>>>>>>>>>>>>>>> picks >>>>>>>>>>>>>>> up >>>>>>>>>>>>>> >>>>>>>>>>>>>>> from the wiki (which I should have time to update later in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> week >>>>>>>>>>>>>>> based >>>>>>>>>>>>>> >>>>>>>>>>>>>>> on the below and feedback from the thread). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Transport Layer Security (i.e. SSL) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This also includes client authentication in addition to in- >>>>>>>>>>>>>>>> transit >>>>>>>>>>>>>>> security >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> layer. This work has been picked up here >>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1477 and do >>>>>>>>>>>>>>>> appreciate >>>>>>>>>>>>>>>> any >>>>>>>>>>>>>>>> thoughts, comments, feedback, tomatoes, whatever for this >>>>>>>>>>>>>>>> patch. >>>>>>>>>>>>>>>> It >>>>>>>>>>>>>>> is a >>>>>>>>>>>>>> >>>>>>>>>>>>>>> pickup from the fork of the work first done here >>>>>>>>>>>>>>>> https://github.com/relango/kafka/tree/kafka_security. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) Data encryption at rest. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is very important and something that can be >>>>>>>>>>>>>>>> facilitated >>>>>>>>>>>>>>>> within >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> wire protocol. It requires an additional map data structure >>>>>>>>>>>>>>>> for the >>>>>>>>>>>>>>>> "encrypted [data encryption key]". With this map (either in >>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>>> object >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in the wire protocol) you can store the dynamically >>>>>>>>>>>>>>>> generated >>>>>>>>>>>>>>>> symmetric >>>>>>>>>>>>>>> key >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (for each message) and then encrypt the data using that >>>>>>>>>>>>>>>> dynamically >>>>>>>>>>>>>>>> generated key. You then encrypt the encryption key using >>>>>>>>>>>>>>>> each >>>>>>>>>>>>>>>> public >>>>>>>>>>>>>>> key >>>>>>>>>>>>>> >>>>>>>>>>>>>>> for whom is expected to be able to decrypt the encryption >>>>>>>>>>>>>>> key to >>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>> decrypt the message. For each public key encrypted >>>>>>>>>>>>>>>> symmetric >>>>>>>>>>>>>>>> key >>>>>>>>>>>>>>> (which >>>>>>>>>>>>>> >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> now the "encrypted [data encryption key]" along with which >>>>>>>>>>>>>>>> public >>>>>>>>>>>>>>>> key >>>>>>>>>>>>>>> it >>>>>>>>>>>>>> >>>>>>>>>>>>>>> was encrypted with for (so a map of [publicKey] = >>>>>>>>>>>>>>>> encryptedDataEncryptionKey) as a chain. Other patterns >>>>>>>>>>>>>>>> can be >>>>>>>>>>>>>>> implemented >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> but this is a pretty standard digital enveloping [0] >>>>>>>>>>>>>>>> pattern >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>> 1 >>>>>>>>>>>>>> >>>>>>>>>>>>>>> field added. Other patterns should be able to use that field >>>>>>>>>>>>>>> to-do >>>>>>>>>>>>>>> their >>>>>>>>>>>>>> >>>>>>>>>>>>>>> implementation too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3) Non-repudiation and long term non-repudiation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Non-repudiation is proving data hasn't changed. This is >>>>>>>>>>>>>>>> often (if >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> always) done with x509 public certificates (chained to a >>>>>>>>>>>>>>>> certificate >>>>>>>>>>>>>>>> authority). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Long term non-repudiation is what happens when the >>>>>>>>>>>>>>>> certificates of >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> certificate authority are expired (or revoked) and >>>>>>>>>>>>>>>> everything >>>>>>>>>>>>>>>> ever >>>>>>>>>>>>>>> signed >>>>>>>>>>>>>> >>>>>>>>>>>>>>> (ever) with that certificate's public key then becomes "no >>>>>>>>>>>>>>> longer >>>>>>>>>>>>>>> provable >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> as ever being authentic". That is where RFC3126 [1] and >>>>>>>>>>>>>>>> RFC3161 >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>> come >>>>>>>>>>>>>> >>>>>>>>>>>>>>> in (or worm drives [hardware], etc). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For either (or both) of these it is an operation of the >>>>>>>>>>>>>>>> encryptor >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> sign/hash the data (with or without third party trusted >>>>>>>>>>>>>>>> timestap of >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> signing event) and encrypt that with their own private key >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> distribute >>>>>>>>>>>>>> >>>>>>>>>>>>>>> the results (before and after encrypting if required) along >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>> public key. This structure is a bit more complex but >>>>>>>>>>>>>>>> feasible, it >>>>>>>>>>>>>>>> is a >>>>>>>>>>>>>>> map >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> of digital signature formats and the chain of dig sig >>>>>>>>>>>>>>>> attestations. >>>>>>>>>>>>>>> The >>>>>>>>>>>>>> >>>>>>>>>>>>>>> map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig >>>>>>>>>>>>>>> [4]) >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> then >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> a list of map where that key is "purpose" of signature >>>>>>>>>>>>>>>> (what >>>>>>>>>>>>>>>> your >>>>>>>>>>>>>>> attesting >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> too). As a sibling field to the list another field for >>>>>>>>>>>>>>>> "the >>>>>>>>>>>>>>>> attester" >>>>>>>>>>>>>>> as >>>>>>>>>>>>>> >>>>>>>>>>>>>>> bytes (e.g. their PKCS12 [5] for the map of PKCS7 >>>>>>>>>>>>>>> signatures). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 4) Authorization >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We should have a policy of "404" for data, topics, >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>> (etc) if >>>>>>>>>>>>>>>> authenticated connections do not have access. In "secure >>>>>>>>>>>>>>>> mode" any >>>>>>>>>>>>>>>> non >>>>>>>>>>>>>>>> authenticated connections should get a "404" type message >>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Knowing "something is there" is a security risk in many uses >>>>>>>>>>>>>>> cases. >>>>>>>>>>>>>>> So >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> you don't have access you don't even see it. Baking "that" >>>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>> Kafka >>>>>>>>>>>>>>>> along with some interface for entitlement (access >>>>>>>>>>>>>>>> management) >>>>>>>>>>>>>>>> systems >>>>>>>>>>>>>>>> (pretty standard) is all that I think needs to be done to >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> core >>>>>>>>>>>>>>> project. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I want to tackle item later in the year after summer after >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> other >>>>>>>>>>>>>>> three >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> are complete. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I look forward to thoughts on this and anyone else >>>>>>>>>>>>>>>> interested >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>> working >>>>>>>>>>>>>> >>>>>>>>>>>>>>> with us on these items. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [0] >>>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards- >>>>>>>>>>>>>> initiatives/what-is-a-digital-envelope.htm >>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] http://tools.ietf.org/html/rfc3126 >>>>>>>>>>>>>>>> [2] http://tools.ietf.org/html/rfc3161 >>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pk >>>>>>>>>>>>>>> cs >>>>>>>>>>>>>>> -7 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>> cryptographic-message-syntax-standar.htm >>>>>>>>>>>>>> >>>>>>>>>>>>>>> [4] http://en.wikipedia.org/wiki/XML_Signature >>>>>>>>>>>>>>>> [5] http://en.wikipedia.org/wiki/PKCS_12 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> /******************************************* >>>>>>>>>>>>>>>> Joe Stein >>>>>>>>>>>>>>>> Founder, Principal Consultant >>>>>>>>>>>>>>>> Big Data Open Source Security LLC >>>>>>>>>>>>>>>> http://www.stealth.ly >>>>>>>>>>>>>>>> Twitter: @allthingshadoop >>>>>>>>>>>>>>>> <http://www.twitter.com/allthingshadoop >>>>>>>>>>>>>>>> ********************************************/ >