Re: [DISCUSS] Kafka Security Specific Features

Rob Withers Sat, 07 Jun 2014 12:27:07 -0700

At one level this makes sense to me to externalize the security issueto producers and consumers. On consideration I realized that thisadds a lot of coordination requirements to the app layer across teamsor even companies. Another issue I feel is that you want a specificunchanging encryption for the data and the clients (producers/consumers) will need to be able to decode frozen data. If certs areused they cannot expire. Also, different clients would need to usethe same cert.

So, you statement that it should ABSOLUTELY not include internalencryption rings seems misplaced. There are some customers of kafkathat would opt to encrypt the on-disk data and key management is asignificant issue. This is best handled internally, with keymanagement stored in either ZK or in a topic. Truly, perhapsannealing Hadoop/HBASE as a metadata store seems applicable.


Thanks, another 2 cents,
Rob

On Jun 6, 2014, at 12:15 PM, Todd Palino wrote:

Yes, I realized last night that I needed to be clearer in what I was
saying. Encryption should ABSOLUTELY not be handled server-side. Ithinkit¹s a good idea to enable use of it in the consumer/producer, butdoing
it server side will not solve many use cases for needing encryption
because the server then has access to all the keys. You could say that
this eliminates the need for TLS, but TLS is pretty low-hangingfruit, andthere¹s definitely a need for encryption of the traffic across thenetwork
even if you don¹t need at-rest encryption as well.

And as you mentioned, something needs to be done about key management.
Storing information with the message about which key(s) was used isa good
idea, because it allows you to know when a producer has switched keys.
There are definitely some alternative solutions to that as well. But
storing the keys in the broker, Zookeeper, or other systems likethat arenot. There needs to be a system used where the keys are onlyavailable tothe producers and consumers that need them, and they only get accesstothe appropriate part of the key pair. Even as the guy running KafkaandZookeeper, I should not have access to the keys being used, and ifdata is
encrypted I should not be able to see the cleartext.
And even if we decide not to put anything about at-rest encryptionin the
consumer/producer clients directly, and leave it for an exercise above
that level (you have to pass the ciphertext as the message to theclient),
I still think there is a good case for implementing a message envelope
that can store the information about which key was used, and other
pertinent metadata, and have the ability for special applications like
mirror maker to be able to preserve it across clusters. This stillhelpsto enable the use of encryption and other features (like auditing)even if
we decide it¹s too large a scope to fully implement.

-Todd

On 6/6/14, 10:51 AM, "Pradeep Gollakota" <[email protected]> wrote:
I'm actually not convinced that encryption needs to be handledserver side
in Kafka. I think the best solution for encryption is to handle it
producer/consumer side just like compression. This will offload key
management to the users and we'll still be able to leverage thesendfile
optimization for better performance.
On Fri, Jun 6, 2014 at 10:48 AM, Rob Withers <[email protected]>
wrote:
On consideration, if we have 3 different access groups (1 forproductionWRITE and 2 consumers) they all need to decode the same encryptionand
so
all need the same public/private key....certs won't work, unless you
write
a CertAuthority to build multiple certs with the same keys.  Better
seems
to not use certs and wrap the encryption specification with an ACL
capabilities for each group of access.


On Jun 6, 2014, at 11:43 AM, Rob Withers wrote:

This is quite interesting to me and it is an excelent opportunity to
promote a slightly different security scheme. Object-capabilities areperfect for online security and would use ACL styleauthentication to
gain
capabilities filtered to those allowed resources for allow actions
(READ/WRITE/DELETE/LIST/SCAN). Erights.org has thequitenscential (??)object capabilities model and capnproto is impleemting this for C++. Ihave a java implementation at http://github.com/pauwau/pauwau butthemaster is broken. 0.2 works, basically. B asically a TLSconnection
with
no certificate server, it is peer to peer.  It has some advanced
features,
but the lining of capabilities with authorization so that you canonly
invoke correct services is extended to the secure user.

Regarding non-repudiation, on disk, why not prepend a CRC?
Regarding on-disk encryption, multiple users/groups may need toaccess,with different capabilities. Sounds like zookeeper needs tostore a
cert
for each class of access so that a group member can access the
decrypted
data from disk.  Use cert-based async decryption.  The only isue is
storing
the private key in zookeeper. Perhaps some hash magic could beused.
Thanks for kafka,
Rob

On Jun 5, 2014, at 3:01 PM, Jay Kreps wrote:

Hey Joe,
I don't really understand the sections you added to the wiki.Can you
clarify them?
Is non-repudiation what SASL would call integrity checks? If sodon't
SSL
and and many of the SASL schemes already support this as well as
on-the-wire encryption?
Or are you proposing an on-disk encryption scheme? Is thisactually
needed?
Isn't a on-the-wire encryption when combined with mutual
authentication
and
permissions sufficient for most uses?
On-disk encryption seems unnecessary because if an attacker canget
root
on
the kafka boxes it can potentially modify Kafka to do anythinghe or
she
wants with data. So this seems to break any security model.
I understand the problem of a large organization not reallyhaving a
trusted network and wanting to secure data transfer and limit and
audit
data access. The uses for these other things I don't totally
understand.
Also it would be worth understanding the state of othermessaging andstorage systems (Hadoop, dbs, etc). What features do theysupport. I
think
there is a sense in which you don't have to run faster than thebear,
but
only faster then your friends. :-)

-Jay


On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein <[email protected]>
wrote:

I like the idea of working on the spec and prioritizing. I will
update
the
wiki.

- Joestein


On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps <[email protected]>
wrote:

Hey Joe,
Thanks for kicking this discussion off! I totally agree that for
something
that acts as a central message broker security is criticalfeature.
I
think
a number of people have been interested in this topic andseveral
people
have put effort into special purpose security efforts.
Since most the LinkedIn folks are working on the consumerright now
I
think
this would be a great project for any other interested people to
take
on.
There are some challenges in doing these things distributedbut it
can
also
be a lot of fun.
I think a good first step would be to get a written plan wecan all
agree
on for how things should work. Then we can break things downinto
chunks
that can be done independently while still aiming at a good end
state.

I had tried to write up some notes that summarized at least the
thoughts
I
had had on security:
https://cwiki.apache.org/confluence/display/KAFKA/Security

What do you think of that?
One assumption I had (which may be incorrect) is that althoughwe
want
all
the things in your list, the two most pressing would be
authentication
and
authorization, and that was all that write up covered. Youhave more
experience in this domain, so I wonder how you would prioritize?
Those notes are really sketchy, so I think the first goal Iwould
have
would be to get to a real spec we can all agree on anddiscuss. A
lot
of
the security stuff has a high human interaction element andneeds to
work
in pretty different domains and different companies so gettingthis
kind
of
review is important.

-Jay
On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein<[email protected]>
wrote:

Hi,I wanted to re-ignite the discussion around Apache Kafka
Security.
This
is a huge bottleneck (non-starter in some cases) for a lot of
organizations
(due to regulatory, compliance and other requirements). Beloware
my
suggestions for specific changes in Kafka to accommodatesecurityrequirements. This comes from what folks are doing "in thewild"
to
workaround and implement security with Kafka as it is today and
also
what I
have discovered from organizations about their blockers. Italso
picks
up
from the wiki (which I should have time to update later in theweek
based
on the below and feedback from the thread).
1) Transport Layer Security (i.e. SSL)
This also includes client authentication in addition to in-transit
security
layer.  This work has been picked up here
https://issues.apache.org/jira/browse/KAFKA-1477 and doappreciate
any
thoughts, comments, feedback, tomatoes, whatever for thispatch.
It
is a
pickup from the fork of the work first done here
https://github.com/relango/kafka/tree/kafka_security.

2) Data encryption at rest.
This is very important and something that can be facilitatedwithin
the
wire protocol. It requires an additional map data structurefor the"encrypted [data encryption key]". With this map (either inyour
object
or
in the wire protocol) you can store the dynamically generated
symmetric
key
(for each message) and then encrypt the data using thatdynamically
generated key.  You then encrypt the encryption key using each
public
key
for whom is expected to be able to decrypt the encryption key to
then
decrypt the message. For each public key encrypted symmetrickey
(which
is
now the "encrypted [data encryption key]" along with whichpublic
key
it
was encrypted with for (so a map of [publicKey] =
encryptedDataEncryptionKey) as a chain.   Other patterns can be
implemented
but this is a pretty standard digital enveloping [0] patternwith
only
1
field added. Other patterns should be able to use that fieldto-do
their
implementation too.
3) Non-repudiation and long term non-repudiation.
Non-repudiation is proving data hasn't changed. This isoften (if
not
always) done with x509 public certificates (chained to a
certificate
authority).
Long term non-repudiation is what happens when thecertificates of
the
certificate authority are expired (or revoked) and everythingever
signed
(ever) with that certificate's public key then becomes "nolonger
provable
as ever being authentic". That is where RFC3126 [1] andRFC3161
[2]
come
in (or worm drives [hardware], etc).
For either (or both) of these it is an operation of theencryptor
to
sign/hash the data (with or without third party trustedtimestap of
the
signing event) and encrypt that with their own private key and
distribute
the results (before and after encrypting if required) along with
their
public key. This structure is a bit more complex butfeasible, it
is a
map
of digital signature formats and the chain of dig sigattestations.
The
map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig[4])
and
then
a list of map where that key is "purpose" of signature (whatyour
attesting
too).  As a sibling field to the list another field for "the
attester"
as
bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures).
4) Authorization

We should have a policy of "404" for data, topics, partitions
(etc) if
authenticated connections do not have access. In "securemode" any
non
authenticated connections should get a "404" type message on
everything.
Knowing "something is there" is a security risk in many usescases.
So
if
you don't have access you don't even see it. Baking "that"into
Kafka
along with some interface for entitlement (access management)
systems
(pretty standard) is all that I think needs to be done to thecore
project.
I want to tackle item later in the year after summer after the
other
three
are complete.
I look forward to thoughts on this and anyone else interestedin
working
with us on these items.
[0]
http://www.emc.com/emc-plus/rsa-labs/standards-
initiatives/what-is-a-digital-envelope.htm
[1] http://tools.ietf.org/html/rfc3126
[2] http://tools.ietf.org/html/rfc3161
[3]
http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-7-
cryptographic-message-syntax-standar.htm
[4] http://en.wikipedia.org/wiki/XML_Signature
[5] http://en.wikipedia.org/wiki/PKCS_12

/*******************************************
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

Re: [DISCUSS] Kafka Security Specific Features

Reply via email to