Re: Producer questions and more

David Arthur Fri, 14 Dec 2012 19:33:31 -0800

7) Something like: http://i.imgur.com/R21iF.png ?


On 12/14/12 10:04 PM, David Arthur wrote:

7) Perhaps a dung beetle should be the logo, as featured in TheMetamorphosis. Or maybe just a nice stylized version of the word"Kafka" (like Solr and Lucene).


On 12/14/12 4:58 PM, Johan Lundahl wrote:

Thanks for some very helpful answers!

1) Great, our needs are somewhere in the thousands of topics and we could
probably scale out the number of servers as needed.

2) The reason I would like to separate out the producer is to have as small
and simple a library to integrate in our deployment as possible. Both
conflicts and size would be of interest to reduce since our apps are pretty
sensitive in practice. The best case would be to only have the
KafkaLog4jAppender as one well defined dependency but for now I'll just run
it through Proguard I think.

3) I don't know the motifs or state of Jafka (
https://github.com/adyliu/jafka) but if that project aims to be protocol
compatible, maybe that could be of help.

4) Ah thanks, I didn't even consider the console producer.

5) I'll do some tests with both 0.72 and 0.8 and see what happens. I would
need to start working on the integration for real around the end of Jan or
beginning of Feb. The most important thing is that the producer does as
little impact to our application as possible. On broker and consumer side,
it matters less since that will be in prototype mode in the beginning.

6) Very nice

7) What is the reasoning behind the Kafka name? "THE PROCESS", "Kafkaesque
complexity"? Even though I royally suck at design, possibly I'll try to do
some sketches of something during the holidays to at least fill the void in
some presentations.

Thanks again!


On Fri, Dec 14, 2012 at 9:20 PM, Jay Kreps<[email protected]>  wrote:

1. There are two kinds of limits: per server and overall. The per server
limits come from the fact that we use one directory and at least one file
per partition-replica. The normal rules of unix filesystem scalability
apply. The per server limits can be mitigated by adding more servers. The
overall limits mostly come from zookeeper, which we use for partition
metadata. Zookeeper is non-partitioned and all in memory, so this probably
puts the limit in the millions? These are the fundamental limits. More
practically, we don't have regular performance tests for very large numbers
of partitions, so it is buyer beware. So I think LinkedIn has something
like a few thousand partitions in total. If you have more than that it
should theoretically work up to the limits I described but you should try
it first--if you uncover issues we are definitely interested in fixing
them.

2. We haven't tried to separate out the client from the broker. It is
possible, of course, but no one has done it. Can I ask specifically the
problem you are interested in solving (fewer dependency conflicts? smaller
binary?).

3. The log4j appender relys on the normal scala producer. It is possible to
rewrite the producer in java, but it would be some work. This might be a
good idea--I agree that the clients should ideally be thin and have
few dependencies. The practical problem this introduces is that code
sharing becomes a bit trickier. You are correct that the producer should no
longer depend on zookeeper.

4. There is no mod_kafka that I know of. There is a console producer that
will suck in file input and output kafka messages, which might work for
you. mod_kafka would be a pretty sweet project idea.

5. Yes, this is true. We increased the scope of 0.8 quite a bit try to
bundle non-compatible changes together. The answer depends on your level of
risk tolerance. Right now at LinkedIn we are subjecting 0.8 to a forked
version of our production load and we are still finding plenty of issues.
We are hoping to get that stable in the next few weeks, and it will likely
take several months to completely roll over all applications to 0.8 here.
So right now it is probably safe for development only. When we have rolled
it out 100% I would feel pretty confident saying it is very solid. In
between now and then kind of depends on your risk tolerance. Perhaps one
thing we could do is give a little more of an update as this testing
progresses. It is obviously hard to give a rigerous schedule since it is
mostly unknown unknowns.

6. As of a few days ago svn is used only for the website, and that is only
because of a dependence on apache tooling.

7. There hasn't really been much a discussion on Logo, though we definitely
need one. I offered to act as "personal programming slave" to any of the
LinkedIn designers if they would make us a nice logo. If that approach
fails maybe we should just do 99 designs?

Cheers,

-Jay


On Fri, Dec 14, 2012 at 4:42 AM, Johan Lundahl <[email protected]

wrote:
Hi,

I'm trying to promote Kafka for our centralized log aggregation/metrics
system and have set up a proof of concept based on 0.7.2 which seems to
work very well for our purposes but the improvements in 0.8 looks too
important for us to go live without them. After studying the presentation
material and videos I have some questions:

1) It's mentioned by Jay in one of the videos that Kafka is designed for

1000 topics. I understand the fundamentals of what a topic is meant to be
but are there any real system limits in regards to this? In our case, we
have around 100 clusters running our different (java only) applications
with a guesstimate average size of 40 nodes each. We have around 30
different types of logs plus some other metrics so this would give us 100
clusters * 35 types = 3500 topics. Furthermore, it's likely that the

number

of clusters will increase in the future. Is this something that could

cause

us trouble or are this figure of < 1000 topics just a guideline?

2) The KafkaLog4jAppender is a very convenient way for us to stream our
logs since no changes in the application code will be needed but is it
possible to build a lightweight jar with only the KafkaLog4jAppender
producer that we easily could deploy on our production servers? I'm not

an

sbt expert but I could only manage to build a full package including

broker

and everything which is a lot heavier.

3) As our applications are pure java, it would be nice to avoid the scala
runtime on the producer side. Would it be feasible to implement the
KafkaLog4jAppender in java? With 0.8, the dependency on Zookeeper should
not be needed on producer side either if I understand correctly right?

4) How do you handle non application logs, for example webserver logs? Is
there something like an Apache httpd mod_kafka? OS metrics?

5) In general I think it's somewhat tricky to follow the status of the
different Kafka versions. It seems like 0.8 has been postponed a bit
relative to original plans but are there newer estimations of when it can
be considered "stable"? Is there a summary of the important changes for

the

version?

6) I've seen a few mails recently about git migration. Would it be enough
to only use git from 0.8 or would I still need svn for anything?

7) Has there been discussions about creating a logo for Kafka? My
conceptual system diagrams look a bit empty on the Kafka parts in the
promo-slides I've made...(the same thing applies to the Storm parts)

Thanks a lot in advance for your help!

Re: Producer questions and more

Reply via email to