We don't use Kafka as a broker between storm components; it's an input from another service. That said you can get over 100k operations / second easily from kafka, and I'm sure if you scaled your topic to a higher number of partitions you could eclipse that. In the application that I work on Kafka is not the bottleneck, and we haven't really had to tune it that much.
On Sun, Oct 19, 2014 at 9:34 AM, Klausen Schaefersinho < [email protected]> wrote: > Hi, > > I am not a big fan of large topologies. The simple reason is, that it > increases complexity and makes it difficult to decouple components. Also > you can not simply deploy or undeploy an aspect ( sub topology). So the > approach of using an extern broker seems to be more appealing. > > However, does any body has experience with a broker based setup? What is > the performance penalty? > > > Cheers, > > Klaus > > On Sun, Oct 19, 2014 at 3:27 PM, Nathan Leung <[email protected]> wrote: > >> I agree with Jungtaek, the preferred approach is to either merge the >> topologies or use a broker such as Kafka. >> On Oct 19, 2014 12:12 AM, "임정택" <[email protected]> wrote: >> >>> How about merging topologies into one? >>> Though tuple timeout should be set to max processing time into all of >>> topologies, there's only way to work without adding other components. >>> >>> Btw, ideally supporting pub-sub between topology seems great, but AFAIK >>> there're many hurdles to realize. >>> 1. Subscribing spout should replay tuple when failure occurs >>> (by Guarantee message processing), but publishing bolt can't help to do it. >>> 2. Spout should have feature to receive from bolt (by TCP), which isn't >>> exist yet. >>> 3. Spout retrieves data from data source when nextTuple() occurs, which >>> may can't applied to pub-sub situation. >>> 4. Pub-sub spouts/bolts should allow task registration dynamically >>> (maybe it's already exist) >>> >>> So I also recommend adding message queue(kafka, rabbitmq, etc.) between >>> topologies. >>> >>> Please correct me if I'm wrong. >>> >>> Regards. >>> Jungtaek Lim (HeartSaVioR) >>> >>> 2014년 10월 17일 금요일, Klausen Schaefersinho<[email protected]>님이 >>> 작성한 메시지: >>> >>>> Hi, >>>> >>>> In my storm setup data arrives in form of files that I have to read and >>>> emit in my spout. Also my topology is very dynamic. Some topologies run >>>> quite long, whereas other can turned on and off frequently. In order to >>>> avoid that I have n spouts reading from the files, I was wondering if could >>>> have just one topology in the cluster which reads from the file and just >>>> emits tuples? All other topologies would than register and that "listen" to >>>> taht topology. >>>> >>>> Cheers, >>>> >>>> Klaus >>>> >>> >>> >>> -- >>> Name : 임 정택 >>> Blog : http://www.heartsavior.net / http://dev.heartsavior.net >>> Twitter : http://twitter.com/heartsavior >>> LinkedIn : http://www.linkedin.com/in/heartsavior >>> >>> >
