I haven’t used DRPC, so I can’t speak to it. That said, Kafka is pretty awesome and can do some really jaw dropping performance. If I were you, I’d consider standardizing around Kafka. If that isn’t viable, storm topologies are directional acyclic graphs, so you can merge two streams into a single stream — the illustration on wikipedia is pretty nice http://en.wikipedia.org/wiki/Directed_acyclic_graph.
Generally speaking, synchronous anything in distributed computing is expensive and, imho, to be avoided if possible. I’m a big fan of trident. I’d use it first unless there is a need for the lower level spout&bolt api. Trident can be used to guarantee exactly once processing, but keep in mind this has some external requirements. If you’re writing to a database, for example, the writes still need idempotency. Trident helps with this by providing batch ids, which is more performant than natural keys on an individual tuple, but still can be a pain with things such as columnar stores. [http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726] Grant Overby Software Engineer Cisco.com<http://www.cisco.com/> [email protected]<mailto:[email protected]> Mobile: 865 724 4910 [http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information. From: Hasan Riaz <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, April 23, 2015 at 9:42 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: New to apache storm Hello, just wanted to inquire if anyone can answer my questions. Thanks On Tue, Apr 21, 2015 at 7:50 PM, Hasan Riaz <[email protected]<mailto:[email protected]>> wrote: Hello to all, I am new to apache storm and have been working with it for the last month or so. We are trying to design a topology wherein: - A json message is broken up into multiple parts - each of these parts is processed in a parallel manner - the results are aggregated via a Grouping Bolt This topology needs to work in a synchronous and a asynchronous manner, meaning that the message can be expected synchronously via a DRPC request or via a message queue(kafka) I have the following question: - Is there a way to achieve the above via a single topology or would I need to have separate topologies? - Since DRPC is deprecated is it safe to assume that the best way to code is through the trident abstraction? - Using storm primitives is there a way to process a message exactly once? Lastly, in order to monitor whether a topology is running, I have a script which invokes the rest api as documented by the link<https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md>, reads the response of the topology summary and then based on whether the topology is present or not, starts or stops the topology on a given server. Is this way prudent? I am using monit to invoke the script. Thanks in advance for your help
