I haven’t used DRPC, so I can’t speak to it. That said, Kafka is pretty awesome 
and can do some really jaw dropping performance. If I were you, I’d consider 
standardizing around Kafka. If that isn’t viable, storm topologies are 
directional acyclic graphs, so you can merge two streams into a single stream — 
the illustration on wikipedia is pretty nice 
http://en.wikipedia.org/wiki/Directed_acyclic_graph.

Generally speaking, synchronous anything in distributed computing is expensive 
and, imho, to be avoided if possible.

I’m a big fan of trident. I’d use it first unless there is a need for the lower 
level spout&bolt api.

Trident can be used to guarantee exactly once processing, but keep in mind this 
has some external requirements. If you’re writing to a database, for example, 
the writes still need idempotency. Trident helps with this by providing batch 
ids, which is more performant than natural keys on an individual tuple, but 
still can be a pain with things such as columnar stores.



[http://www.cisco.com/web/europe/images/email/signature/est2014/logo_06.png?ct=1398192119726]

Grant Overby
Software Engineer
Cisco.com<http://www.cisco.com/>
[email protected]<mailto:[email protected]>
Mobile: 865 724 4910






[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif] Think before you 
print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.

Please click 
here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for 
Company Registration Information.





From: Hasan Riaz <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, April 23, 2015 at 9:42 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: New to apache storm

Hello,
just wanted to inquire if anyone can answer my questions.
Thanks

On Tue, Apr 21, 2015 at 7:50 PM, Hasan Riaz 
<[email protected]<mailto:[email protected]>> wrote:
Hello to all,
I am new to apache storm and have been working with it for the last month or 
so. We are trying to design a topology wherein:
- A json message is broken up into multiple parts
- each of these parts is processed in a parallel manner
- the results are aggregated via a Grouping Bolt

This topology needs to work in a synchronous and a asynchronous manner, meaning 
that the message can be expected synchronously via a DRPC request or via a 
message queue(kafka)

I have the following question:
- Is there a way to achieve the above via a single topology or would I need to 
have separate topologies?
- Since DRPC is deprecated is it safe to assume that the best way to code is 
through the trident abstraction?
- Using storm primitives is there a way to process a message exactly once?

Lastly, in order to monitor whether a topology is running, I have a script 
which invokes the rest api as documented by the 
link<https://github.com/apache/storm/blob/master/STORM-UI-REST-API.md>, reads 
the response of the topology summary and then based on whether the topology is 
present or not, starts or stops the topology on a given server. Is this way 
prudent? I am using monit to invoke the script.

Thanks in advance for your help

Reply via email to