Hi,

I would like to discuss possible Storm design patterns for the following 
requirement:

Given a storm topology that is used in production for (automatic) real-time 
stream processing, a REST API is required by the user interface to 
interactively (manually) run a subset of the topology and display interim 
results.

For simplicity let's assume the following topology:

QueueSpout -> (multiple parallel) ProcessingBolt(s) -> Join -> ReduceBolt -> 
PersistenceBolt

The user interface requires each of the ProcessingBolts to be exposed as a 
separate REST API.



Design 1:

     Deploy a separate DRPCTopology for each ProcessingBolt.

     REST server acts as a reverse proxy that forwards the requests to the DRPC 
server.



Design 2:

     REST server puts message in a priority queue with low priority, and 
subscribes for result in Redis.

    Use OOP to enhance all processing bolts to be aware of toggles in the 
tuple. Effectively the tupple contains toggles, to disable all Processing bolts 
but one.

    Another toggle forwards interim results to a (Redis) Publish Bolt instead 
of the ReduceBolt.



Design 1 Pros:

  1.  Follows the principle of immutable stream processing graph.
  2.  Follows the principle of preferring N simpler systems over 1 complex 
system.

Design 2 Pros:

  1.  Makes operations life easier. One system to monitor/upgrade.
  2.  Enabler for fine-grained monitoring probe to continuously monitor the 
real-time system (one subsystem at a time)
  3.  Enabler for customer specific stream processing (instead of topology per 
tenant).



Thoughts?



Itai

Reply via email to