Dear group,

we are currently looking for a grid computing tool suite and Apache Storm seems 
to be a good candidate – but there are some issues, which I do not understand 
clearly.

1. Integration into Spring Framework – General architecture
We do have several Java Spring based web apps, which are running different 
services. There is one service, computing values on a regular basis (every 30 
minutes). The triggering is done by a Quartz Scheduler. The calculation can be 
divided into ~ 2 millions of small tasks (will be more in the future). There is 
a computation Service, which again uses a lot of different other components, 
factories etc. The different components are configured with spring and 
connected via IOC Dependency Injection (@Autowired etc.).

I wrote a Spout (Generating the ~ 2 millions tasks) and a Bolt (Computing the 
values). The problem is, that both (Spout and Bolt) have to work together with 
the other services, components etc. – so they need to access the spring 
services.

My problems/questions:
a) In General: How does the storm cluster actually works once I deploy a 
topology to the cluster? Do I have to „register“ my different spring servers to 
the cluster, and they will retrieve jobs from storm or is the cluster 
completely working independent from any web app?
b) I was wondering that all fields must be serializable, which my services are 
of course not. I read that I need to bind them during the „open“ method, but 
how to access existing spring beans?

2. Best approach for running spouts in a specified time interval
As I said, every 30 minutes a job is triggered and the computation should 
start. What is the best approach to iterate such a scenario with Spouts? There 
is no real stream of data. Actually the stream will be produced by the spouts, 
which are able to generate the small job tasks.

Best fr

Reply via email to