Dear group, we are currently looking for a grid computing tool suite and Apache Storm seems to be a good candidate – but there are some issues, which I do not understand clearly.
1. Integration into Spring Framework – General architecture We do have several Java Spring based web apps, which are running different services. There is one service, computing values on a regular basis (every 30 minutes). The triggering is done by a Quartz Scheduler. The calculation can be divided into ~ 2 millions of small tasks (will be more in the future). There is a computation Service, which again uses a lot of different other components, factories etc. The different components are configured with spring and connected via IOC Dependency Injection (@Autowired etc.). I wrote a Spout (Generating the ~ 2 millions tasks) and a Bolt (Computing the values). The problem is, that both (Spout and Bolt) have to work together with the other services, components etc. – so they need to access the spring services. My problems/questions: a) In General: How does the storm cluster actually works once I deploy a topology to the cluster? Do I have to „register“ my different spring servers to the cluster, and they will retrieve jobs from storm or is the cluster completely working independent from any web app? b) I was wondering that all fields must be serializable, which my services are of course not. I read that I need to bind them during the „open“ method, but how to access existing spring beans? 2. Best approach for running spouts in a specified time interval As I said, every 30 minutes a job is triggered and the computation should start. What is the best approach to iterate such a scenario with Spouts? There is no real stream of data. Actually the stream will be produced by the spouts, which are able to generate the small job tasks. Best fr