My group is using Apache Storm for some near real-time processing but also for on-demand batch processing with orchestration. Recently, idea was thrown to use Kafka to orchestrate between batch processing jobs/pipelines and I don't think this is a good idea.
Given the following flow: Request -> BatchJobA (find all missing IDs to process for request) -> when all done and no IDs found BatchJobC (process existing IDs) -> notify when all done Request -> BatchJobA (find all missing IDs to process for request) -> when all done and some IDs found BatchJobC (create missing IDs) -> BatchJobC (process existing IDs) -> notify when all done While processing within batch jobs can be parallelized, each batch job has to wait for the completion of previous job. I don't think Apache Storm is a tool for such processing but if you have a hammer everything may seem like a nail. Having said that, would CoordinatorBolt work for above scenario and how? Would Trident be appropriate for this type of processing? Thanks for your thoughts, Andre