Hello! I'm evaluating Storm for the project, which involves processing of many distinct small tasks in the following way:
- a user supplies some data source - spout is attached to the source and produces chunks of data to the topology - bolts are being processing the chunk of data and transform it somehow (in general reducing the number of chunks, so number of records in sink are much less than number of records out of the spout) - when all records are processed - the results are accumulated and sent back to the user. As far as I understand, a topology is supposed to be kept running forever, so I don't really see the easy way to "distinguish" the records from one task from records of another one. Should a new topology be started for each new task of a user? Thank you in advance! The links to any appropriate articles are very welcome :) -- Eugene N Dzhurinsky
pgpCrtsPZhsdB.pgp
Description: PGP signature