Is kafka suitable for our architecture?

2014-10-09 Thread Albert Vila
Hi I just came across Kafta when I was trying to find solutions to scale our current architecture. We are currently downloading and processing 6M documents per day from online and social media. We have a different workflow for each type of document, but some of the steps are keyword extraction, l

Re: Is kafka suitable for our architecture?

2014-10-09 Thread William Briggs
Manually managing data locality will become difficult to scale. Kafka is one potential tool you can use to help scale, but by itself, it will not solve your problem. If you need the data in near-real time, you could use a technology like Spark or Storm to stream data from Kafka and perform your pro

Re: Is kafka suitable for our architecture?

2014-10-09 Thread Christian Csar
Apart from your data locality problem it sounds like what you want is a workqueue. Kafka's consumer structure doesn't lend itself too well to that use case as a single partition of a topic should only have one consumer instance per logical subscriber of the topic, and that consumer would not be abl

Re: Is kafka suitable for our architecture?

2014-10-09 Thread Albert Vila
Hi We process data in real time, and we are taking a look at Storm and Spark streaming too, however our actions are atomic, done at a document level so I don't know if it fits on something like Storm/Spark. Regarding what you Christian said, isn't Kafka used for scenarios like the one I described

Re: Is kafka suitable for our architecture?

2014-10-10 Thread cac...@gmail.com
Albert, you certainly can use Kafka (and it will probably work quite well) you'll just need to make sure your consumers are written to match the available options. I think I may not have a good picture of what you need to do. Is it that you have a stream of documents coming in and then each documen

Re: Is kafka suitable for our architecture?

2014-10-10 Thread Albert Vila
Some comments below. On 10 October 2014 11:19, cac...@gmail.com wrote: > Albert, you certainly can use Kafka (and it will probably work quite well) > you'll just need to make sure your consumers are written to match the > available options. I think I may not have a good picture of what you need

Re: Is kafka suitable for our architecture?

2014-10-10 Thread Christian Csar
On 10/10/2014 05:12 AM, Albert Vila wrote: > Some comments below. > > On 10 October 2014 11:19, cac...@gmail.com wrote: > >> Albert, you certainly can use Kafka (and it will probably work quite well) >> you'll just need to make sure your consumers are written to match the >> available options. I