It is obviously not a one size fits all. It depends on a lot of factors. How much data will you be ingesting, what is the data source, is it a firehose, a web front end or an app that is batching the messages. How much processing will you be doing in the storm/kafka layer and obviously what will be the rate at which you will persist data to your sink. So all these factors will determine your topology. Storm/Spark are memory intensive but if you are streaming as would be the case with Kafka then it should not be much of an issue.
On Sunday, March 8, 2015 11:26 PM, "Adaryl "Bob" Wakefield, MBA" <adaryl.wakefi...@hotmail.com> wrote: Let’s say you put together a real time streaming solution using Storm, Kafka, and the necessary Zookeeper and whatever storage tech you decide. Is it true that these applications are so resource intensive that they all need to live by themselves on their own machine? Put another way, for the ingestion portion, is the minimum number of machines required here 9? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData