Sorry, I've found one error: If you do NOT need any relational processing of your messages ( basing on historical data, or joining with other messages) and message processing is quite independent Kafka plus Spark Streaming could be overkill.
On Tue, Apr 19, 2016 at 1:54 PM, Arkadiusz Bicz <arkadiusz.b...@gmail.com> wrote: > Requirements looks like my previous project for smart metering. We > finally did custom solution without Spark, Hadoop and Kafka but it was > 4 years ago when I did not have experience with this technologies ( > some not existed or were not mature). > > If you do need any relational processing of your messages ( basing on > historical data, or joining with other messages) and message > processing is quite independent Kafka plus Spark Streaming could be > overkill. > > The best to check if your data has natural index like timestamp in > metering data which come in the same frequency (every second) and > basing on it do access to your cache and disc. For cache for me most > promising looks Alluxio. > > BR, > Arkadiusz Bicz > > On Tue, Apr 19, 2016 at 6:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: >> Hi all, >> I am looking for an architecture to ingest 10 mils of messages in the micro >> batches of seconds. >> If anyone has worked on similar kind of architecture , can you please point >> me to any documentation around the same like what should be the architecture >> , which all components/big data ecosystem tools should i consider etc. >> The messages has to be in xml/json format , a preprocessor engine or message >> enhancer and then finally a processor. >> I thought about using data cache as well for serving the data >> The data cache should have the capability to serve the historical data in >> milliseconds (may be upto 30 days of data) >> -- >> Thanks >> Deepak >> www.bigdatabig.com >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org