Requirements looks like my previous project for smart metering. We
finally did custom solution without Spark, Hadoop and Kafka but it was
4 years ago when I did not have experience with this technologies (
some not existed or were not mature).

If you do need any relational processing of your messages ( basing on
historical data, or joining with other messages) and message
processing is quite independent Kafka plus Spark Streaming could be
overkill.

The best to check if your data has natural index like timestamp in
metering data which come in the same frequency (every second) and
basing on it do access to your cache and disc. For cache for me  most
promising looks  Alluxio.

BR,
Arkadiusz Bicz

On Tue, Apr 19, 2016 at 6:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote:
> Hi all,
> I am looking for an architecture to ingest 10 mils of messages in the micro
> batches of seconds.
> If anyone has worked on similar kind of architecture  , can you please point
> me to any documentation around the same like what should be the architecture
> , which all components/big data ecosystem tools should i consider etc.
> The messages has to be in xml/json format , a preprocessor engine or message
> enhancer and then finally a processor.
> I thought about using data cache as well for serving the data
> The data cache should have the capability to serve the historical  data in
> milliseconds (may be upto 30 days of data)
> --
> Thanks
> Deepak
> www.bigdatabig.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to