Hello Deepak,

It is not clear what you want to do. Are you talking about spark streaming
? It is possible to process historical data in Spark batch mode too. You
can add a timestamp field in xml/json. Spark documentation is at
spark.apache.org. Spark has good inbuilt features to process json and
xml[1] messages.

Thanks,
Prashant Sharma

1. https://github.com/databricks/spark-xml

On Tue, Apr 19, 2016 at 10:31 AM, Deepak Sharma <deepakmc...@gmail.com>
wrote:

> Hi all,
> I am looking for an architecture to ingest 10 mils of messages in the
> micro batches of seconds.
> If anyone has worked on similar kind of architecture  , can you please
> point me to any documentation around the same like what should be the
> architecture , which all components/big data ecosystem tools should i
> consider etc.
> The messages has to be in xml/json format , a preprocessor engine or
> message enhancer and then finally a processor.
> I thought about using data cache as well for serving the data
> The data cache should have the capability to serve the historical  data in
> milliseconds (may be upto 30 days of data)
> --
> Thanks
> Deepak
> www.bigdatabig.com
>
>

Reply via email to