Hello Deepak, It is not clear what you want to do. Are you talking about spark streaming ? It is possible to process historical data in Spark batch mode too. You can add a timestamp field in xml/json. Spark documentation is at spark.apache.org. Spark has good inbuilt features to process json and xml[1] messages.
Thanks, Prashant Sharma 1. https://github.com/databricks/spark-xml On Tue, Apr 19, 2016 at 10:31 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > Hi all, > I am looking for an architecture to ingest 10 mils of messages in the > micro batches of seconds. > If anyone has worked on similar kind of architecture , can you please > point me to any documentation around the same like what should be the > architecture , which all components/big data ecosystem tools should i > consider etc. > The messages has to be in xml/json format , a preprocessor engine or > message enhancer and then finally a processor. > I thought about using data cache as well for serving the data > The data cache should have the capability to serve the historical data in > milliseconds (may be upto 30 days of data) > -- > Thanks > Deepak > www.bigdatabig.com > >