Hi Mahender, Did you look at this? https://www.snappydata.io/blog/the-spark-database
But I believe that most people handle this use case by either using: - Their favorite regular RDBMS (mySQL, postgres, Oracle, SQL-Server, ...) if the data is not too big - Their favorite New-SQL storage (Cassandra, HBase) if the data is too big and needs to be distributed Spark generally makes it easy enough to query these other databases to allow you to perform analytics. Hive and Spark have been designed as OLAP tools, not OLTP. I'm not sure what features you are seeking for your SCD but they probably won't be part of Spark's core design. Hope this helps, Furcy On 4 April 2018 at 11:29, Mahender Sarangam <[email protected]> wrote: > Hi, > Does anyone has good architecture document/design principle for building > warehouse application using Spark. > > Is it better way of having Hive Context created with HQL and perform > transformation or Directly loading files in dataframe and perform data > transformation. > > We need to implement SCD 2 Type in Spark, Is there any better > document/reference for building Type 2 warehouse object > > Thanks in advace > > /Mahender >
