Hi Mich, My stack is as following: Data sources: * IBM MQ * Oracle database
Kafka to store all messages from data sources Spark Streaming fetching messages from Kafka and do a bit transform and write parquet files to HDFS Hive / SparkSQL / Impala will query on parquet files. Do you have any reference architecture which HBase is apart of ? Please share with me best practises you might know or your favourite designs. Thanks, Kevin. On Mon, Aug 29, 2016 at 5:18 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > Can you explain about you particular stack. > > Example what is the source of streaming data and the role that Spark plays. > > Are you dealing with Real Time and Batch and why Parquet and not something > like Hbase to ingest data real time. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 August 2016 at 15:43, Kevin Tran <kevin...@gmail.com> wrote: > >> Hi, >> Does anyone know what is the best practises to store data to parquet file? >> Does parquet file has limit in size ( 1TB ) ? >> Should we use SaveMode.APPEND for long running streaming app ? >> How should we store in HDFS (directory structure, ... )? >> >> Thanks, >> Kevin. >> > >