Re: Best practises to storing data in Parquet files

Kevin Tran Sun, 28 Aug 2016 16:49:04 -0700

Hi Mich,
My stack is as following:

Data sources:
 * IBM MQ
 * Oracle database


Kafka to store all messages from data sources
Spark Streaming fetching messages from Kafka and do a bit transform and
write parquet files to HDFS
Hive / SparkSQL / Impala will query on parquet files.

Do you have any reference architecture which HBase is apart of ?

Please share with me best practises you might know or your favourite
designs.

Thanks,
Kevin.









On Mon, Aug 29, 2016 at 5:18 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> Can you explain about you particular stack.
>
> Example what is the source of streaming data and the role that Spark plays.
>
> Are you dealing with Real Time and Batch and why Parquet and not something
> like Hbase to ingest data real time.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 August 2016 at 15:43, Kevin Tran <kevin...@gmail.com> wrote:
>
>> Hi,
>> Does anyone know what is the best practises to store data to parquet file?
>> Does parquet file has limit in size ( 1TB ) ?
>> Should we use SaveMode.APPEND for long running streaming app ?
>> How should we store in HDFS (directory structure, ... )?
>>
>> Thanks,
>> Kevin.
>>
>
>

Re: Best practises to storing data in Parquet files

Reply via email to