Apache kafka + spark + Parquet

Mahebub Sayyed Thu, 17 Jul 2014 02:20:00 -0700

Hi All,

Currently we are reading (multiple) topics from Apache kafka and storing
that in HBase (multiple tables) using twitter storm (1 tuple stores in 4
different tables).
but we are facing some performance issue with HBase.
so we are replacing* HBase* with *Parquet* file and *storm* with *Apache
Spark*.


difficulties:
1. How to read multiple topics from kafka using spark?
2. One tuple belongs to multiple tables, How to write one topic to multiple
parquet files with proper partitioning using spark??

Please help me
Thanks in advance.

-- 
*Regards,*

*Mahebub *

Apache kafka + spark + Parquet

Reply via email to