Re: Spark stream data from kafka topics and output as parquet file on HDFS

Tathagata Das Wed, 06 Aug 2014 15:08:22 -0700

You can use SparkSQL for that very easily. You can convert the rdds you get
from kafka input stream, convert them to a RDDs of case classes and save as
parquet files.
More information here.
https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files



On Wed, Aug 6, 2014 at 5:23 AM, Mahebub Sayyed <mahebub...@gmail.com> wrote:

> Hello,
>
> I have referred link "https://github.com/dibbhatt/kafka-spark-consumer";
> and I have successfully consumed tuples from kafka.
> Tuples are JSON objects and I want to store that objects in HDFS as parque
> format.
>
> Please suggest me any sample example for that.
> Thanks in advance.
>
>
>
>
>
> On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya <
> dibyendu.bhattach...@gmail.com> wrote:
>
>> You can try this Kafka Spark Consumer which I recently wrote. This uses
>> the Low Level Kafka Consumer
>>
>> https://github.com/dibbhatt/kafka-spark-consumer
>>
>> Dibyendu
>>
>>
>>
>>
>> On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s <rafeeq.ec...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am new to Apache Spark and Trying to Develop spark streaming program
>>> to  *stream data from kafka topics and output as parquet file on HDFS*.
>>>
>>> Please share the *sample reference* program to stream data from kafka
>>> topics and output as parquet file on HDFS.
>>>
>>> Thanks in Advance.
>>>
>>> Regards,
>>>
>>> Rafeeq S
>>> *(“What you do is what matters, not what you think or say or plan.” )*
>>>
>>>
>>
>
>
> --
> *Regards,*
> *Mahebub Sayyed*
>

Re: Spark stream data from kafka topics and output as parquet file on HDFS

Reply via email to