You can use SparkSQL for that very easily. You can convert the rdds you get from kafka input stream, convert them to a RDDs of case classes and save as parquet files. More information here. https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
On Wed, Aug 6, 2014 at 5:23 AM, Mahebub Sayyed <mahebub...@gmail.com> wrote: > Hello, > > I have referred link "https://github.com/dibbhatt/kafka-spark-consumer" > and I have successfully consumed tuples from kafka. > Tuples are JSON objects and I want to store that objects in HDFS as parque > format. > > Please suggest me any sample example for that. > Thanks in advance. > > > > > > On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya < > dibyendu.bhattach...@gmail.com> wrote: > >> You can try this Kafka Spark Consumer which I recently wrote. This uses >> the Low Level Kafka Consumer >> >> https://github.com/dibbhatt/kafka-spark-consumer >> >> Dibyendu >> >> >> >> >> On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s <rafeeq.ec...@gmail.com> wrote: >> >>> Hi, >>> >>> I am new to Apache Spark and Trying to Develop spark streaming program >>> to *stream data from kafka topics and output as parquet file on HDFS*. >>> >>> Please share the *sample reference* program to stream data from kafka >>> topics and output as parquet file on HDFS. >>> >>> Thanks in Advance. >>> >>> Regards, >>> >>> Rafeeq S >>> *(“What you do is what matters, not what you think or say or plan.” )* >>> >>> >> > > > -- > *Regards,* > *Mahebub Sayyed* >