Re: Spark streaming RDDs to Parquet records

Krishna Sankar Tue, 17 Jun 2014 13:42:27 -0700

Mahesh,

   - One direction could be : create a parquet schema, convert & save the
   records to hdfs.
   - This might help
   
https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala


Cheers
<k/>


On Tue, Jun 17, 2014 at 12:52 PM, maheshtwc <
mahesh.padmanab...@twc-contractor.com> wrote:

> Hello,
>
> Is there an easy way to convert RDDs within a DStream into Parquet records?
> Here is some incomplete pseudo code:
>
> // Create streaming context
> val ssc = new StreamingContext(...)
>
> // Obtain a DStream of events
> val ds = KafkaUtils.createStream(...)
>
> // Get Spark context to get to the SQL context
> val sc = ds.context.sparkContext
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> // For each RDD
> ds.foreachRDD((rdd: RDD[Array[Byte]]) => {
>
>     // What do I do next?
> })
>
> Thanks,
> Mahesh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-RDDs-to-Parquet-records-tp7762.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Spark streaming RDDs to Parquet records

Reply via email to