If you convert the data to a SchemaRDD you can save it as Parquet: http://spark.apache.org/docs/latest/sql-programming-guide.html#using-parquet
On Tue, Jun 17, 2014 at 11:47 PM, Padmanabhan, Mahesh (contractor) < mahesh.padmanab...@twc-contractor.com> wrote: > Thanks Krishna. Seems like you have to use Avro and then convert that to > Parquet. I was hoping to directly convert RDDs to Parquet files. I’ll look > into this some more. > > Thanks, > Mahesh > > From: Krishna Sankar <ksanka...@gmail.com> > Reply-To: "user@spark.apache.org" <user@spark.apache.org> > Date: Tuesday, June 17, 2014 at 2:41 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: Spark streaming RDDs to Parquet records > > Mahesh, > > - One direction could be : create a parquet schema, convert & save the > records to hdfs. > - This might help > > https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala > > Cheers > <k/> > > > On Tue, Jun 17, 2014 at 12:52 PM, maheshtwc < > mahesh.padmanab...@twc-contractor.com> wrote: > >> Hello, >> >> Is there an easy way to convert RDDs within a DStream into Parquet >> records? >> Here is some incomplete pseudo code: >> >> // Create streaming context >> val ssc = new StreamingContext(...) >> >> // Obtain a DStream of events >> val ds = KafkaUtils.createStream(...) >> >> // Get Spark context to get to the SQL context >> val sc = ds.context.sparkContext >> >> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >> >> // For each RDD >> ds.foreachRDD((rdd: RDD[Array[Byte]]) => { >> >> // What do I do next? >> }) >> >> Thanks, >> Mahesh >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-RDDs-to-Parquet-records-tp7762.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > > > ------------------------------ > This E-mail and any of its attachments may contain Time Warner Cable > proprietary information, which is privileged, confidential, or subject to > copyright belonging to Time Warner Cable. This E-mail is intended solely > for the use of the individual or entity to which it is addressed. If you > are not the intended recipient of this E-mail, you are hereby notified that > any dissemination, distribution, copying, or action taken in relation to > the contents of and attachments to this E-mail is strictly prohibited and > may be unlawful. If you have received this E-mail in error, please notify > the sender immediately and permanently delete the original and any copy of > this E-mail and any printout. >