Hi all, I figured it out! The DataFrames and SQL example in Spark Streaming docs were useful.
Best, Vadim ᐧ On Wed, Apr 8, 2015 at 2:38 PM, Vadim Bichutskiy <vadim.bichuts...@gmail.com > wrote: > Hi all, > > I am using Spark Streaming to monitor an S3 bucket for objects that > contain JSON. I want > to import that JSON into Spark SQL DataFrame. > > Here's my current code: > > *from pyspark import SparkContext, SparkConf* > *from pyspark.streaming import StreamingContext* > *import json* > *from pyspark.sql import SQLContext* > > *conf = SparkConf().setAppName('MyApp').setMaster('local[4]')* > *sc = SparkContext(conf=conf)* > *ssc = StreamingContext(sc, 30)* > *sqlContext = SQLContext(sc)* > > *distFile = ssc.textFileStream("s3n://mybucket/")* > *json_data = sqlContext.jsonRDD(distFile)* > *json_data.printSchema()* > > *ssc.start()* > *ssc.awaitTermination()* > > I am not creating DataFrame correctly as I get an error: > > *'TransformedDStream' object has no attribute '_jrdd'* > Can someone help me out? > > Thanks, > Vadim > > ᐧ >