Hi all, I am using Spark Streaming to monitor an S3 bucket for objects that contain JSON. I want to import that JSON into Spark SQL DataFrame.
Here's my current code: *from pyspark import SparkContext, SparkConf* *from pyspark.streaming import StreamingContext* *import json* *from pyspark.sql import SQLContext* *conf = SparkConf().setAppName('MyApp').setMaster('local[4]')* *sc = SparkContext(conf=conf)* *ssc = StreamingContext(sc, 30)* *sqlContext = SQLContext(sc)* *distFile = ssc.textFileStream("s3n://mybucket/")* *json_data = sqlContext.jsonRDD(distFile)* *json_data.printSchema()* *ssc.start()* *ssc.awaitTermination()* I am not creating DataFrame correctly as I get an error: *'TransformedDStream' object has no attribute '_jrdd'* Can someone help me out? Thanks, Vadim ᐧ