Hi Folks, I’d like to find out tips on how to convert the RDDs inside a Spark Streaming DStream to a set of SchemaRDDs.
My DStream contains JSON data pushed over from Kafka, and I’d like to use SparkSQL’s JSON import function (i.e. jsonRDD) to register the JSON dataset as a table, and perform queries on it. Here’s a code snippet of my latest attempt (in Scala): … val sc = new SparkContext(conf) val ssc = new StreamingContext("local", this.getClass.getName, Seconds(1)) ssc.checkpoint("checkpoint") val stream = KafkaUtils.createStream(ssc, "localhost:2181", “group", Map(“topic" -> 10)).map(_._2) val sql = new SQLContext(sc) stream.foreachRDD(rdd => { if (rdd.count > 0) { // message received val sqlRDD = sql.jsonRDD(rdd) sqlRDD.printSchema() } else { println(“No message received") } }) … This compiles and runs when I submit it to Spark (local-mode); however, I never seem to be able to successfully see a schema printed on my console, via the “sqlRDD.printSchema()” method when Kafka is streaming my JSON messages to the “topic” topic name. I know my JSON is valid and my Kafka connection works fine, I’ve been able to print the stream messages in their raw format, just not as SchemaRDDs. Any tips? Suggestions? Thanks much, --- Rishi Verma NASA Jet Propulsion Laboratory California Institute of Technology --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org