Hi, I am trying to read XML data from a Kafka topic and using XmlReader to convert the RDD[String] into a DataFrame conforming to predefined Schema.
One issue i am running into is after saving the final Data Frame to AVRO format most of the elements data is showing up in avro files. How ever the nested Element which is of Array Type is not getting parsed properly and getting loaded as null into the DF and hence when i save it to avro or to json that field is always null. Not sure why this element is not getting parsed. Here is the code i am using kafkaValueAsStringDF = kafakDF.selectExpr("CAST(key AS STRING) msgKey","CAST(value AS STRING) xmlString") var parameters = collection.mutable.Map.empty[String, String] parameters.put("rowTag", "Book") kafkaValueAsStringDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) => val xmlStringDF:DataFrame = batchDF.selectExpr("xmlString") xmlStringDF.printSchema() val rdd: RDD[String] = xmlStringDF.as[String].rdd val relation = XmlRelation( () => rdd, None, parameters.toMap, xmlSchema)(spark.sqlContext) logger.info(".convert() : XmlRelation Schema ={} "+relation.schema.treeString) } .start() .awaitTermination() Thanks Sateesh -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org