This is more a question about spark-xml, which is not part of Spark. You can ask at https://github.com/databricks/spark-xml/ but if you do please show some example of the XML input and schema and output.
On Tue, Jun 30, 2020 at 11:39 AM mars76 <sk_ac...@yahoo.com.invalid> wrote: > > Hi, > > I am trying to read XML data from a Kafka topic and using XmlReader to > convert the RDD[String] into a DataFrame conforming to predefined Schema. > > One issue i am running into is after saving the final Data Frame to AVRO > format most of the elements data is showing up in avro files. How ever the > nested Element which is of Array Type is not getting parsed properly and > getting loaded as null into the DF and hence when i save it to avro or to > json that field is always null. > > Not sure why this element is not getting parsed. > > > Here is the code i am using > > > kafkaValueAsStringDF = kafakDF.selectExpr("CAST(key AS STRING) > msgKey","CAST(value AS STRING) xmlString") > > var parameters = collection.mutable.Map.empty[String, String] > > parameters.put("rowTag", "Book") > > kafkaValueAsStringDF.writeStream.foreachBatch { > (batchDF: DataFrame, batchId: Long) => > > val xmlStringDF:DataFrame = batchDF.selectExpr("xmlString") > > xmlStringDF.printSchema() > > val rdd: RDD[String] = xmlStringDF.as[String].rdd > > > val relation = XmlRelation( > () => rdd, > None, > parameters.toMap, > xmlSchema)(spark.sqlContext) > > > logger.info(".convert() : XmlRelation Schema ={} > "+relation.schema.treeString) > > } > .start() > .awaitTermination() > > > Thanks > Sateesh > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org