This is more a question about spark-xml, which is not part of Spark.
You can ask at https://github.com/databricks/spark-xml/ but if you do
please show some example of the XML input and schema and output.

On Tue, Jun 30, 2020 at 11:39 AM mars76 <sk_ac...@yahoo.com.invalid> wrote:
>
> Hi,
>
>   I am trying to read XML data from a Kafka topic and using XmlReader to
> convert the RDD[String] into a DataFrame conforming to predefined Schema.
>
>   One issue i am running into is after saving the final Data Frame to AVRO
> format most of the elements data is showing up in avro files. How ever the
> nested Element which is of Array Type is not getting parsed properly and
> getting loaded as null into the DF and hence when i save it to avro or to
> json that field is always null.
>
>   Not sure why this element is not getting parsed.
>
>
>   Here is the code i am using
>
>
>   kafkaValueAsStringDF = kafakDF.selectExpr("CAST(key AS STRING)
> msgKey","CAST(value AS STRING) xmlString")
>
>   var parameters = collection.mutable.Map.empty[String, String]
>
>   parameters.put("rowTag", "Book")
>
> kafkaValueAsStringDF.writeStream.foreachBatch {
>           (batchDF: DataFrame, batchId: Long) =>
>
>  val xmlStringDF:DataFrame = batchDF.selectExpr("xmlString")
>
>             xmlStringDF.printSchema()
>
>             val rdd: RDD[String] = xmlStringDF.as[String].rdd
>
>
>             val relation = XmlRelation(
>               () => rdd,
>               None,
>               parameters.toMap,
>               xmlSchema)(spark.sqlContext)
>
>
>             logger.info(".convert() : XmlRelation Schema ={}
> "+relation.schema.treeString)
>
> }
>         .start()
>         .awaitTermination()
>
>
> Thanks
> Sateesh
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to