Hi Folks,
I am trying flatten variety of XMLs using DataFrames. I'm using spark-xml package which is automatically inferring my schema and creating a DataFrame. I do not want to hard code any column names in DataFrame as I have lot of varieties of XML documents and each might be lot more depth of child nodes. I simply want to flatten any type of XML and then write output data to a hive table. Can you please give some expert advice for the same. Example XML and expected output is given below. Sample XML: <emplist> <emp> <manager> <id>1</id> <name>foo</name> <subordinates> <clerk> <cid>1</cid> <cname>foo</cname> </clerk> <clerk> <cid>1</cid> <cname>foo</cname> </clerk> </subordinates> </manager> </emp> </emplist> Expected output: id, name, clerk.cid, clerk.cname 1, foo, 2, cname2 1, foo, 3, cname3 Thanks, Sreekanth Jella