Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Ted Dunning
XML will never die. The Cobol programmers were reincarnated and built similarly long-lasting generators of XML. If you have a schema, then it is a reasonable format for Drill to parse, if only to turn around and write to another format. On Wed, Apr 6, 2022 at 7:31 PM Paul Rogers wrote: > Hi

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Lee, David
I wrote a xml to parquet converter as well. It basically extends xml to json. Writes the json data into a memory mapped file. Reads the memory mapped file into an Apache Arrow columnar table. Saves the Arrow table as a parquet file.. https://github.com/davlee1972/xml_to_json

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Paul Rogers
Hi Luoc, First, what poor soul is asked to deal with large amounts of XML in this day and age? I thought we were past the XML madness, except in Maven and Hadoop config files. XML is much like JSON, only worse. JSON at least has well-defined types that can be gleaned from JSON syntax. With

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Charles Givre
t; -Original Message- >> From: Ted Dunning >> Sent: Wednesday, April 6, 2022 11:48 AM >> To: dev >> Cc: u...@drill.apache.org >> Subject: Re: [DISCUSS] Add schema support for the XML format >> >> External Email: Use caution with links and attachments

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Ted Dunning
saved as an array even if there is just one occurrence of > in your data. > > -Original Message- > From: Ted Dunning > Sent: Wednesday, April 6, 2022 11:48 AM > To: dev > Cc: u...@drill.apache.org > Subject: Re: [DISCUSS] Add schema support for the XML format > > Ex

RE: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Lee, David
2 11:48 AM To: dev Cc: u...@drill.apache.org Subject: Re: [DISCUSS] Add schema support for the XML format External Email: Use caution with links and attachments That example: dog > cat can also convert to ["pet":"dog", "pet":"dog'] XML is rife with prob

Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Ted Dunning
be both a varchar and an array of varchar > > There are a ton of gotcha(s) when dealing with XML.. > numeric vs string > scalar vs array > > -Original Message- > From: Lee, David > Sent: Wednesday, April 6, 2022 10:54 AM > To: u...@drill.apache.org; dev@drill.apache.

RE: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Lee, David
There are a ton of gotcha(s) when dealing with XML.. numeric vs string scalar vs array -Original Message- From: Lee, David Sent: Wednesday, April 6, 2022 10:54 AM To: u...@drill.apache.org; dev@drill.apache.org Subject: RE: [DISCUSS] Add schema support for the XML format I wrote somethi

RE: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Lee, David
. Your XML file and the XSD schema file for that XML file. -Original Message- From: luoc Sent: Wednesday, April 6, 2022 5:01 AM To: u...@drill.apache.org; dev@drill.apache.org Subject: [DISCUSS] Add schema support for the XML format External Email: Use caution with links and attachments

[DISCUSS] Add schema support for the XML format

2022-04-06 Thread luoc
 Hello dear driller, Before starting the topic, I would like to do a simple survey : 1. Did you know that Drill already supports XML format? 2. If yes, what is the maximum size for the XML files you normally read? 1MB, 10MB or 100MB 3. Do you expect that reading XML will be as easy as JSON