Hello dear driller,
Before starting the topic, I would like to do a simple survey :
1. Did you know that Drill already supports XML format?
2. If yes, what is the maximum size for the XML files you normally read? 1MB,
10MB or 100MB
3. Do you expect that reading XML will be as easy as JSON (S
arted.
Your XML file and the XSD schema file for that XML file.
-Original Message-
From: luoc
Sent: Wednesday, April 6, 2022 5:01 AM
To: u...@drill.apache.org; dev@drill.apache.org
Subject: [DISCUSS] Add schema support for the XML format
External Email: Use caution with links and attach
varchar
There are a ton of gotcha(s) when dealing with XML..
numeric vs string
scalar vs array
-Original Message-
From: Lee, David
Sent: Wednesday, April 6, 2022 10:54 AM
To: u...@drill.apache.org; dev@drill.apache.org
Subject: RE: [DISCUSS] Add schema support for the XML format
I wrote
ill can't be both a varchar and an array of varchar
>
> There are a ton of gotcha(s) when dealing with XML..
> numeric vs string
> scalar vs array
>
> -----Original Message-----
> From: Lee, David
> Sent: Wednesday, April 6, 2022 10:54 AM
> To: u...@drill.apache.org; de
2 11:48 AM
To: dev
Cc: u...@drill.apache.org
Subject: Re: [DISCUSS] Add schema support for the XML format
External Email: Use caution with links and attachments
That example:
dog
> cat
can also convert to ["pet":"dog", "pet":"dog']
XML is rife with pr
array even if there is just one occurrence of
> in your data.
>
> -Original Message-
> From: Ted Dunning
> Sent: Wednesday, April 6, 2022 11:48 AM
> To: dev
> Cc: u...@drill.apache.org
> Subject: Re: [DISCUSS] Add schema support for the XML format
>
> Ex
ur data.
>>
>> -Original Message-
>> From: Ted Dunning
>> Sent: Wednesday, April 6, 2022 11:48 AM
>> To: dev
>> Cc: u...@drill.apache.org
>> Subject: Re: [DISCUSS] Add schema support for the XML format
>>
>> External Email: Use caution w
Hi Luoc,
First, what poor soul is asked to deal with large amounts of XML in this
day and age? I thought we were past the XML madness, except in Maven and
Hadoop config files.
XML is much like JSON, only worse. JSON at least has well-defined types
that can be gleaned from JSON syntax. With XML...
I wrote a xml to parquet converter as well.
It basically extends xml to json. Writes the json data into a memory mapped
file. Reads the memory mapped file into an Apache Arrow columnar table. Saves
the Arrow table as a parquet file..
https://github.com/davlee1972/xml_to_json
https://arrow.apach
XML will never die. The Cobol programmers were reincarnated and built
similarly long-lasting generators of XML.
If you have a schema, then it is a reasonable format for Drill to parse, if
only to turn around and write to another format.
On Wed, Apr 6, 2022 at 7:31 PM Paul Rogers wrote:
> Hi L
10 matches
Mail list logo