Re: Use cases for DFDL

2019-11-07 Thread Charles Givre
Ok... That makes sense. Do you know if there's some documentation about that new feature? -- C > On Nov 7, 2019, at 1:57 PM, Paul Rogers wrote: > > Hi Charles, > > > Your suggestion to read the schema in each reader can work. In this case, the > planner knows nothing about the schema; it is

Re: Use cases for DFDL

2019-11-07 Thread Paul Rogers
Hi Charles, Your suggestion to read the schema in each reader can work. In this case, the planner knows nothing about the schema; it is discovered at scan time, by each reader, as the file is read. Let's take a step back. Drill is designed for big data distributed processing. We might

Re: Use cases for DFDL

2019-11-07 Thread Charles Givre
@Paul, Do you think a format plugin is the right way to integrate this? My thought was that we could create a folder for dfdl schemata, then the format plugin could specify which schema would be used during read. IE: "dfdl" :{ "type":"dfdl", "file":"myschema.dfdl", "extensions":["xml"]

Re: Use cases for DFDL

2019-11-07 Thread Paul Rogers
Hi All, One thought to add is that if DFDL defines the file schema, then it would be ideal to use that schema at plan time as well as run time. Drill's Calcite integration provides means to do this, though I am personally a bit hazy on the details. Certainly getting the reader to work is the

Re: Use cases for DFDL

2019-11-07 Thread Charles Givre
Hi Steve, Thanks for responding... Here's how Drill reads a file: Drill uses what are called "format plugins" which basically read the file in question and map fields to column vectors. Note: Drill supports nested data structures, so a column could contain a MAP or LIST. The basic steps

Re: [DRAFT] Drill Board Report: Comments due by 2019-11-08 1200

2019-11-07 Thread Volodymyr Vysotskyi
Hi Charles, Could you please add a couple of items to the list of upcoming features, for example, I think that we should mention the following improvements: - Hive complex types support (arrays, structs, union) - Canonical Map support - Schema provisioning via table function - Empty parquet files

[GitHub] [drill] denysord88 commented on issue #1891: DRILL-7409: Moving test with big test data to the drill-test-framework.

2019-11-07 Thread GitBox
denysord88 commented on issue #1891: DRILL-7409: Moving test with big test data to the drill-test-framework. URL: https://github.com/apache/drill/pull/1891#issuecomment-551046127 @paul-rogers , this test was added to check a bug