Thanks again Charles. Is the following use case description correct?

A Daffodil extension could be created for Apache Drill so that you could parse 
any kind of data with Daffodil using a DFDL schema, and then you could use 
Apache Drill's query-like syntax and rich capabilities to query parts of that 
data, join it with other data, do analysis, etc., just as if it came from a 
database. So, instead of parsing data to XML and then using XPath to pull out 
data, you could instead parse data to Apache Drill's data representation and 
then use Drills rich data-query capabilities to pull out data, and even combine 
it with other non-Daffodil data types. The advantage for this would be that it 
would make it very easy to enable Drill to query new data types (IE simply by 
using a DFDL schema) and it would enable users to easily query this data 
without having to load it into another system.

Is that correct?

/Roger
From: Charles Givre <[email protected]>
Sent: Wednesday, October 30, 2019 12:19 PM
To: Costello, Roger L. <[email protected]>
Cc: [email protected]
Subject: [EXT] Re: Use cases for DFDL

Not exactly...
I was thinking of using DFDL to enable Drill to create a schema for data that 
Drill cannot read.  If DFDL can be used to describe the schema, a plugin could 
be written for Drill that mirrors this schema and ultimately reads the data 
files.  Drill wouldn't be populating any database, but rather directly querying 
the data.

The advantage for this would be that it would make it very easy to enable Drill 
to query new data types (IE simply by using a DFDL schema) and it would enable 
users to easily query this data w/o having to load it into another system.  
Does that make sense?
-- C


On Oct 30, 2019, at 12:13 PM, Costello, Roger L. 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Charles. Let me see if I understand the use case correctly.

Use DFDL to parse data to populate a database and then use Apache Drill to 
query the database.

Is that correct?

/Roger

From: Charles Givre <[email protected]<mailto:[email protected]>>
Sent: Wednesday, October 30, 2019 12:01 PM
To: [email protected]<mailto:[email protected]>
Subject: [EXT] Re: Use cases for DFDL

To add to this discussion, I'm the PMC chair for Apache Drill.  I think a 
compelling use case for DFDL would be enabling Drill to use DFDL to enable 
Drill to query data based on a DFDL schema.  This same concept could be applied 
to other SQL query engines such as Presto and/or Impala.

IMHO, this would facilitate the analysis of data sets supported by DFDL.
-- C



On Oct 30, 2019, at 11:53 AM, Costello, Roger L. 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Mike! I updated the slide:

<image002.png>

From: Beckerle, Mike <[email protected]<mailto:[email protected]>>
Sent: Wednesday, October 30, 2019 11:45 AM
To: [email protected]<mailto:[email protected]>
Subject: [EXT] Re: Use cases for DFDL

I would not pick on RDF data stores as the target.

Parsing data to populate a database (any variety) is the actual case. The fact 
that we did do one project involving RDF is why I cited that example in 
particular but pulling data into any data store/data base begins with the 
ability to parse the data, and then process it into suitable form.

This is an incomplete list so perhaps this slide title should be "Example Use 
Cases for DFDL" ?

...mikeb
________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Monday, October 28, 2019 10:41 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Use cases for DFDL

Hi Folks,

I created a slide of use cases. See below. Do you agree with the slide? 
Anything you would add, delete, or change?  /Roger

<image003.png>

Reply via email to