Hi Charles, this is an interesting idea and in fact we also discussed the same matter for Calcite at ApacheCon NA. But, I agree that it would be really powerful together with a complete Runtime like Drill.
Julian Von: Charles Givre <[email protected]> Antworten an: "[email protected]" <[email protected]> Datum: Mittwoch, 30. Oktober 2019 um 19:38 An: "Costello, Roger L." <[email protected]> Cc: "[email protected]" <[email protected]> Betreff: Re: Use cases for DFDL +1 On Oct 30, 2019, at 2:36 PM, Costello, Roger L. <[email protected]<mailto:[email protected]>> wrote: Excellent! Okay, here’s the use case: A Daffodil extension could be created for Apache Drill so that you could parse any kind of data with Daffodil using a DFDL schema, and then you could use ANSI SQL to query the data, join it with other data, do analysis, etc., just as if it came from a database. So, instead of parsing data to XML and then using XPath to pull out data, you could instead parse data to Apache Drill's data representation and then use ANSI SQL to pull out data, and even combine it with other non-Daffodil data types. The advantage for this would be that it would make it very easy to enable Drill to query new data types (IE simply by using a DFDL schema) and it would enable users to easily query this data without having to load it into another system. How’s that Charles? /Roger From: Charles Givre <[email protected]<mailto:[email protected]>> Sent: Wednesday, October 30, 2019 2:28 PM To: Costello, Roger L. <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Use cases for DFDL Close... One minor nit is that Drill doesn't use a "query-like" syntax. It is regular ANSI SQL. IMHO, I think this. would be a really great collaboration of the two communities. --C On Oct 30, 2019, at 1:10 PM, Costello, Roger L. <[email protected]<mailto:[email protected]>> wrote: Thanks again Charles. Is the following use case description correct? A Daffodil extension could be created for Apache Drill so that you could parse any kind of data with Daffodil using a DFDL schema, and then you could use Apache Drill's query-like syntax and rich capabilities to query parts of that data, join it with other data, do analysis, etc., just as if it came from a database. So, instead of parsing data to XML and then using XPath to pull out data, you could instead parse data to Apache Drill's data representation and then use Drills rich data-query capabilities to pull out data, and even combine it with other non-Daffodil data types. The advantage for this would be that it would make it very easy to enable Drill to query new data types (IE simply by using a DFDL schema) and it would enable users to easily query this data without having to load it into another system. Is that correct? /Roger From: Charles Givre <[email protected]<mailto:[email protected]>> Sent: Wednesday, October 30, 2019 12:19 PM To: Costello, Roger L. <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Use cases for DFDL Not exactly... I was thinking of using DFDL to enable Drill to create a schema for data that Drill cannot read. If DFDL can be used to describe the schema, a plugin could be written for Drill that mirrors this schema and ultimately reads the data files. Drill wouldn't be populating any database, but rather directly querying the data. The advantage for this would be that it would make it very easy to enable Drill to query new data types (IE simply by using a DFDL schema) and it would enable users to easily query this data w/o having to load it into another system. Does that make sense? -- C On Oct 30, 2019, at 12:13 PM, Costello, Roger L. <[email protected]<mailto:[email protected]>> wrote: Thanks Charles. Let me see if I understand the use case correctly. Use DFDL to parse data to populate a database and then use Apache Drill to query the database. Is that correct? /Roger From: Charles Givre <[email protected]<mailto:[email protected]>> Sent: Wednesday, October 30, 2019 12:01 PM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Use cases for DFDL To add to this discussion, I'm the PMC chair for Apache Drill. I think a compelling use case for DFDL would be enabling Drill to use DFDL to enable Drill to query data based on a DFDL schema. This same concept could be applied to other SQL query engines such as Presto and/or Impala. IMHO, this would facilitate the analysis of data sets supported by DFDL. -- C On Oct 30, 2019, at 11:53 AM, Costello, Roger L. <[email protected]<mailto:[email protected]>> wrote: Thanks Mike! I updated the slide: <image002.png> From: Beckerle, Mike <[email protected]<mailto:[email protected]>> Sent: Wednesday, October 30, 2019 11:45 AM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Use cases for DFDL I would not pick on RDF data stores as the target. Parsing data to populate a database (any variety) is the actual case. The fact that we did do one project involving RDF is why I cited that example in particular but pulling data into any data store/data base begins with the ability to parse the data, and then process it into suitable form. This is an incomplete list so perhaps this slide title should be "Example Use Cases for DFDL" ? ...mikeb ________________________________ From: Costello, Roger L. <[email protected]<mailto:[email protected]>> Sent: Monday, October 28, 2019 10:41 AM To: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Subject: Use cases for DFDL Hi Folks, I created a slide of use cases. See below. Do you agree with the slide? Anything you would add, delete, or change? /Roger <image003.png>
