I'm very interested in learning what theory underpins DFDL's separator, 
initiator, terminator, separatorPosition, ignoreCase, lengthKind, et al 
properties too.  My impression is that DFDL is a language for describing data 
formats using these properties as annotations on XML Schemas, and Daffodil is a 
DFDL processor which uses a schema compiler and a parser/unparser generator to 
generate a set of parser/unparser combinator objects in memory.
Parser combinator - Wikipedia<https://en.wikipedia.org/wiki/Parser_combinator> 
offers some insight on the evolution of parser combinators and the theory 
(recursive descent parsing with memoization and backtracking) which enable 
their functionality.
Introduction to Parsers. Parsing is a surprisingly challenging... | by Chet 
Corcos | 
Medium<https://medium.com/@chetcorcos/introduction-to-parsers-644d1b5d7f3d> 
also talks a little about the theory of formal grammars (I would skip the 
second half, which talks about how to write parser combinators in JavaScript).
parsing - When to use a Parser Combinator? When to use a Parser Generator? - 
Software Engineering Stack 
Exchange<https://softwareengineering.stackexchange.com/questions/338665/when-to-use-a-parser-combinator-when-to-use-a-parser-generator>
 has a good discussion of the pros and cons between parser combinators and 
parser generators.
I have used parser generators and they are very useful.  If you find you need a 
parser generator, I recommend ANTLR<https://www.antlr.org/> as a better parser 
generator than Flex & Bison.  It can parse structured text or binary, generate 
Java or C parser code, and is widely used in academia and industry.
LL(*): the foundation of the ANTLR parser generator: ACM SIGPLAN Notices: Vol 
46, No 6<https://dl.acm.org/doi/10.1145/1993316.1993548> describes the formal 
LL(*) theory underlying the ANTLR parser generator.
John

From: Roger L Costello <[email protected]>
Sent: Wednesday, September 1, 2021 9:47 AM
To: [email protected]
Subject: EXT: What theory underpins DFDL?

Hi Folks,
Lately I have been learning to create parsers using a parser tool called Flex & 
Bison. I want to see how Flex & Bison parsers compare to DFDL parsers.
I learned that Flex & Bison parsers are built on solid theory:
The earliest parser back in the 1950s used utterly ad hoc techniques to analyze 
the syntax of the source code of programs they were parsing. During the 1960s, 
the field got a lot of academic attention and by the early 1970s, parsing was 
no longer an arcane art. In the 1970s Aho, Ullman, Knuth, and many others put 
parsing techniques solidly on their theoretical feet.
The book that I am reading said that one of the first techniques they (Aho, 
Ullman, Knuth, and others) espoused was to separate lexing (aka scanning, 
tokenizing) from parsing. Lexing built upon regular expressions, which built 
upon Finite Automata (FA) theory and Nondeterministic Finite Automata (NFA) 
theory. FA and NFA were brilliantly melded together with the famous Kleene 
Theorem. Parsing built on top of a rich theory of grammars - Context Free 
Grammars, Context Sensitive Grammars, etc. - that Chomsky formulated. Here's a 
graphic I created depicting the foundation upon which Flex & Bison parsers are 
built:
[cid:[email protected]]
If we leave aside XML Schema which hosts DFDL, what theory underpins the set of 
DFDL properties - separator, initiator, terminator, separatorPosition, 
ignoreCase, lengthKind, etc.?
/Roger

Reply via email to