You are correct that Daffodil/DFDL has no access to "file names" built into it
conceptually. Daffodil really doesn't know nor care whether the arriving data
comes from a stream, file, etc.
Applications, however, process files routinely, and there are file-name
requirements frequently. Such applications generally have to somehow
concatenate the file name onto the data so that Daffodil sees both for parsing,
and when unparsing the application has to re-create the file using the name
information in the data.
That would allow the application to, for example, truncate overly long file
names, or convert names with spaces in them to use underscores instead.
As far as the complex kinds of validations you are looking at I think the
multi-level table-lookup you are showing here is beyond what is sensible to try
to do directly in DFDL.
My approach would be to write a DFDL schema generator that reads these tables,
and outputs a big DFDL schema with elements that contain the proper specific
field-count so that parsing with that DFDL schema will expect exactly the right
number of items. Strings like AC2DB and ORG_UNIT would be dfdl:initiator values
that are the tags that identify the corresponding elements.
Or your schema could parse based on the delimiters, deeming anything properly
delimited to be "well formed", and just contain XSD minOccurs and maxOccurs
with constant values, or dfdl:assert (or schematron rules) for validation that
contain the constants like that the AC2DB ORG_UNIT element should have:
So, for example if the ORG_UNIT row inside AC2DB is supposed to have 119
fields, then generate something like:
<element name="AC2DB"
dfdl:initiator="####:AC2DB:001:2004:">
<complexType>
<sequence dfdl:initiator="<RD>">
<element name="ORG_UNIT"
dfdl:initiator="ORG_UNIT">
<sequence dfdl:separator="<CD>"
dfdl:separatorPosition="prefix">
<element name="item"
minOccurs="119"
maxOccurs="119"
type="xs:string"
dfdl:occursCountKind="parsed"/>
....
Then regular XSD validation mode of Daffodil would check the
minOccurs/maxOccurs for you, and give you validation errors, but DFDL parsing
would accept any number of occurrences.
Or you could use dfdl:occursCountKind="fixed" to make it fail the parse
entirely (not well formed) if the count of items is not exactly 119.
________________________________
From: Attila Horvath <[email protected]>
Sent: Tuesday, July 13, 2021 10:33 AM
To: [email protected] <[email protected]>
Subject: are some specifications impossible for DFDL to implement?
ALCON
I assume there're some data validation rules that DFDL cannot implement. Case
in point, attached "Case.pdf" is excerpt of customer's three (3) data
validation specifications.
I'm not aware how Rule #2 can/might be implemented w/ DFDL b/c schema's
filename cannot be conveyed to the script for validation. Am I correct?
Assuming Rule #2 cannot be validated, then Rule #3 similarly cannot be
validated as it ties into Rule #2 file naming convention.
I also assume Rule #4 similarly cannot be validated b/c 'SUBJECT_NM' cannot be
validated w/ table #3 of table names 'TABLE_NM'.
Thx in advance,
Attila