You are correct that Daffodil/DFDL has no access to "file names" built into it 
conceptually. Daffodil really doesn't know nor care whether the arriving data 
comes from a stream, file, etc.

Applications, however, process files routinely, and there are file-name 
requirements frequently. Such applications generally have to somehow 
concatenate the file name onto the data so that Daffodil sees both for parsing, 
and when unparsing the application has to re-create the file using the name 
information in the data.

That would allow the application to, for example, truncate overly long file 
names, or convert names with spaces in them to use underscores instead.

As far as the complex kinds of validations you are looking at I think the 
multi-level table-lookup you are showing here is beyond what is sensible to try 
to do directly in DFDL.

My approach would be to write a DFDL schema generator that reads these tables, 
and outputs a big DFDL schema with elements that contain the proper specific 
field-count so that parsing with that DFDL schema will expect exactly the right 
number of items. Strings like AC2DB and ORG_UNIT would be dfdl:initiator values 
that are the tags that identify the corresponding elements.

Or your schema could parse based on the delimiters, deeming anything properly 
delimited to be "well formed", and just contain XSD minOccurs and maxOccurs 
with constant values, or dfdl:assert (or schematron rules) for validation that 
contain the constants like that the AC2DB ORG_UNIT element should have:

So, for example if the ORG_UNIT row inside AC2DB is supposed to have 119 
fields, then generate something like:

<element name="AC2DB"
   dfdl:initiator="####:AC2DB:001:2004:">
   <complexType>
     <sequence dfdl:initiator="&lt;RD&gt;">
        <element name="ORG_UNIT"
            dfdl:initiator="ORG_UNIT">
            <sequence dfdl:separator="&lt;CD&gt;"
                   dfdl:separatorPosition="prefix">
                <element name="item"
                      minOccurs="119"
                      maxOccurs="119"
                      type="xs:string"
                      dfdl:occursCountKind="parsed"/>
       ....

Then regular XSD validation mode of Daffodil would check the 
minOccurs/maxOccurs for you, and give you validation errors, but DFDL parsing 
would accept any number of occurrences.

Or you could use dfdl:occursCountKind="fixed" to make it fail the parse 
entirely (not well formed) if the count of items is not exactly 119.

________________________________
From: Attila Horvath <[email protected]>
Sent: Tuesday, July 13, 2021 10:33 AM
To: [email protected] <[email protected]>
Subject: are some specifications impossible for DFDL to implement?

ALCON

I assume there're some data validation rules that DFDL cannot implement. Case 
in point, attached "Case.pdf" is excerpt of customer's three (3) data 
validation specifications.

I'm not aware how Rule #2 can/might be implemented w/ DFDL b/c schema's 
filename cannot be conveyed to the script for validation. Am I correct?

Assuming Rule #2 cannot be validated, then Rule #3 similarly cannot be 
validated as it ties into Rule #2 file naming convention.

I also assume Rule #4 similarly cannot be validated b/c 'SUBJECT_NM' cannot be 
validated w/ table #3 of table names 'TABLE_NM'.

Thx in advance,

Attila

Reply via email to