As suggested, I'm attempting to validate the structure of the EDIFACT
document with DFDL assertions instead of Schematron. One thing I observed
is that I need to relax *maxOccurs *(i..e, unbounded) and *minOccurs*
(i.e., 0) otherwise the assertion rules won't be evaluated since occurrence
constraint errors are not recoverable (I imagine that it would be the same
case for Schematron) . However, when I relax the constraints, the parsed
structure changes from:
<SegGrp-3>
<RFF-18660>
<C506>
<E1153>VA</E1153>
<E1154>UK19430839</E1154>
</C506>
</RFF-18660>
<RFF-18660>
<C506>
<E1153>ADE</E1153>
<E1154>00000767</E1154>
</C506>
</RFF-18660>
</SegGrp-3>
to
<SegGrp-3>
<RFF>
<C506>
<E1153>VA</E1153>
<E1154>UK19430839</E1154>
</C506>
</RFF>
</SegGrp-3>
<SegGrp-3>
<RFF>
<C506>
<E1153>ADE</E1153>
<E1154>00000767</E1154>
</C506>
</RFF>
</SegGrp-3>
I'd rather avoid making breaking changes to the structure so I decided to
have two flavours of EDIFACT messages: strict and lax. A choice element
first attempts to parse the message using the strict schema and then falls
back to the lax schema if parsing on the strict one fails.
...
...
<xsd:sequence dfdl:choiceBranchKey="INVOIC">
<xsd:choice>
<xsd:sequence>
<xsd:element ref="D03B:INVOIC"/>
</xsd:sequence>
<xsd:sequence>
<xsd:element ref="D03B:Bad-INVOIC"/>
</xsd:sequence>
</xsd:choice>
</xsd:sequence>
...
...
The recoverable assertions are all defined within the *Bad-INVOIC* type
and, where possible, the occurrence constraints are relaxed within this
element type. Does it make sense what I wrote or do you think there might
be a better way to implement this?
Claude
On Sun, Aug 13, 2023 at 12:31 PM Claude Mamo <[email protected]> wrote:
> Schematron is really only needed for really rich validation rules that use
>> the tree-walking capabilities of XPath to scrutinize elements wherever they
>> appear in the infoset tree.
>>
>
> I'll give it a try with dfdl:assert and see how it goes.
>
> Thank for all the feedback!
>
> Claude
>
> On Mon, Jul 24, 2023 at 11:35 PM Mike Beckerle <[email protected]>
> wrote:
>
>> Something to consider:
>>
>> I think many useful validation checks can be expressed in DFDL's
>> expression language using the dfdl:assert statement with
>> failureType='recoverableError'.
>>
>> The sort of constraints that say if this element exists then that can't
>> exist, or if this has a specific value that that must exist... those sorts
>> of things can usually be expressed.
>>
>> Those are run in an incremental/streaming fashion as the parser traverses
>> the data based on the schema.
>>
>> Recoverable errors from Daffodil are the same as validation errors from
>> Daffodil's internal "limited" evaluation. They don't guide the parse (don't
>> cause backtracking), but come out as diagnostic warnings.
>>
>> Schematron is really only needed for really rich validation rules that
>> use the tree-walking capabilities of XPath to scrutinize elements wherever
>> they appear in the infoset tree.
>>
>>
>>
>>
>>
>> On Mon, Jul 24, 2023 at 7:47 AM Steve Lawrence <[email protected]>
>> wrote:
>>
>>> This is correct. The way daffodil currently implements full validation
>>> (xerces) and custom validation (e.g. schematron) is pretty inefficient.
>>> We create two infosets: one the kind that the user passed to the parse
>>> function, and one that is text XML written to a ByteArrayOuputStream in
>>> memory that is used internally for the validation once the parse is
>>> completed. We do not currently stream validation.
>>>
>>> If you wanted streaming, you would probably need to create custom
>>> InfosetOutputter, or maybe use the SAXInfosetOutputter with an XMLReader
>>> that chains/tees SAX events to custom schematron validation.
>>>
>>> - Steve
>>>
>>> On 2023-07-22 03:29 AM, Claude Mamo wrote:
>>> > Spotted this code so presumably it's not streaming when custom or full
>>> > validation is in force:
>>> >
>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356
>>> <
>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356
>>> >
>>> >
>>> > Claude
>>> >
>>> > On Sat, Jul 22, 2023 at 8:07 AM Claude Mamo <[email protected]
>>> > <mailto:[email protected]>> wrote:
>>> >
>>> > Hello Daffodil team,
>>> >
>>> > I'm looking into adding support for Schematron validation since we
>>> > have had many Smooks developers asking for better validation of
>>> > EDIFACT documents. One question I have is whether Schematron
>>> > validation is applied in a streaming fashion. I mean, does Daffodil
>>> > load the whole infoset into memory before applying the Schematron
>>> > rules or is Schematron validating on the fly while accumulating any
>>> > state that is required to be able to evaluate the rules?
>>> >
>>> > Thanks,
>>> >
>>> > Claude
>>> >
>>>
>>>