Roger,

The issue is that DFDL uses the first occurance of the terminator, regardless 
of padding or trim options. This means that if you have 2 NUL characters in a 
row, the first one would terminate the string, and the second one would 
terminate a difference (0 length) string.


If you had a reasonable upper limit for how many NUL characters you could have 
in a row, you could do something like: dfdl:terminator="%NUL; %NUL;%NUL; 
%NUL;%NUL;%NUL;", which indicates that a string can be terminated by a sequence 
of 1, 2, or 3 null characters. DFDL will still terminate at the earliest 
possible chance, however once it terminates, it will use the longest possible 
terminator. Unfortunately, there is no way to specify a terminator of "1 or 
more NULs". There is %WSP+; which means "1 or more whitespace", and %WSP*; 
which means "0 or more whitespace", but these are specific to whitespace and 
not a general construct you can use with other characters.

________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, April 1, 2019 2:47:33 PM
To: [email protected]
Subject: Question about parsing binary input containing strings separated by 
nulls


Hello DFDL community,



My binary input file contains: string null(s) string null(s) ….



The following DFDL schema correctly parses the input file:



<xs:element name="input">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="string" type="xs:string" maxOccurs="unbounded"
                dfdl:lengthKind="pattern"
                dfdl:lengthPattern="[\x00-\xFF]+?(?=\x00([^\x00]|$))"
                dfdl:representation="text"
                dfdl:encoding="ISO-8859-1"
                dfdl:textTrimKind="padChar"
                dfdl:textStringPadCharacter="%NUL;"
                dfdl:textStringJustification="left"
                dfdl:terminator="%NUL;"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>



But why do I need dfdl:lengthPattern?



Why can’t I simply state this: the input contains an unbounded number of 
strings, each string is padded by one or more nulls or ends at the end-of-file.



Why can’t I throw out dfdl:lengthPattern and set dfdl:lengthKind to 
“delimited”? Why doesn’t the following work correctly?



<xs:element name="input">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="string" type="xs:string" maxOccurs="unbounded"
                dfdl:lengthKind="delimited"
                dfdl:representation="text"
                dfdl:encoding="ISO-8859-1"
                dfdl:textTrimKind="padChar"
                dfdl:textStringPadCharacter="%NUL;"
                dfdl:textStringJustification="left"
                dfdl:terminator="%NUL;"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>



/Roger








Reply via email to