The feature you want is some lookahead.
In DFDL this is done with dfdl:assert with testKind 'pattern' and a regex.
So you can, for just one field, define it as either fixed or variable
length depending on whether the data looks like 3 characters and another
delimiter, or not.
That way each field can be defined this way, and each one is isolated from
the next, so the whole thing doesn't become a big coupled mess with
everything having to be combined with the next field.
It's not perfect, because you are expressing the separator in two places,
as a sequence separator, and in this look ahead regex, but OTOH it
expresses exactly the way you described the problem in terms of "it's fixed
length if it's followed by a next field"
So something like this would be in a sequence separated by "/"
<element name="b">
<complexType>
<choice>
<sequence>
<sequence>
<annotation><appinfo ..>
<!-- look ahead for 3 non-slash non-line-ending then a
slash -->
<dfdl:assert testKind="pattern"
testPattern="[^/\R][^/\R][^/\R]/" />
</appinfo></annotation>
</sequence>
<!-- this len named element is here to obey XSD's UPA rules.
-->
<element name="len" type="xs:unsignedInt"
dfdl:inputValueCalc="{ 3 }"/>
<element name="str" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="{ ../len }"/>
<!-- you could add space pad/trim to the str if you want it
left justified -->
</sequence>
<element name="str" type="xs:string" dfdl:lengthKind="delimited"/>
</choice>
</complexType>
</element>
On Thu, Sep 28, 2023 at 9:09 AM Roger L Costello <[email protected]> wrote:
> My input is a single line consisting of three fields separated by slashes.
> The first field (A) can contain any string. The second field (B) has a
> fixed length (3); if the data does not consume the allotted 3 spaces, then
> the data is left-aligned and padded with spaces on the right. The third
> field (C) can contain any string. Here is a sample input:
>
>
>
> Hello/X /Comment
>
>
>
> Notice the two padding spaces following X.
>
>
>
> Here is another sample input:
>
>
>
> Hello/XYZ/Comment
>
>
>
> That is all very straightforward and easily described in DFDL.
>
>
>
> Now for the complexity …
>
>
>
> The third field (C) is optional. If there is no data for the third field,
> then the data in the second field (B) does not need to be padded. So here
> is a valid input:
>
>
>
> Hello/X
>
>
>
> There is no padding following X. (Nor is there a slash separator)
>
>
>
> So, the second field (B) has a fixed length only if there is a third field
> (C).
>
>
>
> I created a DFDL schema which seems to correctly express this data format.
> See below. The approach I use is a choice for the second field:
>
>
>
> choice
> <sequence>
> element declaration for fixed length B
>
> element declaration for C
>
> </sequence>
>
> element declaration for variable length B
>
>
>
> Eek! I don’t think that approach is scalable.
>
>
>
> Suppose instead of 3 fields, there are 4 fields, A, B, C, D. Suppose B, C,
> D are optional and B, C are fixed length unless there are no following
> fields then they are variable length. The choice approach quickly becomes
> untenable as all permutations must be described. Is there a better approach
> to this problem?
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/
> xmlns:xs=http://www.w3.org/2001/XMLSchema
> xmlns:fn=http://www.w3.org/2005/xpath-functions >
> <xs:annotation>
> <xs:appinfo source=http://www.ogf.org/dfdl/>
> <dfdl:format alignment="1"
> alignmentUnits="bytes"
> emptyValueDelimiterPolicy="none"
> encoding="ASCII"
> encodingErrorPolicy="replace"
> escapeSchemeRef=""
> fillByte="%SP;"
> floating="no"
> ignoreCase="yes"
> initiatedContent="no"
> initiator=""
> leadingSkip="0"
> lengthKind="delimited"
> lengthUnits="characters"
> nilValueDelimiterPolicy="none"
> occursCountKind="implicit"
> outputNewLine="%CR;%LF;"
> representation="text"
> separator=""
> separatorSuppressionPolicy="anyEmpty"
> sequenceKind="ordered"
> textBidi="no"
> textPadKind="none"
> textTrimKind="none"
> trailingSkip="0"
> truncateSpecifiedLengthString="no"
> terminator=""
> textNumberRep="standard"
> textStandardBase="10"
> textStandardZeroRep="0"
> textNumberRounding="pattern"
> textStandardExponentRep="E"
> textNumberCheckPolicy="strict"/>
> </xs:appinfo>
> </xs:annotation>
> <xs:element name="Test">
> <xs:complexType>
> <xs:sequence dfdl:separator="/" dfdl:separatorPosition="infix"
> >
> <xs:element name="A" type="xs:string" />
> <xs:choice dfdl:choiceLengthKind="implicit">
> <xs:sequence dfdl:separator="/" dfdl:separatorPosition
> ="infix">
> <xs:element name="B-fixed-length"
> dfdl:lengthKind="explicit"
> dfdl:length="3"
> dfdl:textTrimKind="padChar"
> dfdl:textPadKind="padChar"
> dfdl:textStringPadCharacter=
> "%SP;"
> dfdl:textStringJustification=
> "left">
> <xs:simpleType>
> <xs:restriction base="validString">
> <xs:enumeration value="X"/>
> <xs:enumeration value="XY"/>
> <xs:enumeration value="XYZ"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
> <xs:element name="C" type="xs:string"/>
> </xs:sequence>
> <xs:element name="B-variable-length">
> <xs:simpleType>
> <xs:restriction base="validString">
> <xs:enumeration value="X"/>
> <xs:enumeration value="XY"/>
> <xs:enumeration value="XYZ"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
> </xs:choice>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
>
> <xs:simpleType name="validString">
> <xs:annotation>
> <xs:appinfo source=http://www.ogf.org/dfdl/>
> <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
> </xs:appinfo>
> </xs:annotation>
> <xs:restriction base="xs:string"/>
> </xs:simpleType>
>
> </xs:schema>
>
>
>