Thank you Steve. Terrific explanation.
I tried the approach you described - dfdl:lengthKind="pattern"
dfdl:lengthPattern="ABC|AB|AC|A" - and it worked great.
I also tried using enumeration facets coupled with dfdl:checkConstraints within
dfdl:assert
<xs:element name="item1">
<xs:annotation>
<xs:appinfo
source="http://www.ogf.org/dfdl/">
<dfdl:assert
test="{ dfdl:checkConstraints(.) }"
message="The value of item1 is not one of the allowable values"
/>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="A" />
<xs:enumeration value="ABC" />
<xs:enumeration value="AB" />
<xs:enumeration value="AC" />
</xs:restriction>
</xs:simpleType>
</xs:element>
But that did not work. Why does that not work?
/Roger
-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Monday, July 12, 2021 2:39 PM
To: [email protected]
Subject: [EXT] Re: How to specify data with two fields, no delimiter, variable
length?
In cases like these, you need to use dfdl:lengthKind="pattern" and a regular
expression to define the length of the first item.
There's lots of different regexs depending on what kinds of infosets you want
to allow.
For example, one approach for the first item is a very strict regex that
matches exactly one of the four values, e.g.
<xs:element name="item" type="xs:string"
dfdl:lengthKind="pattern" dfdl:lengthPattern="ABC|AB|AC|A" />
With this approach, the item will get a non-zero length if it is one of those
items. Otherwise the item will be the empty string. And if you don't want empty
string to be allowed, you need to add an assert that the length is greater than
zero. Also, note that order in the regex matters so it matches the longest
possibility first.
On the other end of the spectrum, you could instead model the first item to
match as many non-digits as possible:
<xs:element name="item" type="xs:string"
dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" />
This will match any of the four allowed values, but will also match anything
else up to the first digit. So this could potentially produce infosets with an
item value of XYZ, for example. In some cases, you might actually want this--we
might consider the data to be "well-formed"
but not "valid". So you still get an infoset, it's just not "valid".
Whereas in the first case, you could only get a valid infoset.
You'll probably also need to use regex length for matching the numeric item if
there's no delimiter after the number.
So putting it together, and using the second approach for both items, you might
do something like this:
<xs:sequence>
<xs:element name="item1 type="xs:string"
dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" />
<xs:element name="item2" type="xs:int"
dfdl:lengthKind="pattern" dfdl:lengthPattern="[0-9]*" />
</xs:sequence>
So the first item is string parsing as many non-digits as possible, and the
second is an int parsing as many digits as possible. Note that this approach
probably should have limits on the regex length in case the data is
bad/malformed. For example, if the data didn't contain numbers then item1 would
just consume the entire data. So instead of *, you might instead want to use
something like "{0,10}" for both regexes.
- Steve
On 7/12/21 2:05 PM, Roger L Costello wrote:
> Hi Folks,
>
> I have a data field composed to two items.
>
> The values for the first item can be enumerated:
>
> A
> ABC
> AB
> AC
>
> The values for the second item is any integer 0-999
>
> So, here is a same data field:
>
> A250
>
> How do I parse that using DFDL? I reckon I'm stuck.
>
> /Roger
>