Thanks Mike. But that’s simply not going to work. I am writing a program which inputs an arbitrary sequence of XSD element declarations. The elements use pattern facets or enumeration facets (no separator between the elements). That XSD is scaffolding. My program must add the appropriate DFDL properties to that scaffolding.
Let me generalize my question: Write a program that inputs an arbitrary sequence of XSD element declarations. The sequence of elements represent the parts of one data field. Each part may be of fixed or variable length. There is no separator between the parts. The parts are non-nillable. The program must output the element declarations with the appropriate DFDL properties added. How to achieve this? From: Mike Beckerle <[email protected]> Sent: Thursday, August 18, 2022 1:05 PM To: [email protected] Subject: [EXT] Re: Bug in Daffodil Well your first field is fixed length 2, so the boundary there is not a problem. The second field ends where the E/W appear. I would do this with lengthKind pattern and lookahead: ".*?(?=(E|W))". Note that your digits play no role in this. Those stay in the pattern facet. This regex is looking for (but not including) the start of what comes next. This is absolutlely in my experience the most common idiom for lengthKind 'pattern'. The patterns care not about what the value looks like at all, only about using lookahead to find where it must end. The third field is followed by a hyphen, so that's a terminator. The fourth is fixed length The fifth is delimited. On Thu, Aug 18, 2022 at 12:55 PM Roger L Costello <[email protected]<mailto:[email protected]>> wrote: * Is there a delimiter after the longitude like the "/"? No. Here’s the actual sequence of elements along with their pattern facet regex or enumeration values (no separator between them): LatitudeDegrees [0-9]{2} LatitudeMinutes [0-9]{2} [0-9]{2}\.[0-9]{1} [0-9]{2}\.[0-9]{2} [0-9]{2}\.[0-9]{3} [0-9]{2}\.[0-9]{4} Hemisphere E W Hyphen - LongitudeDegrees [0-9]{3} LongitudeMinutes [0-9]{2} [0-9]{2}\.[0-9]{1} [0-9]{2}\.[0-9]{2} [0-9]{2}\.[0-9]{3} [0-9]{2}\.[0-9]{4} Following that sequence is a slash delimiter. This discussion has helped me to clarify the modeling problem: How to create DFDL for a data field consisting of a series of parts (with no separator between the parts) and the parts may or may not be of variable length (the parts are non-nillable)? How do you answer that? /Roger From: Mike Beckerle <[email protected]<mailto:[email protected]>> Sent: Thursday, August 18, 2022 12:43 PM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Bug in Daffodil Is there a delimiter after the longitude like the "/"? If so then it is only the latitude field that is fixed length. On Thu, Aug 18, 2022 at 12:40 PM Roger L Costello <[email protected]<mailto:[email protected]>> wrote: > You'll need to use lengthKind="pattern" in this case. Ugh! I thought, with the use of -V limited, I had finally gotten rid of lengthKind="pattern". Now, with what you're telling me, I find myself back to the old tedious, error-prone task of ordering regexes longest-to-shortest. In fact, writing a program that examines an arbitrary set of pattern facet regexes to order them longest-to-shortest is going to be extremely difficult or even impossible. This is horrible! Is there no other solution? /Roger -----Original Message----- From: Steve Lawrence <[email protected]<mailto:[email protected]>> Sent: Thursday, August 18, 2022 12:24 PM To: [email protected]<mailto:[email protected]> Subject: [EXT] Re: Bug in Daffodil You'll need to use lengthKind="pattern" in this case. You could combine your pattern restrictions in to a big regex of alternatives, or you could do something a little less verbose like this: <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern="[0-9]{2}(\.[0-9]{1,4})?" /> Matches the same thing, but is a bit more compact. The same pattern could be used for the restriction. Alternatively, if you wanted to differentiate between well-formed/valid (i.e. different length pattern than restriction pattern), you could even do something like this: <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern="[0-9]+(\.[0-9]+)?" /> So parsing would accept any decimal number with optional decimal digits, and then validation could restrict this to the appropriate number of digits using the existing facets. Note that treating it as an xs:decimal instead of xs:string might give more you control (e.g. value must be >= 0 and < than 60). The Hemisphere element would have an explicit length of 1, e.g. <xs:element name="Hemisphere" dfdl:lengthKind="explicit" dfdl:length="1"> On 8/18/22 12:03 PM, Roger L Costello wrote: > Thanks Steve. Unfortunately, specifying an explicit length on each element is > not going to work. The second element - LatitudeMinutes - can actually be 2, > 4, 5, 6, or 7 characters in length: > > <xs:element name="LatitudeMinutes"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:pattern value="[0-9]{2}"/> > <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> > <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> > <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> > <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > And after it are more elements. For example, following it is this element > > <xs:element name="Hemisphere"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:enumeration value="N"/> > <xs:enumeration value="S"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > How to handle this situation? > > /Roger > > -----Original Message----- > From: Steve Lawrence <[email protected]<mailto:[email protected]>> > Sent: Thursday, August 18, 2022 11:54 AM > To: [email protected]<mailto:[email protected]> > Subject: [EXT] Re: Bug in Daffodil > > You haven't specified a length of the LatitudeDegrees (or > LatitudeMinutes). So the lengthKind is just delimited and so will end up > delimited by the nearest enclosing delimiter, which is the /. So > LatatitudeDegrees is parsed as "2006", and things go off the rails. > > Instead, you want your LatitudeDegrees/Minutes elements to have > lengthKind="explicit" with length="2", e.g.: > > <xs:element name="Origin"> > <xs:complexType> > <xs:sequence> > <xs:element name="LatitudeDegrees" > dfdl:lengthKind="explicit" dfdl:length="2"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:pattern value="[0-9]{2}"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > <xs:element name="LatitudeMinutes" > dfdl:lengthKind="explicit" dfdl:length="2"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:pattern value="[0-9]{2}"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > </xs:sequence> > </xs:complexType> > </xs:element> > > > > On 8/18/22 11:08 AM, Roger L Costello wrote: >> Hi Folks, >> >> Daffodil is unable to parse DFDL schemas containing two consecutive element >> declarations, each with a simpleType which has a facet. >> >> With this input: >> >> John Doe/2006/Sally Smith >> >> The part of interest is the middle part - 2006 - which consists of two >> subparts: 20 (LatitudeDegrees) and 06 (LatitudeMinutes). Each subpart is >> constrained via XSD facets. >> >> I get this error message when I parse using Daffodil version 3.2.1 (using >> the -V limited option): >> >> [error] Validation Error: LatitudeMinutes failed facet checks due to: facet >> enumeration(s): 06 >> >> Below is my DFDL schema. >> >> I believe this is a bug, yes? Is there a workaround? >> >> <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" >> xmlns:xs="http://www.w3.org/2001/XMLSchema"> >> <xs:annotation xmlns:f="function" >> xmlns:fn="http://www.w3.org/2005/xpath-functions" >> xmlns:regex="regex-functions"> >> <xs:appinfo source="http://www.ogf.org/dfdl/"> >> <dfdl:format alignment="1" >> alignmentUnits="bytes" >> emptyValueDelimiterPolicy="none" >> encoding="ASCII" >> encodingErrorPolicy="replace" >> escapeSchemeRef="" >> fillByte="%SP;" >> floating="no" >> ignoreCase="yes" >> initiatedContent="no" >> initiator="" >> leadingSkip="0" >> lengthKind="delimited" >> lengthUnits="characters" >> nilValueDelimiterPolicy="none" >> occursCountKind="implicit" >> outputNewLine="%CR;%LF;" >> representation="text" >> separator="" >> separatorSuppressionPolicy="anyEmpty" >> sequenceKind="ordered" >> textBidi="no" >> textPadKind="none" >> textTrimKind="none" >> trailingSkip="0" >> truncateSpecifiedLengthString="no" >> terminator="" >> textNumberRep="standard" >> textStandardBase="10" >> textStandardZeroRep="0" >> textNumberRounding="pattern" >> textStandardExponentRep="E" >> textNumberCheckPolicy="strict"/> >> </xs:appinfo> >> </xs:annotation> >> <xs:element name="Test"> >> <xs:complexType> >> <xs:sequence dfdl:separator="/" dfdl:separatorPosition="infix"> >> <xs:element name="A" type="xs:string" /> >> <xs:element name="Origin"> >> <xs:complexType> >> <xs:sequence dfdl:separator=""> >> <xs:element name="LatitudeDegrees"> >> <xs:simpleType> >> <xs:restriction base="xs:string"> >> <xs:pattern value="[0-9]{2}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="LatitudeMinutes"> >> <xs:simpleType> >> <xs:restriction base="xs:string"> >> <!--<xs:pattern >> value="[0-9]{2}"/>--> <!-- This also fails --> >> <xs:enumeration value="06" /> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> </xs:sequence> >> </xs:complexType> >> </xs:element> >> <xs:element name="B" type="xs:string" /> >> </xs:sequence> >> </xs:complexType> >> </xs:element> >> </xs:schema> >> >>
