Thanks for the explanation Steve, I had been looking into why things behave this way as well and was a little confused how the patterns in a dfdl:assert were handled.
It sounds like it is intentional that patterns can go beyond the bounds of local delimiters then, correct? IE in this example we have a sequence of strings that are separated by lines, but the pattern for the first element of the sequence will read beyond the first newline, correct? I agree that using a pattern restriction is generally preferable, but wanted to make sure that this behavior of dfdl:assert patterns reaching beyond local delimiters was intentional. Josh ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Monday, April 21, 2025 8:29 AM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: Re: issue with using testPattern in an assertion I think the issue is that with testKind="pattern", the dot wildcard character in a regex matches newlines--it behaves as if "(?s)" is appended to your regex. So essentially each time your regex is run it will scan the entire data stream and will always be a successful match as long as the data has an open parenthesis somewhere and ends with a close parenthesis, regardless of what line everything happens on. One way to fix this is to not use the dot wildcard and use a character class to ensure your dots only match the characters you expect. There's a number of ways to do this, but if you want to match everything except newlines, you could do something like this: testPattern="[^\r\n]+\([^\r\n]+\)[\r\n]" So that matches one or more non-newline characters, followed by one or more non-newline characters wrapped in parenthesis, followed by a newline character. An alternative approach, which has a number of benefits and is what I would recommend for this kind of thing, is to use an XSD pattern restriction instead of a recoverableError assertion, e.g.: <element name="line" ... > <simpleType> <restriction base="xs:string"> <pattern value=".+\(.+\)" /> </restriction> </simpleType> </element> A pattern restriction looks only at the infoset content rather than the underlying data stream, so you don't have to worry about newlines anymore and you can use the original regular expression. This is also nice because it's normal XSD, so other tools can be used to validate the values of the infoset, instead of relying only on Daffodil's testPattern. For example, if you add the "--validate on" option in the Daffodil CLI, it will use Xereces to validate the infoset, which outputs more verbose validation message like what the string was that failed the pattern restriction. This is also nice in that if you don't care about validation you can just not enable the validation option. This can be useful for testing. But there is no way to disable a testPattern assertion. On 2025-04-17 02:55 PM, Mark Kozak wrote: > Hello folks. > > I am reaching out for a sanity check please. > > I am seeing a regular expression behavior that was driving me mad, but may > actually be a bug? > > The example below is a simplified version for illustration: > > The goal is to check that a line of text starts with a string and ends with > another string in parenthesis. > > Using the following data and subsequent schema, only the first line should > pass > validation. So I expect to see 5 validation failures. However only the last > line is failed. > > Then just to keep things interesting, copy the first line to the end of the > file, and then there are no validation failures at all. > > It appears that the assertion is being checked against only the last element > in > the sequence. Is that the intended behavior? > > I have tried this with 3.6 and 3.9 and get the same results both times. > > aaa(111) > > bbb > > (222) > > ccc(333)XXX > > () > > (444) > > <element name="sample"> > > <complexType> > > <sequence dfdl:separator="%NL;" > > > <element name="line" dfdl:lengthKind="delimited" type="xs:string" > dfdl:occursCountKind="implicit" maxOccurs="unbounded" > > > <annotation> > > <appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:assert testKind="pattern" > failureType="recoverableError" > > testPattern=".+\(.+\)" /> > > </appinfo> > > </annotation> > > </element> > > </sequence> > > </complexType> > > </element> > > Thank you for the help. > > Mark Kozak > > Director of Engineering > > Adeptus Cyber Solutions > > Adeptus-CS.com >