RE: Bug in Daffodil; regex

2022-04-06 Thread Brutzman, Donald (Don) (CIV)
There are many slight variations among regex application settings and syntax in different programming languages. Syntax is mostly identical but YMMV. Good references include * https://www.regular-expressions.info * https://www.regex101.com * more references listed at htt

Re: Bug in Daffodil

2022-04-06 Thread Mike Beckerle
Actually, all the regex engines work similarly. First off, Daffodil simply calls the Java regex engine. It has no regex engine of its own, and the Java regex engine behaves nearly identically to JS, VB.net, etc. etc., at least at this level of detail. This site: https://myregextester.com/index.ph

RE: Daffodil error messages are awful

2022-04-06 Thread Roger L Costello
Okay, after 10 hours of effort I finally found the bug in my DFDL schema: a dfdl:lengthPattern listed the regex alternatives in the wrong order (shortest to longest instead of longest to shortest). I wish the error message had given me some clue what the problem was. This error message was ab

Re: Bug in Daffodil

2022-04-06 Thread Roger L Costello
Thanks Mike. That is contrary to the way that regexes work in XSD. For example, here I list the regex choice alternatives shortest to longest: http://www.w3.org/2001/XMLSchema> This XML document validates against the XSD:

Re: Bug in Daffodil

2022-04-06 Thread Beckerle, Mike
On that page, paragraph 4 under the heading "Remember That the Regex Engine is Eager" "I already explained that the regex engine is eager. It stops searching as soon as it finds a valid match. The consequence is that in certain situations, the

Re: Bug in Daffodil

2022-04-06 Thread Roger L Costello
Hi Mike, I read the web page you referenced. I don’t see where it says that the order of regex choice alternatives matter. Would you quote the sentence that says that, please? /Roger From: Mike Beckerle Sent: Wednesday, April 6, 2022 1:48 PM To: users@daffodil.apache.org Subject: [EXT] Re: Bu

Re: Bug in Daffodil

2022-04-06 Thread Mike Beckerle
This is standard regex behavior. Order of the regex choice alternatives matters very much. Authors of regex must organize for longest matches to be attempted first. See: https://www.regular-expressions.info/alternation.html This is one of the reasons DFDL delimiters don't let you just write a reg

Bug in Daffodil

2022-04-06 Thread Roger L Costello
With this input: GENTEXT/FOO/TAS// The following DFDL generates the dreaded "Left over data" error: If I reverse the regex for FreeText: Then the error goes away. This seems like a

Re: Daffodil error messages are awful

2022-04-06 Thread Mike Beckerle
Adding to steve's response. I did create ticket https://issues.apache.org/jira/browse/DAFFODIL-2686 about the poor diagnostic about left over data. As for Daffodil "does not like slash in a regex". I think you have lots of slashes going around here. "//" as terminator, "/" as prefix separator, an

Re: Daffodil error messages are awful

2022-04-06 Thread Roger L Costello
Thanks Steve. Upon closer inspection of the actual regex (I had just shown a simplified version of the actual regex) I see that this: [/A-Z]* is actually sandwiched between two [A-Z] That is, the actual regex is: [A-Z][/A-Z]*[A-Z] That avoids the problem you described. Now this DFDL:

Re: Daffodil error messages are awful

2022-04-06 Thread Steve Lawrence
Forward slashes work in regular expression how you expect. The issue is that your regular expression is consuming too much data. When Daffodil evaluates the regular expression, the data Daffodil is looking at looks like this: A// Your regular expression greedily matches one or more capital

Re: Daffodil error messages are awful

2022-04-06 Thread Roger L Costello
After many hours of effort, I figured out what is causing the error. For some reason, Daffodil does not like a forward slash in a regex: dfdl:lengthPattern="[/A-Z]+" The intent of that regex is to say that the input may contain a forward slash or any uppercase letter. That regex is contained i

Re: Daffodil error messages are awful

2022-04-06 Thread Mike Beckerle
I agree that every bad error message is a bug, and any error message that is not-helpful should be reported as one. The left-over data error you are seeing is a bit tricky. When Daffodil is invoked to consume data from a stream, then this situation is not even an error at all, as it is perfectly n

Daffodil error messages are awful

2022-04-06 Thread Roger L Costello
Hi Folks, I ran Daffodil on my DFDL schema and got this error message: [error] Left over data. Consumed 1504 bit(s) with at least 3040 bit(s) remaining. Left over data (Hex) starting at byte 189 is: (0x0d0a47454e544558...) Left over data (UTF-8) starting at byte 189 is: (??GENTEX...) That is a