> pattern facet alternatives don't need to be sorted

That is terrific news! That makes things even simpler - if given a simpleType 
with a pattern facet, just use it as is. No sorting of its alternatives needed. 

Thanks Steve!

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]> 
Sent: Tuesday, August 9, 2022 7:41 AM
To: [email protected]
Subject: [EXT] Re: Do I need to sort the xs:pattern regex alternatives 
longest-to-shortest?

I thought restriction pattern facet alternatives don't need to be 
sorted? This is because the pattern must match the entire value of the 
element--there's essentially an implied ^ and $ surrounding the pattern.

So for example, if the pattern facet was "(ab|abc.*)" and an infoset of

   <foo>abcxyz</foo>

The first alternative would match the start of the value, but not the 
entire value. And so the second alternative would be tried and would 
match, so this would validate. The order it tries the match doesn't 
really matter since it's all or nothing.

The reason why they need to be sorted with Daffodil's lengthPattern is 
because Daffodil doesn't know where the end of the data is. We use the 
pattern for scanning. So once a pattern comes back with a match (which 
could be the first alternative) we stop scanning. We don't continue 
scanning trying all regex alternatives to find the longest, for example.

On 8/5/22 5:50 PM, Mike Beckerle wrote:
> Yes you do. All the regex engines I know are greedy.
> 
> Besides regexs just being fussy, this is the main reason DFDL has a delimiter
> language that is it's own thing. Because the delimiters are specified in
> different places, not all together as in a regex. Hence the user has no
> opportunity to sort longest to shortest, so DFDL delimiters match all the
> possible delimiters that can appear at a point with longest match preferred.
> 
> 
> 
> Il Ven 5 Ago 2022, 1:54 PM Roger L Costello <[email protected]
> <mailto:[email protected]>> ha scritto:
> 
>      Hi Folks,
> 
>      Recall that when using dfdl:lengthPattern you must specify its regex
>      alternatives longest-to-shortest. For example, if you specify this:
> 
>      dfdl:lengthPattern="abc|abcd"
> 
>      then you will get a "left over data" error message.
> 
>      So you must sort the alternatives in longest-to-shortest order. That is a
>      hassle.
> 
>      The "-V limited" option changes things. It enables me to abandon
>      dfdl:lengthPattern and instead use the XSD pattern facet:
> 
>      <simpleType>
>           <restriction base="string">
>               <pattern value="abc|abcd"/>
>           </restriction>
>      </simpleType>
> 
>      Question: Do I need to sort the pattern facet alternatives in
>      longest-to-shortest order? I am hoping the answer is "no".
> 
>      /Roger
> 

Reply via email to