On Tue, 8 Sep 2015 13:46:48 +0200 Mark Davis ☕️ <m...@macchiato.com> wrote:
> On Tue, Sep 8, 2015 at 9:53 AM, Asmus Freytag (t) > <asmus-...@ix.netcom.com> wrote: <snip> > > What about set operations on sets with string ranges? > Again, the range notation is just a formatting issue. Anything you > can do with [{ax}-{bz}] you can also do with > [{ax}{ay}{az}{bx}{by}{bz}], and vice versa, since the former is > defined to be equivalent to the latter. These are just string > representations of the same *logical* underlying implementation. > > Can they be expressed (other than working them out and writing down > > the full enumeration of the resulting set)? > I'm not quite sure what you mean. That's like asking, "Can [a-z] be > expressed, other than by writing out the full enumeration [a b c d > e ... z]?". Well, yes. You could represent [a-z] in many ways: > [\p{ASCII}&\p{lu}], for example. Or [\u0061 \u0062 ...]. Or.... > But I'm probably misunderstanding what you are trying to say. I think Asmus is asking if there is a more compact representation of the result of a string operation than just listing all the string elements. The answer would then be yes. Just [a-z]~~[e-s] can be written (and represented internally) as [a-dt-z], so [{aa}-{zz}]-[{ee}-{ss}] can be written (and represented internally) as the union of four non-overlapping string ranges [{aa}-{dz} {ea}-{sd} {et}-{sz} {ta}-{tz}]. Fortunately, unions of string ranges of the same length commute, which is not necessarily the case for Unicode sets. (It is possible that [[a][{ab}]] might preferentially match "a" while [[{ab}][a]] preferentially matched "ab".) Richard.