Re: Tag characters

2015-05-18 Thread Mark Davis ☕️
​A few notes. A more concrete proposal will be in a PRI to be issued soon, and people will have a chance to comment more then. (I'm not trying to discourage discussion, just pointing out that there will be something more concrete relatively soon to comment on—people are pretty busy getting 8.0 out

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Tue, 19 May 2015 01:25:54 +0200 Philippe Verdy wrote: > I don't work with strings, but with what you seem to call "traces", For the concept of traces, Wikipedia suffices: https://fr.wikipedia.org/wiki/Mono%C3%AFde_des_traces . As far as text manipulation is concerned, the word 'trace' is an

Re: [OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-19 0:50 GMT+02:00 Doug Ewell : > > but fow we just have "i-", deprecated but still valid, > > "i-" is not deprecated. In the IANA database they are all replaced. I call that "deprecated" a bit abusively, but there's no longer any interest in them. >> for all other letters there's no par

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
I don't work with strings, but with what you seem to call "traces", but that I call sets of states (they are in fact bitsets, which may be compacted or just stored as arrays of bytes containing just 1 usefull bit, but which may be a bit faster; byte arrays are just simpler to program)., in a stack

[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
This is why I knew I would regret it. Clearing up some errors here. No more posts from me on this non-Unicode topic after this one. Philippe Verdy wrote: >> This would be a major revision to BCP 47, it would have nothing to do >> with reordering, > > It woiuld have to do because all subtags aft

Re: [OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:55 GMT+02:00 Doug Ewell : > Philippe Verdy wrote: > > > If ever the country codes used in BCP47 becomes full (all pairs of > > letters used), just some time before this happens, we could see new > > prefixes added before a new range of code. It is possible to use a > > 1-letter pref

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:38 GMT+02:00 Doug Ewell : > Philippe Verdy wrote: > > > So country codes cannot be reassigned (and we can expect many more > > merges/splits or changes of regimes in the many troubled areas of the > > world. > > Changes of regimes don't usually result in new 3166 code elements. The

[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy wrote: > If ever the country codes used in BCP47 becomes full (all pairs of > letters used), just some time before this happens, we could see new > prefixes added before a new range of code. It is possible to use a > 1-letter prefix for new country/territory code extensions, but wi

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy wrote: >> ISO 3166-1 already defines alpha-3 and numeric code elements, as well >> as alpha-2. > > But how to work with the 2 letters limitation when the world wants > more stability in codes (this was an important reason why ISO 639 was > not fully integrated in IETF tags, and why

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 22:56:47 +0200 Philippe Verdy wrote: > Isn't it possible for your basic substitution to transform \uf073 > into a character class [\uf071\uf072\uf073] that the regexp considers > as a single entity to check ? > In that case, backtracking for matching \u0F73*\u0F72 is simpler:

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
Isn't it possible for your basic substitution to transform \uf073 into a character class [\uf071\uf072\uf073] that the regexp considers as a single entity to check ? In that case, backtracking for matching \u0F73*\u0F72 is simpler: [\uF071\uF072\uF073]*\u0F72, as it just requires backtracking only

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new country/territory code extensions, but with some maintenance of BCP47 par

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 22:40:21 +0300 Eli Zaretskii wrote: > > Date: Mon, 18 May 2015 19:35:45 +0100 > > From: Richard Wordingham > > > > Mark Davis has published an algorithm to generate all strings > > canonically equivalent to a Unicode string > > Where can I find the description of that algor

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 21:05:49 +0200 Philippe Verdy wrote: > 2015-05-18 20:35 GMT+02:00 Richard Wordingham < > richard.wording...@ntlworld.com>: > > > The algorithm itself should be tractable - Mark Davis has published > > an algorithm to generate all strings canonically equivalent to a > > Unicod

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 22:14 GMT+02:00 Doug Ewell : > I know I'll regret this... > You should not > > Philippe Verdy wrote: > > > Sometime in a future, two letters will not be enough even in ISO > > 3166-1, if countries continue to split/merge (this does not happen > > frequently but is occurs every few yea

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
I know I'll regret this... Philippe Verdy wrote: > Sometime in a future, two letters will not be enough even in ISO > 3166-1, if countries continue to split/merge (this does not happen > frequently but is occurs every few years; and it will not be possible > to reuse old codes that are maintaine

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Markus Scherer wrote: > As far as I can tell from your quotes, CLDR will say what's valid > (plus containment info), and Unicode permits you to show a flag for > any valid tag. North Lanarkshire seems perfectly fine. I'm under the impression that this will be a standard Unicode mechanism, define

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Eli Zaretskii
> Date: Mon, 18 May 2015 19:35:45 +0100 > From: Richard Wordingham > > Mark Davis has published an algorithm to generate all strings > canonically equivalent to a Unicode string Where can I find the description of that algorithm?

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 19:37:06 +0100 Andrew West wrote: > > <1F3F3 E0047 E0042 E002D E004E E004C E004B> (GB-NLK) > > for the North Lanarkshire council area flag > > I don't believe that North Lanarkshire has an associated flag, which I > think is the case for most UK counties and councils (Cornwal

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
2015-05-18 20:35 GMT+02:00 Richard Wordingham < richard.wording...@ntlworld.com>: > The algorithm itself should be tractable - Mark Davis has published > an algorithm to generate all strings canonically equivalent to a > Unicode string, and what we need might not be so complex. Even this algorit

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
The hyphen is not redundant in ISO 3166 that defines primary codes with variable length (even if ISO 3166 part 1 for now only use two-letter codes). Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Andrew West
On 18 May 2015 at 19:19, Doug Ewell wrote: > > Is the new mechanism intended to allow flag tags that include either > "subtype" values or "contains" values? For example: That is my understanding. > <1F3F3 E0047 E0042 E002D E0053 E0043 E0054> (GB-SCT) > for the Scottish flag > > and > > <1F3F3 E0

Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
Philippe and I have got bogged down in a long discussion of how to parse traces of Unicode strings under canonical equivalence against non-regular Kleene star of regular expressions. Fortunately, such expressions can be expected to have very little use. A seemingly simple example is the regex \u0

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Markus Scherer
On Mon, May 18, 2015 at 11:19 AM, Doug Ewell wrote: > Is the new mechanism intended to allow flag tags that include either > "subtype" values or "contains" values? As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag f

Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
L2/15-145R says: > In CLDR 28, LDML will define a unicode_subdivision_subtag which also > provides validity criteria for the codes used for regional > subdivisions (see CLDR ticket #8423). When representing regional > subdivisions using ISO 3166-2 codes, only those codes that are valid > for the L

Re: Arabic diacritics

2015-05-18 Thread عبد الرحمان أيمن
many thanks, this exactly the needed information :) respectfully 2015-05-15 19:09 GMT+03:00 Denis Jacquerye : > You should use ARABIC SHADDA U+0651 in all positions. The presentation > forms (isolated, medial, final forms) are for compatibility with legacy > systems. > See what is said in http:/