Re: Public Review Issues Update

2004-10-21 Thread Theo Veenker
Mark Davis wrote: All comments are reviewed at the next UTC meeting. Due to the volume, we don't reply to each and every one what the disposition was. If actions were taken, they are recorded in the minutes of the meetings. But what if an action was not taken. Do I have to keep reporting a

Re: Public Review Issues Update

2004-10-20 Thread Theo Veenker
Rick McGowan wrote: If you have comments for official UTC consideration, please post them by submitting your comments through our feedback reporting page: http://www.unicode.org/reporting.html Hi Rick, Please tell us, why does one never get feedback on submitted comments regarding Public

UAX 15 hangul composition

2004-08-03 Thread Theo Veenker
Don't know if this has been asked/reported before, but is the example code for hangul composition in UAX 15 correct? The code is: public static String composeHangul(String source) { int len = source.length(); if (len == 0) return ; StringBuffer result = new

2nd attempt: final_sigma vs final_cased

2004-06-21 Thread Theo Veenker
Hi, Is there somebody out there who can answer this question? Casing context Final_Sigma is being used in SpecialCasing.txt, but its specification is no longer present in the standard (at least I can't find it). Obviously this context is now called Final_Cased, but the specification for

final_sigma vs final_cased

2004-06-14 Thread Theo Veenker
Hi, Casing context Final_Sigma is being used in SpecialCasing.txt, but its specification is no longer present in the standard (at least I can't find it). Obviously this context is now called Final_Cased, but the specification for Final_Cased (section 3.13) is not identical to that of Final_Sigma

base character

2004-06-10 Thread Theo Veenker
According to the definition a base character is: A character that does not graphically combine with preceding characters, and that is neither a control nor a format character. What is this expressed in terms of properties? Something like this? cc==0 AND GG!=Cc AND GC!=Cf AND GC!=Cn Theo

A binary file format for storing character properties

2004-05-04 Thread Theo Veenker
. Please check it out. Feedback is welcome. Regards, Theo Veenker

Re: Suggestion: use of symbolic links in the FTP site

2004-04-22 Thread Theo Veenker
Tom Emerson wrote: Philippe Verdy writes: Symbolic links is a bad idea on FTP. They are resolved by the client... Really? Depends on your server: proftpd handles them fine. I think it would also give the false feeling that a new 4.01 file exist when in fact it's the same as 4.00. No, the

Re: Downloading UCD 4.0.0

2004-04-20 Thread Theo Veenker
Asmus Freytag wrote: At 08:42 AM 4/19/2004, Theo Veenker wrote: Hi, Until now I always downloaded the lastest version of the UCD and worked with that. Now I want to download the UCD files for 4.0.0 again. I know it is all in http://www.unicode.org/Public/- 4.0-Update/, but in http

Downloading UCD 4.0.0

2004-04-19 Thread Theo Veenker
Hi, Until now I always downloaded the lastest version of the UCD and worked with that. Now I want to download the UCD files for 4.0.0 again. I know it is all in http://www.unicode.org/Public/- 4.0-Update/, but in http://www.unicode.org/ucd/ I read this: The complete set of all files for a given

Re: ISO 8859-11 (Thai) cross-mapping table

2002-10-09 Thread Theo Veenker
Marco Cimarosti wrote: John Aurelio Cowan wrote:) Marco Cimarosti scripsit: Talking about the format of mapping tables, I always wondered why not using ranges. In the case of ISO 8859-11, the table would become as compact as three lines: Well, that wins for 8859-1 and 8859-11

\p{} and \g{} in regexp

2002-07-23 Thread Theo Veenker
Hi, I have a few questions regarding unicode regular expressions. 1) I'm working on a regexp matcher and I'd like to know which properties are never needed in a \p{...} item. Currently I have included the properties listed below, but for efficiency reasons I'd like to trough out what isn't

Re: unidata is big

2002-04-24 Thread Theo Veenker
andreas palsson wrote: Hi. I would just like to know if someone could give me a tip on how to structure all the unicode-information in memory? All the UNIDATA does contain quite a bit of information and I can't see any obvious method of which is memory-efficient and gives fast access.

Re: Whence UniData.txt? (was Re: unidata is big)

2002-04-24 Thread Theo Veenker
[EMAIL PROTECTED] wrote: Theo's comment leads me to a question I've pondered recently: Assumptions: Many apps, from independent sources, need to access the Unicode character data, A lot of these apps aren't overly concerned with the slight overhead of parsing the data as

grapheme length

2002-04-18 Thread Theo Veenker
Hi, I'd like to know if there is something like a longest grapheme length. From the UTR-18 I see there is no limit, but in practice, can someone give an estimate of howmany code points a longest grapheme would occupy, roughly? Theo

UCD 3.2.0

2002-04-04 Thread Theo Veenker
Hi all, I'd like to make a few remarks about the UCD files. The following things I ran into when checking out the 3.2.0 release: o In PropertyValueAliases-3.2.0.txt line 79: ccc; 202; ATBL ; Attached_Below_Left whereas in UnicodeData-3.2.0.html I read: 200: Below left

mnemonic input

2002-03-27 Thread Theo Veenker
Hi all, Suppose I want to enable mnemonic input in my software. Using mnemonics allows one to write e' (of course embedded in some escape sequence) instead of \u00e9 or eacute; Which sets of mnemonics are being used or should I use? I found the ISO-10646 charmap file which gives mnemonic.ds

Re: mnemonic input

2002-03-27 Thread Theo Veenker
Marco Cimarosti wrote: Ooops! Of course, I was replying to a different question: Does it make sense to use mnemonics for ideographic scripts? I hadn't even noticed you quoted the wrong question, but I understood it anyway. Wat I meant was; I can use mnemonic characters in a plain ASCII

UCD 3.2.0

2002-03-11 Thread Theo Veenker
. Is this correct, or should it just read WAW? Another question, when can I expect a new edition of the unicode book? Regards, Theo Veenker