Re: (Offtopic) The unicode name for a character

2004-12-23 Thread Chris Hostetter
: However, I don't think that the names are consistent enough to permit a : generic use of regular expressions. What Daniel is trying to achieve : looks interesting anyway, I'm not sure that that really matters in the long run ... I think the OP was asking if there was a way to get the name in jav

Re: (Offtopic) The unicode name for a character

2004-12-22 Thread Otis Gospodnetic
If you are not tied to Java, see 'unac' at http://www.senga.org/. It's old, but if nothing else you could see how it works and rewrite it in Java. And if you can, you can donate it to Lucene Sandbox. Otis --- Peter Pimley <[EMAIL PROTECTED]> wrote: > > Hi everyone, > > The Question: > In Java

Re: (Offtopic) The unicode name for a character

2004-12-22 Thread Pierrick Brihaye
Hi, Morus Walter a écrit : If you cannot find that list somewhere I can mail you a copy. ICU4J's one is here : http://oss.software.ibm.com/cvs/icu4j/icu4j/src/com/ibm/icu/dev/data/unicode/UnicodeData.txt?rev=1.7&content-type=text/x-cvsweb-markup See also Unicode's one: http://www.unicode.org/Public

Re: (Offtopic) The unicode name for a character

2004-12-22 Thread Morus Walter
Hi Peter, > > The Question: > In Java generally, Is there an easy way to get the unicode name of a > character? (e.g. "LATIN SMALL LETTER A" from 'a') > ... > > I'm considering taking the unicode name for each character I encounter > and regexping it against something like: > ^LATIN .* LETTER

(Offtopic) The unicode name for a character

2004-12-22 Thread Peter Pimley
Hi everyone, The Question: In Java generally, Is there an easy way to get the unicode name of a character? (e.g. "LATIN SMALL LETTER A" from 'a') The Reasoning (for those who are interested): The documents I'm indexing have quite a lot of characters that are basically variations on the basic A-