: However, I don't think that the names are consistent enough to permit a
: generic use of regular expressions. What Daniel is trying to achieve
: looks interesting anyway,
I'm not sure that that really matters in the long run ... I think the OP
was asking if there was a way to get the name in jav
If you are not tied to Java, see 'unac' at http://www.senga.org/.
It's old, but if nothing else you could see how it works and rewrite it
in Java. And if you can, you can donate it to Lucene Sandbox.
Otis
--- Peter Pimley <[EMAIL PROTECTED]> wrote:
>
> Hi everyone,
>
> The Question:
> In Java
Hi,
Morus Walter a écrit :
If you cannot find that list somewhere I can mail you a copy.
ICU4J's one is here :
http://oss.software.ibm.com/cvs/icu4j/icu4j/src/com/ibm/icu/dev/data/unicode/UnicodeData.txt?rev=1.7&content-type=text/x-cvsweb-markup
See also Unicode's one:
http://www.unicode.org/Public
Hi Peter,
>
> The Question:
> In Java generally, Is there an easy way to get the unicode name of a
> character? (e.g. "LATIN SMALL LETTER A" from 'a')
>
...
>
> I'm considering taking the unicode name for each character I encounter
> and regexping it against something like:
> ^LATIN .* LETTER
Hi everyone,
The Question:
In Java generally, Is there an easy way to get the unicode name of a
character? (e.g. "LATIN SMALL LETTER A" from 'a')
The Reasoning (for those who are interested):
The documents I'm indexing have quite a lot of characters that are
basically variations on the basic A-