Re: Removing Accents from unicode strings

2005-10-31 Thread David Graff
[EMAIL PROTECTED] said: I need to convert strings obtained from a mysql database in utf8 format into a fileformat to be uploaded to specific hardware (specifically GPS's). Some of these formats may only allow unaccented characters, so I need a way to convert accented characters into their

Removing Accents from unicode strings

2005-10-30 Thread Agnar Renolen
Hello, I need to convert strings obtained from a mysql database in utf8 format into a fileformat to be uploaded to specific hardware (specifically GPS's). Some of these formats may only allow unaccented characters, so I need a way to convert accented characters into their respective base

Re: removing accents

2004-01-03 Thread Jarkko Hietaniemi
I'm afraid, the process of taking NFD followed by removing \pM characters (remove_accent() as below) would remove marks other than accents too much. Say, it replaces '' (U+2260, NOT EQUAL TO) with '=' (EQUALS SIGN) since a mathematic negation slash is encoded by U+0338 COMBINING LONG SOLIDUS

Re: removing accents

2004-01-03 Thread Eric Cholet
slash is encoded by U+0338 COMBINING LONG SOLIDUS OVERLAY which is to be removed. Also, although they are not accents, it's unclear (and quite language-dependent) what should be done with ligatures. Thanks to you both for your replies. I did some more research and found that even removing accents

Re: removing accents

2004-01-02 Thread Eric Cholet
/2001-12/ msg4.html I hope something there can help you. == P.S. UTR #30, Character Foldings, has two concepts about removing accents. [cf. http://www.unicode.org/reports/tr30/ ] One is accent removal, and the other is diacritic removal (includes stroke, hook, descender). The accent removal

Re: removing accents

2004-01-02 Thread SADAHIRO Tomoyuki
On Fri, 2 Jan 2004 11:56:12 +0100 Eric Cholet [EMAIL PROTECTED] wrote: Thanks for your detailed reply. I looked into this and found that I can use Unicode::Normalize to decompose a string in NFD form and then remove the accents with a regex removing /pM/. I wonder if I overlooked a

removing accents

2003-12-27 Thread Eric Cholet
Here's another naive question from a unicode newbie: Is there a way, using perl's unicode support, to remove accents from a string? I looked at \pM but can't figure out how it works, I wasn't able to match anything with it. Thanks, -- Eric Cholet

Re: removing accents

2003-12-27 Thread SADAHIRO Tomoyuki
help you. == P.S. UTR #30, Character Foldings, has two concepts about removing accents. [cf. http://www.unicode.org/reports/tr30/ ] One is accent removal, and the other is diacritic removal (includes stroke, hook, descender). The accent removal utilizes canonical decomposition, and non