[EMAIL PROTECTED] said:
I need to convert strings obtained from a mysql database in utf8 format
into a fileformat to be uploaded to specific hardware (specifically
GPS's). Some of these formats may only allow unaccented characters, so I
need a way to convert accented characters into their
Hello,
I need to convert strings obtained from a mysql database in utf8 format
into a fileformat to be uploaded to specific hardware (specifically
GPS's). Some of these formats may only allow unaccented characters, so
I need a way to convert accented characters into their respective base
I'm afraid, the process of taking NFD followed by removing \pM
characters
(remove_accent() as below) would remove marks other than accents too
much.
Say, it replaces '' (U+2260, NOT EQUAL TO) with '=' (EQUALS SIGN)
since a mathematic negation slash is encoded by U+0338
COMBINING LONG SOLIDUS
slash is encoded by U+0338
COMBINING LONG SOLIDUS OVERLAY which is to be removed.
Also, although they are not accents, it's unclear (and quite
language-dependent)
what should be done with ligatures.
Thanks to you both for your replies. I did some more research
and found that even removing accents
/2001-12/
msg4.html
I hope something there can help you.
==
P.S. UTR #30, Character Foldings, has two concepts about removing
accents.
[cf. http://www.unicode.org/reports/tr30/ ]
One is accent removal, and
the other is diacritic removal (includes stroke, hook, descender).
The accent removal
On Fri, 2 Jan 2004 11:56:12 +0100
Eric Cholet [EMAIL PROTECTED] wrote:
Thanks for your detailed reply. I looked into this and found that I
can use Unicode::Normalize to decompose a string in NFD form and then
remove the accents with a regex removing /pM/. I wonder if I overlooked
a
Here's another naive question from a unicode newbie:
Is there a way, using perl's unicode support, to remove
accents from a string? I looked at \pM but can't figure
out how it works, I wasn't able to match anything with it.
Thanks,
--
Eric Cholet
help you.
==
P.S. UTR #30, Character Foldings, has two concepts about removing accents.
[cf. http://www.unicode.org/reports/tr30/ ]
One is accent removal, and
the other is diacritic removal (includes stroke, hook, descender).
The accent removal utilizes canonical decomposition, and
non