Just to throw this out there: you may be interested in Text::Unidecode
(http://search.cpan.org/~sburke/Text-Unidecode-0.04/) if your ultimate
goal is to try to represent a unicode character with its closest ascii
(or perhaps I should say, "romanized") equivalent.
-- Brad
On Wed, May 7, 2008 at 9:
I received a number of helpful suggestions and solutions. The approach I
decided to adopt in my larger script is to 'decode' all the incoming form input
as UTF-8 as well as the input from the database that I'll be matching the form
input against. This seems to allow the '\p{M}' syntax to work
Hi Michael,
"Doran, Michael D" <[EMAIL PROTECTED]> wrote:
> I'm trying to strip out combining diacritics from some form input using
> this code:
> [...]
> $sans_diacritics =~ s/\p{M}*//g;
I do it like this:
use Encode;
use Unicode::Normalize qw(normalize);
my $ascii = encode('ascii', normali
TECTED]
# http://rocky.uta.edu/doran/
> -Original Message-
> From: Leif Andersson [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, May 06, 2008 3:33 AM
> To: Doran, Michael D
> Subject: Re: Stripping out Unicode combining characters (diacritics)
>
> Oh, now I see your REAL
ander
Kopia: [EMAIL PROTECTED]; Perl4lib
Ämne: RE: Stripping out Unicode combining characters (diacritics)
Hi Mike,
I appreciate the quick reply. I am familiar with the Unicode::Normalize module
(and will also be using that), but I left it out of this question because it's
not relevant to t
Mon 5/5/2008 8:52 PM
To: Doran, Michael D
Cc: [EMAIL PROTECTED]; Perl4lib
Subject: Re: Stripping out Unicode combining characters (diacritics)
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <[EMAIL PROTECTED]> wrote:
[snip]
>
> I'm pulling my hair out on this... so an
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <[EMAIL PROTECTED]> wrote:
[snip]
>
> I'm pulling my hair out on this... so any help would be appreciated. If
> there's any other info I can provide, let me know.
>
You'll want to transform the text to NFD format (nominally, base
characters plus