Re: Stripping out Unicode combining characters (diacritics)

2008-05-07 Thread David Kaufman
Hi Michael, Doran, Michael D [EMAIL PROTECTED] wrote: I'm trying to strip out combining diacritics from some form input using this code: [...] $sans_diacritics =~ s/\p{M}*//g; I do it like this: use Encode; use Unicode::Normalize qw(normalize); my $ascii = encode('ascii',

RE: Stripping out Unicode combining characters (diacritics) -

2008-05-07 Thread Doran, Michael D
I received a number of helpful suggestions and solutions. The approach I decided to adopt in my larger script is to 'decode' all the incoming form input as UTF-8 as well as the input from the database that I'll be matching the form input against. This seems to allow the '\p{M}' syntax to work

Re: Stripping out Unicode combining characters (diacritics) -

2008-05-07 Thread Brad Baxter
Just to throw this out there: you may be interested in Text::Unidecode (http://search.cpan.org/~sburke/Text-Unidecode-0.04/) if your ultimate goal is to try to represent a unicode character with its closest ascii (or perhaps I should say, romanized) equivalent. -- Brad On Wed, May 7, 2008 at

Re: Stripping out Unicode combining characters (diacritics)

2008-05-06 Thread Leif Andersson
Kopia: [EMAIL PROTECTED]; Perl4lib Ämne: RE: Stripping out Unicode combining characters (diacritics) Hi Mike, I appreciate the quick reply. I am familiar with the Unicode::Normalize module (and will also be using that), but I left it out of this question because it's not relevant to the problem

RE: Stripping out Unicode combining characters (diacritics)

2008-05-05 Thread Doran, Michael D
:52 PM To: Doran, Michael D Cc: [EMAIL PROTECTED]; Perl4lib Subject: Re: Stripping out Unicode combining characters (diacritics) On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D [EMAIL PROTECTED] wrote: [snip] I'm pulling my hair out on this... so any help would be appreciated. If there's any