Le 28 déc. 03, à 04:45, SADAHIRO Tomoyuki a écrit :
On Sat, 27 Dec 2003 13:30:19 +0100
Eric Cholet [EMAIL PROTECTED] wrote:
Here's another naive question from a unicode newbie:
Is there a way, using perl's unicode support, to remove
accents from a string? I looked at \pM but can't figure
out how
Eric Cholet [EMAIL PROTECTED] writes:
Le 1 janv. 04, 17:50, Rafael Garcia-Suarez a crit :
+(However, and as a limitation of the current implementation, using
+C\w or C\W Iinside a C[...] character class will still match
+with byte semantics.)
I don't think it applies to \w, only \W. \x{df}
Do negated classes work at all ?
What does /[^\w]/ do ?
(I looked at this stuff ages ago and I thought unicode classes
(including
negated ones worked, if that is true then fix may just be the magical
\W expander expanding to wrong thing...)
I think it's the evil characters in the 0x80..0xFF
Dear Perl Unicode experts,
http://www.perldoc.com/perl5.8.0/pod/perlunicode.html says:
In future, Perl-level operations will be expected to work with characters
rather than bytes.
I very much appreciate all your hard work on the internationalization of Perl.
However, recently I have been
In future, Perl-level operations will be expected to work with
characters rather than bytes.
I very much appreciate all your hard work on the internationalization
of Perl.
However, recently I have been working on some things that let me think
that the above statement, if taken directly, may be
At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote:
I am running Perl 5.8. and trying to filter out some invalid Unicode
characters from Unicoded texts of some South Asian languages. There
are 28 such characters in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15,
At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote:
I am running Perl 5.8. and trying to filter out some invalid Unicode
characters from Unicoded texts of some South Asian languages. There
are 28 such characters in my data (all control characters):
0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15,
At 11:47 pm + 2/1/04, I wrote:
$f = /tmp/zili.txt;
open F, $f ;...
Sorry. I had my mailbox sorted by sender rather than by date, so
this message appeared at the bottom unread. My memory's not good
enough to recall I'd read it and actually replied 4 months ago :)
Happy new year!
JD
Hello Jarkko,
Many thanks for your very quick answer.
At 00:31 04/01/03 +0200, Jarkko Hietaniemi wrote:
In future, Perl-level operations will be expected to work with
characters rather than bytes.
I very much appreciate all your hard work on the internationalization of
Perl.
However, recently
On Fri, 02 Jan 2004 18:17:13 -0500, Martin Duerst [EMAIL PROTECTED] said:
Jungshik has also reported that
it fails with Perl 5.8.0 with an UTF-8 locale.
Perl 5.8.0 was very broken with UTF-8 locales since it auto-PERL_UNICODEd.
We saw (keep seeing) a lot of that since RedHat 8 and 9
if (eval use bytes;) { use bytes; }
That would be
use if $] = 5.006, bytes;
But you would have to make sure that if.pm is available, no option IMO.
I think the was used in AxKit by the Matt/axkit-dev folks was to put
this line
$INC{ bytes.pm }++ if $] 5.006;
before any mention of
On Fri, 2 Jan 2004 11:56:12 +0100
Eric Cholet [EMAIL PROTECTED] wrote:
Thanks for your detailed reply. I looked into this and found that I
can use Unicode::Normalize to decompose a string in NFD form and then
remove the accents with a regex removing /pM/. I wonder if I overlooked
a
12 matches
Mail list logo