Re: UTF-8 in web pages

2001-08-09 Thread Martin Duerst
At 22:39 01/08/06 -0700, Brian Stell wrote: >Netscape 4.x and earlier did not use Unicode. Not completely true. Netscape 4 used Unicode, but only in a separate code path, i.e. legacy encodings didn't get converted to Unicode. Regards, Martin.

Re: Unicode Normalization Forms

2001-08-09 Thread Martin Duerst
At 09:39 01/08/09 -0500, Jarkko Hietaniemi wrote: > > NormalizationTest-3.1.0.txt seems to have a few bugs. > > (exactly speaking, on nine lines) > >Have you reported this? If not, please do so as soon as possible >so that Unicode 3.1.1 will have them fixed. I reported them quite a while ago and

Re: Japanese text search problem

2001-08-09 Thread Martin Duerst
At 12:17 01/08/08 -0700, Benjamin Franz wrote: >Oh, yeah. I forgot about that since I don't normally keep stuff in >JIS/SJIS/EUC-JP once I've acquired it. I always make my working store >UTF8. In UTF8 the 'frame' problem doesn't exist because character start >bytes _ALWAYS_ have bit eight set to

Re: Unicode Normalization Forms

2001-08-09 Thread Bjoern Hoehrmann
* SADAHIRO Tomoyuki wrote: >How about the following interface? > >| $normalized_string = normalize($raw_string) >| >| You can use this function only if the normalization form >| you require is specified in the C statement: >| >| use Text::Unicode::Normalize 'C'; # Normalization Form C Also fin

Re: Unicode Normalization Forms

2001-08-09 Thread Jarkko Hietaniemi
On Thu, Aug 09, 2001 at 11:57:24PM +0900, SADAHIRO Tomoyuki wrote: > > On Thu, 9 Aug 2001 09:39:41 -0500 > Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > > > > NormalizationTest-3.1.0.txt seems to have a few bugs. > > > (exactly speaking, on nine lines) > > > > Have you reported this? If not,

Re: Unicode Normalization Forms

2001-08-09 Thread SADAHIRO Tomoyuki
On Thu, 9 Aug 2001 09:39:41 -0500 Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote: > > NormalizationTest-3.1.0.txt seems to have a few bugs. > > (exactly speaking, on nine lines) > > Have you reported this? If not, please do so as soon as possible > so that Unicode 3.1.1 will have them fixed. > >

Re: Unicode Normalization Forms

2001-08-09 Thread SADAHIRO Tomoyuki
> > use Text::Unicode::Normalize; > > > > $stringNFD = NFD($string); # Normalization Form D > > $stringNFC = NFC($string); # Normalization Form C > > $stringNFKD = NFKD($string); # Normalization Form KD > > $stringNFKC = NFKC($string); # Normalization Form KC > > a normalize function in

Re: Unicode Normalization Forms

2001-08-09 Thread Jarkko Hietaniemi
> NormalizationTest-3.1.0.txt seems to have a few bugs. > (exactly speaking, on nine lines) Have you reported this? If not, please do so as soon as possible so that Unicode 3.1.1 will have them fixed. http://www.unicode.org/unicode/standard/versions/beta.html > This module requires the follow

Re: Unicode Normalization Forms

2001-08-09 Thread Jarkko Hietaniemi
On Thu, Aug 09, 2001 at 10:31:14AM +0100, Nick Ing-Simmons wrote: > Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: > >* SADAHIRO Tomoyuki wrote: > >>Now a pre-release module to get Unicode Normalization Forms > >>(UAX #15) is available. > > > >Cool! :-) > > > >>NAME (a temporary name) > >> > >>Text