Mikko,

As Oracle UTF8 character set definition supports surrogates by a pairs of two 3-bytes to be sync with UTF-16 in binary sorting and code point, you will have the same issue to determine how many bytes for UTF8 as how many ushorts for UTF-16 if you want to have exactly match in surrogate support. But as memory for varchar type is dynamically allocated based on actual data, you may need to declare the size a little bit larger to take care of the potential support for surrogate.

Regards,
Jianping.

Mikko Lahti wrote:

What is the recommendation what comes dealing surrogate pairs and supporting CJK Unified Ideographs, Extension B (especially HKSCS) which will be in next version of the Unicode standard?

Mikko

-----Original Message-----
From: Jianping Yang [mailto:[EMAIL PROTECTED]]
Sent: Monday, July 24, 2000 5:08 PM
To: Mikko Lahti
Cc: Unicode List
Subject: Re: Oracle and Surrogate Pairs

Mikko, 

As there is no character defined in surrogate range in Unicode 3.0, the maximum width for Oracle UTF8 character set is 3 bytes. Here I recommend you to use 3 times for the number of  characters you intend to store in a column. 

Regards,
Jianping.. 

Mikko Lahti wrote: 

What is the correct way of supporting surrogate pairs in Oracle 8? Anything wrong with approach of making fields 3 times longer from ASCII or should fields be 4 times ASCII as per UTF-8 spec?

Later,

Mikko
Globalization Specialist
Onyx Software
[EMAIL PROTECTED]
www.onyx.com
425.519.4172

begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard

Reply via email to