Re: [gtk-i18n-list] Determine the best encoding/script for a given text

mpsuzuki Tue, 13 Mar 2007 02:53:00 -0800

Hi,

Although there are some characters that is used in
PRC only, or Taiwan only etc, to determine Traditional/
Simplified Chinese or Vietnamese, taking single codepoint
is not enough. Usually a string with finite length
should be used, but, still it is not definitive.
I suppose, glib designer doesn't want glib to include
non-definitive script-guessing algorithms, so, I
think using other libraries might be better.
Possibly fontconfig's script detection algorithm might
be something informative, although often I find posts
from Chinese people who complains unexpected results.


BTW, for Vietnamese script (Chu-Han & Chu-Nom), I'm
not sure if pre-Unicode encodings are used in popular.
For example, TCVN 5773:1993 looks like a characterset
of intersection between Chu-Han and (exisiting) CJK
Unified Ideographs in BMP, I think it is not good
characterset to use as a character encoding for Vietnamese
script.


Regards,
mpsuzuki

Gaurav Jain wrote:
> Hi,
> 
> I need to find out the Script code for a given Unicode string.  I
> found the API g_unichar_get_script() available in GLIB 2.10 which does
> this, but this doesn't seem to have support for Chinese script.  For
> e.g., is it possible to find out if the given character falls under
> Traditional Chinese or Simplified Chinese code range?  Similarly for
> Vietnamese?
> 
> Is there any other API available in GLIB that I can use to determine
> the best encoding/script for a given text?
> 
> Thanks,
> Gaurav
> _______________________________________________
> gtk-i18n-list mailing list
> gtk-i18n-list@gnome.org
> http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: [gtk-i18n-list] Determine the best encoding/script for a given text

Reply via email to