Re: Unicode in C

2012-03-12 Thread kobi zamir
enchant use hspell as is (iso-8859-8) and just convert the strings when using the hspell lib: http://www.abisource.com/viewvc/enchant/trunk/src/hspell/hspell_provider.c?view=markup imho because hspell only use hebrew, it can internally continue to use hebrew only charset without nikud iso-8859-8 (

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 7:37 PM, Nadav Har'El wrote: > On Mon, Mar 12, 2012, Elazar Leibovich wrote about "Re: Unicode in C": > > The simplest option is, to accept StringPiece-like structure (pointer to > > buffer + size), and encoding, then to convert the data internally to your > > encoding (say

Re: Unicode in C

2012-03-12 Thread Nadav Har'El
On Mon, Mar 12, 2012, Elazar Leibovich wrote about "Re: Unicode in C": > The simplest option is, to accept StringPiece-like structure (pointer to > buffer + size), and encoding, then to convert the data internally to your > encoding (say, ISO-8859-8, replacing illegal characters with whitespace), >

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 5:39 PM, E L wrote: > What's the advantage of using ucs-4 internally? > Especially if the program needs to save memory (embedded devices are > pretty common these days). > UTF-32 or UCS-4, is the only encoding form that allows random access to each Unicode codepoint, each

Re: Unicode in C

2012-03-12 Thread Ely Levy
What's the advantage of using ucs-4 internally? Especially if the program needs to save memory (embedded devices are pretty common these days). Ely 2012/3/12 Dov Grobgeld > My suggestion is go the glib/gtk approach and use utf-8 everywhere and > have the API accept char*, i.e. there is no typed

Re: Unicode in C

2012-03-12 Thread Dov Grobgeld
My suggestion is go the glib/gtk approach and use utf-8 everywhere and have the API accept char*, i.e. there is no typedef for a unicode character strings. If this is not acceptable because of speed (this is its only tradeoff), then use UCS-4 internally and provide two external interfaces for UCS-4

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
The simplest option is, to accept StringPiece-like structure (pointer to buffer + size), and encoding, then to convert the data internally to your encoding (say, ISO-8859-8, replacing illegal characters with whitespace), and convert the other output back. Do you mind using iconv-like library? On

Re: Unicode in C

2012-03-12 Thread Elazar Leibovich
On Mon, Mar 12, 2012 at 3:20 PM, Omer Zak wrote: > > If you need to use Far Eastern fonts and/or have random access for your > text, use fixed size wide character encoding (16 bit or 32 bit size). > > Note that UTF-16, doesn't really offer random access, due to surrogate pairs (not all Unicode co

Re: Unicode in C

2012-03-12 Thread Omer Zak
It depends upon your tradeoffs. If you use mostly Western fonts (Latin, Hebrew, etc.) and want to economize on memory use, use UTF-8. However, for Chinese, it costs more memory than it saves. If you need to use Far Eastern fonts and/or have random access for your text, use fixed size wide charact

Unicode in C

2012-03-12 Thread Nadav Har'El
Hi, I have a question that I was sort of sad that I couldn't readily find the answer to... Let's say I want to create a C API (a C library), with functions which take strings as arguments. What am I supposed to use if I want these strings to be in any language? Obviously the answer is "Unicode", b